Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will introduce you to a few tools that will help you create and manage STIX 2.1 content.

In the last post, I showed examples of STIX objects. All of these were created using the cti-python-stix2 library.

The STIX 2 Python library from OASIS is a set of Python APIs that allow you to quickly start creating STIX 2.1 content. It is likely to be the tool you use most as a STIX 2.1 producer.

There are a wide range of functions it can be used for. This post aims to cover some of the most common that you will likely want to perform.

To follow along with this tutorial, first clone our tutorial repository and install the cti-python-stix2 library like so.

Lets jump in feet first and create two different STIX objects an SDO, and and SCO.

First I’ll create a venv to isolate our work;

mkdir stix2_python_tutorial
python3 -m venv stix2_python_tutorial
source stix2_python_tutorial/bin/activate
pip3 install stix2

Creating SDOs and SCOs

Now lets create a file called generate_sdo.py and use it to generate an Attack Pattern.

# python3 generate_sdo.py
## Start by importing all the things you will need
### https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.sdo.html#stix2.v21.sdo.AttackPattern
### https://stix2.readthedocs.io/en/latest/api/stix2.v21.html?highlight=tlp#stix2.v21.TLPMarking

from stix2 import AttackPattern, TLP_GREEN

## Create AttackPattern SDO using the files 

AttackPatternDemo = AttackPattern(
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    name="Spear Phishing",
    description="Used for tutorial content",
    object_marking_refs=[
        TLP_GREEN
    ]
)

## Print all the objects to the command line

print(AttackPatternDemo.serialize(pretty=True))

Running the command;

python3 generate_sdo.py
{
    "type": "attack-pattern",
    "spec_version": "2.1",
    "id": "attack-pattern--794709ca-2407-4da8-a6ec-e4b1e074a18d",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T07:38:55.364693Z",
    "modified": "2020-01-01T07:38:55.364693Z",
    "name": "Spear Phishing",
    "description": "Used for tutorial content",
    "object_marking_refs": [
        "marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9"
    ]
}

Note, how the library takes care of some of the required properties here; type, spec_version, id, created, modified.

It’s important to review the STIX specification beforehand to ensure you’re passing the right properties, and the correct data types for these properties.

For example, try again, but this time change created_by_ref=Signals Corps Demo in the above code. You’ll see and error like the following returned;

stix2.exceptions.InvalidValueError: Invalid value for AttackPattern 'created_by_ref': not a valid STIX identifier, must match <object-type>--<UUID>: Signals Corps Demo

The process to create an SCO is very similar.

I’ll start by creating a file called generate_sco.py;

# python3 generate_sco.py
## Start by importing all the things you will need
### IPv4 SCO https://stix2.readthedocs.io/en/latest/api/stix2.v21.html#stix2.v21.IPv4Address

from stix2 import IPv4Address

## Create IPv4Address SCO using the files 

IPv4AddressDemo = IPv4Address(
    value="177.60.40.7"
)

## Print all the objects to the command line

print(IPv4AddressDemo.serialize(pretty=True))

Running the command;

python3 generate_sco.py
{
    "type": "ipv4-addr",
    "spec_version": "2.1",
    "id": "ipv4-addr--dc63603e-e634-5357-b239-d4b562bc5445",
    "value": "177.60.40.7"
}

A long (but important) note on ID generation

The STIX specification states:

All identifiers, excluding those used in the deprecated Cyber Observable Container, MUST follow the form object-type–UUID, where object-type is the exact value (all type names are lowercase strings, by definition) from the type property of the object being identified or referenced and where the UUID MUST be an RFC 4122-compliant UUID [RFC4122].

STIX Domain Objects, STIX Relationship Objects, STIX Meta Objects, and STIX Bundle Object SHOULD use UUIDv4 for the UUID portion of the identifier. Producers using something other than UUIDv4 need to be mindful of potential collisions and should use a namespace that guarantees uniqueness, however, they MUST NOT use a namespace of 00abedb4-aa42-466c-9c01-fed23315a9b7 if generating a UUIDv5.

As such, when using the STIX 2 Python library, random UUID v4s will be generated for SDOs, SROs, SMOs, and SBOs.

This means every time I run the code generate_sdo.py, a new UUID is generated. Try it.

As SCOs represent atomic objects that don’t change, they instead use UUID v5s so the UUID persist no matter who generates or when the object was generated. Thus all IPv4 SCOs with the value 1.1.1.1 should all have the same IPv4 SCO ID.

This is useful as it means when sharing STIX data, it is clear if two producers of intel are talking about the same thing (easily identified as the IDs will be identical)

As per the STIX spec;

STIX Cyber-observable Objects SHOULD use UUIDv5 for the UUID portion of the identifier

By following the rules;

  • The namespace SHOULD be 00abedb4-aa42-466c-9c01-fed23315a9b7. This defined namespace is necessary to support the goal of deduplication and semantic equivalence of some STIX objects in the community of producers.
  • The value of the name portion SHOULD be the list of “ID Contributing Properties” (property-name and property value pairs) as defined on each SCO object and SHOULD be represented as a JSON object that is then serialized / stringified according to [RFC8785] to ensure a canonical representation of the JSON data.

Going back to the first post’ remember that SCOs contained ID Contributing Properties? Take the Domain SCO specification…

STIX Domain SCO Properties

This means the the ID here will be generated using the namespace 00abedb4-aa42-466c-9c01-fed23315a9b7 and the value property of the domain object.

The good news is, the STIX 2 Python library does this for us automatically, so I don’t have to worry about generating the UUID v5s for SCOs.

# python3 sco_uuid_demo.py
## Start by importing all the things you will need
### Domain name SCO https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.observables.html#stix2.v21.observables.DomainName

from stix2 import DomainName

## Create DomainName SDO using the files 

DomainNameDemo = DomainName(
    value="google.com"
)

## Print all the objects to the command line

print(DomainNameDemo.serialize(pretty=True))
python3 sco_uuid_demo.py
{
    "type": "domain-name",
    "spec_version": "2.1",
    "id": "domain-name--dd686e37-6889-53bd-8ae1-b1a503452613",
    "value": "google.com"
}

Running it again..

python3 sco_uuid_demo.py
{
    "type": "domain-name",
    "spec_version": "2.1",
    "id": "domain-name--dd686e37-6889-53bd-8ae1-b1a503452613",
    "value": "google.com"
}

Lets add another property that is not an ID contributing property (that is a property that will not change the way the id UUID generation happens)

# python3 sco_uuid_contributing_prop_demo.py
## Start by importing all the things you will need
### Domain name SCO https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.observables.html#stix2.v21.observables.DomainName

from stix2 import DomainName

## Create DomainName SDO using the files 

DomainNameDemo = DomainName(
    value="google.com",
    resolves_to_refs="ipv4-addr--dc63603e-e634-5357-b239-d4b562bc5445"
)

## Print all the objects to the command line

print(DomainNameDemo.serialize(pretty=True))
python3 sco_uuid_contributing_prop_demo.py
{
    "type": "domain-name",
    "spec_version": "2.1",
    "id": "domain-name--dd686e37-6889-53bd-8ae1-b1a503452613",
    "value": "google.com",
    "resolves_to_refs": [
        "ipv4-addr--dc63603e-e634-5357-b239-d4b562bc5445"
    ]
}

See how the ID is still the same? That’s because only the value property is used to generate the UUIDv5. Any other properties will have no effect on the id. Note, all SCOs have different ID contributing properties, some more than one, that will change the UUIDv5 generation.

When generating SDOs, SROs, and SMOs, I occasionally use UUID v5s too (but UUID v4s are recommended by OASIS).

Using UUIDv5s means I can identify objects generated by us using the ID property alone and it also means I can control the STIX IDs to meet our needs (e.g. giving an Indicator and Vulnerability the same UUID portion of the id, when they are directly coupled).

To do this, I just explicitly pass the ID, and the ID generation logic when generating the SDO, SRO, or SMO. For example I could modify the Attack Pattern SDO example I used above to generate a UUIDv5 as follows;

# python3 generate_sdo_with_uuidv5.py
## Start by importing all the things you will need
import uuid

from uuid import UUID
from stix2 import AttackPattern, TLP_GREEN

## Set the uuid variables

namespace = UUID("d2916708-57b9-5636-8689-62f049e9f727")
value = "Some fixed value"
generated_id = "attack-pattern--" + str(uuid.uuid5(namespace, value))

## Create ThreatActor SDO using the files 

AttackPatternUUID5Demo = AttackPattern(
    id=generated_id,
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    name="Spear Phishing",
    description="Used for tutorial content",
    object_marking_refs=[
        TLP_GREEN
    ]
)

## Print all the objects to the command line

print(AttackPatternUUID5Demo.serialize(pretty=True))
python3 generate_sdo_with_uuidv5.py
{
    "type": "attack-pattern",
    "spec_version": "2.1",
    "id": "attack-pattern--6b948b5a-3c09-5365-b48a-da95c3964cb5",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T11:21:07.478851Z",
    "modified": "2020-01-01T11:21:07.478851Z",
    "name": "Spear Phishing",
    "description": "Used for tutorial content",
    "object_marking_refs": [
        "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
    ]
}

Notice how each time the script is run, the ID of the Attack Pattern object generated is always the same (attack-pattern--6b948b5a-3c09-5365-b48a-da95c3964cb5).

STIX 2.1 Versioning allows for the tracking and management of such changes.

SDOs, SROs, and Language Content SMOs

STIX SDOs, SROs, and Language Content SMOs can be versioned in order to update, add, remove information, or revoke the entire Object.

The decision ultimately comes down to whether to use the same Object (minor change), a new Object (major change), or to revoke the Object from circulation entirely when creating a new version of it.

First it is important to realise every STIX SDOs, SROs, and Language Content SMO has three required Common Properties important for versioning;

  1. id
  2. created
  3. modified

In the STIX specification OASIS mention minor or major changes. In short, a minor change means the object keeps the same id, but the other properties are modified. A major changes creates an entirely new object.

However, the specification does not definitively define exactly what constitutes a minor or major change. Let me give you our take on it…

Generally minor changes are what I use. Intelligence evolves over time. New things are learned. This doesn’t usually require a new object. For example, I might want to update the description, or add a new property to reflect what I’ve learned.

In this case I just modify the object with these changes making sure the id and created values do not change, and the modified property reflects the time of the update.

As noted earlier using UUIDv5s to control IDs of SDOs, SROs, and SMOs is not required (and not recommended). The STIX2 library offers a range of versioning methods which are usedful for updating these objects to ensure UUIDs persist.

For example, lets uss the original Attack Pattern object I generated;

python3 generate_sdo.py
{
    "type": "attack-pattern",
    "spec_version": "2.1",
    "id": "attack-pattern--794709ca-2407-4da8-a6ec-e4b1e074a18d",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T07:38:55.364693Z",
    "modified": "2020-01-01T07:38:55.364693Z",
    "name": "Spear Phishing",
    "description": "Used for tutorial content",
    "object_marking_refs": [
        "marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9"
    ]
}

To update this, and ensure the UUID persists, I can use the new_version function in the STIX 2 library as follows;

# python3 update_sdo.py
## Start by importing all the things you will need
### https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.sdo.html#stix2.v21.sdo.AttackPattern
### https://stix2.readthedocs.io/en/latest/api/stix2.v21.html?highlight=tlp#stix2.v21.TLPMarking

from stix2 import AttackPattern, TLP_GREEN, new_version

## Create Attack Pattern SDO using the files 

AttackPatternDemo = AttackPattern(
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    name="Spear Phishing",
    description="Used for tutorial content",
    object_marking_refs=[
        TLP_GREEN
    ]
)

## Print all the objects to the command line

print(AttackPatternDemo.serialize(pretty=True))

## Update the Attack Pattern SDO

UpdatedAttackPatternDemo = new_version(
    AttackPatternDemo,
    description="new description")

## Print all the objects to the command line

print(UpdatedAttackPatternDemo.serialize(pretty=True))
python3 update_sdo.py
{
    "type": "attack-pattern",
    "spec_version": "2.1",
    "id": "attack-pattern--f6455edf-222b-48c3-8604-d672929cd40e",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T08:28:30.688249Z",
    "modified": "2020-01-01T08:28:30.688249Z",
    "name": "Spear Phishing",
    "description": "Used for tutorial content",
    "object_marking_refs": [
        "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
    ]
}
{
    "type": "attack-pattern",
    "spec_version": "2.1",
    "id": "attack-pattern--f6455edf-222b-48c3-8604-d672929cd40e",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T08:28:30.688249Z",
    "modified": "2020-02-01T07:38:55.364693Z",
    "name": "Spear Phishing",
    "description": "new description",
    "object_marking_refs": [
        "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
    ]
}

As you can see the id and created properties persist, but notice how the modified time changes.

One common scenario where a new Object must be created, thus denoting a major change, is when someone other than the creator (defined in the created_by_ref Property) wants to make a change to an Object.

Other scenarios where a major change might be considered, is when a serious factual error was made to the original object.

Instead of updating the Object like in a minor change, the process of performing a major change involves creating a new Object as I’ve shown in this post.

However, it’s also a good idea to;

  1. revoke the first object, if applicable
  2. link the objects together using a Relationship SRO so that there is a history of what has happened

On the subject of revoking an object, that’s easy. SDOs, SROs, and Language Content SMOs have an optional common property revoked. By setting the to true, will mark that the object is no longer active and should not be considered.

On the second point, creating a Relationship object, I’ll come back to that in a bit.

What about versioning SCOs and Marking Definition SMOs?

SCOs and Marking Definition SMOs never contain the modified property.

However, there are of course many occasions where SCOs do need to be updated.

Generally an SCOs ID will not change on modification, unless you modify an ID contributing property. For most objects, this is the value property. If you are changing the ID contributing property of an object it will thus always constitute a major update. Note, the revoked property does not exist for SCOs, but you likely want to handle major updates to SCOs in the same way as for SDOs, et al.

When it comes to minor updates, things are slightly different as there is no modified property to show which one is the latest version (and the id should always persist).

Thus I use the same approach as major updates to update SCOs, using a SRO to link them with with the Property "relationship_type": "update-of".

Speaking of relationship objects…

Creating Relationship SROs

Continuing the example of a major update…

Lets imagine I have two SDOs;

  1. malware--2f559518-c844-4c4e-bca3-cc97520c164a: original version of the object but it contains many serious errors
  2. malware--09d22009-b575-4880-889f-6c539157dbc7: the new version of the object (denoting the same Malware) with errors corrected.

Here I want to create a Relationship SRO to link these objects and describe the link between them (replace-by). To create a relationship SRO;

# python3 generate_sro.py
## Start by importing all the things you will need
### IPv4 SCO https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.sro.html#stix2.v21.sro.Relationship
from stix2 import Relationship

## Create Relationship SRO using the files 

RelationshipDemo = Relationship(
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    relationship_type="replaced-by",
    source_ref="malware--2f559518-c844-4c4e-bca3-cc97520c164a",
    target_ref="malware--09d22009-b575-4880-889f-6c539157dbc7"
)

## Print all the objects to the command line

print(RelationshipDemo.serialize(pretty=True))
python3 generate_sro.py
{
    "type": "relationship",
    "spec_version": "2.1",
    "id": "relationship--44ceafde-0027-45cc-bab9-b46c5e001ceb",
    "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
    "created": "2020-01-01T15:25:53.686507Z",
    "modified": "2020-01-01T15:25:53.686507Z",
    "relationship_type": "replaced-by",
    "source_ref": "malware--2f559518-c844-4c4e-bca3-cc97520c164a",
    "target_ref": "malware--09d22009-b575-4880-889f-6c539157dbc7"
}

Saving Objects to the FileSystemStore

In the last two examples I printed the Objects to the command line. Though it typically makes more sense to save them for reuse later than create them all in one go. For this I can use the STIX2 FileSystemStore API.

Lets now create an Identity SDO and store it to the filesystem;

# python3 generate_identity_sdo_store_fs.py
## Start by importing all the things you will need
### https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.sdo.html#stix2.v21.sdo.Identity
### https://stix2.readthedocs.io/en/latest/api/stix2.datastore.html
from stix2 import Identity
from stix2 import FileSystemStore

## Create Identity SDO using the files 

IdentityDemo = Identity(
    identity_class="organization",
    name="Example Corp.",
    type="identity"
)

## Write the to directory path tmp/stix2_store

fs = FileSystemStore("tmp/stix2_store")
fs.add([IdentityDemo])
python3 generate_identity_sdo_store_fs.py

After running the script you’ll now find the object stored in the location defined, in my case tmp/stix2_store;

STIX filesystem store

For objects that contain a modified property, like an SRO, you will see the objects stored in the following directory structure:

FILESYSTEM_DEFINED/
├── OBJECT_TYPE/
│   ├── OBJECT_ID
│   │   ├── MODIFIED_DATE_OF_OBJECT.json
│   │   └── MODIFIED_DATE_OF_OBJECT.json
│   └── OBJECT_ID
│       ├── MODIFIED_DATE_OF_OBJECT.json
│       └── MODIFIED_DATE_OF_OBJECT.json
└── OBJECT_TYPE/
    └── OBJECT_ID
        ├── MODIFIED_DATE_OF_OBJECT.json
        └── MODIFIED_DATE_OF_OBJECT.json

Those that do not contain a modified time, namely SCOs, will be stored as follows;

FILESYSTEM_DEFINED/
├── OBJECT_TYPE/
│   ├── OBJECT_ID
│   └── OBJECT_ID
└── OBJECT_TYPE/
    ├── OBJECT_ID
    └── OBJECT_ID

Again, to demonstrate;

# python3 generate_sco_in_fs.py
## Start by importing all the things you will need
### IPv4 SCO https://stix2.readthedocs.io/en/latest/api/stix2.v21.html#stix2.v21.IPv4Address
from stix2 import IPv4Address
from stix2 import FileSystemStore

## Create IPv4Address SCO using the files 

IPv4AddressDemo = IPv4Address(
    value="177.60.40.7"
)

## Print all the objects to the command line

fs = FileSystemStore("tmp/stix2_store")
fs.add([IPv4AddressDemo])

STIX filesystem store

Calling objects from the filesystem

In many cases you’ll want to call existing objects from the filesystem to be used.

Lets write some more objects to the filesystem.

# python3 store_objects_to_fs_for_recall.py
## Start by importing all the things you will need
from stix2 import AttackPattern, ThreatActor, TLP_GREEN
from stix2 import FileSystemStore

## Create ThreatActor SDO using the files 

AttackPatternFSDemo = AttackPattern(
    id="attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61",
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    name="Spear Phishing",
    object_marking_refs=[
        TLP_GREEN
    ]
)

ThreatActorFSDemo = ThreatActor(
    id="threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758",
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    name="A bad guy",
    threat_actor_types="sensationalist",
    object_marking_refs=[
        TLP_GREEN
    ]
)

## Write the to directory path tmp/stix2_store

fs = FileSystemStore("tmp/stix2_store")
fs.add([AttackPatternFSDemo,ThreatActorFSDemo])

STIX filesystem store

Now these are stored, I can call them at anytime from the filesystem as follows;

# python3 recall_objects_for_sro.py
from stix2 import AttackPattern, ThreatActor, TLP_GREEN, Relationship
from stix2 import FileSystemStore

## Get required Objects previously saved to filesystem source
### https://stix2.readthedocs.io/en/latest/guide/filesystem.html#FileSystemSource

fs = FileSystemStore("tmp/stix2_store")

## Load them

AttackPatternInFS = fs.get("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")
ThreatActorInFS = fs.get("threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758")

RelationshipDemoUsingFS = Relationship(
    id="relationship--4a58e575-8ace-49b3-9137-26c76aaa25b8",
    created_by_ref="identity--d2916708-57b9-5636-8689-62f049e9f727",
    relationship_type="uses",
    source_ref=ThreatActorInFS,
    target_ref=AttackPatternInFS
)

## Write the to directory path tmp/stix2_store

fs.add([RelationshipDemoUsingFS])

And voila, I now have an SRO connecting them…

STIX filesystem store

Though this assumes I know the objects I want by id using the get method, which clearly is not always the case…

Searching the filesystem

You can search the filesystem using a query;

# python3 query_filesystem.py
## https://stix2.readthedocs.io/en/latest/api/stix2.datastore.html#stix2.datastore.CompositeDataSource.query
## https://stix2.readthedocs.io/en/latest/api/datastore/stix2.datastore.filters.html?highlight=Filter

from stix2 import FileSystemStore, Filter

fs = FileSystemStore("tmp/stix2_store")

# Create a filter for the query
filter1 = Filter('name', '=', 'Spear Phishing')

# Perform the query using the filter
SpearPhishingSearch = fs.query([filter1])

# Print the results
for item in SpearPhishingSearch:
    print(item)

Running this, will return all items in the filesystem where the name property is Spear Phishing;

python3 query_filesystem.py
{"type": "attack-pattern", "spec_version": "2.1", "id": "attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61", "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727", "created": "2023-12-13T08:35:47.58147Z", "modified": "2023-12-13T08:35:47.58147Z", "name": "Spear Phishing", "object_marking_refs": ["marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"]}

Another useful way to search is to retrieve STIX Objects that have a Relationship involving the given STIX object. This can be done using related_to;

# python3 query_filesystem_related.py
## https://stix2.readthedocs.io/en/latest/api/stix2.datastore.html#stix2.datastore.CompositeDataSource.related_to
## https://stix2.readthedocs.io/en/latest/api/datastore/stix2.datastore.filters.html?highlight=Filter

from stix2 import FileSystemStore

fs = FileSystemStore("tmp/stix2_store")

# Perform the query using the filter
RelatedToSpearPhishing = fs.related_to("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")

# Print the results
for item in RelatedToSpearPhishing:
    print(item)
python3 query_filesystem_related.py
{"type": "threat-actor", "spec_version": "2.1", "id": "threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758", "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727", "created": "2023-12-13T08:35:47.581813Z", "modified": "2023-12-13T08:35:47.581813Z", "name": "A bad guy", "threat_actor_types": ["sensationalist"], "object_marking_refs": ["marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"]}

I can see the Threat Actor that is related to our Attack Pattern object (attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61).

Bundling everything together

If you remember back to last weeks post, to share STIX objects, usually they are bundled together.

Lets bundle the objects previously stored in the filestore;

# python3 bundle_filesystem_objects.py
## https://stix2.readthedocs.io/en/latest/guide/filesystem.html#FileSystemSource
## https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.bundle.html#stix2.v21.bundle.Bundle

from stix2 import Bundle
from stix2 import FileSystemStore

fs = FileSystemStore("tmp/stix2_store")

## Get all Objects previously saved to filesystem source

AttackPatternInFS = fs.get("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")
ThreatActorInFS = fs.get("threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758")
RelationshipInFS = fs.get("relationship--4a58e575-8ace-49b3-9137-26c76aaa25b8")

BundleofAllObjects = Bundle(AttackPatternInFS,ThreatActorInFS,RelationshipInFS)

## Print the bundle

print(BundleofAllObjects.serialize(pretty=True))
python3 bundle_filesystem_objects.py
{
    "type": "bundle",
    "id": "bundle--ed31bd4b-46ab-4965-8796-12b4d5ad9fcc",
    "objects": [
        {
            "type": "attack-pattern",
            "spec_version": "2.1",
            "id": "attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61",
            "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
            "created": "2020-01-01T08:28:30.688249Z",
            "modified": "2020-01-01T08:28:30.688249Z",
            "name": "Spear Phishing",
            "object_marking_refs": [
                "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
            ]
        },
        {
            "type": "threat-actor",
            "spec_version": "2.1",
            "id": "threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758",
            "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
            "created": "2020-01-01T08:28:30.688249Z",
            "modified": "2020-01-01T08:28:30.688249Z",
            "name": "A bad guy",
            "threat_actor_types": [
                "sensationalist"
            ],
            "object_marking_refs": [
                "marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da"
            ]
        },
        {
            "type": "relationship",
            "spec_version": "2.1",
            "id": "relationship--4a58e575-8ace-49b3-9137-26c76aaa25b8",
            "created_by_ref": "identity--d2916708-57b9-5636-8689-62f049e9f727",
            "created": "2020-01-01T08:28:30.688249Z",
            "modified": "2020-01-01T08:28:30.688249Z",
            "relationship_type": "uses",
            "source_ref": "threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758",
            "target_ref": "attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61"
        }
    ]
}

Note, you can’t add a bundle directly to the filestore, so should you want to do this, you’ll need to use a bit of code similar to the following;

# python3 bundle_filesystem_objects_to_fs.py
## https://stix2.readthedocs.io/en/latest/guide/filesystem.html#FileSystemSource
## https://stix2.readthedocs.io/en/latest/api/v21/stix2.v21.bundle.html#stix2.v21.bundle.Bundle

from stix2 import Bundle
from stix2 import FileSystemStore
from stix2.base import STIXJSONEncoder
import json

fs = FileSystemStore("tmp/stix2_store")

## Get all Objects previously saved to filesystem source

AttackPatternInFS = fs.get("attack-pattern--b2c77df1-7aac-4b02-bdf1-6e71cb023d61")
ThreatActorInFS = fs.get("threat-actor--db09d012-6be1-4c08-bd0e-15f6910f1758")
RelationshipInFS = fs.get("relationship--4a58e575-8ace-49b3-9137-26c76aaa25b8")

BundleofAllObjects = Bundle(
    id="bundle--1534220d-dc40-465b-a9e2-1bb7af2f8a55",
    objects=[AttackPatternInFS,ThreatActorInFS,RelationshipInFS]
)

## Save a bundle .json (cannot directly write to FS)

with open(BundleofAllObjects.id+'.json', 'w') as f:
    f.write(json.dumps(BundleofAllObjects,cls=STIXJSONEncoder))

Database storage

When it comes to storing and retrieving STIX 2.1 data at scale in an easy and efficient manner using a database is much more suited than the filestore.

There are number of considerations when selecting a database to use.

The team at Sekoia did a great job detailing their decision to use ArangoDB to store STIX Objects.

ArangoDB uses a few concepts useful for this;

  • Databases: A database contains Collections which hold documents and/or edges
  • Collections: There are two types of Collections
    1. Vertex Collections: contain documents on a graph
    2. Edge Collections: links documents from Vertex Collections
  • Documents: the data

In STIX, Documents are the STIX objects (but not Relationship SROS), Vertex Collections are groupings of those objects, and Edge Collections are collections of Relationship objects.

Therefore, I can start up ArangoDB as follows;

## Install
brew install arangodb
## Run
brew services start arangodb
## will now be accessible in a browser at: http://127.0.0.1:8529 . Default username is root with no password set (leave blank) 

And log into the UI to create a:

  1. Database = arango_stix_demo
  2. Vertex Collection = stix_objects
  3. Edge Collection = stix_relationships

Now those are defined I can start populating the database with some STIX 2.1 data.

Storing STIX 2.1 Objects in ArangoDB

In this example I will create two SDOs and link them together with an SRO (taken from this OASIS STIX example bundle)

ArangoDB provides its own query language named Arango Query Language (AQL) which will allow me to do this:

AQL is mainly a declarative language, meaning that a query expresses what result should be achieved but not how it should be achieved. AQL aims to be human-readable and therefore uses keywords from the English language.[…] Further design goals of AQL were the support of complex query patterns and the different data models ArangoDB offers.

The syntax of AQL queries is different to SQL, even if some keywords overlap. Nevertheless, AQL should be easy to understand for anyone with an SQL background.

In the queries section of the ArangoDB UI I will start by creating two SDOs, one Indicator (indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f) and one Malware (malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b);

LET objects = [
  {
    "_key": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
    "type": "indicator",
    "id": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
    "created": "2016-04-06T20:03:48.000Z",
    "modified": "2016-04-06T20:03:48.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "labels": [
      "malicious-activity"
    ],
    "name": "Poison Ivy Malware",
    "description": "This file is part of Poison Ivy",
    "pattern": "[ file:hashes.'SHA-256' = '4bac27393bdd9777ce02453256c5577cd02275510b2227f473d03f533924f877' ]",
    "valid_from": "2016-01-01T00:00:00Z"
  },
  {
    "_key": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
    "type": "malware",
    "id": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
    "created": "2016-04-06T20:07:09.000Z",
    "modified": "2016-04-06T20:07:09.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "name": "Poison Ivy"
  }
]
FOR object IN objects
 INSERT object INTO stix_objects

In the query I use the STIX id attribute as the document primary key (_key) so I can easily retrieve the documents later.

Now I need to store the SRO relationship between these Objects in the Edge Collection stix_relationships. Three attributes must be set in the AQL query for this: _key, _from and _to.

For the _key I am going to use STIX SRO ID in the same way I did for the SDOs. For the _from and _to field I will use the STIX SRO properties source_ref and target_ref respectively. The format to use for the _from and _to fields is as follows: <collection>/<_key>. Where <collection> is the Collection is the Document Collection name (stix_objects) and the <_key> is the _key SDO ID (I set _key as the STIX Object id, so the _key value is equal to the STIX Object id)

To demonstrate, the AQL request for this example is as follow:

INSERT {
    "_key": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad+2016-04-06T20:06:37.000Z",
    "_from": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
    "_to": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
    "type": "relationship",
    "id": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
    "created": "2016-04-06T20:06:37.000Z",
    "modified": "2016-04-06T20:06:37.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81"
    "relationship_type": "indicates",
    "source_ref": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
    "target_ref": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b"
} IN stix_relationships

As we’ve already talked about, STIX objects also have embedded relationships, e.g. created_by_ref, sighting_of_ref, observed_data_refs, object_refs, etc.

An Object might have more than one embedded relationship property (identified where the property name ends with _ref or _refs. Embedded relationships can also contains lists of relationships. Here is an example of a Report SDO with three embedded relationships under the objects_refs property:

{
    "type": "report",
    "spec_version": "2.1",
    "id": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "created": "2015-12-21T19:59:11.000Z",
    "modified": "2015-12-21T19:59:11.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "name": "The Black Vine Cyberespionage Group",
    "description": "A simple report with an indicator and campaign",
    "published": "2015-12-21T19:59:11.000Z",
    "report_types": [
        "campaign"
    ],
    "object_refs": [
        "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
        "campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
        "relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a"
    ]
}

These embedded relationships need to be represented as edges too, but in a slightly different way to native STIX Objects. In the previous examples I added pure STIX Objects into ArangoDB (as they would be received), however, embedded relationships require a custom object (non STIX Object) to define the relationship that in turn needs to be parsed out of a STIX Object.

In our implementation, embedded relationship edges have the _key containing both the STIX Object IDs for the embedded relationship, joined with a +. The edges also contain two unique properties; 1) type which is matches the _ref property, and 2) relationship_description property which is equal to the property in the STIX Object.

Let me demonstrate for clarity. First I would create the Report SDO as a Document;

INSERT {
    "_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "type": "report",
    "spec_version": "2.1",
    "id": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "created": "2015-12-21T19:59:11.000Z",
    "modified": "2015-12-21T19:59:11.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "name": "The Black Vine Cyberespionage Group",
    "description": "A simple report with an indicator and campaign",
    "published": "2015-12-21T19:59:11.000Z",
    "report_types": [
        "campaign"
    ],
    "object_refs": [
        "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
        "campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
        "relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a"
    ]
} IN stix_objects

This Report SDO has four embedded relationships;

  • created_by_ref = identity--c2aceda2-0e46-431d-be40-7b4a4e797f81
  • object_refs = indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2
  • object_refs = campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c
  • object_refs = relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a

I will assume these SDOs already exist in the stix_objects document collection.

As such, all that that is required is to create the four embedded relationship edges in the stix_relationships Edge Collection.

LET embedded_relationships  = [
{
    "_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
    "_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "_to": "stix_objects/indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
    "relationship_type": "created_by_ref",
},
{
    "_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+identity--a463ffb3-1bd9-4d94-b02d-74e4f1658283",
    "_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "_to": "stix_objects/identity--a463ffb3-1bd9-4d94-b02d-74e4f1658283",
    "relationship_type": "object_refs",
},
{
    "_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
    "_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "_to": "stix_objects/campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
    "relationship_type": "object_refs",
},
{
    "_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a",
    "_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
    "_to": "stix_objects/relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a",
    "relationship_type": "object_refs",
}
]
FOR embedded_relationship IN embedded_relationships
 INSERT embedded_relationship INTO stix_relationships

Now I can filter the results by relationship_type. For example, I could filter the documents in the stix_relationships collection using the query relationship_type==created_by_ref.

I talked a bit about versioning earlier in this post. This can be achieved done using updates to documents in ArangoDB. Lets start by creating a new object;

INSERT {
    "_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
    "type": "report",
    "id": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
    "created": "2021-01-01T00:00:00.000Z",
    "modified": "2021-01-01T00:00:00.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "name": "Demoing version",
    "published": "2021-01-01T00:00:00.000Z",
    "report_types": ["campaign"]
} IN stix_objects

Then I will update it with the new properties;

UPDATE {
    "_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
    "type": "report",
    "id": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
    "created": "2021-01-01T00:00:00.000Z",
    "modified": "2022-01-01T00:00:00.000Z",
    "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
    "name": "Demoing version",
    "description": "Adding a new field",
    "published": "2021-01-01T00:00:00.000Z",
    "report_types": ["campaign"]
} IN stix_objects

With each UPDATE to a document, ArangoDB tracks its revision under a _rev (Document Revision) property.

Querying STIX 2.1 Objects in ArangoDB

Now these STIX Objects and relationships have successfully been written, I can start to explore the database.

The ArangoDB UI contains a graph visualisation tool that is useful for browsing visually (Graphs > Your Graph).

I will need to create a Graph = stix_graph which contains;

  • stix_relationships as the Edge Definition (edge definition define a relation of the graph)
  • stix_objects for both the fromCollections (Collections that contain the start vertices of the relation) and toCollections (Collections that contain the end vertices of the relation)

Here is what the stix_graph looks like for the Objects I have added to the database;

ArangoDB Graph UI STIX example

As you can see from the first example, the two stix_objects in the Document Collection are represented as individual nodes on the graph, and the stix_relationships in the Edge Collection as an edge.

This is about the simplest example of a STIX Object Graph. Nodes (STIX SDO/SCOs) can have multiple edges (SROs) making for much more complex graphs.

In these cases, it is likely you will want to filter the information being returned using AQL queries. AQL is really well documented on the ArangoDB website, but I will cover some basic queries for this.

For example, here is a query using the first example to get the ID’s ( RETURN item.id) five (LIMIT 5) most recently created Objects (SORT item.created DESC) of type “indicator” (FILTER item.type == "indicator") in the stix_objects collection.

FOR item IN stix_objects
    FILTER item.type == "indicator"
    SORT item.created DESC
    LIMIT 5
    RETURN item.id

Which returns;

[
  "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f"
]

Before I showed you the out-of-the-box graph view using the UI query builder (that I used to create the Graph stix_graph) – this created an automatically generated graph traversal query. It is also possible to write these queries to customise the nodes and edges returned in the graph to modelled. For example;

FOR vertex, edge, path IN 1..5
    OUTBOUND 'stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f'
    GRAPH 'stix_graph'
    RETURN path

Here is how this query is formed;

  • The FOR takes three variables, the vertex in the traversal, the edge in the traversal and the current path. The current path contains two members, vertices and edges. Here I am asking to return the paths matching the request that’s why I can find the edges and vertices keys in the result.
  • IN 1..5 specifies the minimal and maximal depth for the traversal. 0 would have been a traversal starting from the original vertex.
  • OUTBOUND specifies the direction to follow Possible values are OUTBOUND OR INBOUND OR ANY. The object I used as the original vertex for the traversal is the source in the relationship object. For this reason INBOUND would no return any result in our example.
  • stix_objects/{_key} defines the vertex where the traversal originates from.
  • GRAPH stix_graph is the name identifying the named graph to use for the traversal.

Running the query gives us the following json output (which can also be modelled as a graph):

[
  {
    "edges": [
      {
        "_key": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
        "_id": "stix_relationships/relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
        "_from": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
        "_to": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
        "_rev": "_ep_8O9C---",
        "type": "relationship",
        "id": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
        "created": "2016-04-06T20:06:37.000Z",
        "modified": "2016-04-06T20:06:37.000Z",
        "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
        "relationship_type": "indicates",
        "source_ref": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
        "target_ref": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b"
      }
    ],
    "vertices": [
      {
        "_key": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
        "_id": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
        "_rev": "_ep_1Eeq---",
        "type": "indicator",
        "id": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
        "created": "2016-04-06T20:03:48.000Z",
        "modified": "2016-04-06T20:03:48.000Z",
        "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
        "labels": [
          "malicious-activity"
        ],
        "name": "Poison Ivy Malware",
        "description": "This file is part of Poison Ivy",
        "pattern": "[ file:hashes.'SHA-256' = '4bac27393bdd9777ce02453256c5577cd02275510b2227f473d03f533924f877' ]",
        "valid_from": "2016-01-01T00:00:00Z"
      },
      {
        "_key": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
        "_id": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
        "_rev": "_ep_1Eeq--_",
        "type": "malware",
        "id": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
        "created": "2016-04-06T20:07:09.000Z",
        "modified": "2016-04-06T20:07:09.000Z",
        "created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
        "name": "Poison Ivy"
      }
    ]
  }
]

As the graphs grow, these AQL queries prove very efficient at returning STIX 2.1 data with complex relationship structures.

Creating STIX Patterns for Indicator SDOs

In the next post we’ll deep dive into the STIX Indicator SDOs.

Indicators contain a pattern that can be used to detect suspicious or malicious cyber activity.

In the next post I will introduce patterns, and how they can be constructed.


STIX 2.1 Certification (Virtual and In Person)

The content used in this post is a small subset of our full training material used in our STIX 2.1 training.

If you want to join a select group of certified STIX 2.1 professionals, subscribe to our newsletter below to be notified of new course dates.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.