If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.
In this post I will show you how Signals Corps products use ArangoDB to store STIX 2.1 Objects.
As I demonstrated in our STIX 2.1 tutorial, STIX is a very comprehensive way to represent cyber threat intelligence. Not only does it offer a lot of options to model data, it is also a widely used standard.
However, both these factors bring downsides. Being a defined standard means I cannot be too creative (to ensure maximum downstream compatibility). Similarly the flexibility to model relationships and connect STIX Objects can make for a complex graph of information.
When it comes to storing and retrieving STIX 2.1 data in an easy and efficient manner there are number of considerations when selecting a database to use.
In building Signal Corps products (which are all built around the STIX 2.1 specification), I reviewed various relational databases, document-oriented databases, graph databases and hybrid graph/document databases.
I finally settled on ArangoDB. ArangoDB is a native multi-model database system supporting three data models: key/value, documents and graphs.
The team at Sekoia went through a similar process and have done a great job detailing their decision to use ArangoDB to store STIX Objects too.
In this post, I will show you how to get started storing and retrieving STIX 2.1 Objects using ArangoDB.
You can download ArangoDB here.
For this tutorial, I will be heavily using the the ArangoDB Web UI to demonstrate the concepts.
Storing STIX 2.1 Objects
Firstly, I need to create two Collections;
- A Document Collection
stix_objects
to store the SDOs and SCOs. Document Collections are used to store vertex documents. - An Edge Collection
stix_relationships
to store the SROs and embedded relationships (e.g.object_refs
). Edges are special documents used for connecting other documents into a graph. An edge describes the connection between two documents using the internal attributes:_from
and_to
.
Using these two collections I will also need a Graph, stix_graph
:
stix_relationships
as the Edge Definition (edge definition define a relation of the graph)stix_objects
for both thefromCollections
(Collections that contain the start vertices of the relation) andtoCollections
(Collections that contain the end vertices of the relation)
Now the Collections and Graph are defined I can start populating the database with some STIX 2.1 data.
In this example I will create two SDOs and link them together with an SRO (taken from this OASIS STIX example bundle)
ArangoDB provides its own query language named Arango Query Language (AQL) which will allow me to do this:
AQL is mainly a declarative language, meaning that a query expresses what result should be achieved but not how it should be achieved. AQL aims to be human-readable and therefore uses keywords from the English language.[…] Further design goals of AQL were the support of complex query patterns and the different data models ArangoDB offers.
The syntax of AQL queries is different to SQL, even if some keywords overlap. Nevertheless, AQL should be easy to understand for anyone with an SQL background.
I will start by creating two SDOs, one Indicator (indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f
) and one Malware (malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b
);
LET objects = [
{
"_key": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"type": "indicator",
"id": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"created": "2016-04-06T20:03:48.000Z",
"modified": "2016-04-06T20:03:48.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"labels": [
"malicious-activity"
],
"name": "Poison Ivy Malware",
"description": "This file is part of Poison Ivy",
"pattern": "[ file:hashes.'SHA-256' = '4bac27393bdd9777ce02453256c5577cd02275510b2227f473d03f533924f877' ]",
"valid_from": "2016-01-01T00:00:00Z"
},
{
"_key": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"type": "malware",
"id": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"created": "2016-04-06T20:07:09.000Z",
"modified": "2016-04-06T20:07:09.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "Poison Ivy"
}
]
FOR object IN objects
INSERT object INTO stix_objects
In the query I use the STIX id
attribute as the document primary key (_key
) so I can easily retrieve the documents later.
As you can see in the screenshot above, this successfully executes two writes to the stix_objects
Document Collection.
Now I need to store the SRO relationship between these Objects in the Edge Collection stix_relationships
. Three attributes must be set in the AQL query for this: _key
, _from
and _to
.
For the _key
I am going to use STIX SRO ID in the same way I did for the SDOs. For the _from
and _to
field I will use the STIX SRO properties source_ref
and target_ref
respectively. The format to use for the _from
and _to
fields is as follows: <collection>/<_key>
. Where <collection>
is the Collection is the Document Collection name (stix_objects
) and the <_key>
is the _key
SDO ID (I set _key
as the STIX Object id
, so the _key
value is equal to the STIX Object id
)
To demonstrate, the AQL request for this example is as follow:
INSERT {
"_key": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad+2016-04-06T20:06:37.000Z",
"_from": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"_to": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"type": "relationship",
"id": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
"created": "2016-04-06T20:06:37.000Z",
"modified": "2016-04-06T20:06:37.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81"
"relationship_type": "indicates",
"source_ref": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"target_ref": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b"
} IN stix_relationships
STIX also has embedded relationships defined in each SDO, e.g. created_by_ref
, sighting_of_ref
, observed_data_refs
, object_refs
, etc.
An SDO might have more than one embedded relationship property (identified where the property name ends with _refs
. Embedded relationships can also contains lists of relationships. Here is an example of a Report SDO with three embedded relationships under the objects_refs
property:
{
"type": "report",
"spec_version": "2.1",
"id": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"created": "2015-12-21T19:59:11.000Z",
"modified": "2015-12-21T19:59:11.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "The Black Vine Cyberespionage Group",
"description": "A simple report with an indicator and campaign",
"published": "2015-12-21T19:59:11.000Z",
"report_types": ["campaign"],
"object_refs": [
"indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
"relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a"
]
}
These embedded relationships need to be represented as edges too, but in a slightly different way to native STIX Objects. In the previous examples I added pure STIX Objects into ArangoDB (as they would be received), however, embedded relationships require a custom object (non STIX Object) to define the relationship that in turn needs to be parsed out of a STIX Object.
In our implementation, embedded relationship edges have the _key
containing both the STIX Object IDs for the embedded relationship, joined with a +
. The edges also contain two unique properties; 1) type
which is always embedded-relationship
, and 2) relationship_description
property which is equal to the property in the STIX Object.
Let me demonstrate for clarity. First I would create the Report SDO as a Document;
INSERT {
"_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"type": "report",
"id": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"created": "2015-12-21T19:59:11.000Z",
"modified": "2015-12-21T19:59:11.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "The Black Vine Cyberespionage Group",
"description": "A simple report with an indicator and campaign",
"published": "2015-12-21T19:59:11.000Z",
"report_types": ["campaign"],
"object_refs": [
"indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
"relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a"
]
} IN stix_objects
This report has four embedded relationships;
created_by_ref
:identity--c2aceda2-0e46-431d-be40-7b4a4e797f81
object_refs
:indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2
object_refs
:campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c
object_refs
:relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a
I will assume these SDOs already exist in the stix_objects
document collection.
As such, all that that is required is to create the four embedded relationship edges in the stix_relationships
Edge Collection.
LET embedded_relationships = [
{
"_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"_to": "stix_objects/indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"type": "embedded-relationship",
"relationship_description": "created_by_ref",
"deprecated" false
},
{
"_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+identity--a463ffb3-1bd9-4d94-b02d-74e4f1658283",
"_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"_to": "stix_objects/identity--a463ffb3-1bd9-4d94-b02d-74e4f1658283",
"type": "embedded-relationship",
"relationship_description": "object_refs",
"deprecated" false
},
{
"_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
"_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"_to": "stix_objects/campaign--83422c77-904c-4dc1-aff5-5c38f3a2c55c",
"type": "embedded-relationship",
"relationship_description": "object_refs",
"deprecated" false
},
{
"_key": "report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3+relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a",
"_from": "stix_objects/report--84e4d88f-44ea-4bcd-bbf3-b2c1c320bcb3",
"_to": "stix_objects/relationship--f82356ae-fe6c-437c-9c24-6b64314ae68a",
"type": "embedded-relationship",
"relationship_description": "object_refs",
"deprecated" false
}
]
FOR embedded_relationship IN embedded_relationships
INSERT embedded_relationship INTO stix_relationships
Now I can filter the graph by type
, either equal to relationship
for SROs or embedded-relationship
for relationships embedded in STIX SDOs.
As described in the 105 Versioning tutorial, STIX 2.1 Objects can go through minor updates. A minor update can contain updated, removed or new property field/values.
With each UPDATE to a document, ArangoDB tracks its revision under a _rev
(Document Revision) property. However, relying on this alone does not allow for the ability to easily track revision history (e.g. for an audit trail of how the document changed).
To ensure an audit trail is kept, we can add a new fields _version
where the property is the time of INSERT/UPDATE. This means the latest version of the object will always have the highest time.
For example, lets start create a new report object with the _version
key;
INSERT {
"_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
"_version": "2020-01-01T00:00:00.000Z",
"type": "report",
"id": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
"created": "2021-01-01T00:00:00.000Z",
"modified": "2021-01-01T00:00:00.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "Demoing version",
"published": "2021-01-01T00:00:00.000Z",
"report_types": ["campaign"]
} IN stix_objects
Then I will update it, with a new _version
time, along with the additional fields;
UPDATE {
"_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
"_version": "2022-04-08T09:23:23.000Z",
"type": "report",
"id": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
"created": "2021-01-01T00:00:00.000Z",
"modified": "2022-01-01T00:00:00.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "Demoing version",
"description": "Adding a new field",
"published": "2021-01-01T00:00:00.000Z",
"report_types": ["campaign"]
} IN stix_objects
So now I have two revisions (_rev
s) of the object with "_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1"
. By default the latest revision will be returned. Though there’s a problem with this approach;
On disk, it is therefore possible that multiple revisions of the same document (as identified by the same
_key
value) exist at the same time. However, stale revisions are not accessible.
Source: ArangoDB docs
Therefore a new object needs to be created to ensure the old STIX objects for the ID are still accessible.
In addition to making the update we need to write a new object to the database to represent the old object version that has just been overwritten. To do this, the key can take the form id+new_version_value
.
For example, in the above example, the original version of the object would be recreated as follows;
INSERT {
"_key": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1+2022-04-08T09:23:23.000Z",
"_version": "2020-01-01T00:00:00.000Z",
"type": "report",
"id": "report--02ee5fc1-6321-4007-b6b8-c3c5c8d5e1a1",
"created": "2021-01-01T00:00:00.000Z",
"modified": "2021-01-01T00:00:00.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "Demoing version",
"published": "2021-01-01T00:00:00.000Z",
"report_types": ["campaign"]
} IN stix_objects
This same logic can be applied to any object, including relationships (because the latest object ID will always be the same).
Similarly, if any new embedded relationship (_ref
or _refs
) properties are added to the updated object in the Document Collection, these new relationships will be created in ArangoDB in the same way as described earlier.
If an embedded relationship property is removed, the embedded relationship object in the edge collection is updated so that the deprecated
property equals true
.
Historic STIX 2.1 Relationships Objects (NOT embedded relationships) in the Edge Collection do not need to be updated as they should always point to the latest version but retain their original created
and modified
times. If a user wants to add, update or remove these, they should do so explicitly using a separate UPDATE.
Querying STIX 2.1 Objects
Now these STIX Objects and relationships have successfully been written, I can start to explore the database.
The ArangoDB UI contains a graph visualisation tool that is useful for browsing visually (Graphs > Your Graph). Here is what the stix_graph
I created earlied looks like for the Objects I have added to the database;
As you can see from the first example, the two stix_objects
in the Document Collection are represented as individual nodes on the graph, and the stix_relationships
in the Edge Collection as an edge.
This is about the simplest example of a STIX Object Graph. Nodes (STIX SDO/SCOs) can have multiple edges (SROs) making for much more complex graphs. Similarly, relationships can be defined in lists in STIX SDO properties, e.g. object_refs
).
In these cases, it is likely you will want to filter the information being returned using AQL queries. AQL is really well documented on the ArangoDB website, but I will cover some basic queries for this.
For example, here is a query using the first example to get the ID’s ( RETURN item.id
) five (LIMIT 5
) most recently created Objects (SORT item.created DESC
) of type “indicator” (FILTER item.type == "indicator"
) in the stix_objects
collection.
FOR item IN stix_objects
FILTER item.type == "indicator"
SORT item.created DESC
LIMIT 5
RETURN item.id
Which returns;
[
"indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f"
]
Before I showed you the out-of-the-box graph view using the UI query builder (that I used to create the Graph stix_graph
) – this created an automatically generated graph traversal query. It is also possible to write these queries to customise the nodes and edges returned in the graph to modelled. For example;
FOR vertex, edge, path IN 1..5
OUTBOUND 'stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f'
GRAPH 'stix_graph'
RETURN path
Here is how this query is formed;
- The
FOR
takes three variables, thevertex
in the traversal, theedge
in the traversal and the currentpath
. The current path contains two members, vertices and edges. Here I am asking to return the paths matching the request that’s why I can find the edges and vertices keys in the result. IN
1..5
specifies the minimal and maximal depth for the traversal. 0 would have been a traversal starting from the original vertex.OUTBOUND
specifies the direction to follow Possible values areOUTBOUND
ORINBOUND
ORANY
. The object I used as the original vertex for the traversal is the source in the relationship object. For this reasonINBOUND
would no return any result in our example.stix_objects/{_key}
defines the vertex where the traversal originates from.GRAPH stix_graph
is the name identifying the named graph to use for the traversal.
Running the query gives us the following json output (which can also be modelled as a graph):
[
{
"edges": [
{
"_key": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
"_id": "stix_relationships/relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
"_from": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"_to": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"_rev": "_ep_8O9C---",
"type": "relationship",
"id": "relationship--44298a74-ba52-4f0c-87a3-1824e67d7fad",
"created": "2016-04-06T20:06:37.000Z",
"modified": "2016-04-06T20:06:37.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"relationship_type": "indicates",
"source_ref": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"target_ref": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b"
}
],
"vertices": [
{
"_key": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"_id": "stix_objects/indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"_rev": "_ep_1Eeq---",
"type": "indicator",
"id": "indicator--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",
"created": "2016-04-06T20:03:48.000Z",
"modified": "2016-04-06T20:03:48.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"labels": [
"malicious-activity"
],
"name": "Poison Ivy Malware",
"description": "This file is part of Poison Ivy",
"pattern": "[ file:hashes.'SHA-256' = '4bac27393bdd9777ce02453256c5577cd02275510b2227f473d03f533924f877' ]",
"valid_from": "2016-01-01T00:00:00Z"
},
{
"_key": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"_id": "stix_objects/malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"_rev": "_ep_1Eeq--_",
"type": "malware",
"id": "malware--31b940d4-6f7f-459a-80ea-9c1f17b5891b",
"created": "2016-04-06T20:07:09.000Z",
"modified": "2016-04-06T20:07:09.000Z",
"created_by_ref": "identity--c2aceda2-0e46-431d-be40-7b4a4e797f81",
"name": "Poison Ivy"
}
]
}
]
As the graphs grow, these AQL queries prove very efficient at returning STIX 2.1 data with complex relationship structures.
Hopefully this post has given those building cyber threat intelligence tools who are new to ArangoDB enough information to give them a few ideas, and avoid making some of the errors I have in the past – it can be complex to store and query STIX 2.1 Objects in regular databases without degrading performance.
If you want to dive a bit deeper, I suggest taking a look at the tutorials in the ArangoDB documentation which helped me a lot to create this post.
I am still learning ArangoDB but you have any questions about the content covered in these posts (or have feedback), please do not hesitate to drop me a message on Slack.
CONGRATULATIONS!
You have made it to the end of this short course.
Hopefully the last three months have given you a deeper understanding of modelling cyber threat intelligence using STIX 2.1 and a few tools that will help you turn this theory into reality.
Here are some useful links to bookmark following this course, some I have covered, some I have not, that you I find useful when working with STIX 2.1:
- The STIX 2.1 Specification will be your number one resource for looking information up
- Alongside the specification, the Python APIs for STIX 2.1 will prove very handy
- Not using cti-python-stix2? To make sure any STIX Objects you create or modify conform to the STIX 2.1 specification, you can use OASIS’s STIX Object Validator.
- The STIX Pattern Validator is also useful when creating Patterns outside of cti-python-stix2
- And the STIX Pattern Matcher compares STIX Observed Data content against patterns used in STIX Indicators for detections
- For visualising STIX Objects I prefer to use (including in this post), STIX View
- I should also note OASIS also offer their own STIX Viewer
If you have any questions about the content in this tutorial, please do not hesitate to drop us a message on Slack.
STIX 2.1 Certification (Virtual and In Person)
The content used in this post is a small subset of our full training material used in our STIX 2.1 training.
If you want to join a select group of certified STIX 2.1 professionals, subscribe to our newsletter below to be notified of new course dates.
Discuss this post

Never miss an update
Sign up to receive new articles in your inbox as they published.