If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.
In this post I will explain the difference between UUID versions and which versions are best suited to different scenarios when modelling threat intelligence.
The STIX specification states:
All identifiers, excluding those used in the deprecated Cyber Observable Container, MUST follow the form object-type–UUID, where object-type is the exact value (all type names are lowercase strings, by definition) from the type property of the object being identified or referenced and where the UUID MUST be an RFC 4122-compliant UUID [RFC4122].
STIX Domain Objects, STIX Relationship Objects, STIX Meta Objects, and STIX Bundle Object SHOULD use UUIDv4 for the UUID portion of the identifier. Producers using something other than UUIDv4 need to be mindful of potential collisions and should use a namespace that guarantees uniqueness, however, they MUST NOT use a namespace of 00abedb4-aa42-466c-9c01-fed23315a9b7 if generating a UUIDv5.
Essentially this means any one of the five UUID (Universally Unique IDentifier) versions defined in RFC4122 could be used for many STIX Object ID’s.
For all demos used previously on this blog I use a randomly generated UUIDv4’s, though that’s not to say it has always been the best choice.
Before I explain how different UUID versions could improve some STIX related use-cases, let me start by explaining the different UUID versions.
An brief introduction to UUIDs
UUIDs were designed to provide a consistent format for any unique ID data.
All UUID versions are just 128 bit pieces of data, that is displayed as (128/4) = 32 hexadecimal digits, like this: 4edcc5e0-2460-4e30-8b1e-cb7d2e91b8ea
(which is a v4 UUID).
UUIDv1
UUIDv1 is based on the time of generation and the MAC address for the computer or “node” generating the UUID.
In addition to these two pieces of data, it also introduces a third completely random component just to be sure of its uniqueness.
UUIDv2
Like a UUIDv1, a UUIDv2 uses the time of generation, along with the MAC address (or node) for a network interface on the local machine.
Additionally, a UUIDv2 replaces the low part of the time field with a local identifier such as the user ID or group ID of the local account that created the UUID. This serves three purposes:
- You can know when the identifier was created
- You can know where the identifier was created
- You can know who created the identifier
UUIDv2’s are usually only used for distributed systems type use-cases (e.g. identifying database nodes).
UUIDv3
UUIDv3’s are generated based on a “namespace” and unique “name”.
Namespace and name are concatenated and hashed using the MD5 algorithm.
The UUID specification establishes four pre-defined namespaces. The pre-defined namespaces are:
- DNS (for domain names):
6ba7b810-9dad-11d1-80b4-00c04fd430c8
- URL (for URLs):
6ba7b811-9dad-11d1-80b4-00c04fd430c8
- OID (for ISO Object IDentifiers):
6ba7b812-9dad-11d1-80b4-00c04fd430c8
- X.500 DN (for X.500 Distinguished Names):
6ba7b814-9dad-11d1-80b4-00c04fd430c8
This means for STIX I could use any namespace in the format 00000000-0000-0000-0000-000000000000
(e.g. 8818ff6f-bf1f-4108-a22a-2cd3b468789b
).
For example, assume 8818ff6f-bf1f-4108-a22a-2cd3b468789b
is the namespace, and I use the STIX Object name FancyBear
, the resulting UUIDv3 would be 9a7483f5-02e8-325d-9e6b-978b6588a483
.
UUIDv4
UUIDv4’s are randomly generated, that is; they require no input to generate them. This is the most common UUID version.
Timestamp-first are not mentioned in the UUID RFC; however, they have become a common (but non-standard) variation of version 4 UUIDs. This format is sometimes called “Ordered UUIDs” or “COMB” (combined time-GUID).
UUIDv4’s with timestamps start with the time property followed by randomness. There are two main reason for beginning UUIDs with the current timestamp:
- When sorting by UUIDv4 they will appear in the order created
- Ordered UUIDv4’s can be more efficiently stored in indexed databases columns compared to random UUIDv4’s
There are several variations of timestamp-first UUIDv4’s (beware, there are quite a few).
Here’s one implementation in Node. In their docs the example UUIDv4 (15972459-4799-4612-a723-231092612723
) prints time to nanoseconds in the first 8 characters (15972459
); Wed, 12 Aug 2020 15:25:47.000 GMT.
UUIDv5
UUIDv5’s are very similar to UUIDv3’s, however, are based on a SHA-1 hash of the name and namespace (versus MD5 used by UUIDv3).
For example, again assume 8818ff6f-bf1f-4108-a22a-2cd3b468789b
is the namespace, and I use the STIX Object name FancyBear
, the resulting UUIDv5 would be; f20c3d16-4740-5639-ba7a-0156bf37b305
(versus MD5 based UUIDv3’s which was 9a7483f5-02e8-325d-9e6b-978b6588a483
).
UUIDv5’s are the more secure (SHA-1 vs MD5) and generally recommended version. If are dealing with very strict resource requirements (e.g. a very busy Arduino board) UUIDv3’s can be the better choice as a trade-off.
UUIDv6, v7 and v8 (proposed additions)
UUIDv6, UUIDv7, and UUIDv8 are proposed additions to RFC 4122. The draft is being discussed in a GitHub repository.
See the articles here and here to fully understand the reason for new UUIDs (and a little more insight into the benefits of timestamps in UUIDs). tl;dr, they offer a much more efficient way of using timestamps in UUIDs (especially useful for database queries – see first link for benchmarks)
UUIDs and STIX
The structure of different UUID versions gives us some interesting options to consider when generating id
values for various STIX objects.
SCOs
For SCOs it is important to consider the types of properties they contain. Take for example the Domain Name SCO.
There are five types of properties that can be considered for SDOs and SROs;
- Required Common Properties
- Optional Common Properties
- Not Applicable Common Properties
- ID Contributing Properties
SCOs do not contain time properties (created
or modified
). This is because these objects are not really created and modified. For example the domain name value example.com
should never change. Of course, you might learn more about what example.com
is doing (e.g. being used as a phishing domain), but this data should be represented by other connected STIX Objects.
This is where ID Contributing Properties come in. An SCOs ID Contributing Properties should be the factors used to generate the ID, that is, the ID should not really be random. In this case a UUIDv5 would be perfect. OASIS use UUIDv5 in their STIX2 Python library for generating SCOs.
OASIS suggest that the namespace for generating the UUIDv5 should be 00abedb4-aa42-466c-9c01-fed23315a9b7
.
Lets say I have a domain name value example.com
.
The value of the name portion SHOULD be the list of “ID Contributing Properties” (property-name and property value pairs) as defined on each SCO object and SHOULD be represented as a JSON object that is then serialized / stringified according to [RFC8785] to ensure a canonical representation of the JSON data.
Source: STIX Specification
So I would pass the namespace 00abedb4-aa42-466c-9c01-fed23315a9b7
and the JSON object…
{"value" : "example.com"}
To generate the UUIDv5.
Together they give me a UUID v5; d9653e29-95e6-5fd8-8954-925b1b3dea97
.
Giving me a full domain name SCO;
{
"type": "domain-name",
"spec_version": "2.1",
"id": "domain-name--b768e201-0346-54d6-8b09-f94f24802c43",
"value": "example.com"
}
A word of warning…
The JSON object I passed above as the value to generate the UUIDv5 was stringified (as OASIS suggest). If it was minified, I would get a different UUIDv5, e.g.
{"value":"example.com"}
Which using the OASIS namespace gives the UUID v5 of bedb4899-d24b-5401-bc86-8f6b4cc18ec7
, different to the strigified version. Whilst the minified example goes against OASIS recommendations, you would not be surprised as to the number of producers doing this (both intentionally and unintentionally).
It gets more complex though…
Take for example the Network Traffic SCO which has the ID Contributing Properties; start
, end
, src_ref
, dst_ref
, src_port
, dst_port
, protocols
, and extensions
.
If multiple ID Contributing Properties exist, the order in which these are passed is vital for the same reason.
For example passing all the ID Contributing Properties;
{"src_ref":"ipv4-addr--4d22aae0-2bf9-5427-8819-e4f6abf20a53","dst_ref":"ipv4-addr--ff26c055-6336-5bc5-b98d-13d6226742dd","src_port":"80","dest_port":"80"}
Which using the OASIS namespace gives the UUID v5 of 87a6bafa-f243-52b3-bc33-c453afd49000
.
If I move src_port
and dst_port
around…
{"src_ref":"ipv4-addr--4d22aae0-2bf9-5427-8819-e4f6abf20a53","dst_ref":"ipv4-addr--ff26c055-6336-5bc5-b98d-13d6226742dd","dest_port":"80","src_port":"80"}
Gives the UUID v5 of 6d56c245-c5dd-582b-bf28-cdf3959bda4a
.
This is why I strongly suggest all producers use the STIX 2.1 Python Library to generate STIX objects, so that the same generation behaviour occurs to support the goal of deduplication and semantic equivalence of some STIX objects in the community of producers (e.g. 10 producers creating an example.com
domain name SCO would all create the same object, with the same ID).
It’s also much easier to use the library than trying to figure out this logic for yourself!
OASIS advise you can use custom namespaces when generating UUIDv5s for SCOs. I’d advise against this (at least without a clear reason not to) for the reasons outlined above.
If the contributing properties are all optional, and none are present on the SCO, then a UUIDv4 MUST be used.
Source: STIX Specification
In some cases, the above may be possible, in which cases UUIDv4 is the fallback (and is the only situation I’d use v4s for SCO ID generation).
SDOs
Unlike SCOs, SDOs do not containing ID Contributing Properties, so the same object could have multiple IDs (which is important for the approach to major versioning that the STIX standard defines). Essentially this means almost all the SDO object property can change at anytime.
OASIS strongly advise UUIDv4 should be used for SDO generation (and implement that in the STIX 2 Python library). Though lets theorise the options available to us.
If a major version happens to an SDO the created
date always remains the same (as does the type
). What this means is that the type and time can be written into the UUIDs which gives us a few options;
- UUIDv1: can include time
- UUIDv4: can include time (hacky)
- UUIDv3 / UUIDv5: where the time and type could both be written into the name part (or as shown earlier, type could be used as the namespace)
The question really becomes; how much do you need the ID to contain a time?
For example, there might be situations where you don’t have the full STIX object (e.g. IDs in a list), so having the time easily decodeable in UUIDv1 or UUIDv4 is the better choice. As shown earlier, UUIDv1 or UUIDv4 (although not a standard) have specific parts of the UUID designed for time and thus can be decoded (although not easy for v4). When using UUIDv3 and UUIDv5, the namespaces and names are hashed which means there is no other way to decode the namespace or name from a given UUID than bruteforce – which is clearly not feasible at scale.
All our data is stored in databases with time fields, making this less of a concern. I’d also argue for most use-cases, time is not needed for UUIDs. Thus, I use UUIDv4s for generating all SDO IDs.
If you do decide to use UUIDv5s for SDOs, OASIS dictate you should not use the OASIS namespace used to generate SCOs for SDOs.
SROs
SROs are most like SDOs when it comes to their properties and life-cycle.
Usually once generated, an SRO never changes until it is revoked. It is designed to link two objects in perpetuity.
If the IDs of the objects change (i.e. due to major version change), then a new SRO is generated, and the old one remains valid but revoked. Similarly if the description of the relationship between the objects changes (relationship_type
), again, a new SRO is usually generated for this.
OASIS recommend UUIDv4 again as there is no real use case for v5, and UUIDv4 is the approach we take.
In summary
Generally speaking STIX objects are interrogated in a number of ways. Below is some of the most common use-cases for me when working with threat intel
- string search: e.g. find an object of type
X
with name containingY
- graph search: e.g. show me all objects linked to object
Z
- time search/sort: e.g. show me the most recently created or modified objects
Assuming you’re the producer of a STIX Object, normalising on UUIDv5 for SCOs is the best choice as it helps ensure integrity of objects and makes for easier management of those objects. For SDOs and SROs, UUIDv4 for IDs solves for most use-cases.
OASIS use this mix of UUIDv4 (for SDOs and SROs) and UUIDv5 (for SCOs) in their STIX2 Python library.
If you have a need to search on ID (key) only (like I describe here), then adding a timestamp to a UUIDv4 to SDOs and SROs id
’s might be the way to go, but there are probably better ways to do this (i.e. better database design).
Coming to the end of this post I’m aware I might have introduced more complexity than clarity, but hopefully this provides some food-for-thought behind what appears, at first glance, to be somewhat simplistic.
tl;dr… use the STIX2 Python library from OASIS handle object id
generation.
Discuss this post

Never miss an update
Sign up to receive new articles in your inbox as they published.