Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will explain the difference between UUID versions and which versions are best suited to different scenarios when modelling threat intelligence.

The STIX specification states:

All identifiers, excluding those used in the deprecated Cyber Observable Container, MUST follow the form object-type–UUID, where object-type is the exact value (all type names are lowercase strings, by definition) from the type property of the object being identified or referenced and where the UUID MUST be an RFC 4122-compliant UUID [RFC4122].

STIX Domain Objects, STIX Relationship Objects, STIX Meta Objects, and STIX Bundle Object SHOULD use UUIDv4 for the UUID portion of the identifier. Producers using something other than UUIDv4 need to be mindful of potential collisions and should use a namespace that guarantees uniqueness, however, they MUST NOT use a namespace of 00abedb4-aa42-466c-9c01-fed23315a9b7 if generating a UUIDv5.

Essentially this means any one of the five UUID (Universally Unique IDentifier) versions defined in RFC4122 could be used for many STIX Object ID’s.

For all demos used previously on this blog I use a randomly generated UUIDv4’s, though that’s not to say it has always been the best choice.

Before I explain how different UUID versions could improve some STIX related use-cases, let me start by explaining the different UUID versions.

An brief introduction to UUIDs

UUIDs were designed to provide a consistent format for any unique ID data.

All UUID versions are just 128 bit pieces of data, that is displayed as (128/4) = 32 hexadecimal digits, like this: 4edcc5e0-2460-4e30-8b1e-cb7d2e91b8ea (which is a v4 UUID).

UUIDv1

UUID Version 1

UUIDv1 is based on the time of generation and the MAC address for the computer or “node” generating the UUID.

In addition to these two pieces of data, it also introduces a third completely random component just to be sure of its uniqueness.

UUIDv2

Like a UUIDv1, a UUIDv2 uses the time of generation, along with the MAC address (or node) for a network interface on the local machine.

Additionally, a UUIDv2 replaces the low part of the time field with a local identifier such as the user ID or group ID of the local account that created the UUID. This serves three purposes:

  • You can know when the identifier was created
  • You can know where the identifier was created
  • You can know who created the identifier

UUIDv2’s are usually only used for distributed systems type use-cases (e.g. identifying database nodes).

UUIDv3

UUIDv3’s are generated based on a “namespace” and unique “name”.

Namespace and name are concatenated and hashed using the MD5 algorithm.

The UUID specification establishes four pre-defined namespaces. The pre-defined namespaces are:

  • DNS (for domain names): 6ba7b810-9dad-11d1-80b4-00c04fd430c8
  • URL (for URLs): 6ba7b811-9dad-11d1-80b4-00c04fd430c8
  • OID (for ISO Object IDentifiers): 6ba7b812-9dad-11d1-80b4-00c04fd430c8
  • X.500 DN (for X.500 Distinguished Names): 6ba7b814-9dad-11d1-80b4-00c04fd430c8

This means for STIX I could use any namespace in the format 00000000-0000-0000-0000-000000000000 (e.g. 8818ff6f-bf1f-4108-a22a-2cd3b468789b).

For example, assume 8818ff6f-bf1f-4108-a22a-2cd3b468789b is the namespace, and we use the STIX Object name FancyBear, the resulting UUIDv3 would be 9a7483f5-02e8-325d-9e6b-978b6588a483.

UUIDv4

UUID Version 4

UUIDv4’s are randomly generated, that is; they require no input to generate them. This is the most common UUID version.

Timestamp-first are not mentioned in the UUID RFC; however, they have become a common (but non-standard) variation of version 4 UUIDs. This format is sometimes called “Ordered UUIDs” or “COMB” (combined time-GUID).

UUIDv4’s with timestamps start with the time property followed by randomness. There are two main reason for beginning UUIDs with the current timestamp:

  • When sorting by UUIDv4 they will appear in the order created
  • Ordered UUIDv4’s can be more efficiently stored in indexed databases columns compared to random UUIDv4’s

There are several variations of timestamp-first UUIDv4’s (beware, there are quite a few).

Here’s one implementation in Node. In their docs the example UUIDv4 (15972459-4799-4612-a723-231092612723) prints time to nanoseconds in the first 8 characters (15972459); Wed, 12 Aug 2020 15:25:47.000 GMT.

UUIDv5

UUIDv5’s are very similar to UUIDv3’s, however, are based on a SHA-1 hash of the name and namespace (versus MD5 used by UUIDv3).

For example, again assume 8818ff6f-bf1f-4108-a22a-2cd3b468789b is the namespace, and we use the STIX Object name FancyBear, the resulting UUIDv5 would be; f20c3d16-4740-5639-ba7a-0156bf37b305 (versus MD5 based UUIDv3’s which was 9a7483f5-02e8-325d-9e6b-978b6588a483).

UUIDv5’s are the more secure (SHA-1 vs MD5) and generally recommended version. If are dealing with very strict resource requirements (e.g. a very busy Arduino board) UUIDv3’s can be the better choice as a trade-off.

UUIDv6, v7 and v8 (proposed additions)

UUIDv6, UUIDv7, and UUIDv8 are proposed additions to RFC 4122. The draft is being discussed in a GitHub repository.

See the articles here and here to fully understand the reason for new UUIDs (and a little more insight into the benefits of timestamps in UUIDs). tl;dr, they offer a much more efficient way of using timestamps in UUIDs (especially useful for database queries – see first link for benchmarks)

UUIDs and STIX

The structure of different UUID versions gives us some interesting options to consider when generating id values for various STIX objects.

SCOs

For SCOs it is important to consider the types of properties they contain. Take for example the Domain Name SCO.

There are five types of properties that can be considered for SDOs and SROs;

STIX 2.1 Domain name Properties

  • Required Common Properties
  • Optional Common Properties
  • Not Applicable Common Properties
  • Specific Properties
  • ID Contributing Properties

SCOs do not contain time properties (created or modified). This is because these objects are not really created and modified. For example the domain name value example.com should never change. Of course, you might learn more about what example.com is doing (e.g. being used as a phishing domain), but this data should be represented by other connected STIX Objects.

This is where ID Contributing Properties come in. An SCOs ID Contributing Properties should be the factors used to generate the ID, that is, the ID should not really be random. In this case a UUI5 would be perfect. OASIS use UUIDv5 in their STIX2 Python library for generating SCOs.

Lets say I have a domain name value example.com. I could create a namespace for the type domain-name (e.g. 20f86936-8762-11ed-967d-13065b84bbdb which I randomly generated), and then add ID Contributing Property ("value": "example.com") in the name field, which gives me a UUID v5; 0c8e0fdf-dd39-51c6-a4f1-5395839f569d. Try it for yourself. Giving me a full domain name SCO;

{
    "type": "domain-name",
    "spec_version": "2.1",
    "id": "domain-name--0c8e0fdf-dd39-51c6-a4f1-5395839f569d",
    "value": "example.com"
}

Note, the STIX2 Python takes the approach of a single namespace for all SCOswhich you should not use for other objects (SDOs and SROs) when generating UUIDv5.

If you want an additional level of certainty that two objects are the same, you could pass the entire json STIX object, without the ID.

However, a word of warning if choose this route; the potential issue with this approach is that it is assumes all the properties in the object appear in the same order between producers when the UUIDv5 is being generated. If all producers follow the order of the specification to generate objects it’s not such issue (but that’s nowhere near a certainty). When custom fields, etc., are used to generate the UUIDv5 you will end up with different IDs.

For example, the same example as above as a minified json object (which I will use as the name field to generate the UUIDv5);

{"type":"domain-name","spec_version":"2.1","value":"example.com"}

And using the same namespace as before would give a UUIDv5 of; 2cf60ee3-58dc-52cc-a2c2-1fcc7ef4aed3, which gives us the full STIX object;

{
    "type": "domain-name",
    "spec_version": "2.1",
    "id": "domain-name--2cf60ee3-58dc-52cc-a2c2-1fcc7ef4aed3",
    "value": "example.com",
}

But if I passed the name properties in a different order I would get a different UUIDv5, e.g.

{"value":"example.com","type":"domain-name","spec_version":"2.1",}

Gives a UUIDv5 of; 8ba81a7a-8f0d-5ba6-a7f0-8278f2699657.

However this second approach is particularly useful for producers not using the STIX2 Python library because many SCOs have multiple ID Contributing Properties.

Take for example the Network Traffic SCO which has the ID Contributing Properties; start, end, src_ref, dst_ref, src_port, dst_port, protocols, and extensions.

In these cases instead of figuring out how to construct a schema for the name field, you can simply pass the object without the ID.

Creating a namespace for network-traffic (e.g. 6ad222de-2f99-49fb-993f-dd5f43a49f35 which I randomly generated) and the minified object;

{"type":"network-traffic","spec_version":"2.1","id":"network-traffic--162f6d1f-4e77-56e4-adb9-ac55c416f852","src_ref":"ipv4-addr--4d22aae0-2bf9-5427-8819-e4f6abf20a53","dst_ref":"ipv4-addr--ff26c055-6336-5bc5-b98d-13d6226742dd","src_port":"80","dest_port":"80"}

Would give a UUIDv5 of; 41987da3-a081-54f0-81a0-745d81d784c3.

Regardless of which of these two approaches there are two big benefits of using UUIDv5 over UUIDv4 for SCO generation;

  1. it prevents unauthorised modifications to an SCO
  2. if everyone in the STIX community was to also adopt this approach (the STIX2 lib already declares a namespace for SCOs), it would ensure SCOs talking about the same thing (e.g. 10 vendors all creating an example.com domain name SCO) could be easily linked (because they will all have the same ID).

SDOs

Unlike SCOs, SDOs do not containing ID Contributing Properties, so the same object could have multiple IDs (which is important for the approach to major versioning that the STIX standard defines). Essentially this means almost all the SDO object property can change at anytime.

STIX 2.1 Indicator Properties

That said, if a major version happens to an SDO the created date always remains the same (as does the type). What this means is that the type and time can be written into the UUIDs which gives us a few options;

  • UUIDv1: where can only write in the time
  • UUIDv4: where can only write in the time
  • UUIDv3 / UUIDv5: where the time and type could both be written into the name part (or as shown earlier, type could be used as the namespace)

The question really becomes; how much do you need the ID to contain a time?

For example, there might be situations where you don’t have the full STIX object, so having the time easily decodeable in UUIDv1 or UUIDv4 is the better choice. As shown earlier, UUIDv1 or UUIDv4 (although not a standard) have specific parts of the UUID designed for time and thus simple to decode. When using UUIDv3 and UUIDv5, the namespaces and names are hashed which means there is no other way to decode the namespace or name from a given UUID than bruteforce – which is clearly not feasible at scale.

All our data is stored in databases with time fields, making this less of a concern, and as such I have considered an approach similar to SCO generation.

First I minify the entire json object WITHOUT the ID field, for example;

{"type":"indicator","spec_version":"2.1","created_by_ref":"identity--f431f809-377b-45e0-aa1c-6a4751cae5ff","created":"2016-04-06T20:03:48.000Z","modified":"2016-04-06T20:03:48.000Z","indicator_types":["malicious-activity"],"name":"Poison Ivy Malware","description":"This file is part of Poison Ivy","pattern":"[ file:hashes.'SHA-256' = '4bac27393bdd9777ce02453256c5577cd02275510b2227f473d03f533924f877' ]","pattern_type":"stix","valid_from":"2016-01-01T00:00:00Z"}

And pass it as I did before, this time with an Indicator namespace (6564c092-5f61-4f9b-bec8-0f9915d4fc49), to give a UUIDv5 of; fd5b243a-07de-59e0-af54-fbb57e5df946.

This makes it very easy to see if an object has been changed, in a similar but more thorough way than described for SCOs. The thoroughness is not such an issue when compared to SCOs though, because even though they might be talking about the same thing, the information held in SDOs is more subjective between producers.

What this does mean is that any change to an SDO is essentially a major change (as any changes to the JSON payload will change the id UUIDv5 value), which may or may not suit your approach as any updates will need to filter down to relationship objects, which brings us nicely to…

SROs

STIX 2.1 Relationship Properties

SROs are a cross between SCOs when it comes to properties and SDOs when it comes to their life-cycle.

As you can see, SROs have ID Contributing Properties like SCOs that could be used to generate a UUIDv5.

Usually once generated, an SRO never changes until it is revoked. It is designed to link two objects.

If the IDs of the objects change (i.e. due to major version change), then a new SRO is generated, and the old one remains valid but revoked. Similarly if the description of the relationship between the objects changes (relationship_type), again, a new SRO is generated for this.

As such, any of the approaches for UUIDv5 generation described earlier for both SDOs (use all properties) or SCOs (use only ID contributing properties) could be adopted.

In summary

Generally speaking STIX objects are interrogated in a number of ways. Below is some of the most common use-cases for me when working with threat intel

  • string search: e.g. find an object of type X with name containing Y
  • graph search: e.g. show me all objects linked to object Z
  • time search/sort: e.g. show me the most recently created or modified objects

Assuming you’re the producer of a STIX Object, normalising on UUIDv5 is probably the best choice as it helps ensure integrity of objects and makes for easier management of those objects. However, UUIDv5 is most important for SCOs.

OASIS use a mix of UUIDv4 and UUIDv5 in their STIX2 Python library – UUIDv5 for SCOs and UUIDv4 for all other objects (and this is what I use right now in all our tools).

If you have a need to search on ID (key) only (like I describe here), then adding a timestamp to a UUIDv4 to SDOs and SROs id’s might be the way to go, but this is not of any benefit if you have access to the created or modified properties already.

Coming to the end of this post I’m aware I might have introduced more complexity than clarity, but hopefully this provides some food-for-thought behind what appears, at first glance, to be somewhat simplistic. My recommendation, let the STIX2 Python library from OASIS handle object id generation.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.