Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will deconstruct STIX Patterns and demonstrate how to write effective detection rules (aka patterns).

Note: this post is written for OASIS STIX version 2.1. The concepts discussed are not always correct for earlier versions of OASIS STIX.

The ultimate use of intelligence is to try and defend or counteract for it. For example, understanding how to put in place network defenses or to mitigate an attack that has been successful in part of its initiatives.

Part of this is to ensure you are able to detect security events (to ensure the bit of intelligence you are looking at has not already impacted you).

Many of you will be familiar with detection languages in SIEMs to search for malicious events. There might be as simple as searching for an IP address, or more complex looking for behaviours and patterns alongside evidential breadcrumbs.

In STIX 2.1, Indicator SDOs must contain a pattern Property that can be used to describe suspicious or malicious cyber activity.

The STIX 2.1 Indicator SDO specification is flexible enough to allow for a range of detection language (pattern_type) as defined in the Pattern Type Vocabulary, these are;

  • pcre: Perl Compatible Regular Expressions language
  • sigma: SIGMA language
  • snort: SNORT language
  • suricata: SURICATE language
  • yara: YARA language
  • stix: STIX pattern language

For example, I could use a YARA pattern inside an Indicator SDO by defining the Properties "pattern_type": "yara" and define the actual YARA rule under the "pattern" Property.

Note, the YARA rule (or any non-STIX pattern type) should be encoded into JSON. You can see this in the raw json in the bundle below;

STIX pattern language

You will have seen the STIX specific pattern_type listed above. This is a pattern language defined by OASIS in the STIX 2.1 specification.

Before continuing, it is worth making the distinction between Indicator SDOs and Indicators of Compromise (IOCs) – they are often confused.

IOCs are atomic indicators. These are elements or fragments of data that cannot be broken down any further like a hostname, IP address, email address, file name, etc.

Indicator SDOs can contain zero or more IOCs and the logical relationship between them as patterns.

In the case of sigma pattern_types, for example, the detection might be purely behavioural and not contain any IOCs.

Whereas stix pattern_types are composed of IOCs (aka SCOs), ranging from simple key-value comparisons to more complex, context-sensitive expressions.

Here is the general structure of a STIX Pattern;

STIX Pattern structure

It is a lot! Let me try and take this structure apart for you.

Comparison Expressions and Operators

Comparison Expressions are the fundamental building blocks of STIX patterns.

They take an Object Path (using SCOs) and Object Value with a Comparison Operator to evaluate their relationship.

STIX Pattern Comparison Expressions and Operators

Multiple Comparison Expressions can joined by Comparison Expression Operators to create an Observation Expression.

My earlier example of a filename showed a simple Comparison Expression in a Pattern.

Here is an example of a simple Comparison Expression to detect an IPv4 address:

[ipv4-addr:value='198.51.100.1']

It uses the IPv4 Address SCO (ipv4-addr) and its ID Contributing Property (value) as the Object path (shown in specification screenshot below). The Object value is 198.51.100.1.

ipv4 SCO specification

Another example, using a Windows Registry Key;

[windows-registry-key:key='HKEY_LOCAL_MACHINE\\System\\Foo\\Bar']

Here I use Windows Registry Key Object Key SCO and its ID Contributing Property (key) (shown in specification screenshot below). The Object value is HKEY_LOCAL_MACHINE\\System\\Foo\\Bar.

Windows Registry Key SCO specification

You can use a range Comparison Operators in addition to equals (=). Does not equal (!=), is greater than (>), is less than or equal to (>=), etc.

[directory:path LIKE 'C:\\Windows\\%\\foo']

In the above example I am using the LIKE Comparison Operator. You will notice it is possible to pass capture groups. In the example above % catches 0 or more characters.

As such a pattern would match (be true) if C:\Windows\DAVID\foo, C:\Windows\JAMES\foo, etc. was observed.

Observation Expressions, Operators and Qualifiers

More than one Comparison Expression can be joined using a Comparison Expression Operator to create an Observation Expression.

STIX Pattern Observation Expressions

The entire Observation Expression is captured in square brackets [].

For example, a pattern to match match on either 198.51.100.1/32 or 203.0.113.33/32 could be expressed with the OR Comparison Expression Operator;

[ipv4-addr:value='198.51.100.1/32' OR ipv4-addr:value='203.0.113.33/32']

Changing the Comparison Expression Operator to an AND makes the pattern match on both 198.51.100.1/32 and 203.0.113.33/32;

[ipv4-addr:value='198.51.100.1/32' AND ipv4-addr:value='203.0.113.33/32']

Observation Expressions can also be joinged using Observation Operators.

In the following example there are two Observation Expressions joined by the Observation Operator FOLLOWEDBY;

[ipv4-addr:value='198.51.100.1/32'] FOLLOWEDBY [ipv4-addr:value='203.0.113.33/32']

The FOLLOWEDBY Observation Operator defines the order in which Comparison Expressions must match. In this case 198.51.100.1/32 must be followed by 203.0.113.33/32. Put another way, 198.51.100.1/32 must be detected before 203.0.113.33/32.

Observation Expression Qualifiers allow for even more definition at the end of a pattern.

You can define WITHIN, START/ STOP, and REPEATS Observation Expression Qualifiers.

The following example requires the two Observation Expressions to repeat 5 times in order for a match;

([ipv4-addr:value='198.51.100.1/32'] FOLLOWEDBY [ipv4-addr:value='203.0.113.33/32']) REPEATS 5 TIMES

Here is another example that is very similar to a pattern used for malware detection;

([file:hashes.'SHA-256'='ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'] AND [win-registry-key:key='hkey']) WITHIN 120 SECONDS

Here if the file hash Observation Expression and a Windows Registry Observation Expression are true within 120 seconds of each other then the pattern matches.

Precedence and Parenthesis

Operator Precedence is an important consideration to keep in mind when writing Patterns.

Consider the following Pattern:

[ipv4-addr:value='198.51.100.1/32'] FOLLOWEDBY ([ipv4-addr:value='203.0.113.33/32'] REPEATS 5 TIMES)

Here, the first Observation Expression requires a match on an ipv4-addr:value equal to 198.51.100.1/32 that precedes 5 occurrences of the Observation Expression where ipv4-addr:value equal to 203.0.113.33/32.

Now consider the following Pattern (almost identical to before, but notice the parentheses):

([ipv4-addr:value='198.51.100.1/32'] FOLLOWEDBY [ipv4-addr:value='203.0.113.33/32']) REPEATS 5 TIMES

The first Observation Expression requires a match on an ipv4-addr:value equal to 198.51.100.1/32 followed by a match on the second Observation Expression for an ipv4-addr:value equal to 203.0.113.33/32, this pattern must be seen 5 times for a match.

Some examples to test you

Below are samples from a Linux audit log (/var/log/audit/audit.log) – it is better viewed by copying it into a text editor for this exercise.

2019-08-20 09:08:55:906 type=USER_LOGIN msg=audit(1566306445.906:280) user pid=2318 uid=0 auid 4294967295 ses=4294967295 username=unknown subj=system_u:system_r:sshd_t:s0-"(unknown)" exe="/usr/sbin/sshd" hostname=? addr=218.92.0.173 terminal=ssh res=failed'
2019-08-20 09:07:25:647 type=USER_LOGIN msg=audit(1566306445.647:242) user pid=2314 uid=0 auid 4294967295 ses=4294967295 username=mike subj=system_u:system_r:sshd_t:s0-"(mike)" exe="/usr/sbin/sshd" hostname=? addr=60.242.115.215 terminal=ssh res=failed'
2019-08-20 09:07:25:195 type=USER_LOGIN msg=audit(1566306445.195.262) user pid=2311 uid=0 auid 4294967295 ses=4294967295 username=mike subj=system_u:system_r:sshd_t:s0-"(mike)" exe="/usr/sbin/sshd" hostname=? addr=60.242.115.215 terminal=ssh res=failed'

Assume the SIEM has aliased field names correctly (e.g. addr field in the logs resolves to an IPv4 address field in the data model, which in turn is mapped to the ipv4-addr SCO).

Example 1: Using the OR Observation Expression

[ipv4-addr:value='218.92.0.173'] OR [ipv4-addr:value='1.1.1.1']

Matches.

The statement IPv4 218.92.0.173 was True for one line (log line 1).

Example 2: Using the AND Observation Expression

[ipv4-addr:value='218.92.0.173'] AND [ipv4-addr:value='1.1.1.1']

Does not match.

Both of the statements needed to be True to satisfy the AND operator, but only the IPv4 218.92.0.173 statement was ever true (log line 1).

Example 3: Using the FOLLOWEDBY Observation Expression

[ipv4-addr:value='60.242.115.215'] FOLLOWEDBY [user-account.account_login='mike']

Matches.

The IPv4 address 60.242.115.215 (log line 3) is immediately followed by mike user account login (log line 2)

Example 4: Using the != Comparison Operators

[ipv4-addr:value!='218.92.0.173']

Matches.

The IPv4 address value 218.92.0.173 was not seen (log line 2 and 3)

Example 5: Using the > Comparison Operators

[process:pid>='2315']

Matches.

Log line 1 is the only line where process ID is greater than pid=2315 (the other two lines have process IDs less than 2315)

Example 6: Parentheses Precedent

[ipv4-addr:value='218.92.0.173'] FOLLOWEDBY ([user-account:account_login='mike'] OR [user-account:account_login='david'])

Does not match.

The IPv4 address 218.92.0.173 must be followed by at least one of the statements in the parenthesis. Log line 1 contains 218.92.0.173 but does not have and logs that follow it (by time), thus this statement is not true for the 3 logs shown.

Example 7: Using the WITHIN Observation Expression Qualifier

[ipv4-addr:value='60.242.115.215'] FOLLOWEDBY [ipv4-addr:value='218.92.0.173'] WITHIN 1 MINUTE

Does not match.

The IPv4 address 60.242.115.215 was seen at 09:07:25:647 (log line 2) then the IPv4 address 218.92.0.173 was seen at 09:08:55:906 (log line 1) which is more than 1 minute apart.

Example 8: Using the REPEATS Observation Expression Qualifier

([ipv4-addr:value='60.242.115.215'] FOLLOWEDBY [ipv4-addr:value='60.242.115.215']) REPEATS 2 TIMES

Does not match.

The IPv4 address 60.242.115.215 (log line 2) was followed IPv4 address 218.92.0.173 (log line 1) but it was not repeated twice.

Pattern Matches as Sighting SROs

Now you have seen how Patterns can be used, detections (aka sightings) need to modelled. If you start to use STIX Patterns for threat detection, you will probably want to represent the detection matches in STIX format too.

That is where the STIX Sighting SRO and Observed Data SDO can help, as detailed in the previous post.

STIX Pattern Matching Model

In the example below I am using sighting SRO to show a Pattern inside the Indicator SRO (indicating Malware) [ipv4-addr:value='198.51.100.3' AND domain:value='example.com'] was matched 50 times.

The Observed Data SDO captures that information too, but also points to the specific things (SCOs) that were seen (the bits of the Pattern that matched). In this case it is a domain-name (example.com) and ipv4-addr (198.51.100.3);

Helpful tools to create and validate STIX Patterns

The STIX 2 Pattern Validator from OASIS is a great tool in checking your patterns are written correctly.

Simply run the STIX 2 Pattern Validator script by declaring your Pattern. If the Pattern is valid it will return something similar to the following;

$ validate-patterns
Enter a pattern to validate: [file:hashes.md5 = '79054025255fb1a26e4bc422aef54eb4']
PASS: [file:hashes.md5 = '79054025255fb1a26e4bc422aef54eb4']

If you are trying to see if content in an Observed Data SDO matches an existing STIX Pattern you can use the CTI Pattern Matcher.

Customising STIX

Whilst the STIX standard is very broad when it comes to threat intelligence concepts, there are always edge cases.

In such cases, STIX Objects, Properties and Extensions can all be customised to suit your requirements.

I will cover the topic of customisation of STIX 2.1 in the next post.


STIX 2.1 Certification (Virtual and In Person)

The content used in this post is a small subset of our full training material used in our STIX 2.1 training.

If you want to join a select group of certified STIX 2.1 professionals, subscribe to our newsletter below to be notified of new course dates.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.