If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.
In this post I will experiment with generating rich threat intelligence reports using the knowledge obtained through existing intelligence.
In the world of threat intelligence, many analysts consume information from structure threat feeds and then write their own reports from their research of this data.
Automated STIX 2.1 translation of existing structured threat intelligence data
Threat intelligence vendors often offer structured data via feeds. Often the payloads of these feeds will be structured in that vendors format.
Take a response from Alienvault OTX for a pulse.
GET https://otx.alienvault.com/api/v1/pulses/5cc3cafbc25eed7fb30974a7
{
"id": "5cc3cafbc25eed7fb30974a7",
"name": "Neutrino - Malware Domain Feed V2",
"description": "Command and Control domains for malware known as Neutrino. These domains are extracted from malware sandbox reports using a Machine Learning model trained on a corpus of good and bad domains.",
"author_name": "otxrobottwo",
"modified": "2022-08-06T19:46:55.831000",
"created": "2019-04-27T03:22:35.347000",
"tags": [],
"references": [],
"public": 1,
"adversary": "",
"targeted_countries": [],
"malware_families": [],
"attack_ids": [],
"industries": [],
"TLP": "white",
"indicators": [
{
"id": 1656760116,
"indicator": "www.relichunter.top",
"type": "hostname",
"created": "2019-04-27T03:22:36",
"content": "",
"title": "Command and Control",
"description": "",
"expiration": null,
"is_active": 1
}
],
"revision": 1,
"groups": [
{
"name": "Ransomware",
"id": 66
}
],
"in_group": true,
"author": {
"username": "otxrobottwo",
"id": "78495",
"avatar_url": "/otxapi/users/avatar_image/media/avatars/user_78495/resized/80/avatar_ba5a8acdbd.png",
"is_subscribed": false,
"is_following": false
},
"is_subscribing": null
}
My idea here being that if I can teach GPT-3 the schema of each vendors response they can then all be mapped to STIX 2.1, a standardised threat intelligence schema
For example, the indicators
listed in this response could easily be turned into STIX 2.1 SCOs.
Input:
Consider the following JSON payload snippet from the Alienvault OTX API:
"indicators": [
{
"id": 1656760116,
"indicator": "www.relichunter.top",
"type": "hostname",
"created": "2019-04-27T03:22:36",
"content": "",
"title": "Command and Control",
"description": "",
"expiration": null,
"is_active": 1
}
]
Now please turn it into a STIX 2.1 SCO.
Output:
Sure! Here’s the STIX 2.1 SCO equivalent of the JSON payload snippet you provided:
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--26b365f2-7d91-4669-9f9a-b0d57f8c246e",
"created": "2019-04-27T03:22:36.000Z",
"modified": "2019-04-27T03:22:36.000Z",
"name": "Command and Control",
"pattern": "[domain-name:value = 'www.relichunter.top']",
"valid_from": "2019-04-27T03:22:36.000Z",
"labels": [
"C2"
],
"pattern_type": "stix",
"pattern_version": "2.1",
"x_mitre_is_active": true
}
In STIX 2.1, an indicator is represented as a SCO (Sighting Consensus Object) with a unique ID. Here, we create a SCO with the type “indicator”, specifying the version of the STIX standard as “2.1”. We also set the “created” and “modified” timestamps to the value from the original JSON payload.
The “name” property in the SCO corresponds to the “title” property in the original JSON payload. We set the value of “name” to “Command and Control”.
The “pattern” property in the SCO is used to specify the actual indicator value. Here, we use the domain name value from the original JSON payload to construct a STIX pattern that specifies the domain name as the value of the “domain-name” object.
We set the “valid_from” property to the same value as the “created” property, indicating when this indicator became valid. We also add a label of “C2” to the SCO to indicate that this indicator is associated with Command and Control activity.
Finally, we set the “x_mitre_is_active” property to true to indicate that this indicator is currently active.
Review:
All in all, very good, especially the conversion to include a correct pattern
property. The only errors I see are:
- this is actually an SDO not SCO (which is a better representation of the data anyway) and
- the use of the MITRE custom property
x_mitre_is_active
because this is not authored by MITRE
Input:
Thanks! Just for reference that is an SDO not and SCO.
Secondly, can you regenerate the same STIX object but do not include the x_mitre_is_active property please?
Output:
Apologies for the mistake, you are correct that it is an SDO.
Sure, here’s the same STIX 2.1 SDO without the x_mitre_is_active property:
{
"type": "indicator",
"spec_version": "2.1",
"id": "indicator--26b365f2-7d91-4669-9f9a-b0d57f8c246e",
"created": "2019-04-27T03:22:36.000Z",
"modified": "2019-04-27T03:22:36.000Z",
"name": "Command and Control",
"pattern": "[domain-name:value = 'www.relichunter.top']",
"valid_from": "2019-04-27T03:22:36.000Z",
"labels": [
"C2"
],
"pattern_type": "stix",
"pattern_version": "2.1"
}
This STIX 2.1 SDO is identical to the previous one, except that the x_mitre_is_active property has been removed. This property is used by the MITRE ATT&CK framework to indicate whether an indicator is currently being used in live attacks. Since this information was not present in the original JSON payload, it is reasonable to omit it from the STIX SDO.
Writing threat intelligence reports using GPT-3
GPT-3 opens up the possibility for intelligence analysts to write reports from the extracted knowledge from various sources using a range of further prompts.
Input:
Please create STIX 2.1 Domain Objects and STIX 2.1 Cyber-observable Objects in a STIX 2.1 bundle using the following items in the list;
- Infrastructure is “My server”. The description of this infrastructure is “Web app backend”.
- IPv4 of “My server” is 1.1.1.1
- The owner (identity) of that server is “My fake company”
- A malware called “Some Malware”. The description of this malware is “Exploits a backdoor in some program”.
- A threat actor called “A Threat Actor”. The description of this actor is “A very bad person”.
- The vulnerability CVE-2022-41082. Include the external reference to the NVD website for this CVE.
Do not print any relationships in the bundle yet.
Output:
Sure! Here’s an example STIX 2.1 bundle that includes the objects you specified:
{
"type": "bundle",
"id": "bundle--82d7fa4a-0c3e-4c4e-ba7a-12d471f9b2f9",
"spec_version": "2.1",
"objects": [
{
"type": "identity",
"spec_version": "2.1",
"id": "identity--b4ec3e4e-4f04-4d14-88e2-8f566e9dcdbf",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "My fake company"
},
{
"type": "vulnerability",
"spec_version": "2.1",
"id": "vulnerability--1d944e45-9e9e-42c6-a12d-17aa6180e71b",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "CVE-2022-41082",
"external_references": [
{
"source_name": "NVD",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2022-41082"
}
]
},
{
"type": "malware",
"spec_version": "2.1",
"id": "malware--a4c03e48-d8a1-4cc4-b576-292c0ab31d0e",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "Some Malware",
"description": "Exploits a backdoor in some program"
},
{
"type": "threat-actor",
"spec_version": "2.1",
"id": "threat-actor--0be73132-b29d-40f5-bb9e-0e9c312e20b8",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "A Threat Actor",
"description": "A very bad person"
},
{
"type": "infrastructure",
"spec_version": "2.1",
"id": "infrastructure--9f30f900-8b57-4619-9a2f-02b5ed12d054",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "My server",
"description": "Web app backend"
},
{
"type": "ipv4-addr",
"spec_version": "2.1",
"id": "ipv4-addr--cd286c38-6e05-48fa-9e9b-3c95f1d19c1e",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"value": "1.1.1.1",
"belongs_to": {
"type": "identity",
"id": "identity--b4ec3e4e-4f04-4d14-88e2-8f566e9dcdbf"
}
}
]
}
This STIX 2.1 bundle includes six objects: an Identity object, a Vulnerability object, a Malware object, a Threat Actor object, an Infrastructure object, and an IPv4 Address object.
The Identity object represents the owner of the server, “My fake company”. We give it a unique ID and set its name
property to the name of the company.
The Vulnerability object represents the vulnerability with the CVE ID CVE-2022-41082. We give it a unique ID and set its name
property to the CVE ID. We also include an external_references
property with a reference to the CVE entry on the NVD website.
The Malware object represents the malware called “Some Malware”. We give it a unique ID and set its name
property to the name of the malware. We also set its description
property to a description of the malware.
The Threat Actor object represents the threat actor called “A Threat Actor”. We give it a unique ID and set its name
property to the name of the threat actor. We also set its description
property to a description of the threat actor.
The Infrastructure object represents the server “My server” that hosts the web app backend. We give it a unique ID and set its name
property to the name of the server. We also set its description
property to a description of the server.
The IPv4 Address object represents the IP address of the server, which is 1.1.1.1. We give it a unique ID and set its value
property to the IPv4 address. We also include a belongs_to
property to indicate that the IP address belongs to the identity representing “My fake company”.
Note that we have not yet included any relationships between these objects. These will depend on the specific context in which these objects are being used, and the relationships will need to be added separately to the STIX 2.1 bundle.
Responses has been concatenated from two prompts in the above output.
Review:
The response is very good. I’m particularly impressed by the external_references
link to the NVD in the Vulnerability object.
{
"source_name": "NVD",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2022-41082"
}
Now I’ll get GPT-3 to add the relationships between them…
Input:
Can you please print a STIX 2.1 Bundle with the previous STIX 2.1 objects and now include Relationship Objects linking them as described below;
- The Infrastructure has the Vulnerability CVE-2022-41082
- The Infrastructure has the IPv4 1.1.1.1
- The owner (identity) of 1.1.1.1 is “My fake company”
- The Malware exploits the Vulnerability CVE-2022-41082
- The Threat Actor targets the Vulnerability
- The Threat Actor uses the Malware
Output:
Certainly! Here’s an example STIX 2.1 bundle that includes the objects from the previous example and the specified relationships:
{
"type": "bundle",
"id": "bundle--82d7fa4a-0c3e-4c4e-ba7a-12d471f9b2f9",
"spec_version": "2.1",
"objects": [
{
"type": "identity",
"spec_version": "2.1",
"id": "identity--b4ec3e4e-4f04-4d14-88e2-8f566e9dcdbf",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "My fake company"
},
{
"type": "vulnerability",
"spec_version": "2.1",
"id": "vulnerability--1d944e45-9e9e-42c6-a12d-17aa6180e71b",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "CVE-2022-41082",
"external_references": [
{
"source_name": "NVD",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2022-41082"
}
]
},
{
"type": "malware",
"spec_version": "2.1",
"id": "malware--a4c03e48-d8a1-4cc4-b576-292c0ab31d0e",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "Some Malware",
"description": "Exploits a backdoor in some program"
},
{
"type": "threat-actor",
"spec_version": "2.1",
"id": "threat-actor--0be73132-b29d-40f5-bb9e-0e9c312e20b8",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "A Threat Actor",
"description": "A very bad person"
},
{
"type": "infrastructure",
"spec_version": "2.1",
"id": "infrastructure--9f30f900-8b57-4619-9a2f-02b5ed12d054",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"name": "My server",
"description": "Web app backend"
},
{
"type": "ipv4-addr",
"spec_version": "2.1",
"id": "ipv4-addr--cd286c38-6e05-48fa-9e9b-3c95f1d19c1e",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"value": "1.1.1.1",
"belongs_to": {
"type": "identity",
"id": "identity--b4ec3e4e-4f04-4d14-88e2-8f566e9dcdbf"
}
},
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--d32294f3-9123-4423-b9a9-aa16f6d482f6",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "targets",
"source_ref": "threat-actor--0be73132-b29d-40f5-bb9e-0e9c312e20b8",
"target_ref": "vulnerability--1d944e45-9e9e-42c6-a12d-17aa6180e71b"
},
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--17d46389-1b44-44ec-9e5b-19e7e00d7315",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "uses",
"source_ref": "threat-actor--0be73132-b29d-40f5-bb9e-0e9c312e20b8",
"target_ref": "malware--a4c03e48-d8a1-4cc4-b576-292c0ab31d0e"
},
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--91dc542c-6f83-4f3a-bd9b-5f7a879a72d5",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "exploits",
"source_ref": "malware--a4c03e48-d8a1-4cc4-b576-292c0ab31d0e",
"target_ref": "vulnerability--1d944e45-9e9e-42c6-a12d-17aa6180e71b"
},
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--37b0a846-bff1-43c2-b6ee-8a233f0d88ed",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "has",
"source_ref": "infrastructure--9f30f900-8b57-4619-9a2f-02b5ed12d054",
"target_ref": "ipv4-addr--cd286c38-6e05-48fa-9e9b-3c95f1d19c1e"
}
]
}
Responses has been concatenated from two prompts in the above output.
Review:
As you can see, GPT-3 has printed only 4 relationships (compared to the 6 I provided in the prompt).
Input:
You missed the STIX Relationships objects for:
- The Infrastructure is vulnerability Vulnerability CVE-2022-41082
- The owner (identity) of 1.1.1.1 is “My fake company”
Please include these too.
Output:
Apologies for the oversight! Here are the additional Relationship objects that link the Infrastructure and the IP address to the Vulnerability and the Identity, respectively:
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--32d1b58e-5d5c-4a84-b482-d3d11a74efbb",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "has",
"source_ref": "infrastructure--9f30f900-8b57-4619-9a2f-02b5ed12d054",
"target_ref": "vulnerability--1d944e45-9e9e-42c6-a12d-17aa6180e71b"
},
{
"type": "relationship",
"spec_version": "2.1",
"id": "relationship--ea5b8e5e-3c0f-4ec5-87c8-5d1bb5b4d4d1",
"created": "2023-03-25T00:00:00Z",
"modified": "2023-03-25T00:00:00Z",
"relationship_type": "belongs-to",
"source_ref": "ipv4-addr--cd286c38-6e05-48fa-9e9b-3c95f1d19c1e",
"target_ref": "identity--b4ec3e4e-4f04-4d14-88e2-8f566e9dcdbf"
}
Review:
Below I’ve manually added these finally two relationships objects to the bundle;
All-in-all, in-lieu of a dedicated STIX 2.1 authoring software, GPT-3 proves a fast and effective tool of structuring intelligence from prompts.
Automating everything
In the past three posts I have only been using the web UI at chat.openai.com to generate these responses.
GPT-3 comes with an API that can be used for this purpose too, specifically the Text Compilation endpoint.
The completions endpoint can be used for a wide variety of tasks. It provides a simple but powerful interface to any of our models. You input some text as a prompt, and the model will generate a text completion that attempts to match whatever context or pattern you gave it.
All you need is an API key (create it here) to start making requests as I will show you.
First let me setup my environment;
python3 -m venv tutorial_env
source tutorial_env/bin/activate
pip3 install openai
Now I’ll create the script called ioc_extract_prompt.py
.
In this I will use the openai.Completion.create()
function with the following parameters:
import openai
# Replace YOUR_API_KEY with your OpenAI API key
openai.api_key = "YOUR_API_KEY"
# Set the model and prompt
model_engine = "text-davinci-003"
prompt = "Please extract all the indicators of compromise from the following text. The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample."
# Set the maximum number of tokens to generate in the response
max_tokens = 2048
# Generate a response
completion = openai.Completion.create(
engine=model_engine,
prompt=prompt,
max_tokens=max_tokens,
temperature=0.5,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
# Print the response
print(completion.choices[0].text)
Where I set the following:
model_engine
: The name of the model you want to use. I will be using the text-davinci-003 model for this tutorial. This is the most capable GPT-3 model (see them all here). Note, during this exercise I realised that the latest Chat GPT model isn’t available yet via the OpenAI API.prompt
: The prompt or context for the conversation. This can be a single line of text or a multi-line prompt separated by newlines. Put another way, this is where you put the questions being asked earlier in this post.max_tokens
: The maximum number of tokens (words and punctuation) to generate in the response. The minimum is 1 and the maximum is 2048 (I mentioned tokens in part 1).
There are few more parameters that I use in the above request, but I will let you figure out what they are used for in the OpenAI documentations should you want to tweak it yourself.
Now I will run my script;
python3 ioc_extract_prompt.py
Which prints;
Indicators of Compromise:
1. portal-Share.mfa[.]new
2. 158.247.222[.]6
3. www[.]delldrivers[.]in
4. Turian backdoor sample
Again, the query format will need some tuning, but even using the older model, simple indicator extraction is working as expected.
In the coming months I’m going to try and fine tune the queries for extraction of intelligence so that the OpenAI API can be used in file2stix.
In the meantime, the above should be enough to get you started on your own endeavors!
Oh and one final word of warning, keep track of the amount of free credits you are using up during development
At the time of writing 1,000 tokens are charged at $0.0200 / 1k tokens (750 words). You get $18 of free credits (good for 675,000 words) – but don’t be complacent, this quota can be very quickly used up (as I have found!).
Discuss this post

Never miss an update
Sign up to receive new articles in your inbox as they published.