Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you some of my early attempts to try and productise what OpenAI can do to aid the work of Intelligence Analysts.

I’d make a fair assumption you too have played with chat.openai.com over the last few months.

Before I continue, I do want to take all credit for this post and let you know it is not generated by OpenAI (although believe me, I tried).

Though beyond writing blog posts, GPT-3 clearly has huge potential for the operational side of threat intelligence. For a long time I’ve had it in the back of my mind as potentially being a very useful addition to file2stix logic.

Currently file2stix uses regular expressions to extract observables from reports. As discussed in this post, this is fine, but misses lots of contextual information that could also be captured as observables.

I’ve also talked about this problem in my post about MITRE TRAM previously, so I was interested to see how GPT-3 compares.

Here is how my initial research went…

Observable Extraction

I wanted to start simply; how would GPT-3 perform when asked to extract indicators of compromise (observables) from text?

Input:

Please extract all the indicators of compromise from the following text: The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample.

Source.

Output:

Indicators of compromise:

  1. Historical certificate reference to portal-Share.mfa[.]new
  2. Virtual private server (VPS) owned by The Constant Company
  3. IP address 158.247.222[.]6
  4. Domain www[.]delldrivers[.]in
  5. Association with a Turian backdoor sample.

Not perfect, but very close.

Lets teach GPT-3 for future runs;

Input:

Some corrections are needed please.

Line 1, portal-Share.mfa.new is an indicator of compromise of type domain, not a historical certificate reference

Line 5, Turian is an indicator of compromise of type malware family.

Output:

My apologies, thank you for bringing this to my attention. Here are the corrected indicators of compromise:

Indicators of compromise in the text:

  1. Domain: portal-Share.mfa[.]new
  2. Virtual private server (VPS) owned by The Constant Company
  3. IP address 158.247.222[.]6
  4. Domain www[.]delldrivers[.]in
  5. Malware family: Turian

Now let’s see how GPT-3 deals with fanged data.

Input:

Now please remove the fangs from these indicators of compromise.

Output:

Sure, here are the indicators of compromise without the brackets (fangs):

Indicators of compromise in the text:

  1. Domain: portal-Share.mfa.new
  2. Virtual private server (VPS) owned by The Constant Company
  3. IP address 158.247.222.6
  4. Domain www.delldrivers.in
  5. Malware family: Turian

Now I’ll try and better classify these IOCs into STIX 2.1 Objects.

Input:

Now, please classify these indicators of compromise into the correct OASIS STIX 2.1 Domain Objects or STIX 2.1 Cyber-observable Objects.

Output:

Sure, here are the correct OASIS STIX 2.1 Domain Object type and STIX 2.1 Cyber-observable Object types for each of the indicators of compromise:

  1. Domain: portal-Share.mfa.new -> Domain Name Object (STIX 2.1 Domain Object type)
  2. Virtual private server (VPS) owned by The Constant Company -> Not a STIX-defined Object type, but could be represented as an Observed Data Object to indicate that a VPS owned by a certain company was observed.
  3. IP address 158.247.222.6 -> IPv4-Address Object (STIX 2.1 Cyber-observable Object type)
  4. Domain www.delldrivers.in -> Domain Name Object (STIX 2.1 Domain Object type)
  5. Malware family: Turian -> Malware Object (STIX 2.1 Domain Object type)

This is very good, although in my opinion “The Constant Company” is actually an STIX 2.1 Identity Object.

Classifying data as STIX 2.1 Objects

Input:

One small correction please; “The Constant Company” is actually a STIX 2.1 Identity Object.

Output:

Thank you for correcting me. Here is the corrected version:

Indicators of compromise in the text:

  1. Domain: portal-Share.mfa.new -> Domain Name Object (STIX 2.1 Domain Object type)
  2. Virtual private server (VPS) owned by The Constant Company -> Identity Object (STIX 2.1 Identity Object type)
  3. IP address 158.247.222.6 -> IPv4-Address Object (STIX 2.1 Cyber-observable Object type)
  4. Domain www.delldrivers.in -> Domain Name Object (STIX 2.1 Domain Object type)
  5. Malware family: Turian -> Malware Object (STIX 2.1 Domain Object type)

Now I’m intrigued, how well does GPT-3 understand the STIX 2.1 schema?

Input:

Please, can you create a STIX 2.1 bundle with all these objects inside?

Output:

Certainly, here is an example of a STIX 2.1 bundle that includes all of these objects:

{
    "type": "bundle",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "identity",
            "id": "identity--00000000-0000-0000-0000-000000000001",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "domain-name",
            "id": "domain-name--00000000-0000-0000-0000-000000000002",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--00000000-0000-0000-0000-000000000003",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--00000000-0000-0000-0000-000000000004",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--00000000-0000-0000-0000-000000000005",
            "name": "Turian"
        }
    ]
}

Nice!

But… there are no relationships to describe the link between this data, which gives a network graph that looks as follows;

Lets see how GPT-3 copes with this task…

Input:

Using the original text input supplied at the start of this chat, can you link these STIX 2.1 Objects using STIX 2.1 Relationship Objects please?

Output:

Sure, here is an example of a STIX 2.1 bundle that includes all of the objects and their relationships:

{
    "type": "bundle",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "identity",
            "id": "identity--00000000-0000-0000-0000-000000000001",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "domain-name",
            "id": "domain-name--00000000-0000-0000-0000-000000000002",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--00000000-0000-0000-0000-000000000003",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--00000000-0000-0000-0000-000000000004",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--00000000-0000-0000-0000-000000000005",
            "name": "Turian"
        },
        {
            "type": "relationship",
            "id": "relationship--00000000-0000-0000-0000-000000000006",
            "source_ref": "identity--00000000-0000-0000-0000-000000000001",
            "target_ref": "ipv4-addr--00000000-0000-0000-0000-000000000003",
            "relationship_type": "owns"
        },
        {
            "type": "relationship",
            "id": "relationship--00000000-0000-0000-0000-000000000007",
            "source_ref": "ipv4-addr--00000000-0000-0000-0000-000000000003",
            "target_ref": "domain-name--00000000-0000-0000-0000-000000000004",
            "relationship_type": "hosts"
        },
        {
            "type": "relationship",
            "id": "relationship--00000000-0000-0000-0000-000000000008",
            "source_ref": "domain-name--00000000-0000-0000-0000-000000000004",
            "target_ref": "malware--00000000-0000-0000-0000-000000000005",
            "relationship_type": "associated-with"
        }
    ]
}

Which mapped on a graph looks as follows;

Beware of the id values here, they are not random and will lead to collisions. Let’s get GPT-3 to fix that…

Input:

Can you randomise the UUIDv4 part of the id property values please?

Output:

Yes, I can randomize the UUIDv4 part of the id property values.

{
    "type": "bundle",
    "id": "bundle--4c46bdb3-f8ed-4c21-ac2a-1c9e67a8dc7a",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "identity",
            "id": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "domain-name",
            "id": "domain-name--cfb9dc56-d10c-4442-9a7f-97d6f16b2e12",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--ddcdd412-9219-4f24-8403-2c3cae17af06",
            "name": "Turian"
        },
        {
            "type": "relationship",
            "id": "relationship--cbd013e4-83b6-4ed3-b70d-e89f0d70e57d",
            "source_ref": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "target_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "relationship_type": "owns"
        },
        {
            "type": "relationship",
            "id": "relationship--c30c002c-5bf9-4e92-8f98-20b99e744ea4",
            "source_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "target_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "relationship_type": "hosts"
        },
        {
            "type": "relationship",
            "id": "relationship--36254daa-cbb8-40a4-ab63-dd79a78464c5",
            "source_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "target_ref": "malware--ddcdd412-9219-4f24-8403-2c3cae17af06",
            "relationship_type": "associated-with"
        }
    ]
}

It even adds a bundle ID to the response!

If I review the original text again;

The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample.

I can see the graph is an accurate descripton of the relationships described.

It would appear GPT-3 is pretty good at extraction of indicators of compromise and linking them together; but what about non-explicit references?

Enriching the extracted observables and entities

To test this I will now try to enrich the report data with more external context (ignoring the rest of the context in the full report for now).

Input:

What threat actors are known to use Turian malware?

Output:

Turian is a backdoor that has been used by BackdoorDiplomacy, a threat actor group, to target Ministries of Foreign Affairs, telecommunication companies, and charities in Africa, Europe, the Middle East, and Asia. First reported in 2021, Turian is likely related to Quarian, an older backdoor that was last observed being used in 2013 against diplomatic targets in Syria and the United States.

What if I develop this further and use a publicly available knowledgebase like MITRE ATT&CK?

Input:

Can you add any additional context to the last STIX 2.1 bundle you create using the MITRE ATT&CK framework?

Output:

Sure! The STIX 2.1 bundle could be enhanced by adding the mapping of the Turian malware to the MITRE ATT&CK framework. This could provide additional context and insight into the tactics, techniques, and procedures (TTPs) used by the threat actors that leverage the Turian malware.

Here is an example of how the bundle could be updated to include the mapping to the MITRE ATT&CK framework:

{
    "type": "bundle",
    "id": "bundle--4c46bdb3-f8ed-4c21-ac2a-1c9e67a8dc7a",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "identity",
            "id": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "domain-name",
            "id": "domain-name--cfb9dc56-d10c-4442-9a7f-97d6f16b2e12",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--ddcdd412-9219-4f24-8403-2c3cae17af06",
            "name": "Turian",
            "x_mitre_platforms": [
                "Windows"
            ],
            "x_mitre_attack_refs": [
                "T1190",
                "T1086"
            ]
        },
        {
            "type": "relationship",
            "id": "relationship--cbd013e4-83b6-4ed3-b70d-e89f0d70e57d",
            "source_ref": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "target_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "relationship_type": "owns"
        },
        {
            "type": "relationship",
            "id": "relationship--c30c002c-5bf9-4e92-8f98-20b99e744ea4",
            "source_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "target_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "relationship_type": "hosts"
        },
        {
            "type": "relationship",
            "id": "relationship--36254daa-cbb8-40a4-ab63-dd79a78464c5",
            "source_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "target_ref": "malware--ddcdd412-9219-4f24-8403-2c3cae17af06",
            "relationship_type": "associated-with"
        }
    ]
}

The only problem with this is that the output uses a custom STIX Object to represent the Turian Malware family, not the one from ATT&CK (malware--350f12cf-fd3b-4dad-b323-14b943090df4).

To fix this I need to teach GPT-3 what this object looks like.

Input:

Instead of using a custom STIX 2.1 object to represent the Turian Malware, can you use the one from the MITRE ATT&CK dictionary that I’ve printed below;

       {
            "labels": [
                "malware"
            ],
            "x_mitre_platforms": [
                "Windows",
                "Linux"
            ],
            "x_mitre_domains": [
                "enterprise-attack"
            ],
            "x_mitre_contributors": [
                "Zaw Min Htun, @Z3TAE"
            ],
            "x_mitre_aliases": [
                "Turian"
            ],
            "object_marking_refs": [
                "marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
            ],
            "id": "malware--350f12cf-fd3b-4dad-b323-14b943090df4",
            "type": "malware",
            "created": "2021-09-21T15:21:31.795Z",
            "created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
            "external_references": [
                {
                    "external_id": "S0647",
                    "source_name": "mitre-attack",
                    "url": "https://attack.mitre.org/software/S0647"
                },
                {
                    "source_name": "ESET BackdoorDiplomacy Jun 2021",
                    "url": "https://www.welivesecurity.com/2021/06/10/backdoordiplomacy-upgrading-quarian-turian/",
                    "description": "Adam Burgher. (2021, June 10). BackdoorDiplomacy: Upgrading from Quarian to Turian. Retrieved September 1, 2021"
                }
            ],
            "modified": "2021-10-18T13:19:48.020Z",
            "name": "Turian",
            "description": "[Turian](https://attack.mitre.org/software/S0647) is a backdoor that has been used by [BackdoorDiplomacy](https://attack.mitre.org/groups/G0135) to target Ministries of Foreign Affairs, telecommunication companies, and charities in Africa, Europe, the Middle East, and Asia. First reported in 2021, [Turian](https://attack.mitre.org/software/S0647) is likely related to Quarian, an older backdoor that was last observed being used in 2013 against diplomatic targets in Syria and the United States.(Citation: ESET BackdoorDiplomacy Jun 2021)",
            "x_mitre_version": "1.0",
            "x_mitre_attack_spec_version": "2.1.0",
            "x_mitre_modified_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5"
        }

Output:

Yes, I can do that. Here’s the updated STIX 2.1 Bundle, including the MITRE ATT&CK Turian malware information:

{
    "type": "bundle",
    "id": "bundle--4c46bdb3-f8ed-4c21-ac2a-1c9e67a8dc7a",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "identity",
            "id": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "domain-name",
            "id": "domain-name--cfb9dc56-d10c-4442-9a7f-97d6f16b2e12",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "value": "www.delldrivers.in"
        },
      {
            "labels": [
                "malware"
            ],
            "x_mitre_platforms": [
                "Windows",
                "Linux"
            ],
            "x_mitre_domains": [
                "enterprise-attack"
            ],
            "x_mitre_contributors": [
                "Zaw Min Htun, @Z3TAE"
            ],
            "x_mitre_aliases": [
                "Turian"
            ],
            "object_marking_refs": [
                "marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
            ],
            "id": "malware--350f12cf-fd3b-4dad-b323-14b943090df4",
            "type": "malware",
            "created": "2021-09-21T15:21:31.795Z",
            "created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
            "external_references": [
                {
                    "external_id": "S0647",
                    "source_name": "mitre-attack",
                    "url": "https://attack.mitre.org/software/S0647"
                },
                {
                    "source_name": "ESET BackdoorDiplomacy Jun 2021",
                    "url": "https://www.welivesecurity.com/2021/06/10/backdoordiplomacy-upgrading-quarian-turian/",
                    "description": "Adam Burgher. (2021, June 10). BackdoorDiplomacy: Upgrading from Quarian to Turian. Retrieved September 1, 2021"
                }
            ],
            "modified": "2021-10-18T13:19:48.020Z",
            "name": "Turian",
            "description": "[Turian](https://attack.mitre.org/software/S0647) is a backdoor that has been used by [BackdoorDiplomacy](https://attack.mitre.org/groups/G0135) to target Ministries of Foreign Affairs, telecommunication companies, and charities in Africa, Europe, the Middle East, and Asia. First reported in 2021, [Turian](https://attack.mitre.org/software/S0647) is likely related to Quarian, an older backdoor that was last observed being used in 2013 against diplomatic targets in Syria and the United States.(Citation: ESET BackdoorDiplomacy Jun 2021)",
            "x_mitre_version": "1.0",
            "x_mitre_attack_spec_version": "2.1.0",
            "x_mitre_modified_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5"
        },
        {
            "type": "relationship",
            "id": "relationship--cbd013e4-83b6-4ed3-b70d-e89f0d70e57d",
            "source_ref": "identity--e64c8d06-be1b-4e5d-9c67-7738af84c1b6",
            "target_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "relationship_type": "owns"
        },
        {
            "type": "relationship",
            "id": "relationship--c30c002c-5bf9-4e92-8f98-20b99e744ea4",
            "source_ref": "ipv4-addr--0318ef25-92b9-41d9-b4b8-bddb5a4a6e46",
            "target_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "relationship_type": "hosts"
        },
        {
            "type": "relationship",
            "id": "relationship--36254daa-cbb8-40a4-ab63-dd79a78464c5",
            "source_ref": "domain-name--aa3103b8-fd56-4f55-8b2b-70a303c93d2d",
            "target_ref": "malware--350f12cf-fd3b-4dad-b323-14b943090df4",
            "relationship_type": "associated-with"
        }
    ]
}

Now, what about other MITRE ATT&CK objects relevant to the original text?

Input:

Can you please mark this document up with relevant objects in the Mitre ATT&CK framework?

Output:

Sure, here’s an example of how the text can be marked up with the MITRE ATT&CK framework, including the indicators of compromise (IOCs) that were extracted:

The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. (T1192 Spearphishing Attachment, T1598 Supply Chain Compromise) IOC: portal-Share.mfa[.]new

The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample. (T1071 Application Layer Protocol, T1070.004 Domain Fronting, T1566.001 Phishing) IOC: 158.247.222[.]6, www[.]delldrivers[.]in, Turian backdoor sample.

It completely missed Turian (added above).

Input

Thanks, but you missed Turian, Mitre ATT&CK S0647.

Output:

You are correct, thank you for pointing that out. The Turian backdoor sample is associated with the S0647 technique in the MITRE ATT&CK framework. So the complete markup would be:

The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. (T1192 Spearphishing Attachment, T1598 Supply Chain Compromise) IOC: portal-Share.mfa[.]new

The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample (S0647). (T1071 Application Layer Protocol, T1070.004 Domain Fronting, T1566.001 Phishing) IOC: 158.247.222[.]6, www[.]delldrivers[.]in, Turian backdoor sample.

Now, let’s make the references more intuitive by creating links in markdown format for easy lookup by analysts reading it.

Input:

Finally, can you please provide the links inline with the original text to the MITRE ATT&CK website in markdown format please?

Output:

Sure, here’s the original text with the MITRE ATT&CK links inserted inline:

The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. (T1192 Spearphishing Attachment, T1598 Supply Chain Compromise) IOC: portal-Share.mfa[.]new

The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample (S0647 Multi-Stage Channels). (T1071 Application Layer Protocol, T1070.004 Domain Fronting, T1566.001 Phishing) IOC: 158.247.222[.]6, www[.]delldrivers[.]in, Turian backdoor sample.

If I wanted to, I could at this point teach GPT-3 the ATT&CK objects used, and recreate a new, richer STIX bundle.

Automated STIX 2.1 translation threat intelligence data

Many threat intelligence vendors offer structured data via feeds. Often the payloads of these feeds will be structured in that vendors format.

Take a response from Alienvault OTX for a pulse.

GET https://otx.alienvault.com/api/v1/pulses/5cc3cafbc25eed7fb30974a7
{
    "id": "5cc3cafbc25eed7fb30974a7",
    "name": "Neutrino - Malware Domain Feed V2",
    "description": "Command and Control domains for malware known as Neutrino. These domains are extracted from malware sandbox reports using                             a Machine Learning model trained on a corpus of good and bad domains.",
    "author_name": "otxrobottwo",
    "modified": "2022-08-06T19:46:55.831000",
    "created": "2019-04-27T03:22:35.347000",
    "tags": [],
    "references": [],
    "public": 1,
    "adversary": "",
    "targeted_countries": [],
    "malware_families": [],
    "attack_ids": [],
    "industries": [],
    "TLP": "white",
    "indicators": [
        {
            "id": 1656760116,
            "indicator": "www.relichunter.top",
            "type": "hostname",
            "created": "2019-04-27T03:22:36",
            "content": "",
            "title": "Command and Control",
            "description": "",
            "expiration": null,
            "is_active": 1
        }    
    ],
    "revision": 1,
    "groups": [
        {
            "name": "Ransomware",
            "id": 66
        }
    ],
    "in_group": true,
    "author": {
        "username": "otxrobottwo",
        "id": "78495",
        "avatar_url": "/otxapi/users/avatar_image/media/avatars/user_78495/resized/80/avatar_ba5a8acdbd.png",
        "is_subscribed": false,
        "is_following": false
    },
    "is_subscribing": null
}

Inside this response is a lot of data that could be modelled as STIX 2.1 Objects.

For example, the indicators listed

Writing threat intelligence reports using GPT-3

Going back ever-so-slightly, ignoring automated extraction, GPT-3 opens up the possiblity for intelligence analysts to write their reports as GPT-3 prompts.

Input:

Please create STIX 2.1 Domain Objects and STIX 2.1 Cyber-observable Objects in a STIX 2.1 bundle using the following items in the list;

  1. Infrastructure is “My server”. The description of this infrastructure is “Web app backend”.
  2. IPv4 of “My server” is 1.1.1.1
  3. The owner (identity) of that server is “My fake company”
  4. A malware called “Some Malware”. The description of this malware is “Exploits a backdoor in some program”.
  5. A threat actor called “A Threat Actor”. The description of this actor is “A very bad person”.
  6. The vulnerability CVE-2022-41082. Include the external reference to the NVD website for this CVE.

Do not print any relationships in the bundle yet.

Output:

Continuing from the previous response, here is the updated bundle with the external reference to the NVD website for the CVE:

{
    "type": "bundle",
    "id": "bundle--f3d3b537-d8d8-4bb9-aa34-7b2f4a4c4b21",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "infrastructure",
            "id": "infrastructure--c4daa105-33a8-4b98-b07e-921cd1b1c8b9",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "name": "My server",
            "description": "Web app backend"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--fc1833f3-b450-4929-9b80-fd0a55c39ca5",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "value": "1.1.1.1",
            "resolves_to_refs": [
                "infrastructure--c4daa105-33a8-4b98-b07e-921cd1b1c8b9"
            ]
        },
        {
            "type": "identity",
            "id": "identity--1e675e7d-61f6-4cf3-87b6-999c6dd23e6e",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "name": "My fake company",
            "identity_class": "organization"
        },
        {
            "type": "malware",
            "id": "malware--b890dbd6-81d8-4b28-9735-7f768a5f2ce7",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "name": "Some Malware",
            "description": "Exploits a backdoor in some program"
        },
        {
            "type": "threat-actor",
            "id": "threat-actor--aeedf5c5-7e89-43f9-a7d3-56a04511cfda",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "name": "A Threat Actor",
            "description": "A very bad person"
        },
        {
            "type": "vulnerability",
            "id": "vulnerability--4a85d6c4-6aa4-4e3d-8ec3-926aa1efb59d",
            "created": "2023-02-21T00:00:00.000Z",
            "modified": "2023-02-21T00:00:00.000Z",
            "name": "CVE-2022-41082"

Notice how this bundle is cut short. This is due to the 2048 charachter limit for GPT-3 responses. There is a way around this that worked for me here, but beware, the is not always the case;

Input:

Can you please print the final few lines of the last response?

Output:

Apologies for the incomplete response earlier. Here are the final lines of the updated bundle with the external reference to the NVD website for the CVE:

            "external_references": [
                {
                    "source_name": "NVD",
                    "external_id": "CVE-2022-41082",
                    "url": "https://nvd.nist.gov/vuln/detail/CVE-2022-41082"
                }
            ]
        }
    ]
}

Please note that this is just an example of how STIX 2.1 Domain Objects and STIX 2.1 Cyber-observable Objects can be created in a STIX 2.1 bundle. The actual structure and content of the objects may vary depending on the specific use case and requirements.

We can now cut and paste this into a valid JSON STIX 2.1 Bundle.

In the same vain, I will ask ChatGPT to print relationships for these objects.

Input:

Can you please print a STIX 2.1 Bundle with STIX 2.1 Relationship Objects for the object in the previous bundle using the relationships described below;

  1. The Infrastructure has the Vulnerability CVE-2022-41082
  2. The Infrastructure has the IPv4 1.1.1.1
  3. The owner (identity) of 1.1.1.1 is “My fake company”
  4. The Malware exploits the Vulnerability CVE-2022-41082
  5. The Threat Actor targets the Vulnerability
  6. The Threat Actor uses the Malware

Please only print the STIX 2.1 relatioship objects

Output:

Sure, here is a STIX 2.1 bundle with the requested relationships:

{
    "type": "bundle",
    "id": "bundle--1ebce0a3-9da9-41d9-97f1-4cbe4f4db4d4",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "relationship",
            "id": "relationship--366f90a6-22c4-47d9-a9b9-1d04210e0cf7",
            "relationship_type": "targets",
            "source_ref": "threat-actor--f2b7d4ab-9e3a-4d87-8df5-6f5b6de5e546",
            "target_ref": "vulnerability--ca21ec13-358d-4011-a68a-1f12c2d7f6b5"
        },
        {
            "type": "relationship",
            "id": "relationship--5aaf515b-9960-4c04-9b16-6ee5a6e736a6",
            "relationship_type": "uses",
            "source_ref": "threat-actor--f2b7d4ab-9e3a-4d87-8df5-6f5b6de5e546",
            "target_ref": "malware--3dc3e497-7abf-4e09-9b51-2f7b81f104ee"
        },
        {
            "type": "relationship",
            "id": "relationship--f96d8149-bb81-4961-92ce-12c88d78e6ba",
            "relationship_type": "exploits",
            "source_ref": "malware--3dc3e497-7abf-4e09-9b51-2f7b81f104ee",
            "target_ref": "vulnerability--ca21ec13-358d-4011-a68a-1f12c2d7f6b5"
        },
        {
            "type": "relationship",
            "id": "relationship--afe0f712-23e9-45b8-9b9c-5f5a5e1a5e64",
            "relationship_type": "has",
            "source_ref": "infrastructure--4e2c4d1e-0f27-4ce5-94f5-5dc5a4821d3c",
            "target_ref": "ipv4-addr--3c49b2e6-3258-4a9c-9a23-3f23c11f77de"
        },
        {
            "type": "relationship",
            "id": "relationship--11b4d8f4-f9b4-4e87-881f-15a39b6d99f1",
            "relationship_type": "belongs-to",
            "source_ref": "ipv4-addr--3c49b2e6-3258-4a9c-9a23-3f23c11f77de",
            "target_ref": "identity--a55d7da0-37a5-47f5-8f2d-bd8161f6b49d"
        },
        {
            "type": "relationship",
            "id": "773e28c2-8f60-41cc-8f5a-4825",
            "relationship_type": "uses",
            "source_ref": "threat-actor--4cc87d4f-53d4-4a9c-9a9d-a416d350d6a7",
            "target_ref": "malware--c1b320ed-3439-42af-9b53-5ae07a5c1edc"
        }

The only thing missing in this response for relationships is the created and modified fields, but this can be fixed by asking ChatGPT to include them in a follow-up response.

Automating everything

So far I’ve only been using the web UI at chat.openai.com to generate these responses.

GPT-3 comes with an API that can be used for this purpose too, specifically the Text Compilation endpoint.

The completions endpoint can be used for a wide variety of tasks. It provides a simple but powerful interface to any of our models. You input some text as a prompt, and the model will generate a text completion that attempts to match whatever context or pattern you gave it.

All you need is an API key (create it here) to start making requests as I will show you.

First let me setup my environment;

python3 -m venv openai_tutorial
source openai_tutorial/bin/activate
pip3 install openai

Now I’ll create the script called ioc_extract_prompt.py.

In this I will use the openai.Completion.create() function with the following parameters:

import openai

# Replace YOUR_API_KEY with your OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Set the model and prompt
model_engine = "text-davinci-003"
prompt = "Please extract all the indicators of compromise from the following text. The first IP contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample."

# Set the maximum number of tokens to generate in the response
max_tokens = 2048

# Generate a response
completion = openai.Completion.create(
    engine=model_engine,
    prompt=prompt,
    max_tokens=max_tokens,
    temperature=0.5,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

# Print the response
print(completion.choices[0].text)

Where I set the following:

  • model_engine: The name of the model you want to use. I will be using the text-davinci-003 model for this tutorial. This is the most capable GPT-3 model (see them all here). Note, during this exercise I realised that the latest Chat GPT model isn’t available yet via the OpenAI API.
  • prompt: The prompt or context for the conversation. This can be a single line of text or a multi-line prompt separated by newlines. Put another way, this is where you put the questions being asked earlier in this post.
  • max_tokens: The maximum number of tokens (words and punctuation) to generate in the response. The minimum is 1 and the maximum is 2048.

Tokens are very important here, espiecally with code output.

For English text, 1 token is approximately 4 characters or 0.75 words.

In some of the prompts above I had to break down the requests into multiple parts, otherwise the response was cut short as it exceeded the 2048 limit.

There are few more parameters that I use in the above request, but I will let you figure out what they are used for in the OpenAI documentations should you want to tweak it yourself.

Now I will run my script;

python3 ioc_extract_prompt.py

Which prints;

Indicators of Compromise: 
1. portal-Share.mfa[.]new 
2. 158.247.222[.]6 
3. www[.]delldrivers[.]in 
4. Turian backdoor sample

Again, the query format will need some tuning, but even using the older model, simple indicator extraction is working as expected.

In the coming months I’m going to try and fine tune the queries for extraction of intelligence so that OpenAI can be used in file2stix.

In the meantime, the above should be enough to get you started on your own endeavors!

Oh and one final word of warning, keep track of the amount of free credits you are using up during development

At the time of writing 1,000 tokens are charged at $0.0200 / 1k tokens (750 words). You get $18 of free credits (good for 675,000 words) – but don’t be complacent, this quota can be very quickly used up (as I have found!).




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.