Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you some of my early attempts to try and productise what GPT-3 can do to aid the work of Intelligence Analysts.

I’d make a fair assumption you too have played with chat.openai.com over the last few weeks.

Before I continue, I do want to take all credit for this post and let you know it is not generated by OpenAI (although believe me, I tried).

Though beyond writing blog posts, GPT-3 clearly has huge potential for the operational side of threat intelligence. For a long time I’ve had it in the back of my mind as potentially being a very useful addition to file2stix logic.

Currently file2stix uses regular expressions to extract observables from reports. For now this is acceptable, but misses lots of contextual information that could also be captured as observables.

I’ve also talked about this problem in my post about MITRE TRAM previously, so I was interested to see how GPT-3 compares.

Here is how my initial research went…

Observable Extraction

I wanted to start simply; how would GPT-3 perform when asked to extract indicators of compromise (IoCs), also referred to in this post interchangeably as observables in this post, from text?

Input:

Please extract all the indicators of compromise from the following text: The first IP (151.248.24[.]251) contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample.

Source.

Output:

Here are the indicators of compromise (IOCs) extracted from the given text:

  1. IP address: 151.248.24[.]251
    • Contains a historical certificate reference to portal-Share.mfa[.]new
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: IP address
  2. IP address: 158.247.222[.]6
    • Virtual Private Server (VPS) owned by The Constant Company
    • Hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: IP address
  3. Domain: portal-Share.mfa[.]new
    • Contains a historical certificate reference
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: Domain
  4. Domain: www[.]delldrivers[.]in
    • Hosted on the IP address 158.247.222[.]6
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: Domain

ChatGPT Response 1

This is very good. Not only has it has very accurately described the relationships between them.

It can be slightly improved, classifying Turian as a malware IoC.

Lets teach GPT-3 for future runs;

Input:

Thank you! Just one small improvement please; Turian is an indicator of compromise of type malware family.

Output:

Sure! The indicators of compromise (IOCs) extracted from the given text are:

  1. IP address: 151.248.24[.]251
    • Contains a historical certificate reference to portal-Share.mfa[.]new
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: IP address
  2. IP address: 158.247.222[.]6
    • Virtual Private Server (VPS) owned by The Constant Company
    • Hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: IP address
  3. Domain: portal-Share.mfa[.]new
    • Contains a historical certificate reference
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: Domain
  4. Domain: www[.]delldrivers[.]in
    • Hosted on the IP address 158.247.222[.]6
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: Domain
  5. Malware: Turian
    • Associated with the IP address 158.247.222[.]6
    • Indicator of compromise type: Malware

ChatGPT Response 2

Now let’s see how GPT-3 understands fanged data ([ ]) with the aim of removing them.

Input:

Now please remove the fangs ([ ]) from these indicators of compromise.

Output:

Sure, here are the IOCs with the brackets removed:

  1. IP address: 151.248.24.251
    • Contains a historical certificate reference to portal-Share.mfa.new
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: IP address
  2. IP address: 158.247.222.6
    • Virtual Private Server (VPS) owned by The Constant Company
    • Hosted the domain www.delldrivers.in from July 7, 2022 to Oct. 11, 2022
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: IP address
  3. Domain: portal-Share.mfa.new
    • Contains a historical certificate reference
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • Indicator of compromise type: Domain
  4. Domain: www.delldrivers.in
    • Hosted on the IP address 158.247.222.6
    • Associated with a Turian backdoor sample
    • Indicator of compromise type: Domain
  5. Malware: Turian
    • Associated with the IP address 158.247.222.6
    • Indicator of compromise type: Malware

ChatGPT Response 3

Perfect!

Now I’ll try and better classify these IOCs into STIX 2.1 Objects for easier sharing.

Input:

Now, please classify these indicators of compromise into the correct OASIS STIX 2.1 Domain Objects or STIX 2.1 Cyber-observable Objects.

Output:

Sure, here are the IOCs classified into the correct STIX 2.1 Domain Objects or Cyber-observable Objects:

  1. IP address: 151.248.24.251
    • Contains a historical certificate reference to portal-Share.mfa.new
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • STIX 2.1 Domain Object: ipv4-addr
  2. IP address: 158.247.222.6
    • Virtual Private Server (VPS) owned by The Constant Company
    • Hosted the domain www.delldrivers.in from July 7, 2022 to Oct. 11, 2022
    • Associated with a Turian backdoor sample
    • STIX 2.1 Domain Object: ipv4-addr
  3. Domain: portal-Share.mfa.new
    • Contains a historical certificate reference
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • STIX 2.1 Domain Object: domain-name
  4. Domain: www.delldrivers.in
    • Hosted on the IP address 158.247.222.6
    • Associated with a Turian backdoor sample
    • STIX 2.1 Domain Object: domain-name
  5. Malware: Turian
    • Associated with the IP address 158.247.222.6
    • STIX 2.1 Cyber-observable Object: malware

ChatGPT Response 4

Review:

GPT-3 is obviously aware of STIX 2.1 however it is making a big mistake here; confusing SCOs with SDOs and vice-versa.

Let’s fix that;

Input:

The STIX 2.1 object type classifications are incorrect. Here are the correct classifications

  • ipv4-addr is a STIX 2.1 Cyber-observable Object (SCO)
  • domain-name is a STIX 2.1 Cyber-observable Object (SCO)
  • malware is a STIX 2.1 Domain Object (SDO)

Please can you correct the previous response?

Output:

I apologize for the mistake. Thank you for letting me know. Here are the corrected STIX 2.1 object type classifications:

  1. IP address: 151.248.24.251
    • Contains a historical certificate reference to portal-Share.mfa.new
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • STIX 2.1 Cyber-observable Object: ipv4-addr
  2. IP address: 158.247.222.6
    • Virtual Private Server (VPS) owned by The Constant Company
    • Hosted the domain www.delldrivers.in from July 7, 2022 to Oct. 11, 2022
    • Associated with a Turian backdoor sample
    • STIX 2.1 Cyber-observable Object: ipv4-addr
  3. Domain: portal-Share.mfa.new
    • Contains a historical certificate reference
    • Suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus
    • STIX 2.1 Cyber-observable Object: domain-name
  4. Domain: www.delldrivers.in
    • Hosted on the IP address 158.247.222.6
    • Associated with a Turian backdoor sample
    • STIX 2.1 Cyber-observable Object: domain-name
  5. Malware: Turian
    • Associated with the IP address 158.247.222.6
    • STIX 2.1 Domain Object: malware

ChatGPT Response 5

Review:

Now I’m intrigued, how well does GPT-3 understand the STIX 2.1 schema?

Classifying data as STIX 2.1 Objects

Input:

Perfect! Now can you please model the first IP address in the list (151.248.24.251) as a STIX 2.1 SCO?

Output:

Certainly! Here’s an example of how to model the first IP address (151.248.24.251) as a STIX 2.1 Cyber-observable Object (SCO) using the Python STIX2 library:

import stix2

ipv4 = stix2.IPv4Address(
    value="151.248.24.251",
    x509_fingerprint="9b33c371cf35a7a8581b936d7467edaff69b835f",
    # The SHA-1 fingerprint of a certificate associated with the IP address
    custom_properties={
        "description": "IP address containing a historical certificate reference to portal-Share.mfa.new",
        "source": "Example data source",
        "ambiguous_nexus": "Ministry of Foreign Affairs (MFA)",
    }
)
print(ipv4.serialize())

This will produce the following STIX 2.1 SCO for the IP address:

{
    "type": "ipv4-addr",
    "value": "151.248.24.251",
    "x509_fingerprint": "9b33c371cf35a7a8581b936d7467edaff69b835f",
    "description": "IP address containing a historical certificate reference to portal-Share.mfa.new",
    "source": "Example data source",
    "ambiguous_nexus": "Ministry of Foreign Affairs (MFA)"
}

Note that I added some custom properties to the object to include the description, source, and ambiguous nexus information mentioned in the original text. You can adjust these properties as needed to match the context of the data.

ChatGPT Response 6

Review:

Firstly, wow! GPT-3 delivers a code example using the STIX2 Python library (here’s my introduction to the STIX2 library).

However, the first attempt is far from perfect.

Firstly….

If you look at the STIX 2.1 specification, this is missing a required property id.

STIX 2.1 IPv4 SCO specification

If you read my previous post you will know this id should be in UUIDv5 format created using the value property.

Secondly…

You will not only type and value are default properties for ipv4-addr SCOs. GPT-3 does acknowledge that it uses custom properties, however, as detailed in this post it is doing so incorrectly (it should use custom extension of extension_types = property-extension).

That said, in this case (and in generally for STIX modelling) this contextual data makes more sense to be captured inside an SDO than an SCO.

Time to inform GPT-3 of these changes…

Input:

Thanks! However, two improvements are needed as follows please:

Firstly, this object is missing an id property that should be in UUIDv5 format. In the case of ipv4-addr type SCOs this UUIDv5 ID should be generated using the value property of the object.

Secondly, please remove all the custom properties (x509_fingerprint, description, source, ambiguous_nexus) from the ipv4-addr SCO (these will be used later).

Output:

Thank you for the feedback. Here’s an updated example of how to model the first IP address (151.248.24.251) as a STIX 2.1 Cyber-observable Object (SCO) using the Python STIX2 library with the requested improvements:

import uuid
import stix2

ipv4 = stix2.IPv4Address(
    id=uuid.uuid5(uuid.NAMESPACE_DNS, "151.248.24.251"),
    value="151.248.24.251",
)
print(ipv4.serialize())

This will produce the following STIX 2.1 SCO for the IP address:

{
    "type": "ipv4-addr",
    "id": "5066d2f2-1ea2-5e72-b16f-ea3d3ef4413e",
    "value": "151.248.24.251"
}

Note that I removed the custom properties from the object as requested. These properties can be added later as separate properties or as part of a related STIX 2.1 Sighting or Report object.

ChatGPT Response 7

Review:

Much better. You could debate the namespace used to generate the UUIDv5 (and fix with a further prompt), however for brevity I will ignore for now.

As I suggested, that the custom properties can be used for context later…

These properties can be added later as separate properties or as part of a related STIX 2.1 Sighting or Report object.

Just what I was thinking!

Now that GPT-3 is a little more knowledgeable about the STIX 2.1 specification, let’s try modelling all the STIX 2.1 objects identified in the original text.

Input:

Great! Now can you turn all the identified STIX 2.1 SDOs and SCOs identified in the original text and print them all inside a valid STIX 2.1 bundle please?

Output:

Certainly. Here’s an updated example of how to create a STIX 2.1 Bundle that includes all the identified STIX 2.1 SDOs and SCOs using the Python STIX2 library with the corrected UUIDv5 IDs:

import stix2

# Create the STIX 2.1 objects
ipv4_1 = stix2.IPv4Address(
    id=str(stix2.utils._hash("151.248.24.251", stix2.utils._HASH_SHA256)),
    value="151.248.24.251",
)
ipv4_2 = stix2.IPv4Address(
    id=str(stix2.utils._hash("158.247.222.6", stix2.utils._HASH_SHA256)),
    value="158.247.222.6",
)
domain_1 = stix2.DomainName(
    id=str(stix2.utils._hash("portal-Share.mfa.new", stix2.utils._HASH_SHA256)),
    value="portal-Share.mfa.new",
)
domain_2 = stix2.DomainName(
    id=str(stix2.utils._hash("www.delldrivers.in", stix2.utils._HASH_SHA256)),
    value="www.delldrivers.in",
)
malware = stix2.Malware(
    id=str(stix2.utils._hash("Turian", stix2.utils._HASH_SHA256)),
    name="Turian",
    labels=["trojan"],
)

# Create a STIX 2.1 Bundle that includes the objects
bundle = stix2.Bundle(objects=[ipv4_1, ipv4_2, domain_1, domain_2, malware])

# Serialize the bundle to JSON and print it
print(bundle.serialize())

This will produce a valid STIX 2.1 Bundle that includes all the identified STIX 2.1 SDOs and SCOs with UUIDv5 IDs for all SCOs:

{
    "type": "bundle",
    "id": "bundle--ab6bb13e-39f1-5f78-8903-e3f38de5299f",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--ae6d1c96-8e77-50a6-94a3-3f9f9b05495f",
            "value": "151.248.24.251"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--5dd5df13-15c5-5ec5-a67c-50c1cbf7a0fa",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--4cb4a3c3-2c8c-565e-92dd-54eaf2672e12",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "domain-name",
            "id": "domain-name--c6e0dcd7-1d38-5436-b8d6-540d2bb53f6c",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--fe6f350f-1d84-56c9-ae33-b3a3dc6bde38",
            "name

ChatGPT Response 8

Review:

So ignoring the fact the response is cut short (more on that in a minute), the response is slightly incorrect because GPT-3 is now using a hash function to generate SCO ids

ipv4_1 = stix2.IPv4Address(
    id=str(stix2.utils._hash("151.248.24.251", stix2.utils._HASH_SHA256)),
    value="151.248.24.251",
)

It’s an easy fix…

Input:

Thanks! However, can you make one small fix please; use uuidv5 ID’s using the STIX2 library for all SCOs please using each SCOs value property?

Output:

My apologies for the confusion. Here’s the updated example of how to create a STIX 2.1 Bundle with corrected UUIDs:

import uuid
import stix2

# Create the STIX 2.1 objects
ipv4_1 = stix2.IPv4Address(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "151.248.24.251")),
    value="151.248.24.251",
)
ipv4_2 = stix2.IPv4Address(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "158.247.222.6")),
    value="158.247.222.6",
)
domain_1 = stix2.DomainName(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "portal-Share.mfa.new")),
    value="portal-Share.mfa.new",
)
domain_2 = stix2.DomainName(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "www.delldrivers.in")),
    value="www.delldrivers.in",
)
malware = stix2.Malware(
    id=str(uuid.uuid4()),
    name="Turian",
    labels=["trojan"],
)

# Create a STIX 2.1 Bundle that includes the objects
bundle = stix2.Bundle(objects=[ipv4_1, ipv4_2, domain_1, domain_2, malware])

# Serialize the bundle to JSON and print it
print(bundle.serialize())

This will produce a valid STIX 2.1 Bundle that includes all the identified STIX 2.1 SDOs and SCOs with the corrected UUIDs:

{
    "type": "bundle",
    "id": "bundle--f365a102-e774-5d5e-939c-24ab4259e4b4",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--c6a57506-355d-5c06-8357-32a8b0ac9d67",
            "value": "151.248.24.251"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--e2a34b95-3643-5efb-9d36-86739dd1f9c1",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--0dcd1bb9-d3a3-53cc-a23e-13975f73b7ee",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "domain-name",
            "id": "domain-name--10464b8f-0c9c-54a9-8190-49fa1391b734",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--6d1c6e57-3669-4d4c-9443-b8fc4a26fcf1",
            "name": "Turian",
            "labels": [
                "trojan"
            ]
        }
    ]
}

ChatGPT Response 9

Review:

Perfect!

Here’s what this bundle looks like on a graph;

What’s immediately clear when I graph the bundle is that there are no relationships to describe the link between these objects.

As shown in the first response, GPT-3 does understand the relationships well so let’s see how it copes with generating STIX 2.1 relationship objects.

Input:

Using the original text input supplied at the start of this chat and the generate STIX 2.1 SDOs and SCOs, can you link these STIX 2.1 Objects using STIX 2.1 Relationship Objects please?

Output:

Certainly! Here’s an updated example that includes STIX 2.1 Relationship Objects to link the STIX 2.1 Objects created earlier:

import uuid
import stix2

# Create the STIX 2.1 objects
ipv4_1 = stix2.IPv4Address(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "151.248.24.251")),
    value="151.248.24.251",
)
ipv4_2 = stix2.IPv4Address(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "158.247.222.6")),
    value="158.247.222.6",
)
domain_1 = stix2.DomainName(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "portal-Share.mfa.new")),
    value="portal-Share.mfa.new",
)
domain_2 = stix2.DomainName(
    id=str(uuid.uuid5(uuid.NAMESPACE_DNS, "www.delldrivers.in")),
    value="www.delldrivers.in",
)
malware = stix2.Malware(
    id=str(uuid.uuid4()),
    name="Turian",
    labels=["trojan"],
)

# Create STIX 2.1 Relationship Objects to link the objects
relationship_1 = stix2.Relationship(
    id=str(uuid.uuid4()),
    source_ref=ipv4_1.id,
    relationship_type="associated-with",
    target_ref=domain_1.id,
)
relationship_2 = stix2.Relationship(
    id=str(uuid.uuid4()),
    source_ref=ipv4_2.id,
    relationship_type="associated-with",
    target_ref=domain_2.id,
)
relationship_3 = stix2.Relationship(
    id=str(uuid.uuid4()),
    source_ref=domain_2.id,
    relationship_type="indicates",
    target_ref=malware.id,
)

# Create a STIX 2.1 Bundle that includes the objects and relationships
bundle = stix2.Bundle(
    objects=[ipv4_1, ipv4_2, domain_1, domain_2, malware, relationship_1, relationship_2, relationship_3],
)

# Serialize the bundle to JSON and print it
print(bundle.serialize())

This will produce a valid STIX 2.1 Bundle that includes the STIX 2.1 SDOs and SCOs created earlier, as well as the STIX 2.1 Relationship Objects to link them:

{
    "type": "bundle",
    "id": "bundle--c0b2147f-52cf-56f7-8b67-9b7e912c07bb",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--a51d0cc2-0f09-5d70-b4fc-9ee3ec1d9774",
            "value": "151.248.24.251"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--350c5b62-ef92-54db-b2a5-5a5129a9b418",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--d163cf7b-95ca-5583-88a1-02eaa7c12fb6",
            "value": "portal-Share.mfa.new"
        },
        {
            "

ChatGPT Response 10

Review:

Ah, the response is being cut off again!

This is because ChatGPT response has a limit of 512 tokens for output (i.e. roughly 4096 character limit). So if the output crosses this limit the response will be incomplete.

Tokens are very important here, especially with code output.

As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words for English text.

Source: OpenAI docs

Similarly there is a context limit for each prompt of 4096 characters.

There are two ways I use to get around the output problem I’m facing here:

  1. Just ask for the part of the response you need (in this case the STIX 2.1 Bundle). Now this might now always work if the part of the response you want will always be greater than 4096 characters, in which case…
  2. Simply ask ChatGPT to continue in the next prompt using simple prompts such as go on, keep going, or continue.

Here’s an example of the first approach

Input:

Please print only the STIX 2.1 bundle from the last message.

Output:

Sure! Here’s the STIX bundle without any other output:

{
    "type": "bundle",
    "id": "bundle--c0b2147f-52cf-56f7-8b67-9b7e912c07bb",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--a51d0cc2-0f09-5d70-b4fc-9ee3ec1d9774",
            "value": "151.248.24.251"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--350c5b62-ef92-54db-b2a5-5a5129a9b418",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--d163cf7b-95ca-5583-88a1-02eaa7c12fb6",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "domain-name",
            "id": "domain-name--fa5b5f5b-20ea-5e68-8e15-90f841b209d2",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--12d319e8-f728-41b9-9e81-cd5eb91b5750",
            "name": "Turian",
            "labels": [
                "trojan"
            ]
        },
        {
            "type": "relationship",
            "id": "relationship--f3df2798-8822-5651-b65e-d6c3fda50c28",
            "source_ref": "ipv4-addr--a51d0cc2-0f09-5d70-b4fc-9ee3ec1d9774",
            "target_ref": "domain-name--d163cf7b-95ca-5583-88a1-02eaa7c12fb6",
            "relationship_type": "associated-with"
        },
        {
            "type": "relationship",
            "id": "relationship--a3f4011e-55d3-5a8e-a6c9-431244c2db64",
            "source_ref": "ipv4-addr--350c5b62-ef92-54db-b2a5-5a5129a9b418",
            "target_ref": "domain-name--fa5b5f5b-20ea-5e68-8e15-90f841b209d2",
            "relationship_type": "associated-with"
        },
        {
            "type": "relationship",
            "id": "relationship--ce523f6b-ef6d-5d01-a8e4-4cc4b4cc4f29",
            "source_ref": "domain-name--fa5b5f5b-20ea-5e68-8e15-90f841b209d2",
            "target_ref": "malware--12d319e8-f728-41b9-9e81-cd5eb91b5750",
            "relationship_type": "indicates"
        }
    ]
}

ChatGPT Response 11

Review:

Now we have the full bundle let’s look at it graphically;

If I review the original text again;

The first IP (151.248.24[.]251) contains a historical certificate reference to portal-Share.mfa[.]new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222[.]6) hosted the domain www[.]delldrivers[.]in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample.

I can see the graph is fairly accurate.

If I was writing published intelligence, I would update the relationship types to better reflect the relationships (e.g. change associate-with between IP and domain to equal hosts, or something similar), however for now I’ll leave as is.

There are two other big improvements that I can immediately see would be valuable, and I’ll ask GPT-3 to make them…

Input:

That’s great! Two improvements please;

  1. Create a report object (to represent the text) with:
    • the name property “Chinese Playful Taurus Activity in Iran”
    • the description property should contain the entire original text considered for IOC extraction
    • the published property should be January 18, 2023 at 3:00 AM in RFC 3339 format
    • the object_refs property should list all the generated STIX 2.1 Objects in the bundle, including any new ones generated in this response
  2. Add an Identity SDO so for The Constant Company with a relationship from the IP 158.247.222.6

Please regenerate the STIX 2.1 bundle with these additions and only print the STIX 2.1 bundle in your response.

Output:

Certainly! Here is the STIX 2.1 Bundle containing all the generated STIX 2.1 objects and relationships, as well as the report object and the identity object for The Constant Company:

{
    "type": "bundle",
    "id": "bundle--2e7d5a80-82f7-4a6c-9907-4ef4e05fc357",
    "spec_version": "2.1",
    "objects": [
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--ea132e96-1d3e-5a9a-80df-6650cde12f24",
            "value": "151.248.24.251"
        },
        {
            "type": "ipv4-addr",
            "id": "ipv4-addr--47436f88-c2b4-5a68-9c4b-4c27d7b416e8",
            "value": "158.247.222.6"
        },
        {
            "type": "domain-name",
            "id": "domain-name--35411d71-3b05-5b90-b117-910ee064080c",
            "value": "portal-Share.mfa.new"
        },
        {
            "type": "domain-name",
            "id": "domain-name--3c43d417-e33c-573d-b69a-1fa17b7f5548",
            "value": "www.delldrivers.in"
        },
        {
            "type": "malware",
            "id": "malware--b04a9c47-b3a3-4613-8f74-03c77cf1871a",
            "name": "Turian",
            "labels": [
                "trojan"
            ]
        },
        {
            "type": "identity",
            "id": "identity--f39f2d43-1925-48df-8f6c-5f5a48c74dd8",
            "name": "The Constant Company",
            "identity_class": "organization"
        },
        {
            "type": "relationship",
            "id": "relationship--32c0bde6-817d-5f62-a919-0c3f3d61c3b8",
            "source_ref": "ipv4-addr--ea132e96-1d3e-5a9a-80df-6650cde12f24",
            "relationship_type": "associated-with",
            "target_ref": "domain-name--35411d71-3b05-5b90-b117-910ee064080c"
        },
        {
            "type": "relationship",
            "id": "relationship--8af744ec-e284-56b8-8e2c-4b4a3bc4d9e9",
            "source_ref": "ipv4-addr--47436f88-c2b4-5a68-9c4b-4c27d7b416e8",
            "relationship_type": "associated-with",
            "target_ref": "domain-name--3c43d417-e33c-573d-b69a-1fa17b7f5548"
        },
        {
            "type": "relationship",
            "id": "relationship--c3e6f3a3-8b1a-534e-95c5-904d9870b9f6",
            "source_ref":

ChatGPT Response 12

Review:

The JSON STIX 2.1 bundle is too long, so now I will use a prompt to continue the generation and then concatenate the two code blocks to form a valid bundle.

Input:

Continue please

Output:

            "domain-name--3c43d417-e33c-573d-b69a-1fa17b7f5548",
            "relationship_type": "indicates",
            "target_ref": "malware--b04a9c47-b3a3-4613-8f74-03c77cf1871a"
        },
        {
            "type": "relationship",
            "id": "relationship--f6a1c6ed-38b8-5f7e-a75c-77b39d3d33f9",
            "source_ref": "identity--f39f2d43-1925-48df-8f6c-5f5a48c74dd8",
            "relationship_type": "owns",
            "target_ref": "ipv4-addr--47436f88-c2b4-5a68-9c4b-4c27d7b416e8"
        },
        {
            "type": "report",
            "id": "report--cbf8ec49-fd7c-4ebc-ae7e-cac9f72b39d3",
            "name": "Chinese Playful Taurus Activity in Iran",
            "published": "2023-01-18T03:00:00Z",
            "description": "The first IP (151.248.24.251) contains a historical certificate reference to portal-Share.mfa.new, which suggests an ambiguous “Ministry of Foreign Affairs (MFA)” nexus. The second is a virtual private server (VPS) owned by The Constant Company. This second IP (158.247.222.6) hosted the domain www.delldrivers.in from July 7, 2022 to Oct. 11, 2022. This domain is associated with a Turian backdoor sample.",
            "object_refs": [
                "ipv4-addr--ea132e96-1d3e-5a9a-80df-6650cde12f24",
                "ipv4-addr--47436f88-c2b4-5a68-9c4b-4c27d7b416e8",
                "domain-name--35411d71-3b05-5b90-b117-910ee064080c",
                "domain-name--3c43d417-e33c-573d-b69a-1fa17b7f5548",
                "malware--b04a9c47-b3a3-4613-8f74-03c77cf1871a",
                "identity--f39f2d43-1925-48df-8f6c-5f5a48c74dd8",
                "relationship--32c0bde6-817d-5f62-a919-0c3f3d61c3b8",
                "relationship--8af744ec-e284-56b8-8e2c-4b4a3bc4d9e9",
                "relationship--c3e6f3a3-8b1a-534e-95c5-904d9870b9f6",
                "relationship--f6a1c6ed-38b8-5f7e-a75c-77b39d3d33f9"
            ]
        }
    ]
}

ChatGPT Response 13

Review:

Here is that concatenated bundle;

I’m fairly happy with that!

Next up: Enriching the extracted observables and entities

It would appear GPT-3 is pretty good at extraction of indicators of compromise and linking them together; but what about non-explicit references?

To test this in the next post I will try to enrich the report data with more external context.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.