Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you how I craft STIX schemas for STIX 2.1 Extensions.

Last year I wrote a post on customising STIX Objects using extensions. That post proved rather useful to us recently as we needed to create our own for file2stix.

In April I showed you a proof-of-concept for adding a custom extension to represent a full Sigma Rules more logically (in addition to the entire YAML in the Indicator SDOs pattern field).

file2stix uses this extension definition when creating Indicator SDOs when representing Sigma Rules.

file2stix also uses other extensions (where extension_type = property-extension) to extend Indicator SDO and Software SCO Objects when representing;

  • MISP / Custom Warning List Matches (Indicator SDO)
  • NVD CPE enrichments (Software SCO)
  • NVD CVE enrichments (Vulnerability SDO)

file2stix also creates 4 types of custom SCOs (where extension_type = new-sco) using extensions for the following detections;

  • User agent (user-agent)
  • Credit card (credit-card)
  • Bank account (bank-account)
  • Cryptocurrency wallet (cryptocurrency-wallet)

You can view them all on GitHub here.

Creating the extension definition objects is easy – simply create an extension-definition object with the required properties.

However, one of the crucial parts I glossed over in previous posts was defining the schema for available properties used in an extension.

In the extension definition objects you might have seen the schema property that typically links to a public site where the schema can be viewed, for example;

    "schema": "https://raw.githubusercontent.com/signalscorps/stix2-objects/main/schemas/new-sco/cryptocurrency-wallet/schema.json",

A well defined schema is vital for creators of STIX objects wanting to use your schema to understand the properties and data types available for them to use. It’s equally important for consumers to understand the type of values that can be returned.

When getting started with defining a schema (especially if you’re new to it, like I was, and still am, is to take a look at some existing examples – the schemas for native STIX objects created by OASIS are perfect for this. For example, the Vulnerability SDO schema.

This guide, Understanding JSON Schema, is also a helpful resource for newbies too.

And that is important to keep in mind, I am a newbie. I am sharing how I built our schemas. It is definitely not the best way and If someone drops into the community Slack to tell me a better way, I’d be very grateful!

Schema metadata

The top level part of the schema defines its purpose and format.

e.g.

{
    "$id": "https://raw.githubusercontent.com/signalscorps/stix2-objects/main/schemas/new-sco/cryptocurrency-wallet/schema.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema#",
    "title": "cryptocurrency-wallet",
    "description": "This extension creates a new SCO that can be used to represent cryptocurrency wallets.",
    "type": "object",
    "allOf": [

Breaking down each of these properties;

  • $id: Should be unique. The easiest way to do this is to point to the schema file itself (e.g. http://raw.githubusercontent.com/oasis-open/cti-stix2-json-schemas/stix2.1/schemas/sdos/vulnerability.json)
  • $schema: STIX conforms to JSON schemas, so this should always reference the JOSN schema (e.g http://json-schema.org/draft/2020-12/schema)
  • title: Title of the schema – not the title of STIX object (but can be the same). e.g. Vulnerability
  • description: Description of the schema – not description of STIX Object (but can be the same). e.g. A Vulnerability is a mistake in software that can be directly used by a hacker to gain access to a system or network.
  • type: The JSON data model used. Should always be object in the case of STIX.

Below the top-level schema information you will see the allOf property, which the full schema is nested within.

Inheriting schemas metadata

One of the useful features when defining a schema is the ability to inherit other schemas.

For example, the Vulnerability SDO Schema, inherits the core STIX schema], defined in the Vulnerability SDO Schema like so;

"allOf": [
    {
        "$ref": "../common/core.json"
    },

This includes the common STIX 2.1 properties.

common STIX 2.1 properties

e.g. id, created_by_ref, labels, etc.

In the crytocurrency-wallet new-sco schema I inherit the core SCO schema;

{
    "$ref": "http://raw.githubusercontent.com/oasis-open/cti-stix2-json-schemas/stix2.1/schemas/common/cyber-observable-core.json"
},

Similarly in the CPE property-extension for Software SCOs, I inherit the Software SCO schema, like so;

{
    "$ref": "https://github.com/oasis-open/cti-stix2-json-schemas/blob/master/schemas/observables/software.json"
},

…which in turn inherits the SCO core schema (in the Software SCO schema).

In the CPE property you will also see the following property;

"$ref": "https://csrc.nist.gov/schema/nvd/api/2.0/cpe_api_json_2.0.schema"

Which in addition to the STIX schemas being inherited, is also pulling in an external schema.

Nesting property-extensions

In the case of the extension_type = property-extension I need to define how the extension will be nested in the original STIX Object.

Using the example CPE extension nested in the software SCO);

{
    "type": "software",
    "extensions": {
        "extension-definition--6c453e0f-9895-498f-a273-2e2dda473377": {
            "extension_type": "property-extension",
            "nvd_cpe": {

Nested under the Software SCOs extensions property will be the id of the extension definition object for my CPE definition (extension-definition--6c453e0f-9895-498f-a273-2e2dda473377).

Nested below that will the extension_type is declared and then the nvd_cpe property which contains all the NVD enrichment data nested within it.

Complete example here for clarity.

Which represented in my schema looks like;

"properties": {
    "extensions": {
        "extension-definition--6c453e0f-9895-498f-a273-2e2dda473377": {
            "extension_type": "property-extension",
            "nvd_cpe": {
                "$ref": "https://csrc.nist.gov/schema/nvd/api/2.0/cpe_api_json_2.0.schema"
            },
            "required": [
                "extension_type",
                "nvd_cpe"
            ]
        },
        "required": [
            "extension-definition--6c453e0f-9895-498f-a273-2e2dda473377"
        ]
    }
}

Auto-generating schema from an example object

I have found the easiest way to generate the schema for each property in an extension (if there is not an external schema available, like for the CPE extension) was to first mock it up in an example, and use an automated schema creation tool to create a skeleton schema that can be built upon.

GenSON proved to be a good tool for auto-generating a schema. GenSON can be installed an used like so;

git clone https://github.com/signalscorps/stix2-objects
cd stix2-objects
python3 -m venv genson
source genson/bin/activate
pip3 install genson

Once installed, I can then run it on one of the example STIX 2.1 Objects like so;

genson schemas/property-extension/warning-list-extension/example.json

The output will provide a full JSON schema listing all the properties in the object. In this case, this includes both the core Software SCO properties (e.g. id, type, etc) and the nested extension definition properties inside it (e.g. cpe23Uri, part, etc.).

Firstly, the core properties can all be cut from the file (these are already defined in the inherited STIX core schemas).

The output for the custom properties tries to identify the correct datatype, however, unless you write very detailed examples the data type identified for most fields will string (unless the data type is obvious to GenSON e.g. deprecated = boolean);

"warning_list_match": {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "list_name": {
                "type": "string"
            },
            "list_url": {
                "type": "string"
            },
            "list_type": {
                "type": "string"
            }
        },

Nonetheless this data is still useful. It is these properties extracted by GenSON we can use as the skeleton to define our schema.

The next step is to go one by one through each property.

Let me show you by going over a few examples.

Cleaning up the json schema

Defining fixed fields

Starting with extension_type, for property extensions I know this must always be property-extension.

Therefore for my Warning List schema I can define an enum for this property with only property-extension available.

Going from the GenSON output;

"extension_type": {
    "type": "string"
},

to;

"extension_type": {
    "type": "string",
    "description": "The value of this property MUST be property-extension.",
    "enum": [
        "property-extension"
    ]
},

As shown, it is useful to add a description property to the definition to provide more clarity to those reading it.

Dealing with arrays

Now lets look at the input property in the crytocurrency-wallet schema.

The input property contains objects with the properties address_ref and amount_sent

The two nested properties in an object make the data type an array. Within the array are the two items.

The output from GenSON is correct without modification; the general structure and data types identified are all correct. All I did was to add descriptions that gave a schema as follows;

"input": {
    "type": "array",
    "description": "One or more input addresses for the transaction",
    "items": {
        "type": "object",
        "properties": {
            "address_ref": {
                "type": "string",
                "description": "Specifies the hash of the sender (source) wallet, as a reference to a cryptocurrency-wallet object"
            },
            "amount_sent": {
                "type": "number",
                "description": "The amount sent by this input (in the same currency as the cryptocurrency defined in the symbol property)"
            }
        },
        "required": [
            "address_ref",
            "amount_sent"
        ]
    }
},

Pattern validation

In our Credit Card SCO, there’s a few more constraints to the fields I decided to add a bit more logic to the field values to ensure users avoid making mistakes when creating objects.

For the id field the values must always start with credit-card--. This can be defined in a pattern property.

        "id": {
          "title": "id",
          "pattern": "^credit-card--"
        },

Regular expression patterns like this can be made much more complex, if required.

When it comes to the value of the credit card (the number property), it’s also possible to be even more specific about the values accepted using the schema.

Firstly, the type is not a string. Credit card numbers, by definition, only contain numbers.

Secondly I also know that most major credit cards do not have more than 20 digits.

Therefore I can change the simplistic GenSON output (where only "type": "string" is defined) to a tighter schema definition as follows;

"number": {
    "type": "number",
    "description": "Specifies the full credit card number with no spaces.",
    "maxLength": 20
},

If I really wanted, I could use other schema properties to further define acceptable values (e.g. number must start with 4242 if issuer is visa, etc.).

Required fields

Finally, we can define the required fields. By default GenSON assumes all properties are required, which for my extensions is not the case.

For our Credit Card new-sco I define the following top-level properties inside the extension as always required;

    "required": [
        "type",
        "id",
        "number"
    ],

It is also a requirement in the STIX specification that extension_type is declared for custom extensions.

In the credit card SCO you can see this defined in the nested objects, extensions and extension-definition--abd6fc0e-749e-4e6c-a20c-1faa419f5ee4 respectively.

        "required": [
            "extension_type"
        ]
    }
},
"required": [
    "extension-definition--abd6fc0e-749e-4e6c-a20c-1faa419f5ee4"
]

Sharing your extension

Once you’ve create a schema for your extensions it’s time to start using it in STIX objects.

There is no requirement to share your extension if the objects you create from them are never shared outside your organisation.

Of course, if you want to share objects that use extensions then it is important that the schema and extension definition objects are available.

Signals Corps extensions are share publicly in our STIX2 Objects repository.

OASIS also accept pull requests to their STIX common object repository here. There are a few examples in the extension-definition-specification directory (ignore the stix1x ones).

Submission to the OASIS repository can ultimately end up in your extensions being adopted into the core STIX schema. There is a very detailed policy from OASIS outlining this.


STIX 2.1 Certification (Virtual and In Person)

The content used in this post is a small subset of our full training material used in our STIX 2.1 training.

If you want to join a select group of certified STIX 2.1 professionals, subscribe to our newsletter below to be notified of new course dates.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.