Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will give you an introduction of TAXII 2.1 concepts and prepare you for the following posts in the tutorial series.

TAXII, or Trusted Automated Exchange of Intelligence Information, is a protocol designed specifically to share cyber threat intelligence.

TAXII enables organisations to share CTI by defining a single API that all upstream and downstream technology can be built to support, removing the issues of trying to support many individual API designs.

Before I kick off, it’s important to make the disctinction between STIX and TAXII. Many often confuse the two, myself included only a few month ago.

STIX is a representation of threat intelligence – the content (covered in a previous tutorial).

TAXII is a standard way to share that content – the protocol.

So how are they related? A TAXII Server must be able to handle STIX content (print it in responses, and receive it from producer).

A TAXII server can also handle other intelligence formats, in addition to STIX. However, in the following tutorial posts, I will only focus on TAXII servers and clients that use STIX 2.1 structured data. Why? 99% of the industry use TAXII to distribute only STIX data (total guess, however, I’m yet to see a TAXII server distributing anything but STIX.. prove me wrong!).

Just one final note for the avoidance of doubt… one of the reasons that leads to confusion is this versioning of the standards. STIX is currently on version 2.1. TAXII, also currently on 2.1. However the versioning is completely independent and there is no coupling of the two based on version.

Now that’s clear (hopefully), at its core; TAXII has two main concepts…

  1. TAXII Servers: store created intelligence from and disseminate it to consumers (the TAXII Clients) via the API
  2. TAXII Clients: that publish intelligence and/or consume intelligence from TAXII Servers

Note, a TAXII Server and Client can be the same machine.

For example, a Threat Intelligence platform acts as a TAXII Client and consumes intelligence feeds from remote TAXII Servers. The TIP also acts a TAXII server for downstream security tools connecting to the TIP to poll for curated intel.

TAXII Services

The TAXII 2.1 specification defines two primary services to support a variety of common intelligence sharing models:

TAXII 2.1 Collections and Channels

  • Collections: An interface to a server-provided repository of objects that allows a producer to serve consumers in a request-response template.
  • Channels: Allows the exchange of information according to a publish-subscribe model.

For the more technically inclined, a good equivalent is to think of Collections as a REST API and Channels as webhooks.

In reality, the Channels service is not yet defined and I am not really sure why OASIS decided to include it in the published specification. As such, Channels will be touched on minimally in this tutorial.

Collections and Channels can be organized in different ways.

The search for information on a TAXII server depends on what you are looking for and how you want to receive it. Generally the design of Collections and Channels on a TAXII Server will look something like this

TAXII 2.1 Flow diagram

TAXII Clients contain the logic to consume data from and publish data to Collections (request/response) or Channels (streamed) via the TAXII Servers API.

A TAXII Client might just be a script making API calls to the TAXII Server to retrieve data, though a few fully fledged TAXII Clients products with more advanced logic to interact with TAXII APIs exist, some of which I will show you in this tutorial.

Before I get into those, I will demonstrate these concepts by going through the TAXII APIs.

To do this, lets jump right in and install a TAXII Server.

Medallion TAXII server

Medallion is an an open-source TAXII implementation build by OASIS.

Medallion has been designed to be a simple front-end REST server providing access to the endpoints defined in the TAXII 2.1 specification.

It is important to note that medallion was designed as a prototype and a reference implementation of TAXII 2.1 – it was not intended for production use.

Medallion can be installed very simply;

git clone https://github.com/oasis-open/cti-taxii-server
cd cti-taxii-server
git checkout v2.0.1
python3 -m venv cti-taxii-server
source cti-taxii-server/bin/activate
pip3 install medallion
medallion -h
usage: medallion [-h] [--host HOST] [--port PORT] [--debug-mode]
                 [--log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}]
                 CONFIG_PATH

medallion v3.0.0

positional arguments:
  CONFIG_PATH           The location of the JSON configuration file to use.

options:
  -h, --help            show this help message and exit

  --host HOST           The host to listen on.

  --port PORT           The port of the web server.

  --debug-mode          If set, start application in debug mode.

  --log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}
                        The logging output level for medallion.

If you are wondering why I cloned the cti-taxii-server repository, it is to have the demo config and data files locally which will be used in this tutorial.

The first thing needed is to define the configurations in the Medallion config file. The config file contains:

  1. configuration information for the backend plugin (including an initial data file)
  2. a simple user name/password dictionary

Two back-end plugins are provided with medallion,

  1. The Memory back-end: persists data “in memory”. It is initialized using a json file that contains TAXII data and metadata. It is possible to save the current state of the in memory store, but this back-end is really intended only for testing purposes
  2. The MongoDB backend is somewhat more robust and makes use of a MongoDB server, installed independently.

For simplicity in this tutorial, I will demonstrate with the Memory back-end.

Older versions of the repository also ship with a sample data file containing STIX 2.1 object under /medallion/test/data/default_data.json that will fill the TAXII server with initial data (you can of course add more / remove it once the server is running, as shown in the last post).

In this sample data file you will see the Discovery URL config (/discovery), API Roots config (e.g. api1, trustgroup1), and the collections config within the API Roots (and the STIX 2.1 Objects added to them).

If it is unclear how these are structured in the file at first glance it will become much clearer as I continue.

Park that for now, lets look at authenticating to the server first.

As required by the TAXII specification, Medallion supports HTTP Basic authorization. The user names and passwords for this are stored in the config file in plain text (remember, it is not production ready).

To add a user on our server, first create a new config file;

vi config_file.json

Then add the following content to add two users;

{
    "backend": {
        "module": "medallion.backends.memory_backend",
        "module_class": "MemoryBackend",
        "filename": "medallion/test/data/default_data.json"
    },
    "users": {
        "admin": "Password0",
        "user1": "Password1"
    },
    "taxii": {
        "max_page_size": 100
    }
}

Now the basic config file is complete, I can run Medallion like so;

medallion config_file.json --host localhost --port 5000

If successful, you should see the server start;

 * Serving Flask app 'medallion'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://localhost:5000

Now that the server is running, I can now interact with it.

The following flows demonstrates the logic of how TAXII Clients generally interact with a TAXII Servers API to get STIX 2.1 structured cyber threat intelligence…

TAXII Discovery

It can be assumed a client trying access a TAXII server is unaware of the collections of data that it holds.

That’s where the server discovery is useful. The HOST/taxii2/ should always return the available API roots which show where the data is.

However, if you go to http://localhost:5000/taxii2 in the browser (and enter the credentials, admin:Password0) you’ll see the error;

{
    "description": "Media type in the Accept header is invalid or not found.",
    "http_status": "406",
    "title": "ProcessingError"
}

That’s because the endpoint is expecting a few things.

First need to encode a user:password to base64 to pass in the Authorization header. You can use one of the users in the config file to do this (e.g. admin:Password0 in base64 = YWRtaW46UGFzc3dvcmQw).

Secondly, under the “Required Headers” section of the TAXII 2 specification for the endpoint /taxii2 you’ll see Accept: application/taxii+json;version=2.1.

So I will pass both of these in the header of my request. In a new terminal window (but keep the Medallion TAXII terminal window open), lets discover our new TAXII server…

curl -X GET "http://localhost:5000/taxii2/" \
    -H "Authorization: Basic YWRtaW46UGFzc3dvcmQw" \
    -H "Accept: application/taxii+json;version=2.1"
{
    "api_roots": [
        "http://localhost:5000/api1/",
        "http://localhost:5000/api2/",
        "http://localhost:5000/trustgroup1/"
    ],
    "contact": "string containing contact information",
    "default": "http://localhost:5000/trustgroup1/",
    "description": "This TAXII Server contains a listing of",
    "title": "Some TAXII Server"
}   

This data all comes from the default_data.json file. See the /discovery config;

{
    "/discovery": {
        "title": "Some TAXII Server",
        "description": "This TAXII Server contains a listing of",
        "contact": "string containing contact information",
        "default": "http://localhost:5000/trustgroup1/",
        "api_roots": [
            "http://localhost:5000/api1/",
            "http://localhost:5000/api2/",
            "http://localhost:5000/trustgroup1/"
        ]
    },

You can see in the API response there are API roots…

TAXII API Roots

In the previous example the server exposed the following API Roots to the user; /api/v1/group1, /api/v1/group2, and /api/v1/group3. There could be even more routes on this server, they are just not accessible to the authenticated user.

API Roots are logical groupings of TAXII Collections (and in the future, Channels). A TAXII server instance can support one or more API Roots.

API Roots offer a way for the owner of a TAXII Server to segment the data between groups of users. A user can have access to zero or more API Roots depending on the user permissions assigned to them on the TAXII server.

For example, a single TAXII Server could host multiple API Roots - one API Root for Collections and Channels used by Sharing Group A and another API Root for Collections and Channels used by Sharing Group B. Different consumers will have access to different groups (aka API Roots)/

To see what is inside an API Root you can use the Get API Root Information endpoint.

The GET request for the endpoint takes the form HOST/<API_ROOT>/.

Lets take a look at the /trustgroup1 API Root;

curl -X GET "http://localhost:5000/trustgroup1/" \
    -H "Authorization: Basic YWRtaW46UGFzc3dvcmQw" \
    -H "Accept: application/taxii+json;version=2.1"
{
    "description": "A trust group setup for malware researchers",
    "max_content_length": 9765625,
    "title": "Malware Research Group",
    "versions": [
        "application/taxii+json;version=2.1"
    ]
} 

Again this data is pulled from the servers memory, loaded from the default_data.json file.

As you can see this endpoint is useful as it describes what type of data is in the root.

Now I know this root is what I want, lets look at the TAXII Collections it holds…

TAXII Collections

A TAXII Collection is a logical grouping of threat intelligence that enables the exchange of information between a TAXII Client and a TAXII Server via a TAXII API in a request-response manner.

For example, I might create a collection for compromised credit card intelligence, another for C2 domains, etc. Ultimately a creator can decide what is in the collection. Many producers simply have one collection (that users can filter using the TAXII API parameters, more on that later).

I can discover the Collection in a root using the collections endpoint like so;

curl -X GET "http://localhost:5000/trustgroup1/collections/" \
    -H "Authorization: Basic YWRtaW46UGFzc3dvcmQw" \
    -H "Accept: application/taxii+json;version=2.1" 
{
    "collections": [
        {
            "can_read": false,
            "can_write": true,
            "id": "472c94ae-3113-4e3e-a4dd-a9f4ac7471d4",
            "media_types": [
                "application/stix+json;version=2.1"
            ],
            "title": "This data collection is for testing querying across collections"
        },
        {
            "can_read": true,
            "can_write": true,
            "id": "365fed99-08fa-fdcd-a1b3-fb247eb41d01",
            "media_types": [
                "application/stix+json;version=2.1"
            ],
            "title": "This data collection is for testing adding objects"
        },
        {
            "can_read": true,
            "can_write": true,
            "description": "This data collection is for collecting high value IOCs", "id": "91a7b528-80eb-42ed-a74d-c6fbd5a26116",
            "media_types": [
                "application/stix+json;version=2.1"
            ],
            "title": "High Value Indicator Collection"
        },
        {
            "can_read": true,
            "can_write": false,
            "description": "This data collection is for collecting current IOCs",
            "id": "52892447-4d7e-4f70-b94d-d7f22742ff63", 
            "media_types": [
                "application/stix+json;version=2.1"
            ],
            "title": "Indicators from the past 24-hours"
        },
        {
            "can_read": false,
            "can_write": false,
            "description": "Non accessible",
            "id": "64993447-4d7e-4f70-b94d-d7f33742ee63",
            "media_types": [
                "application/stix+json;version=2.1"],
            "title": "Secret Indicators"
        }
    ]
}  

The response shows 5 available Collections to the authenticated user.

Note some are can_read=true only, others I can can_read=true and can_write=true.

Most TAXII servers offer collections that are are only, that is a collection has can_read=true and can_write=false.

However, although more rare, TAXII servers can be used to provide users write access, can_write=true, in order to add their data into a collection. This is often seen in threat sharing communities who use a collection as a pool of all their intelligence.

Also, remember earlier I mentioned TAXII servers could be used to distribute intelligence in standards other than STIX. The media_type property of each collection tells the Client polling it what type of data is held within in.

I can then drill down into an individual Collection using its ID as follows;

curl -X GET "http://localhost:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/objects/" \
    -H "Authorization: Basic YWRtaW46UGFzc3dvcmQw" \
    -H "Accept: application/taxii+json;version=2.1" 
{
   "more":false,
   "objects":[
      {
         "created":"2014-05-08T09:00:00.000Z",
         "id":"relationship--2f9a9aa9-108a-4333-83e2-4fb25add0463",
         "modified":"2014-05-08T09:00:00.000Z",
         "relationship_type":"indicates",
         "source_ref":"indicator--cd981c25-8042-4166-8945-51178443bdac",
         "spec_version":"2.1",
         "target_ref":"malware--c0931cc6-c75e-47e5-9036-78fabc95d4ec",
         "type":"relationship"
      },
      {
         "created":"2014-05-08T09:00:00.000Z",
         "id":"indicator--cd981c25-8042-4166-8945-51178443bdac",
         "indicator_types":[
            "file-hash-watchlist"
         ],
         "modified":"2014-05-08T09:00:00.000Z",
         "name":"File hash for Poison Ivy variant",
         "pattern":"[file:hashes.'SHA-256' = 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c']",
         "pattern_type":"stix",
         "spec_version":"2.1",
         "type":"indicator",
         "valid_from":"2014-05-08T09:00:00.000000Z"
      },
      {
         "created":"2017-01-20T00:00:00.000Z",
         "definition":{
            "tlp":"green"
         },
         "definition_type":"tlp",
         "id":"marking-definition--34098fce-860f-48ae-8e50-ebd3cc5e41da",
         "name":"TLP:GREEN",
         "spec_version":"2.1",
         "type":"marking-definition"
      },
      {
         "created":"2017-01-27T13:49:53.997Z",
         "description":"Poison Ivy",
         "id":"malware--c0931cc6-c75e-47e5-9036-78fabc95d4ec",
         "is_family":true,
         "malware_types":[
            "remote-access-trojan"
         ],
         "modified":"2017-01-27T13:49:53.997Z",
         "name":"Poison Ivy",
         "spec_version":"2.1",
         "type":"malware"
      },
      {
         "created":"2016-11-03T12:30:59.000Z",
         "description":"Accessing this url will infect your machine with malware. This is the last updated indicator",
         "id":"indicator--6770298f-0fd8-471a-ab8c-1c658a46574e",
         "indicator_types":[
            "url-watchlist"
         ],
         "modified":"2017-01-27T13:49:53.935Z",
         "name":"Malicious site hosting downloader",
         "pattern":"[url:value = 'http://x4z9arb.cn/4712']",
         "pattern_type":"stix",
         "spec_version":"2.1",
         "type":"indicator",
         "valid_from":"2016-11-03T12:30:59.000Z"
      }
   ]
}

Note the use of the more property. This is how pagination is implemented in TAXII.

Here, it is more=false because the max_page_size in the config was 100 results, and there are only 5 result here. However, when more pages are present (in this case where more than 100 results exist for the query), more=true.

Let me modify the config_file_pagination.json to demonstrate…

vi config_file_pagination.json

Then add the following content to add two users;

{
    "backend": {
        "module": "medallion.backends.memory_backend",
        "module_class": "MemoryBackend",
        "filename": "medallion/test/data/default_data.json"
    },
    "users": {
        "admin": "Password0",
        "user1": "Password1"
    },
    "taxii": {
        "max_page_size": 2
    }
}

Here I’ve set the max_page_size to 2 objects (note this only applies to STIX object endpoints).

And now restart the TAXII server…

medallion config_file_pagination.json --host localhost --port 5000

And run the same request…

curl -X GET "http://localhost:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/objects/" \
    -H "Authorization: Basic YWRtaW46UGFzc3dvcmQw" \
    -H "Accept: application/taxii+json;version=2.1" 
{
   "more":true,
   "next":"96e8fbed-0f9b-48fb-bff3-d9dece725c10",
   "objects":[
      {
         "created":"2014-05-08T09:00:00.000Z",
         "id":"relationship--2f9a9aa9-108a-4333-83e2-4fb25add0463",
         "modified":"2014-05-08T09:00:00.000Z",
         "relationship_type":"indicates",
         "source_ref":"indicator--cd981c25-8042-4166-8945-51178443bdac",
         "spec_version":"2.1",
         "target_ref":"malware--c0931cc6-c75e-47e5-9036-78fabc95d4ec",
         "type":"relationship"
      },
      {
         "created":"2014-05-08T09:00:00.000Z",
         "id":"indicator--cd981c25-8042-4166-8945-51178443bdac",
         "indicator_types":[
            "file-hash-watchlist"
         ],
         "modified":"2014-05-08T09:00:00.000Z",
         "name":"File hash for Poison Ivy variant",
         "pattern":"[file:hashes.'SHA-256' = 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c']",
         "pattern_type":"stix",
         "spec_version":"2.1",
         "type":"indicator",
         "valid_from":"2014-05-08T09:00:00.000000Z"
      }
   ]
}

See now only the first two results are returned.

Filtering and querying objects

TAXII Collections often contain a lot of data, thus the need for pagination.

The TAXII specification has a range of parameters a TAXII Client can use to filter the objects returned to make the request/response flow more effecient for requirements.

In the next post I’ll show you how this filtering works with more hands-on examples.


TAXII 2.1 Certification (Virtual and In Person)

The content used in this post is a small subset of our full training material used in our TAXII 2.1 training.

If you want to join a select group of certified TAXII 2.1 professionals, subscribe to our newsletter below to be notified of new course dates.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.