Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you how to turn unstructured data into structured threat intelligence with ATT&CK context.

Note: this tutorial is written for MITRE ATT&CK version 11.0 (published on 2022-04-24) and TRAM v1.2.0 (released on 2022-06-04). Some of the concepts discussed are not correct for different versions.

A significant amount of intelligence available today is still shared through blogs, advisories and research articles, which requires further processing to make it machine readable (and usable by SIEM and SOAR tools).

Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.

MITRE’s Threat Report ATT&CK Mapping (TRAM) uses Natural Language Processing (NLP) and Artificial Intelligence / Machine Learning (AI/ML) to map Threat Reports to MITRE ATT&CK Techniques.

Here is a nice overview of the ATT&CK Navigator presented by MITRE:

Install and run

1. Download the required repositories

The TRAM code is open source and available on GitHub.

The documentation for TRAM is here.

First get the last

git https://github.com/center-for-threat-informed-defense/tram/
cd tram/docker

2. Build and run

docker-compose up

Now open up a browser and navigate to localhost:8000.

If you are using the default docker-compose.yml the default username and password values can be found in the variables DJANGO_SUPERUSER_USERNAME and DJANGO_SUPERUSER_PASSWORD (which are user: djangoSuperuser, password: LEGITPassword1234).

TRAM Workflow

The general TRAM workflow is;

  1. A Threat Report is added to the job processing queue
  2. TRAM breaks the Threat Report into Sentences
  3. The AI/ML model proposes ATT&CK Techniques on a per-sentence basis
  4. Someone (e.g. an analyst) edits and confirms the mappings
  5. (Optional) The mappings can be exported to support other workflows
  6. (Optional) The AI/ML model can be retrained on confirmed mappings

MITRE TRAM Upload report

On first install you will see some training data listed (Bootstrap Training Data). I will go ahead and upload my own report.

To upload via the UI, simply click the “Upload Report” button in the navigation bar.

For this demo I will use the following report from FireEye; APT39: An Iranian Cyber Espionage Group Focused on Personal Information.

MITRE have created a .pdf copy I can use.

Pro-tip: Many of you will want to use TRAM with web content. TRAM can be used with HTML content. All you need to do is export the webpage (in Chrome; File > Save Page As), and then upload the .html file to TRAM. If the website uses lots of Javascript (and other complexity) the results can vary (as TRAM cannot parse the HTML tags correctly). In these cases, I would recommend exporting the page a .pdf and using that with TRAM instead (in Chrome; Print > Destination = Save as PDF).

Once TRAM has finished processing the report and comparing the sentences extracted against the default model, its status will change from “Queued” to “Reviewing”.

MITRE TRAM report reviewing

You can see TRAM has extracted a total of 32 ATT&CK Techniques for review. I can review them by clicking “Analyze”.

MITRE TRAM report extraction

You can now see each sentence extracted on the left hand side. Where a number is shown in the left column, it indicates TRAM has detected that number of Techniques for the sentence (might be more than one).

Clicking a sentence where a Technique has been extracted shows the Technique(s) identified and the confidence score of the extraction.

I can now review the accuracy manually with one of three outcomes;

  1. I agree with the extraction and want to assign the Technique(s) to the sentence by clicking “Accepted”. You will see the row turn green.
  2. I disagree with the extraction and want to remove it by clicking the red no entry button. It will remove the Technique.
  3. I want to add an additional Technique to the sentence by clicking the green “Add Mapping” button and selecting the Object I want to add. The TRAM model support Technique mappings by default, however, this manual option also allows you to link any ATT&CK Object (Groups, Software, etc.) to the sentence. In the screenshot below, I am adding the Group G0087 to a sentence.

MITRE TRAM add group

Once all the mappings have been reviewed and accepted you can close the report.

MITRE TRAM report reviewed

Note, if you do not review and accept all sentences (some remain yellow), even ones with no mappings, the report will remain in “Reviewing” state.

MITRE TRAM export results

You can now export the data in either .docx and .json format, using the “Export” button (not the “Download” button which simply exports the original upload).

The .docx structure is as follows;

MITRE TRAM Word Export

The .json structure is more verbose, as it includes the original text as well as the ATT&CK mappings. Each sentence and its mapping is represented like so;

  "sentences": [
    {
      "id": 13568,
      "text": " \n \n \n \n \nAPT39: An Iranian Cyber \nEspionage Group Focused on \nPersonal Information \n \n \n \n \n \n \nJanuary 29, 2019\t\n\t\nSarah\tHawley,\tBen\tRead,\tCristiana\tBrafman-Kittner,\tNalani\tFraser,\tAndrew\t\nThompson,\tYuri\tRozhansky,\tSanaz\tYashar\t\n \n In December 2018, FireEye identified APT39 as an Iranian cyber espionage group \nresponsible for widespread theft of personal information.",
      "order": 0,
      "disposition": "accept",
      "mappings": [
        {
          "id": 1785,
          "attack_id": "G0087",
          "name": "APT39",
          "confidence": "100.0"
        }
      ]
    },

In this example the sentence (text) is linked to the ATT&CK ID G0087 ("attack_id": "G0087").

Here is the full .json for my report showing the full structure.

The exported data can then be used in other tool, for example, Navigator. In Navigator you could create a new layer representing this report, and assign the Objects detected by TRAM.

TRAM Machine Learning Models

TRAM has four machine learning models that can be used out-of-the-box:

MITRE TRAM ML Mapping

All ML models are implemented as an SKLearn Pipeline.

MITRE TRAM LogisticRegressionModel Accuracy

The AI/ML models are retrained on confirmed mappings.

It is also possible to add your own models to the SKLearn Pipeline. I will not cover that in this tutorial. You can read the following section of the TRAM documentation for more information about how this can be achieved.

Other notable cyber threat intelligence NLP projects

The field of NLP/ML is an exciting one in cyber threat intelligence, though it has some way to go.

Whilst not ATT&CK specific, I did want to take this opportunity to share some similar tools that look at extracting intelligence from relatively unstructured data sources.

Unstructured Threat Intelligence Processing Tool (UTIP)

One of the first fully formed project I found was Unstructured Threat Intelligence Processing Tool (UTIP).

UTIP appears to be built by a team at Accenture and was presented at Black Hat in 2015, though I couldn’t find the session listed publicly, nor find the recording).

Unstructured Threat Intelligence Processing Tool

Unstructured Threat Intelligence Processing Tool

From the slides you can see information extraction is done on uploaded reports. The logic for extraction (and confidence in extraction) comes from an NLP component (sadly, not much I can deduce from the slides about this) before the data is packaged into STIX 1.0.

Death to the IOC: What’s Next in Threat Intelligence

Another Black Hat talk, this time from 2019; Death to the IOC: What’s Next in Threat Intelligence.

The talk describes the process of building a Cyber Entity Extractor and the evaluation of various models on real world data to do so.

Slides from the talk here.

Making Sense of Unstructured Threat Data

Applies Doc2Vec classification methodology to cluster vulnerability descriptions from the NVD and map clusters to a specific ATT&CK technique.

TTPDrill

In 2017 a conference paper is released with the title; TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources.

TTP builds on UTIP as it maps of extracted TTPs to other standardised frameworks including ATT&CK and the Kill Chain.

The project appears to have since been abandoned, the last commit on GitHub was 2 years ago.


ATT&CK Certification (Virtual and In Person)

The content used in this post is a small subset of our full training material used in our ATT&CK training.

If you want to join a select group of certified ATT&CK professionals, subscribe to our newsletter below to be notified of new course dates.




Our brand new Discord!

Like this blog?

Sign up to receive new posts in your inbox.


Stixify

Stixify. Extract machine readable intelligence from unstructured data.

Extract machine readable intelligence from unstructured data.

Obstracts

Obstracts

Turn any blog into structured threat intelligence.


Vulmatch

Vulmatch

Know when software you use is vulnerable, how it is being exploited, and how to detect an attack.

SIEM Rules

SIEM Rules. Your detection engineering database.

View, modify, and deploy SIEM rules for threat hunting.