So that is can be used in SOC detection workflows.

Security operations have become increasingly proficient at using structured threat intelligence to enrich alerts, and accelerate investigation and threat hunting workflows.

Threat intelligence platforms have largely automated the process of collecting, extracting and normalizing intel from structured data sources.

However, a significant amount of intelligence available today is still shared through blogs, advisories and research articles, which requires further processing to make it machine readable (and usable by SIEM and SOAR tools).

In Stixify and Obstracts we currently keyword matches, for example creating a STIX 2.1 object from a keyword or regular expression match.

The problem with relying on such logic is that reports are often very descriptive. For example, different authors describe the same behaviours and techniques with differing language.

Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.

There is a lot of academic research into natural language processing (NLP). In other fields this research has been widely commercialised, and we’re now starting to see it being adopted in the field of threat intelligence.

I’ve recently been doing a bit of research into open tools that turn unstructured intelligence into a structured format, like STIX 2.1, using NLP.

Unstructured Threat Intelligence Processing Tool (UTIP)

One of the first fully formed project I found was Unstructured Threat Intelligence Processing Tool (UTIP).

UTIP appears to be built by a team at Accenture and was presented at Black Hat in 2015, though I couldn’t find the session listed publicly, nor find the recording).

Unstructured Threat Intelligence Processing Tool

Unstructured Threat Intelligence Processing Tool

From the slides you can see information extraction is done on uploaded reports. The logic for extraction (and confidence in extraction) comes from an NLP component (sadly, not much I can deduce from the slides about this) before the data is packaged into STIX 1.0.

TTPDrill

In 2017 a conference paper is released with the title; TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources.

TTP builds on UTIP as it maps of extracted TTPs to other standardised frameworks including ATT&CK and the Kill Chain.

The project appears to have since been abandoned, the last commit on GitHub was 2 years ago.

Threat Report ATT&CK Mapper (TRAM)

The most developed (and open) tool I found was MITRE’s Threat Report ATT&CK Mapper (TRAM).

TRAM provides a streamlined approach for analysing reports and extracting ATT&CK techniques from reports.

You can find TRAM repository on GitHub here.

Death to the IOC: What’s Next in Threat Intelligence

Another Black Hat talk, this time from 2019; Death to the IOC: What’s Next in Threat Intelligence.

The talk describes the process of building a Cyber Entity Extractor and the evaluation of various models on real world data to do so.

Slides from the talk here.

Making Sense of Unstructured Threat Data

Applies Doc2Vec classification methodology to cluster vulnerability descriptions from the NVD and map clusters to a specific ATT&CK technique.

Other notable research




Join the Signals Corps on Discord

Join our public community of intelligence analysts and researchers sharing new content hourly.


Obstracts

Obstracts

Turn any blog into structured threat intelligence.

Stixify

Stixify. Extract machine readable intelligence from unstructured data.

Extract machine readable intelligence from unstructured data.


Vulmatch

Vulmatch

Know when software you use is vulnerable, how it is being exploited, and how to detect an attack.

SIEM Rules

SIEM Rules. Your detection engineering database.

View, modify, and deploy SIEM rules for threat hunting.