So that is can be used in SOC detection workflows.
Security operations have become increasingly proficient at using structured threat intelligence to enrich alerts, and accelerate investigation and threat hunting workflows.
Threat intelligence platforms have largely automated the process of collecting, extracting and normalizing intel from structured data sources.
However, a significant amount of intelligence available today is still shared through blogs, advisories and research articles, which requires further processing to make it machine readable (and usable by SIEM and SOAR tools).
The problem with relying on such logic is that reports are often very descriptive. For example, different authors describe the same behaviours and techniques with differing language.
Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
There is a lot of academic research into natural language processing (NLP). In other fields this research has been widely commercialised, and we’re now starting to see it being adopted in the field of threat intelligence.
I’ve recently been doing a bit of research into open tools that turn unstructured intelligence into a structured format, like STIX 2.1, using NLP.
Unstructured Threat Intelligence Processing Tool (UTIP)
One of the first fully formed project I found was Unstructured Threat Intelligence Processing Tool (UTIP).
UTIP appears to be built by a team at Accenture and was presented at Black Hat in 2015, though I couldn’t find the session listed publicly, nor find the recording).
From the slides you can see information extraction is done on uploaded reports. The logic for extraction (and confidence in extraction) comes from an NLP component (sadly, not much I can deduce from the slides about this) before the data is packaged into STIX 1.0.
In 2017 a conference paper is released with the title; TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources.
TTP builds on UTIP as it maps of extracted TTPs to other standardised frameworks including ATT&CK and the Kill Chain.
The project appears to have since been abandoned, the last commit on GitHub was 2 years ago.
Threat Report ATT&CK Mapper (TRAM)
The most developed (and open) tool I found was MITRE’s Threat Report ATT&CK Mapper (TRAM).
TRAM provides a streamlined approach for analysing reports and extracting ATT&CK techniques from reports.
Death to the IOC: What’s Next in Threat Intelligence
Another Black Hat talk, this time from 2019; Death to the IOC: What’s Next in Threat Intelligence.
The talk describes the process of building a Cyber Entity Extractor and the evaluation of various models on real world data to do so.
Making Sense of Unstructured Threat Data
Applies Doc2Vec classification methodology to cluster vulnerability descriptions from the NVD and map clusters to a specific ATT&CK technique.
Other notable research
- Semi-Automated Information Extraction from Unstructured Threat Advisories (2017)
- A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources (2018)
- Threat Action Extraction using Information Retrieval (2021)
Join the Signals Corps on Discord
Join our public community of intelligence analysts and researchers sharing new content hourly.
Turn any blog into structured threat intelligence.
Extract machine readable intelligence from unstructured data.
Know when software you use is vulnerable, how it is being exploited, and how to detect an attack.
View, modify, and deploy SIEM rules for threat hunting.