Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you the design decisions that went into building the cve2stix API (and why I didn’t choose TAXII).

Any long time follower of this blog will know that I am big fan of open-standards, TAXII being one (if you missed my tutorial from a few years ago, check it out here).

As such, when it came to designing an API for cve2stix, TAXII was my starting point.

Then I took a step back to ask; what do users actually want?

The recently launched v2 of the NVD API has clearly taken in a lot of feedback from v1 users into its new design. For me I saw similar use-cases of cve2stix users;

  • retrieve information about a specific CVE using its ID
    • e.g. show me everything about CVE-XXXX
  • retrieve CVEs based on a certain match criteria
    • e.g. pass a CPE URI for a product they are using and have the API return a list of CVE matches
    • e.g. show me all CVEs affected by a certain weakness (CWE
  • retrieve information about a product (CPE) using its ID
    • e.g. show me all product versions (CPE URIs) for Apple iOS

Choosing a pure TAXII API would have made ingestion of data from down/up-stream tools much simpler – a user could simply plug in their credentials and the product would natively understanding how the TAXII server is structured to access the data. However, for the use-cases above the TAXII API design is not paticularly well suited.

The ability to filter results using default TAXII parameters is very limited, meaning use-cases beyond a simple backfill are not achievable without adding some customisation (which would defeat a large part of choosing TAXII in the first place).

As such, I decided to design the endpoints from scratch and came up with the following;

  • <HOST>/api/<VERSION>/cve
    • <HOST>/api/<VERSION>/cve/<CVE_ID>

CVE Endpoint

I decided the granularity of parameters used by NVD was too much for most people, but the following would be critical for the root CVE endpoint for those performing CVE browsing use-cases:

  • cpeMatchString [n]: allows for a full or partial CPE URI to be passed. Will return all CVEs that contain it (wether it is vulnerable to the CVE or not)
    • e.g. `cpe:2.3:a:apple
  • cweId [n]: allows for a CWE ID to be passed. Will return all CVE IDs linked to this CWE
    • e.g. CWE-79
  • capecID [n]: allows for a CAPEC ID to be passed. Will return all CVE IDs linked to this CAPEC
    • e.g. CAPEC-509
  • attackID [n]: allows for an ATT&CK ID (Technique) to be passed. Will return all CVE IDs linked to this ATT&CK technique
    • e.g. T1583
  • keywordSearch: allows for a string to be passed. The title and description of CVEs will be searched for the string as a wildcard. For example using keywordSearch=Windows" would be considered as the regex *windows*.
    • e.g. windows
  • lastModifiedStartDate: the earliest modified date (in format YYYY-MM-DDTHHLMM:SS of a CVE you want returned. All timestamps are UTC.
    • e.g. 1988-11-11T05:00:00
  • lastModifiedEndDate: the latest modified date of a CVE you want returned
  • createdStartDate: the earliest created date of a CVE you want returned
  • createdEndDate: the earliest created date of a CVE you want returned
  • page: the page you want results for.
    • e.g. 11

Multiple values can be passed for parameters above marked with [n]. In all cases they will be treated with an AND operator. For example ?capecID=79&capecID=55&attackID=T1583, would only return CVEs which are linked to these three objects.

The response from this endpoint is designed to be minimal to reduce initial payload size, as follows;

{
    "_metadata": {
        "timestamp": "<TIMESTAMP OF REQUEST>",
        "parameters": "<FILTERING PARAMETERS>",
        "page_number": "<CURRENT PAGE, 0 RATED>",
        "results_per_page": "<ALWAYS 50>",
        "cve_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
        "total_cve_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
        "links": [
            {"self": "<PATH>"},
            {"first": "<PATH>"},
            {"previous": "<PATH>"},
            {"next": "<PATH>"},
            {"last": "<PATH>"}
        ]
    },
    "vulnerabilities": [
        {
            "cve": {
                "id": "<VULNERABILITY OBJECT - external_references.external_id>",
                "description": "<VULNERABILITY OBJECT - description>",
                "created": "<VULNERABILITY OBJECT - created>",
                "modified": "<VULNERABILITY OBJECT - modified>"
            }
        },
        {
            "cve": {
                "id": "<VULNERABILITY OBJECT - external_references.external_id>",
                "description": "<VULNERABILITY OBJECT - description>",
                "created": "<VULNERABILITY OBJECT - created>",
                "modified": "<VULNERABILITY OBJECT - modified>"
            }
        },
        ...
    ]
}

cve2stix API responses are always sorted by newest pubDate datetime.

The idea is the user first finds the CVE’s they are interested in based on their search. Once they have a list of CVE IDs in the response, they can then pass these individually to the <HOST>/api/<VERSION>/cve/<CVE_ID> endpoint to get all the STIX Objects related to that CVE.

Though it quickly became clear for this use case, it would be much more performant if there was some level of grouping of STIX Objects that the backend could use to quickly retrieve objects for the specific CVE being queried. That’s because there are almost 200,000 CVE records at the time of writing. Considering the way cve2stix creates STIX objects for each CVE, this means there even more objects that have relationships to each CVE.

STIX Grouping SDOs are perfect for being able to quickly return a list of all objects linked to a specific CVE.

On ingest of a new CVE, cve2stix also creates on grouping SDO for every CVE (Vulnerability Object) created, as follows;

{
    "type": "grouping",
    "spec_version": "2.1",
    "id": "grouping--<GENERATED BY STIX2 LIBRARY>",
    "created_by_ref": "identity--<CVE2STIX IDENTITY ID>",
    "created": "<vulnerabilities.cve.published>",
    "modified": "<vulnerabilities.cve.lastModifiedDate>",
    "name": "<vulnerabilities.cve.id>",
    "object_refs": [
        "<IDS OF ALL OBJECTS RELATED TO CVE INCLUDING RELATIONSHIPS>"
    ]
}

The Grouping object means that when a user queries a specific CVE, the backend can first search for the right Grouping object (by searching its name property). The Grouping Object contains a list of all STIX Objects (listed in the Grouping Objects object_refs property) that are linked to that CVE, and thus need to be returned in the response.

For this endpoint, the following response structure is returned;

{
    "_metadata": {
        "timestamp": "<TIMESTAMP OF REQUEST>",
        "id": "<VULNERABILITY OBJECT - external_references.external_id>",
        "description": "<VULNERABILITY OBJECT - description>",
        "created": "<VULNERABILITY OBJECT - created>",
        "modified": "<VULNERABILITY OBJECT - modified>"
    },
    "objects": [
        "A LIST OF ALL STIX JSON OBJECTS FOR CVE"
    ]
}

Only one page is ever returned for this response that contains all STIX Objects for the CVE (including the Grouping and all enrichments) nested under objects.

Authentication

I also wanted to make the service free, but to put some basic security controls in-front of it to try and avoid abuse.

All request must contain an API key in the header (X-API-Key).

There is also a simple rate limit of 500 requests per hour which should suit most of the use cases defined.

For those who run into limits and want to run cve2stix themselves, the code is completely open source under an MIT license here.

Try cve2stix now

Hopefully this post has given you a little insight into why I built cve2stix and how you can use it.

To really understand the power of file2stix, take a look at the user documentation here

cve2stix is available to download on Github here.

I hope you find it useful, and I am always very happy to receive feedback either via Github issues, or directly via our Slack community.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.