Posted by:

David Greenwood

David Greenwood, Chief of Signal

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.

In this post I will show you the architecture of the hosted version of cve2stix.

I set out with the intention of making cve2stix free (or as close to free as possible) for as long as possible.

What this meant financial frugality and the risks of providing a “free” service were always top of mind.

Database design and data storage

Beyond the initial backfill, the daily processing power required by cve2stix is very low – it’s converting and updating around a hundred entries per day.

Introducing the ability for users to query data via the endpoints adds a little more overhead, but the size of each bundle is very small.

Whilst there’s hundreds-of-thousands of objects, cve2stix minifies the STIX 2.1 Objects it creates meaning the entire dataset as it stands is around 5GBs.

The worry is that there will be thousands of users, but at this point in time it is wishful thinking. Therefore I designed to scale, but wasn’t worried about seeking out marginal performance improvements.

There are two entry points to CVE data via the API;

  1. Get a list of CVEs/CPEs based on some criteria (GET /cve/ or GET /cpe/)
  2. Print objects related to a specific CVE (GET cve/<CVE_ID>), or print a specific CPE (GET cpe/<CPE_URI>)

For the first set of endpoints a relational database with a row for each entry, and the properties the can be used to search on using the API need to be present.

At the time of writing the parameters listed in this post can be used with the GET /cve/ endpoint. This means the following property values need to exist in the database for each parameter;

  • cpeURI: vulnerability.extensions.extension-definition--b2b5f2cd-49e6-4091-a0e0-c0bb71543e23.configurations.nodes.cpeMatch.criteria
  • cpeMatchString: vulnerability.extensions.extension-definition--b2b5f2cd-49e6-4091-a0e0-c0bb71543e23.configurations.nodes.cpeMatch.criteria
  • cweId: vulnerability.extensions.extension-definition--b2b5f2cd-49e6-4091-a0e0-c0bb71543e23.configurations.nodes.cpeMatch.matchCriteriaId
  • keywordSearch: vulnerability.name and vulnerability.description
  • capecID: vulnerability.external_id (where "source_name": "capec")
  • attackID: vulnerability.external_id (where "source_name": "mitre-attack")
  • lastModifiedStartDate: vulnerability.modified
  • lastModifiedEndDate: vulnerability.modified
  • createdStartDate: vulnerability.created
  • createdEndDate: vulnerability.created

We also need to factor in the fields contained in the response to the user which contains the following values;

        {
            "cve": {
                "name": "<VULNERABILITY OBJECT - external_references.external_id>",
                "id": "<VULNERABILITY OBJECT - id>",
                "description": "<VULNERABILITY OBJECT - description>",
                "created": "<VULNERABILITY OBJECT - created>",
                "modified": "<VULNERABILITY OBJECT - modified>"
            }
        },

Carrying out the same exercise for CPE’s;

  • cpeMatchString: software.cpe
  • keywordSearch: software.name

And in the response;

            "product": {
                "name": "<SOFTWARE OBJECT - name>",
                "id": "<SOFTWARE OBJECT - id>",
                "cpe": "<SOFTWARE OBJECT - cpe>"
            }

This database is fairly trivial to store and maintain.

However, one of the considerations was around storage of the actual objects returned by the ID endpoint (GET cve/<CVE_ID> and GET cpe/<CPE_URI>) once the user knows what they want to query.

I really didn’t want to have to manage databases of objects if at all possible, especially when versioning can result in many instances of the same object – it made much more sense to put these in a file-structure and reference the path to the file-structure in the database.

  • cve2stix
    • cve
      • <CVE_ID>
        • bundles
          • <BUNDLE VERSION>
        • objects
          • <OBJECT TYPE>
            • <OBJECT VERSION>
    • cpe
      • <CPE_URI>
        • objects
          • <OBJECT TYPE>
            • <OBJECT VERSION>

As the STIX2 Python library names the bundle and object version using date of generation (see: Building cve2stix: Modelling NVD Data as STIX 2.1 Objects (part 3) it makes it possible to select the latest file by querying the .json filename with largest number in the directory.

One of the simplest ways to handle simple file storage, besides locally, for a web app is using a storage service like AWS S3. Therefore in addition to local storage, cve2stix can be configured to leverage an S3 bucket to store and reference the STIX 2.1 Objects.

The benefit is that the local database can be made aware of the Object reference in S3 on creation. That means using the S3 API we can query a specific object directly, for example, to query a specific bundle;

GET /cve/CVE-2022-29384/bundles/20220217101756888626.json HTTP/1.1
Host: cve2stix.s3.eu-west2.amazonaws.com
Date: Mon, 3 Oct 2022 22:32:00 GMT
Authorization: authorization string

Or a specific object;

GET /cve/CVE-2022-29384/objects/vulnerability 20220217101303144117.json HTTP/1.1
Host: cve2stix.s3.eu-west2.amazonaws.com
Date: Mon, 3 Oct 2022 22:32:00 GMT
Authorization: authorization string

This approach relies on the database storing the latest version of the objects when they’re created so that they can be queries directly like this.

For the CPE_URI endpoint this is trivial as each CPE has one Object linked to it (and that object is never updated). We can simply include the ID object in the CPE table described above, and the latest version of the software object that exists in S3 (or the filesystem).

For the CVE_ID endpoint things get a little more difficult – one CVE can have many objects that can change over time. Here, the Grouping Object mentioned in part 3 comes in useful.

The ID of the Grouping object never changes, it’s only the object_refs inside it which point to the objects related to the CVE, that do. As such, in the CVE table described earlier the Grouping Object can be used as a key to link to a new table that contains a list of Grouping Objects (remember, one for each CVE) with all objects and their versions linked to it for querying the S3 API (or indeed the filesystem).

As such, the full objects can be queried and returned in the API response for both the GET cve/<CVE_ID> and GET cpe/<CPE_URI> endpoints.

Authentication and security

Making the service free meant putting in some basic security controls to avoid abuse. Here’s a current list of security controls

  • Users must agree to a basic TOS around usage, and accept that the cloud service might become chargeable in the future.
  • All request must contain an API key in the header (X-API-Key). You can request an API key here .
  • A simple rate limit of 500 requests per key per day will be enforced. Such a limit should suit most of the use cases identified and for those who run into limits and want to run and host cve2stix themselves.
  • IP rate limits will also be enforced

Try cve2stix now

Hopefully this post has given you a little insight into why I built cve2stix and how you can use it.

To really understand the power of cve2stix, take a look at the user documentation here

cve2stix is available to download on Github here.

I hope you find it useful, and I am always very happy to receive feedback either via Github issues, or directly via our Slack community.




Discuss this post


Signals Corps Slack

Never miss an update


Sign up to receive new articles in your inbox as they published.