If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly. Please view the post on signalscorps.com for the full interactive viewing experience.
In this post I will show you the design decisions that went into building the cve2stix API (and why I didn’t choose TAXII).
Any long time follower of this blog will know that I am big fan of open-standards, TAXII being one (if you missed my tutorial from a few years ago, check it out here).
As such, when it came to designing an API for cve2stix, TAXII was my starting point.
Then I took a step back to ask; what do users actually want?
The recently launched v2 of the NVD API has clearly taken in a lot of feedback from v1 users into its new design. For me I saw similar use-cases of cve2stix users;
- retrieve information about a specific CVE using its ID
- e.g. show me everything about CVE-XXXX
- retrieve history about a specific CVE using its ID
- e.g. show all published versions of a CVE
- retrieve CVEs based on a certain match criteria
- e.g. pass a CPE URI for a product they are using and have the API return a list of CVE matches
- e.g. show me all CVEs affected by a certain weakness (CWE
- retrieve information about a product (CPE) using its ID
- e.g. show me all product versions (CPE URIs) for Apple iOS
Choosing a pure TAXII API would have made ingestion of data from down/up-stream tools much simpler – a user could simply plug in their credentials and the product would natively understanding how the TAXII server is structured to access the data. However, for the use-cases above the TAXII API design is not paticularly well suited.
The ability to filter results using default TAXII parameters is very limited, meaning use-cases beyond a simple backfill are not achievable without adding some customisation (which would defeat a large part of choosing TAXII in the first place).
As such, I decided to design the endpoints from scratch and came up with the following;
- Search CVE’s:
GET <HOST>/api/<VERSION>/cve
- Show specific CVE:
GET <HOST>/api/<VERSION>/cve/<CVE_ID>
- Show STIX object versions of a CVE:
GET <HOST>/api/<VERSION>/cve/<CVE_ID>/versions
- Show specific CVE:
- Search CPE’s:
GET <HOST>/api/<VERSION>/cpe
- Show specific CPE:
GET <HOST>/api/<VERSION>/cpe/<CPE_URI>
- Show summary of vendors:
GET <HOST>/api/<VERSION>/cpe/vendors
- Show specific CPE:
- Get user information:
GET <HOST>/api/<VERSION>/users
- Get plan information:
GET <HOST>/api/<VERSION>/plans
Authentication and security
All API request must contain an API key in the header (X-API-Key
) (and requests are rate limited based on plans – see part 6 in 2 weeks for more information).
CVE Endpoint
I decided the granularity of parameters used by NVD was too much for most people, but the following would be critical for the root CVE endpoint for those performing CVE browsing use-cases:
cpeMatchString
[n] (optional): allows for a full or partial CPE URI to be passed. Will return all CVEs that contain it (wether it is vulnerable to the CVE or not). Default, not passed- e.g.
cpe:2.3:a:apple
- e.g.
matchCriteriaId
(optional): allows for a full match criteria ID to be passed. If any CPE in the CVE (vulnerable or not) matches thematchCriteriaId
, that CVE will be returned. Default, not passed- e.g.
A76D2886-66B2-4799-96C8-E00D961A91F7
- e.g.
keywordSearch
: (optional) allows for a string to be passed. The title and description of CVEs will be searched for the string as a wildcard. For example usingkeywordSearch=Windows"
would be considered as the regex*windows*
. Default, not passed- e.g.
windows
- e.g.
cweId
[n] (optional): allows for a CWE ID to be passed. Will return all CVE IDs linked to this CWE. Default, not passed- e.g.
CWE-79
- e.g.
capecID
[n] (optional): allows for a CAPEC ID to be passed. Will return all CVE IDs linked to this CAPEC. Default, not passed- e.g.
CAPEC-509
- e.g.
attackID
[n] (optional): allows for an ATT&CK ID (Technique) to be passed. Will return all CVE IDs linked to this ATT&CK technique. Default, not passed- e.g.
T1583
- e.g.
cvss3ExploitabilityScoreMin
(optional): if set returns only CVE’s with a CVSS v3 Exploitability Score above the value set. Value must be between 0 and 10 (and can be supplied to one decimal place)- e.g.
3.2
- e.g.
cvss3ImpactScoreMin
(optional): if set returns only CVE’s with a CVSS v3 Impact Score above the value set. Value must be between 0 and 10 (and can be supplied to one decimal place)- e.g.
10
- e.g.
lastModifiedStartDate
(optional) (optional): the earliest modified date (in formatYYYY-MM-DDTHH:MM:SS
) of a CVE you want returned. All timestamps are UTC. Default, not passed- e.g.
1988-11-11T05:00:00
- e.g.
lastModifiedEndDate
(optional): the latest modified date of a CVE you want returned. Default, not passed- e.g.
1990-11-11T05:00:00
- e.g.
createdStartDate
(optional): the earliest created date of a CVE you want returned. Default, not passed- e.g.
1988-11-11T05:00:00
- e.g.
createdEndDate
(optional): the earliest created date of a CVE you want returned. Default, not passed- e.g.
1990-11-11T05:00:00
- e.g.
page
(optional): the page you want results for. Default is0
. Max is50
- e.g.
11
- e.g.
metadata
(optional): all responses contain full_metadata
of the request and general information about the response (e.g. pagination). Default istrue
, but user can set to false if they don’t want this data in the response.
Multiple values can be passed for parameters above marked with [n]. In all cases they will be treated with an AND
operator. For example ?capecID=79&capecID=55&attackID=T1583
, would only return CVEs which are linked to these three objects.
The response from this endpoint is designed to be minimal to reduce initial payload size, as follows;
GET <HOST>/api/<VERSION>/cve
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cve",
"parameters": [
"<FILTERING PARAMETERS>"
],
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
"vulnerabilities": [
{
"cve": {
"name": "<VULNERABILITY OBJECT - external_references.external_id>",
"id": "<VULNERABILITY OBJECT - id>",
"description": "<VULNERABILITY OBJECT - description>",
"created": "<VULNERABILITY OBJECT - created>",
"modified": "<VULNERABILITY OBJECT - modified>",
"versions": [
"<AVAILABLE STIX VERSIONS>"
]
}
},
{
"cve": {
"name": "<VULNERABILITY OBJECT - external_references.external_id>",
"id": "<VULNERABILITY OBJECT - id>",
"description": "<VULNERABILITY OBJECT - description>",
"created": "<VULNERABILITY OBJECT - created>",
"modified": "<VULNERABILITY OBJECT - modified>",
"versions": [
"<AVAILABLE STIX BUNDLE VERSIONS>"
],
"cvss3ExploitabilityScore": "<vulnerabilities.cve.metrics.cvssMetricV31.exploitabilityScore>",
"cvss3ImpactScore": "<vulnerabilities.cve.metrics.cvssMetricV31.impactScore>"
}
},
...
]
}
An example response might look like;
{
"_metadata": {
"timestamp": "2023-02-10T07:55:14.340",
"host": "https://www.cve2stix.com",
"version": "api/v1",
"endpoint": "cve",
"parameters": [
"?keywordSearch=*windows*"
],
"page_number": "0",
"page_max_count": 50,
"results_page_count": "1",
"result_total_count": "1",
"links": [
{"self": "https://www.cve2stix.com/api/v1/cve?page=0"},
{"first": "https://www.cve2stix.com/api/v1/cve?page=0"},
{"previous": ""},
{"next": ""},
{"last": "https://www.cve2stix.com/api/v1/cve?page=0"}
]
},
"vulnerabilities": [
{
"cve": {
"name": "CVE-2021-24092",
"id": "vulnerability--d7d9829a-ee93-4bc7-a1a9-027676f6bf7a",
"description": "Microsoft Defender Elevation of Privilege Vulnerability",
"created": "2011-02-25T09:23:23.000",
"modified": "2011-02-25T09:23:23.000",
"versions": [
"20110225092323000"
],
"cvss3ExploitabilityScore": 3.2,
"cvss3ImpactScore": 5.5
}
}
]
}
The results of this endpoint are always in descending order (most recent first) of the CVEs Vulnerability STIX 2.1 Objects modified
date.
The idea is the user first finds the CVE’s they are interested in based on their search. Once they have a list of CVE IDs in the response, they can then pass these individually to the <HOST>/api/<VERSION>/cve/<CVE_ID>
endpoint to get all the STIX Objects related to that CVE.
Though it quickly became clear for this use case, it would be much more performant if there was some level of grouping of STIX Objects that the backend could use to quickly retrieve objects for the specific CVE being queried. That’s because there are almost 200,000 CVE records at the time of writing. Considering the way cve2stix creates STIX objects for each CVE, this means there even more objects that have relationships to each CVE.
As noted earlier, on ingest of a new CVE cve2stix also creates on grouping SDO for every CVE (Vulnerability Object) created.
The Grouping object means that when a user queries a specific CVE, the backend can first search for the right Grouping object (by searching its name
property – which is a CVE ID, e.g. CVE-XXXX-XXXXX
). The Grouping Object contains a list of all STIX Objects (listed in the Grouping Objects object_refs
property) that are linked to that CVE, and thus need to be returned in the response.
In many cases a user will not want all STIX Objects related to the CVE in question, therefore a user can filter by STIX Object type on this endpoint (and/or version);
stixObjects
: by default all STIX objects linked to a matching CVE will be returned in the response. You can also explitly define one or more STIX object types you want returned in the response from the following (default, not passed);cve2stix-all
included (and can be filtered);vulnerability-cve
,indicator-cve
,relationship-indicator-cve
,vulnerability-cwe
,relationship-vulnerability-cwe
,software
,relationship-software
,grouping
,identity-signalscorps
,marking-definition-white
.capec-all
cannot filter on enrichments – all objects always returned if passed.attack-all
cannot filter on enrichments – all objects always returned if passed.
version
: by default, only the latest bundle for the CVE will be returned. However, if the user wants older versions, listed on the CVE list endpoint, they can use this paramater to define the specific bundle version they want returned in the response. A user can use the CVE history endpoint to see all available versions.metadata
(optional): all responses contain full_metadata
of the request and general information about the response. Default istrue
, but user can set to false if they don’t want this data in the response.
For this endpoint, the following response structure is returned;
GET <HOST>/api/<VERSION>/cve/<CVE_ID>
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cve/<CVE_ID>/versions",
"parameters": [
"<PARAMETERS USED>"
],
"name": "<VULNERABILITY OBJECT - external_references.external_id>",
"description": "<VULNERABILITY OBJECT - description>",
"created": "<VULNERABILITY OBJECT - created>",
"modified": "<VULNERABILITY OBJECT - modified>",
"object_count": "<COUNT OF STIX OBJECTS>"
},
"vulnerabilities": [
{
"cve": {
"objects": [
{
"type": "bundle",
"id": "bundle--<UUID OF VULNERABILITY OBJECT>",
"objects": [
"<A LIST OF ALL STIX JSON OBJECTS FOR CVE>"
]
}
]
}
}
]
}
Only one page is ever returned for this response that contains all STIX Objects for the CVE (including the Grouping and all enrichments) nested under objects
.
In order to see what versions of the CVE exists, should a user not want the latest version, the versions endpoint can be used.
GET <HOST>/api/<VERSION>/cve/<CVE_ID>/versions
This endpoint takes the following parameters;
page
(optional): the page you want results for. Default is0
. Max is50
- e.g.
11
- e.g.
metadata
(optional): all responses contain full_metadata
of the request and general information about the response (e.g. pagination). Default istrue
, but user can set to false if they don’t want this data in the response.
An example response looks as follows;
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cve/<CVE_ID>/versions",
"parameters": [
"<PARAMETERS USED>"
],
"name": "<VULNERABILITY OBJECT - external_references.external_id>",
"description": "<VULNERABILITY OBJECT - description>",
"created": "<VULNERABILITY OBJECT - created>",
"modified": "<VULNERABILITY OBJECT - modified>",
"version_count": "<COUNT OF VERSIONS>",
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
},
"vulnerabilities": [
{
"cve": {
"versions": [
"<LIST OF STIX VERSIONS FOR CVE VULNERABILITY OBJECT>"
]
}
}
]
}
CPE Endpoint
Using the CPE endpoint, it is possible to filter on CPEs (using the cpeMatchString
), however, this returns CPEs linked to CVEs. Whilst this returns the software object desired, it’s also very inefficient if all that’s wanted is the software object alone to fulfil the use-case of learning more about the product being searched.
As such, the CPE endpoint has similar parameters to the CVE endpoint, but is designed to only return information about CPEs.
cpeMatchString
[n] (optional): allows for a full or partial CPE URI to be passed. Will return all CVEs that contain it (wether it is vulnerable to the CVE or not). Default, not passed- e.g.
cpe:2.3:a:apple
- e.g.
keywordSearch
(optional): allows for a string to be passed. Thename
of the software objects will be searched for the string as a wildcard. For example usingkeywordSearch=Windows"
would be considered as the regex*windows*
. Default, not passed- e.g.
windows
- e.g.
page
(optional): the page you want results for. Default is0
. Max is50
- e.g.
11
- e.g.
metadata
(optional): all responses contain full_metadata
of the request and general information about the response (e.g. pagination). Default istrue
, but user can set to false if they don’t want this data in the response.
Note, STIX Software objects do not contain any date properties. To search on date here, cve2stix uses the times captured in the extension of the Software object returned in the NVD API response.
Like the CVE endpoint, the response from the root CPE endpoint is designed to be minimal to reduce initial payload size, as follows;
GET <HOST>/api/<VERSION>/cpe
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cpe",
"parameters": [
"<PARAMETERS USED N>"
],
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
"products": [
{
"product": {
"name": "<SOFTWARE OBJECT - name>",
"id": "<SOFTWARE OBJECT - id>",
"cpe": "<SOFTWARE OBJECT - cpe>",
"created": "<SOFTWARE OBJECT - created>",
"modified": "<SOFTWARE OBJECT - modified>"
}
},
{
"product": {
"name": "<SOFTWARE OBJECT - name>",
"id": "<SOFTWARE OBJECT - id>",
"cpe": "<SOFTWARE OBJECT - cpe>",
"created": "<SOFTWARE OBJECT - created>",
"modified": "<SOFTWARE OBJECT - modified>"
}
}
]
}
An example response might look like;
{
"_metadata": {
"timestamp": "2023-02-10T07:55:14.340",
"host": "https://www.cve2stix.com",
"version": "api/v1",
"endpoint": "cpe",
"parameters": [
"?keywordSearch=*windows*"
],
"page_number": "0",
"page_max_count": 50,
"results_page_count": "1",
"result_total_count": "1",
"links": [
{"self": "https://www.cve2stix.com/api/v1/cpe?page=0"},
{"first": "https://www.cve2stix.com/api/v1/cpe?page=0"},
{"previous": ""},
{"next": ""},
{"last": "https://www.cve2stix.com/api/v1/cpe?page=0"}
]
},
"products": [
{
"product": {
"name": "Microsoft Windows",
"id": "software--aa027d2d-697a-4390-a0b7-d69a1a2bbc6e",
"cpe": "cpe:2.3:o:microsoft:windows:-:*:*:*:*:*:*:*",
"created": "2011-02-25T09:23:23.000",
"modified": "2011-02-25T09:23:23.000"
}
}
]
}
The results of this endpoint are always in descending order (most recent first) of the product Software STIX 2.1 Objects modified
date.
Once the CPE URI is known, the individual software object can be returned like so;
GET <HOST>/api/<VERSION>/cpe/<CPE_URI>
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cpe/<CPE_URI>",
"name": "<SOFTWARE OBJECT - name>",
"cpe": "<SOFTWARE OBJECT - cpe>",
"created": "<SOFTWARE OBJECT - created>",
"modified": "<SOFTWARE OBJECT - modified>"
},
"product": [
{"SOFTWARE OBJECT FOR CPE URI"}
]
}
This endpoint will only ever print one STIX Software Object for a CPE URI.
To help discovery of software products (and avoid lots of request to the API to do so) for further requests, it is also possible for users to get a summary of the software vendors listed in the CPE dictionary;
GET <HOST>/api/<VERSION>/cpe/vendors
To support autocomplete actions this API has the following parameter:
keywordSearch
(optional): allows for a string to be passed. Thename
of the vendor will be searched for the string as a wildcard. For example usingkeywordSearch=Micro"
would be considered as the regex*micro*
. Default, not passed- e.g.
microsoft
- e.g.
page
(optional): the page you want results for. Default is0
. Max is50
- e.g.
11
- e.g.
metadata
(optional): all responses contain full_metadata
of the request and general information about the response (e.g. pagination). Default istrue
, but user can set to false if they don’t want this data in the response.
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "cpe/vendors",
"parameters": [
"<PARAMETERS USED N>"
],
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
"vendors": [
{
"vendor": {
"name": "<VENDOR NAME>",
"products": {
"a": [
"<LIST OF APPLICATION TYPE PRODUCT NAMES FOR VENDOR>"
],
"o": [
"<LIST OF OS TYPE PRODUCT NAMES FOR VENDOR>"
],
"h": [
"<LIST OF HARDWARE TYPE PRODUCT NAME FOR VENDOR>"
]
}
},
"vendor": {
"name": "<VENDOR NAME>",
"products": {
"a": [
"<LIST OF APPLICATION TYPE PRODUCT NAMES FOR VENDOR>"
],
"o": [
"<LIST OF OS TYPE PRODUCT NAMES FOR VENDOR>"
],
"h": [
"<LIST OF HARDWARE TYPE PRODUCT NAME FOR VENDOR>"
]
}
}
}
]
}
An example response might look like;
{
"_metadata": {
"timestamp": "2023-02-10T07:55:14.340",
"host": "https://www.cve2stix.com",
"version": "api/v1",
"endpoint": "cpe/vendor",
"parameters": [
"?page=1"
],
"page_number": "0",
"page_max_count": 50,
"results_page_count": "1",
"result_total_count": "1",
"links": [
{"self": "https://www.cve2stix.com/api/v1/cpe/vendor?page=0"},
{"first": "https://www.cve2stix.com/api/v1/cpe/vendor?page=0"},
{"previous": ""},
{"next": ""},
{"last": "https://www.cve2stix.com/api/v1/cpe/vendor?page=0"}
]
},
"vendors": [
{
"vendor": {
"name": "apple",
"products": {
"a": [
"garageband"
],
"o": [
"OSX"
],
"h": [
"iphone",
"macbook"
]
}
},
"vendor": {
"name": "microsoft",
"products": {
"a": [
"word",
"excel"
],
"o": [
"windows XP"
],
"h": [
"keyboard"
]
}
}
}
]
}
The response of this endpoint is sorted in numberical then alphabetical order (0-9 then a-z).
User info endpoint
This helps users track daily requests in line with their plan or staff (and UI) to lookup a user.
Non staff users can only ever see their own account (always one result returned). Staff users (with correct API permissions) can also use this endpoint to search and track users (using the id
parameter)
GET <HOST>/api/<VERSION>/users
The following parameters are used on this endpoint.
id
(optional): the ID of the user. Note, if not a staff user making the request, this must be the user ID associated with the API key in the request, else it will return an unautorized response. Default, not passed.metadata
(optional): all responses contain full_metadata
of the request and general information about the response. Default istrue
, but user can set to false if they don’t want this data in the response.
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "users",
"parameters": [
"<PARAMETERS USED N>"
],
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
"users": [
{
"user": {
"id": "<USER ID>",
"email": "<USER EMAIL>",
"registration_time": "<DATETIME>",
"last_ui_login_time": "<DATETIME>",
"sso_account": "<SSO PROVIDER, IF SSO, ELSE USERPASS>",
"last_api_request_time": "<DATETIME>",
"total_api_requests_day": "<COUNT OF DAILY REQUESTS>",
"plan_id": "<PLAN ID>",
}
}
]
}
Plans endpoint
GET <HOST>/api/<VERSION>/plans
The following parameters are used on this endpoint.
id
(optional): the ID of the plan. Note, if not a staff user making the request, this must be a plan visible to the user.metadata
(optional): all responses contain full_metadata
of the request and general information about the response. Default istrue
, but user can set to false if they don’t want this data in the response.
Note, for plans that are custom or archived, only staff users can see all of them. For users on custom or archived plans, in addition to public plans, they can only see the custom or archived plans they are subscribed too.
{
"_metadata": {
"timestamp": "<TIMESTAMP OF REQUEST>",
"host": "<CVE2STIX HOST>",
"version": "<API VERSION>",
"endpoint": "plans",
"parameters": [
"<PARAMETERS USED N>"
],
"page_number": "<CURRENT PAGE, 0 RATED>",
"page_max_count": 50,
"results_page_count": "<COUNT OF RESULTS ON THIS PAGE>",
"result_total_count": "<TOTAL COUNT OF RESULTS ON ALL PAGES>",
"links": [
{"self": "<PATH>"},
{"first": "<PATH>"},
{"previous": "<PATH>"},
{"next": "<PATH>"},
{"last": "<PATH>"}
]
},
},
"plans": [
{
"plan": {
"id": "<USER ID>",
"name": "<PLAN NAME>",
"description": "<PLAN DESCRIPTION>",
"created": "<DATETIME>",
"state": "<PUBLIC/PRIVATE/CUSTOM>",
"daily_api_request_limit": "<DAILY API LIMIT QUOTA>",
"monthly_cost_usd": "<DAILY API LIMIT QUOTA>",
"ui_stix_download": "<BOOLEAN>",
}
}
]
}
Error responses
The API (everything under /api
provides detailed responses of errors in JSON format.
Invalid request (400)
This is the most common error returned. cve2stix will throw this response if any part of the request is invalid, and will not action the request until it is corrected to be 100% valid.
Generally this error is returned if:
- the parameters used in the request are do not exist or an invalid combination of parameters are being used (e.g. a parameter that should only be used once is incorrectly being passed twice)
- the values for the parameters used in the request are invalid
{
"_metadata": {
"METADATA FOR ENDPOINT"
},
"errors": [
{
"status": 400,
"title": "<GENERIC TYPE OF ERROR>",
"detail": "<DETAIL ON WHAT EXACTLY FAILED>"
}
]
}
Note, this error will be returned on the first error identified. To explain this in more detail, lets assume a request has two errors;
- the
page
parameter is used twice (only allowed once) - the format of
createdEndDate
is incorrect (must be in formatYYYY-MM-DDTHH:MM:SS
)
The request looks like this;
GET ?page=2&page=3&createdEndDate=bad_value
As the page
error is detected first by the server, the error response will be;
{
"_metadata": {
"timestamp": "2023-02-10T07:55:14.340",
"host": "https://www.cve2stix.com",
"version": "api/v1",
"endpoint": "cve",
"parameters": [
"?page=2&page=3&createdEndDate=bad_value"
],
"page_number": "",
"page_max_count": "",
"results_page_count": "",
"result_total_count": "",
"links": [
{"self": ""},
{"first": ""},
{"previous": ""},
{"next": ""},
{"last": ""}
]
},
"errors": [
{
"status": 400,
"title": "Too many parameters",
"detail": "page should only be passed once."
}
]
}
Hopefully the user will then correct this mistake, and will then be faced with the createdEndDate
error on the next request. Ultimately, this is annoying but allows the user to see very clearly what each error is to help with debugging.
Unauthorized request (401)
If the API key used to make the request is invalid or does not have the permissions to view what they are requesting a 401 will be returned.
{
"_metadata": {
"METADATA FOR ENDPOINT"
},
"errors": [
{
"status": 401,
"title": "Unauthorized",
"detail": "Authentication failed"
}
]
}
Unauthorized request (500)
When the file2stix server is experiencing issues, a 500 will be returned. A generic “reply later” message will be returned.
{
"_metadata": {
"METADATA FOR ENDPOINT"
},
"errors": [
{
"status": 500,
"title": "Internal Server Error",
"detail": "The server returned a 500 Internal Server Error. Please try again in 10 minutes."
}
]
}
Database design and data storage
Now I knew the request and response design of the API, I could efficiently design a database/data-store structure for data to be retrieved with each request.
Beyond the initial backfill, the daily processing power required by cve2stix is very low – it’s converting and updating around a hundred entries per day.
Introducing the ability for users to query data via the endpoints adds a little more overhead, but the size of each bundle is very small.
It’s first important to point out how cve2stix stores data.
On setup a user can select to use local storage or AWS S3. The Objects generated by the STIX2 Library are stored in a structure manner in a local database (used by the API) and as full raw STIX objects in the filesystem or remotely on S3 (for easier manual lookup).
Using the STIX2 Python Library, cve2stix stores a copy of all STIX objects it creates;
stix_data
cve
<CVE_ID>
bundles
<BUNDLE VERSION>.json
objects
<OBJECT TYPE>
<OBJECT VERSION>.json
cpe
<CPE_URI>
objects
<OBJECT TYPE>
<OBJECT VERSION>.json
Whilst there are hundreds-of-thousands of CVE/CPE objects, cve2stix minifies the STIX 2.1 Objects it creates – the entire dataset as it stands is around 5GBs.
The worry for me is that there will be thousands of queries against the API affecting performance, but at this point in time it is wishful thinking. Therefore I designed to scale, but wasn’t worried about seeking out marginal performance improvements.
There are two entry points to CVE/CPE data via the API:
- Get a list of CVEs/CPEs based on some criteria (
GET /cve/
orGET /cpe/
) - Print objects related to a specific CVE (
GET cve/<CVE_ID>
), or print a specific CPE (GET cpe/<CPE_URI>
)
In the case of CVEs, the CVE object database table can be used efficiently to query this data. I decided not to use a summarised table for the lookup as there are a number of parameters present on the CVE and data in the response that can simply be pulled directly from this table.
For the GET /cpe/
a summary table of vendors and products is used. I wanted to created a summary table in the DB for this data, as although it changes with every update, it is fairly static. For the the GET cpe/<CPE_URI>
endpoint, the CPE object database table is used directly, as is done for CVEs.
The database tables for these are fairly trivial to store and maintain.
As the STIX2 Python library names the bundle and object version using date of generation (see: Building cve2stix: Modelling NVD Data as STIX 2.1 Objects (part 3). This means many versions of the same object exist in the DBs but a user will typically only want the latest. Using the date of generation in the filename makes it possible to easily select the latest file by querying the .json
filename with largest number in the directory (or a specific other version by date).
As noted earlier, a copy of all the json STIX objects is also stored in addition to the DB object records.
One of the simplest ways to handle simple file storage, besides locally, for a web app is using a cloud storage service like AWS S3. Therefore in addition to local storage, cve2stix can be configured to leverage an S3 bucket to store and reference the STIX 2.1 Objects (in addition to the local filestore).
The benefit is that the local database can be made aware of the Object reference in S3 on creation. That means using the S3 API we can query a specific object directly, for example, to query a specific bundle;
GET /stix_data/cve/CVE-2022-29384/bundles/20220217101756888626.json HTTP/1.1
Host: cve2stix.s3.eu-west2.amazonaws.com
Date: Mon, 3 Oct 2022 22:32:00 GMT
Authorization: authorization string
Or a specific object;
GET /stix_data/cve/CVE-2022-29384/objects/vulnerability/20220217101303144117.json HTTP/1.1
Host: cve2stix.s3.eu-west2.amazonaws.com
Date: Mon, 3 Oct 2022 22:32:00 GMT
Authorization: authorization string
Whilst this is not used by the cve2stix API to return objects, having the files stored remotely (and the object URLs referenced in DB tables, as they would be if local filestore was set) means it’s easier for users using cloud services to create a backup of objects away from the local machine.
Next up: Creating a hosted version of cve2stix
In many cases users won’t want to install and maintain their own copy of cve2stix.
For these users we decided to host our own version of cve2stix with a complete backfill of data, including the first batch of CVE’s ever published (for posterity more than anything, assuming cve2stix is not lost to time like many software products!).
Next time I’ll show how we designed a web app on-top of the existing cve2stix features to make it as easy as possible for users (and us) to manage as a service.
Discuss this post

Never miss an update
Sign up to receive new articles in your inbox as they published.