Database design

## JSON backend

All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in `db/` and the name should be `<vuln-id>-<vulnerability-name>.json` , where `<vuln-id>` is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the `<vulnerability-name>` is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:
- `1-cross-site-scripting.json`
- `2-sql-injection.json`
- `3-missing-hsts-header.json`

JSON files should look similar to this example:

``` json
{
  "id": 12345,
  "title": "Cross-Site Scripting",
  "description": "A very long description for Cross-Site Scripting",
  "severity": "medium",
  "wasc": ["0003"],
  "tags": ["xss", "client side"],
  "cwe": ["0003", "0007"],
  "owasp_top_10": {
        "2010": [1],
        "2013": [2]
  },
  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },
  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
      {"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
      {"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
    ]
  }
```

Notes about the JSON file:
- "id" holds the vulnerability unique ID. The contents of the "id" field must match filename
- The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
- "fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
- Since a long blob of unformatted text is hard to read for any user, the **description and solution text MUST use markdown for formatting**. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.
## Long strings in JSON

JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:

```
  "description": "A very long description for Cross-Site Scripting",
  "solution": "A very long text explaining how to fix XSS vulnerabilities",
```

So, the parser should check if the `description` and `solution` fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:

```
  "description": ["A very long description for Cross-Site Scripting",
                         " which has more than one line"]
```

Will be parsed as `A very long description for Cross-Site Scripting which has more than one line` and then if the user wants to enter new lines, he needs to do so explicitly:

```
  "description": ["Line1\n",
                         "Line2\n"]
```

Will be parsed as:

```
Line1
Line2
```
## Translations

One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.

Implementation details:
- The English version will be in `db/`
- Unix's environment variable `LANG` is used for setting the language
- If there is no translation for the current `LANG` then the English version is used
- Translations will be in `db/ru/` , `db/es/` , etc.
- The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:

```
  "title": "Cross-Site Scripting",
```

And then the Russian translation file (for the same vulnerability id) says:

```
  "title": "Межсайтовый скриптинг",
```

And if the `LANG` environment variable is set to Russian then the python/ruby/go wrapper should return `Cross-Site Scripting in Russian` as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).
## Content

We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.
## Unittesting
- Unittests must be in the `tests/` directory
- We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
- Assert that the `severity` field is one of `high`, `medium`, `low`, `informational`
- Assert that these fields are present and contain at least 30 chars:
  - description
  - solution
- Assert that these fields are present:
  - id
  - title
  - fix_effort
  - severity
- Assert that all references have `url` and `title` fields, and that they are not empty
- For each value in WASC/OWASP, make sure that we can generate a link like `http://owasp.org/foo/<id>`, the URL is valid, site is online, not 404
- For each URL field in the schema, we need to check that:
  - It's a valid URL
  - The site is online, and no 404 is returned
- `fix_effort` value is in the right format
- The markdown in both description and solution are well formed
- The ID in the file name is the same as the one inside the JSON file
- The lines are not longer than 90 columns
## References

https://github.com/andresriancho/w3af/issues/53


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Database design #5

JSON backend

Long strings in JSON

Translations

Content

Unittesting

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Database design #5

Description

JSON backend

Long strings in JSON

Translations

Content

Unittesting

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions