Customizing Library Models for JavaScript¶

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

JavaScript analysis can be customized by adding library models in data extension files.

A data extension for JavaScript is a YAML file of the form:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: <name of extensible predicate>
    data:
      - <tuple1>
      - <tuple2>
      - ...

The CodeQL library for JavaScript exposes the following extensible predicates:

sourceModel(type, path, kind)
sinkModel(type, path, kind)
typeModel(type1, type2, path)
summaryModel(type, path, input, output, kind)
barrierModel(type, path, kind)
barrierGuardModel(type, path, acceptingValue, kind)

We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.

Example: Taint sink in the ‘execa’ package¶

In this example, we’ll show how to add the following argument, passed to execa, as a command-line injection sink:

import { shell } from "execa";
shell(cmd); // <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sinkModel
    data:
      - ["execa", "Member[shell].Argument[0]", "command-injection"]

The first column, "execa", identifies a set of values from which to begin the search for the sink. The string "execa" means we start at the places where the codebase imports the NPM package execa.
The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.
- Member[shell] selects accesses to the shell member of the execa package.
- Argument[0] selects the first argument to calls to that member.
command-injection indicates that this is considered a sink for the command injection query.

Example: Taint sources from window ‘message’ events¶

In this example, we’ll show how the event.data expression below could be marked as a remote flow source:

window.addEventListener("message", function (event) {
  let data = event.data; // <-- add 'event.data' as a taint source
});

Note that this source is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sourceModel
    data:
      - [
          "global",
          "Member[addEventListener].Argument[1].Parameter[0].Member[data]",
          "remote",
        ]

The first column, "global", begins the search at references to the global object (also known as window in browser contexts). This is a special JavaScript object that contains all global variables and methods.
Member[addEventListener] selects accesses to the addEventListener member.
Argument[1] selects the second argument of calls to that member (the argument containing the callback).
Parameter[0] selects the first parameter of the callback (the parameter named event).
Member[data] selects accesses to the data property of the event object.
Finally, the kind remote indicates that this is considered a source of remote flow.

In the next section, we’ll show how to restrict the model to recognize events of a specific type.

Continued example: Restricting the event type¶

The model above treats all events as sources of remote flow, not just message events. For example, it would also pick up this irrelevant source:

window.addEventListener("onclick", function (event) {
  let data = event.data; // <-- 'event.data' became a spurious taint source
});

We can refine the model by adding the WithStringArgument component to restrict the set of calls being considered:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sourceModel
    data:
      - [
          "global",
          "Member[addEventListener].WithStringArgument[0=message].Argument[1].Parameter[0].Member[data]",
          "remote",
        ]

The WithStringArgument[0=message] component here selects the subset of calls to addEventListener where the first argument is a string literal with the value "message".

Example: Using types to add MySQL injection sinks¶

In this example, we’ll show how to add the following SQL injection sink:

import { Connection } from "mysql";

function submit(connection: Connection, q: string) {
  connection.query(q); // <-- add 'q' as a SQL injection sink
}

We need to add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sinkModel
    data:
      - ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]

The first column, "mysql.Connection", begins the search at any expression whose value is known to be an instance of the Connection type from the mysql package. This will select the connection parameter above because of its type annotation.
Member[query] selects the query member from the connection object.
Argument[0] selects the first argument of a call to that member.
sql-injection indicates that this is considered a sink for the SQL injection query.

This works in this example because the connection parameter has a type annotation that matches what the model is looking for.

Note that there is a significant difference between the following two rows:

data:
- ["mysql.Connection", "", ...]
- ["mysql", "Member[Connection]", ...]

The first row matches instances of mysql.Connection, which are objects that encapsulate a MySQL connection. The second row would match something like require('mysql').Connection, which is not itself a connection object.

In the next section, we’ll show how to generalize the model to handle the absence of type annotations.

Continued example: Dealing with untyped code¶

Suppose we want the model from above to detect the sink in this snippet:

import { getConnection } from "@example/db";
let connection = getConnection();
connection.query(q); // <-- add 'q' as a SQL injection sink

There is no type annotation on connection, and there is no indication of what getConnection() returns. By adding a tuple to the typeModel(type1, type2, path) extensible predicate we can tell our model that this function returns an instance of mysql.Connection:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: typeModel
    data:
      - ["mysql.Connection", "@example/db", "Member[getConnection].ReturnValue"]

The first column, "mysql.Connection", names the type that we’re adding a new definition for.
The second column, "@example/db", begins the search at imports of the hypothetical NPM package @example/db.
Member[getConnection] selects references to the getConnection member from that package.
ReturnValue selects the return value from a call to that member.

The new model states that the return value of getConnection() has type mysql.Connection. Combining this with the sink model we added earlier, the sink in the example is detected by the model.

The mechanism used here is how library models work for both TypeScript and plain JavaScript. A good library model contains typeModel tuples to ensure it works even in codebases without type annotations. For example, the mysql model that is included with the CodeQL JS analysis includes this type definition (among many others):

- ["mysql.Connection", "mysql", "Member[createConnection].ReturnValue"]

Example: Using fuzzy models to simplify modeling¶

In this example, we’ll show how to add the following SQL injection sink using a “fuzzy” model:

import * as mysql from 'mysql';
const pool = mysql.createPool({...});
pool.getConnection((err, conn) => {
  conn.query(q, (err, rows) => {...}); // <-- add 'q' as a SQL injection sink
});

We need to add a tuple for a fuzzy model to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sinkModel
    data:
      - ["mysql", "Fuzzy.Member[query].Argument[0]", "sql-injection"]

The first column, "mysql", begins the search at places where the mysql package is imported.
Fuzzy selects all objects that appear to originate from the mysql package, such as the pool, conn, err, and rows objects.
Member[query] selects the query member from any of those objects. In this case, the only such member is conn.query. In principle, this would also find expressions such as pool.query and err.query, but in practice such expressions are not likely to occur, because the pool and err objects do not have a member named query.
Argument[0] selects the first argument of a call to the selected member, that is, the q argument to conn.query.
sql-injection indicates that this is considered as a sink for the SQL injection query.

For reference, a more detailed model might look like this, as described in the preceding examples:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sinkModel
    data:
      - ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]

  - addsTo:
      pack: codeql/javascript-all
      extensible: typeModel
    data:
      - ["mysql.Pool", "mysql", "Member[createPool].ReturnValue"]
      - ["mysql.Connection", "mysql.Pool", "Member[getConnection].Argument[0].Parameter[1]"]

The model using the Fuzzy component is simpler, at the cost of being approximate. This technique is useful when modeling a large or complex library, where it is difficult to write a detailed model.

Example: Adding flow through ‘decodeURIComponent’¶

In this example, we’ll show how to add flow through calls to decodeURIComponent:

let y = decodeURIComponent(x); // add taint flow from 'x' to 'y'

Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: summaryModel
    data:
      - [
          "global",
          "Member[decodeURIComponent]",
          "Argument[0]",
          "ReturnValue",
          "taint",
        ]

The first column, "global", begins the search for relevant calls at references to the global object. In JavaScript, global variables are properties of the global object, so this lets us access global variables or functions.
The second column, Member[decodeURIComponent], is a path leading to the function calls we wish to model. In this case, we select references to the decodeURIComponent member from the global object, that is, the global variable named decodeURIComponent.
The third column, Argument[0], indicates the input of the flow. In this case, the first argument to the function call.
The fourth column, ReturnValue, indicates the output of the flow. In this case, the return value of the function call.
The last column, taint, indicates the kind of flow to add. The value taint means the output is not necessarily equal to the input, but was derived from the input in a taint-preserving way.

Example: Adding flow through ‘underscore.forEach’¶

In this example, we’ll show how to add flow through calls to forEach from the underscore package:

require('underscore').forEach([x, y], (v) => { ... }); // add value flow from 'x' and 'y' to 'v'

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: summaryModel
    data:
      - [
          "underscore",
          "Member[forEach]",
          "Argument[0].ArrayElement",
          "Argument[1].Parameter[0]",
          "value",
        ]

The first column, "underscore", begins the search for relevant calls at places where the underscore package is imported.
The second column, Member[forEach], selects references to the forEach member from the underscore package.
The third column specifies the input of the flow:
- Argument[0] selects the first argument of forEach, which is the array being iterated over.
- ArrayElement selects the elements of that array (the expressions x and y).
The fourth column specifies the output of the flow:
- Argument[1] selects the second argument of forEach (the argument containing the callback function).
- Parameter[0] selects the first parameter of the callback function (the parameter named v).
The last column, value, indicates the kind of flow to add. The value value means the input value is unchanged as it flows to the output.

Example: Modeling properties injected by a middleware function¶

In this example, we’ll show how to model a hypothetical middleware function that adds a tainted value on the incoming request objects:

const express = require('express')
const app = express()

app.use(require('@example/middleware').injectData())

app.get('/foo', (req, res) => {
  req.data; // <-- mark 'req.data' as a taint source
});

We need to add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sourceModel
    data:
      - [
          "@example/middleware",
          "Member[injectData].ReturnValue.GuardedRouteHandler.Parameter[0].Member[data]",
          "remote",
        ]

The first column, "@example/middleware", begins the search at imports of the hypothetical NPM package @example/middleware.
Member[injectData] selects accesses to the injectData member.
ReturnValue selects the return value of the call to injectData.
GuardedRouteHandler interprets the current value as a middleware function and selects all route handlers guarded by that middleware. Since the current value is passd to app.use(), the callback subsequently passed to app.get() is seen as a guarded route handler.
Parameter[0] selects the first parameter of the callback (the parameter named req).
Member[data] selects accesses to the data property of the req object.
Finally, the kind remote indicates that this is considered a source of remote flow.

Example: Taint barrier using the ‘encodeURIComponent’ function¶

In this example, we’ll show how to add the return value of encodeURIComponent as a barrier for XSS.

let escaped = encodeURIComponent(input); // The return value of this method is safe for XSS.
document.body.innerHTML = escaped;

We need to add a tuple to the barrierModel(type, path, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: barrierModel
    data:
      - ["global", "Member[encodeURIComponent].ReturnValue", "html-injection"]

The first column, "global", begins the search for relevant calls at references to the global object.
The second column, Member[encodeURIComponent].ReturnValue, selects the return value of the encodeURIComponent function.
The third column, "html-injection", is the kind of the barrier.

Example: Add a barrier guard¶

This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. Consider a function called isValid which returns true when the data is considered safe.

if (isValid(userInput)) { // The check guards the use, so the input is safe.
  db.query(userInput); // This is safe.
}

We need to add a tuple to the barrierGuardModel(type, path, acceptingValue, kind) extensible predicate by updating a data extension file.

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: barrierGuardModel
    data:
      - ["my-package", "Member[isValid].Argument[0]", "true", "sql-injection"]

The first column, "my-package", begins the search at imports of the hypothetical NPM package my-package.
The second column, Member[isValid].Argument[0], selects the first argument of the isValid function. This is the value being validated.
The third column, "true", is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply.
The fourth column, "sql-injection", is the kind of the barrier guard.

Reference material¶

The following sections provide reference material for extensible predicates, access paths, types, and kinds.

Extensible predicates¶

sourceModel(type, path, kind)¶

Adds a new taint source. Most taint-tracking queries will use the new source.

type: Name of a type from which to evaluate path.
path: Access path leading to the source.
kind: Kind of source to add. See the section on source kinds for a list of supported kinds.

Example:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sourceModel
    data:
      - ["global", "Member[user].Member[name]", "remote"]

sinkModel(type, path, kind)¶

Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries.

type: Name of a type from which to evaluate path.
path: Access path leading to the sink.
kind: Kind of sink to add. See the section on sink kinds for a list of supported kinds.

Example:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: sinkModel
    data:
      - ["global", "Member[eval].Argument[0]", "code-injection"]

summaryModel(type, path, input, output, kind)¶

Adds flow through a function call.

type: Name of a type from which to evaluate path.
path: Access path leading to a function call.
input: Path relative to the function call that leads to input of the flow.
output: Path relative to the function call leading to the output of the flow.
kind: Kind of summary to add. Can be taint for taint-propagating flow, or value for value-preserving flow.

Example:

extensions:
  - addsTo:
      pack: codeql/javascript-all
      extensible: summaryModel
    data:
      - [
          "global",
          "Member[decodeURIComponent]",
          "Argument[0]",
          "ReturnValue",
          "taint",
        ]

typeModel(type1, type2, path)¶

Adds a new definition of a type.

type1: Name of the type to define.
type2: Name of the type from which to evaluate path.
path: Access path leading from type2 to type1.

Example:

extensions:
- addsTo:
    pack: codeql/javascript-all
    extensible: typeModel
  data:
    - [
        "mysql.Connection",
        "@example/db",
        "Member[getConnection].ReturnValue",
      ]

Types¶

A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. A type can be defined by adding typeModel tuples for that type. Additionally, the following built-in types are available:

The name of an NPM package matches imports of that package. For example, the type express matches the expression require("express"). If the package name includes dots, it must be surrounded by single quotes, such as in 'lodash.escape'.
The type global identifies the global object, also known as window. In JavaScript, global variables are properties of the global object, so global variables can be identified using this type. (This type also matches imports of the NPM package named global, which is a package that happens to export the global object.)
A qualified type name of form <package>.<type> identifies expressions of type <type> from <package>. For example, mysql.Connection identifies expression of type Connection from the mysql package. Note that this only works if type annotations are present in the codebase, or if sufficient typeModel tuples have been provided for that type.

Access paths¶

The path, input, and output columns consist of a .-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values.

The following components are supported:

Argument[number] selects the argument at the given index.
Argument[this] selects the receiver of a method call.
Parameter[number] selects the parameter at the given index.
Parameter[this] selects the this parameter of a function.
ReturnValue selects the return value of a function or call.
Member[name] selects the property with the given name.
AnyMember selects any property regardless of name.
ArrayElement selects an element of an array.
MapValue selects a value of a map object.
Awaited selects the value of a promise.
Instance selects instances of a class, including instances of its subclasses.
Fuzzy selects all values that are derived from the current value through a combination of the other operations described in this list. For example, this can be used to find all values that appear to originate from a particular package. This can be useful for finding method calls from a known package, but where the receiver type is not known or is difficult to model.

The following components are called “call site filters”. They select a subset of the previously-selected calls, if the call fits certain criteria:

WithArity[number] selects the subset of calls that have the given number of arguments.
WithStringArgument[number=value] selects the subset of calls where the argument at the given index is a string literal with the given value.

Components related to decorators:

DecoratedClass selects a class that has the current value as a decorator. For example, Member[Component].DecoratedClass selects any class that is decorated with @Component.
DecoratedParameter selects a parameter that is decorated by the current value.
DecoratedMember selects a method, field, or accessor that is decorated by the current value.

Additionally there is a component related to middleware functions:

GuardedRouteHandler interprets the current value as a middleware function, and selects any route handler function that comes after it in the routing hierarchy. This can be used to model properties injected onto request and response objects, such as req.db after a middleware that injects a database connection. Note that this currently over-approximates the set of route handlers but may be made more accurate in the future.

Additional notes about the syntax of operands:

Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example, Member[foo,bar] matches the union of Member[foo] and Member[bar].
Numeric operands to Argument, Parameter, and WithArity may be given as an interval. For example, Argument[0..2] matches argument 0, 1, or 2.
Argument[N-1] selects the last argument of a call, and Parameter[N-1] selects the last parameter of a function, with N-2 being the second-to-last and so on.

Kinds¶

Source kinds¶

remote: A general source of remote flow.
browser: A source in the browser environment that does not fit a more specific browser kind.
browser-url-query: A source derived from the query parameters of the browser URL, such as location.search.
browser-url-fragment: A source derived from the fragment part of the browser URL, such as location.hash.
browser-url-path: A source derived from the pathname of the browser URL, such as location.pathname.
browser-url: A source derived from the browser URL, where the untrusted part is prefixed by trusted data such as the scheme and hostname.
browser-window-name: A source derived from the window name, such as window.name.
browser-message-event: A source derived from cross-window message passing, such as event in window.onmessage = event => {...}.

Sink kinds¶

Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query.

code-injection: A sink that can be used to inject code, such as in calls to eval.
command-injection: A sink that can be used to inject shell commands, such as in calls to child_process.spawn.
path-injection: A sink that can be used for path injection in a file system access, such as in calls to fs.readFile.
sql-injection: A sink that can be used for SQL injection, such as in a MySQL query call.
nosql-injection: A sink that can be used for NoSQL injection, such as in a MongoDB findOne call.
html-injection: A sink that can be used for HTML injection, such as in a jQuery $() call.
request-forgery: A sink that controls the URL of a request, such as in a fetch call.
url-redirection: A sink that can be used to redirect the user to a malicious URL.
unsafe-deserialization: A deserialization sink that can lead to code execution or other unsafe behaviour, such as an unsafe YAML parser.
log-injection: A sink that can be used for log injection, such as in a console.log call.

Summary kinds¶

taint: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well.
value: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved.

Threat models¶

Note

Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.

A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.

The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:

remote which represents requests and responses from the network.
local which represents data from local files (file), command-line arguments (commandargs), database reads (database), environment variables(environment), standard input (stdin) and Windows registry values (“windows-registry”). Currently, Windows registry values are used by C# only.

Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.

The less commonly used categories are:

android which represents reads from external files in Android (android-external-storage-dir) and parameter of an entry-point method declared in a ContentProvider class (contentprovider). Currently only used by Java/Kotlin.
database-access-result which represents a database access. Currently only used by JavaScript.
file-write which represents opening a file in write mode. Currently only used in C#.
reverse-dns which represents reverse DNS lookups. Currently only used in Java.
view-component-input which represents inputs to a React, Vue, or Angular component (also known as “props”). Currently only used by JavaScript/TypeScript.

When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.