Customizing Library Models for JavaScript¶
Beta Notice - Unstable API
Library customization using data extensions is currently in beta and subject to change.
Breaking changes to this format may occur while in beta.
JavaScript analysis can be customized by adding library models in data extension files.
A data extension for JavaScript is a YAML file of the form:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: <name of extensible predicate>
data:
- <tuple1>
- <tuple2>
- ...
The CodeQL library for JavaScript exposes the following extensible predicates:
sourceModel(type, path, kind)sinkModel(type, path, kind)typeModel(type1, type2, path)summaryModel(type, path, input, output, kind)barrierModel(type, path, kind)barrierGuardModel(type, path, acceptingValue, kind)
We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.
Example: Taint sink in the ‘execa’ package¶
In this example, we’ll show how to add the following argument, passed to execa, as a command-line injection sink:
import { shell } from "execa";
shell(cmd); // <-- add 'cmd' as a taint sink
Note that this sink is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the
sinkModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sinkModel
data:
- ["execa", "Member[shell].Argument[0]", "command-injection"]
The first column,
"execa", identifies a set of values from which to begin the search for the sink. The string"execa"means we start at the places where the codebase imports the NPM packageexeca.The second column is an access path that is evaluated from left to right, starting at the values that were identified by the first column.
Member[shell]selects accesses to theshellmember of theexecapackage.Argument[0]selects the first argument to calls to that member.
command-injectionindicates that this is considered a sink for the command injection query.
Example: Taint sources from window ‘message’ events¶
In this example, we’ll show how the event.data expression below could be marked as a remote flow source:
window.addEventListener("message", function (event) {
let data = event.data; // <-- add 'event.data' as a taint source
});
Note that this source is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the
sourceModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sourceModel
data:
- [
"global",
"Member[addEventListener].Argument[1].Parameter[0].Member[data]",
"remote",
]
The first column,
"global", begins the search at references to the global object (also known aswindowin browser contexts). This is a special JavaScript object that contains all global variables and methods.Member[addEventListener]selects accesses to theaddEventListenermember.Argument[1]selects the second argument of calls to that member (the argument containing the callback).Parameter[0]selects the first parameter of the callback (the parameter namedevent).Member[data]selects accesses to thedataproperty of the event object.Finally, the kind
remoteindicates that this is considered a source of remote flow.
In the next section, we’ll show how to restrict the model to recognize events of a specific type.
Continued example: Restricting the event type¶
The model above treats all events as sources of remote flow, not just message events.
For example, it would also pick up this irrelevant source:
window.addEventListener("onclick", function (event) {
let data = event.data; // <-- 'event.data' became a spurious taint source
});
We can refine the model by adding the WithStringArgument component to restrict the set of calls being considered:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sourceModel
data:
- [
"global",
"Member[addEventListener].WithStringArgument[0=message].Argument[1].Parameter[0].Member[data]",
"remote",
]
The WithStringArgument[0=message] component here selects the subset of calls to addEventListener where the first argument is a string literal with the value "message".
Example: Using types to add MySQL injection sinks¶
In this example, we’ll show how to add the following SQL injection sink:
import { Connection } from "mysql";
function submit(connection: Connection, q: string) {
connection.query(q); // <-- add 'q' as a SQL injection sink
}
We need to add a tuple to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sinkModel
data:
- ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]
The first column,
"mysql.Connection", begins the search at any expression whose value is known to be an instance of theConnectiontype from themysqlpackage. This will select theconnectionparameter above because of its type annotation.Member[query]selects thequerymember from the connection object.Argument[0]selects the first argument of a call to that member.sql-injectionindicates that this is considered a sink for the SQL injection query.
This works in this example because the connection parameter has a type annotation that matches what the model is looking for.
Note that there is a significant difference between the following two rows:
data:
- ["mysql.Connection", "", ...]
- ["mysql", "Member[Connection]", ...]
The first row matches instances of mysql.Connection, which are objects that encapsulate a MySQL connection.
The second row would match something like require('mysql').Connection, which is not itself a connection object.
In the next section, we’ll show how to generalize the model to handle the absence of type annotations.
Continued example: Dealing with untyped code¶
Suppose we want the model from above to detect the sink in this snippet:
import { getConnection } from "@example/db";
let connection = getConnection();
connection.query(q); // <-- add 'q' as a SQL injection sink
There is no type annotation on connection, and there is no indication of what getConnection() returns.
By adding a tuple to the typeModel(type1, type2, path) extensible predicate we can tell our model that
this function returns an instance of mysql.Connection:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: typeModel
data:
- ["mysql.Connection", "@example/db", "Member[getConnection].ReturnValue"]
The first column,
"mysql.Connection", names the type that we’re adding a new definition for.The second column,
"@example/db", begins the search at imports of the hypothetical NPM package@example/db.Member[getConnection]selects references to thegetConnectionmember from that package.ReturnValueselects the return value from a call to that member.
The new model states that the return value of getConnection() has type mysql.Connection.
Combining this with the sink model we added earlier, the sink in the example is detected by the model.
The mechanism used here is how library models work for both TypeScript and plain JavaScript.
A good library model contains typeModel tuples to ensure it works even in codebases without type annotations.
For example, the mysql model that is included with the CodeQL JS analysis includes this type definition (among many others):
- ["mysql.Connection", "mysql", "Member[createConnection].ReturnValue"]
Example: Using fuzzy models to simplify modeling¶
In this example, we’ll show how to add the following SQL injection sink using a “fuzzy” model:
import * as mysql from 'mysql';
const pool = mysql.createPool({...});
pool.getConnection((err, conn) => {
conn.query(q, (err, rows) => {...}); // <-- add 'q' as a SQL injection sink
});
We need to add a tuple for a fuzzy model to the sinkModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sinkModel
data:
- ["mysql", "Fuzzy.Member[query].Argument[0]", "sql-injection"]
The first column,
"mysql", begins the search at places where the mysql package is imported.Fuzzyselects all objects that appear to originate from the mysql package, such as the pool, conn, err, and rows objects.Member[query]selects thequerymember from any of those objects. In this case, the only such member is conn.query. In principle, this would also find expressions such as pool.query and err.query, but in practice such expressions are not likely to occur, because the pool and err objects do not have a member named query.Argument[0]selects the first argument of a call to the selected member, that is, the q argument to conn.query.sql-injectionindicates that this is considered as a sink for the SQL injection query.
For reference, a more detailed model might look like this, as described in the preceding examples:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sinkModel
data:
- ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]
- addsTo:
pack: codeql/javascript-all
extensible: typeModel
data:
- ["mysql.Pool", "mysql", "Member[createPool].ReturnValue"]
- ["mysql.Connection", "mysql.Pool", "Member[getConnection].Argument[0].Parameter[1]"]
The model using the Fuzzy component is simpler, at the cost of being approximate.
This technique is useful when modeling a large or complex library, where it is difficult to write a detailed model.
Example: Adding flow through ‘decodeURIComponent’¶
In this example, we’ll show how to add flow through calls to decodeURIComponent:
let y = decodeURIComponent(x); // add taint flow from 'x' to 'y'
Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the
summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: summaryModel
data:
- [
"global",
"Member[decodeURIComponent]",
"Argument[0]",
"ReturnValue",
"taint",
]
The first column,
"global", begins the search for relevant calls at references to the global object. In JavaScript, global variables are properties of the global object, so this lets us access global variables or functions.The second column,
Member[decodeURIComponent], is a path leading to the function calls we wish to model. In this case, we select references to thedecodeURIComponentmember from the global object, that is, the global variable nameddecodeURIComponent.The third column,
Argument[0], indicates the input of the flow. In this case, the first argument to the function call.The fourth column,
ReturnValue, indicates the output of the flow. In this case, the return value of the function call.The last column,
taint, indicates the kind of flow to add. The valuetaintmeans the output is not necessarily equal to the input, but was derived from the input in a taint-preserving way.
Example: Adding flow through ‘underscore.forEach’¶
In this example, we’ll show how to add flow through calls to forEach from the underscore package:
require('underscore').forEach([x, y], (v) => { ... }); // add value flow from 'x' and 'y' to 'v'
Note that this flow is already recognized by the CodeQL JS analysis, but for this example, you could add a tuple to the
summaryModel(type, path, input, output, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: summaryModel
data:
- [
"underscore",
"Member[forEach]",
"Argument[0].ArrayElement",
"Argument[1].Parameter[0]",
"value",
]
The first column,
"underscore", begins the search for relevant calls at places where theunderscorepackage is imported.The second column,
Member[forEach], selects references to theforEachmember from theunderscorepackage.The third column specifies the input of the flow:
Argument[0]selects the first argument offorEach, which is the array being iterated over.ArrayElementselects the elements of that array (the expressionsxandy).
The fourth column specifies the output of the flow:
Argument[1]selects the second argument offorEach(the argument containing the callback function).Parameter[0]selects the first parameter of the callback function (the parameter namedv).
The last column,
value, indicates the kind of flow to add. The valuevaluemeans the input value is unchanged as it flows to the output.
Example: Modeling properties injected by a middleware function¶
In this example, we’ll show how to model a hypothetical middleware function that adds a tainted value on the incoming request objects:
const express = require('express')
const app = express()
app.use(require('@example/middleware').injectData())
app.get('/foo', (req, res) => {
req.data; // <-- mark 'req.data' as a taint source
});
We need to add a tuple to the sourceModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sourceModel
data:
- [
"@example/middleware",
"Member[injectData].ReturnValue.GuardedRouteHandler.Parameter[0].Member[data]",
"remote",
]
The first column,
"@example/middleware", begins the search at imports of the hypothetical NPM package@example/middleware.Member[injectData]selects accesses to theinjectDatamember.ReturnValueselects the return value of the call toinjectData.GuardedRouteHandlerinterprets the current value as a middleware function and selects all route handlers guarded by that middleware. Since the current value is passd toapp.use(), the callback subsequently passed toapp.get()is seen as a guarded route handler.Parameter[0]selects the first parameter of the callback (the parameter namedreq).Member[data]selects accesses to thedataproperty of thereqobject.Finally, the kind
remoteindicates that this is considered a source of remote flow.
Example: Taint barrier using the ‘encodeURIComponent’ function¶
In this example, we’ll show how to add the return value of encodeURIComponent as a barrier for XSS.
let escaped = encodeURIComponent(input); // The return value of this method is safe for XSS.
document.body.innerHTML = escaped;
We need to add a tuple to the barrierModel(type, path, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: barrierModel
data:
- ["global", "Member[encodeURIComponent].ReturnValue", "html-injection"]
The first column,
"global", begins the search for relevant calls at references to the global object.The second column,
Member[encodeURIComponent].ReturnValue, selects the return value of theencodeURIComponentfunction.The third column,
"html-injection", is the kind of the barrier.
Example: Add a barrier guard¶
This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data. Consider a function called isValid which returns true when the data is considered safe.
if (isValid(userInput)) { // The check guards the use, so the input is safe.
db.query(userInput); // This is safe.
}
We need to add a tuple to the barrierGuardModel(type, path, acceptingValue, kind) extensible predicate by updating a data extension file.
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: barrierGuardModel
data:
- ["my-package", "Member[isValid].Argument[0]", "true", "sql-injection"]
The first column,
"my-package", begins the search at imports of the hypothetical NPM packagemy-package.The second column,
Member[isValid].Argument[0], selects the first argument of the isValid function. This is the value being validated.The third column,
"true", is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply.The fourth column,
"sql-injection", is the kind of the barrier guard.
Reference material¶
The following sections provide reference material for extensible predicates, access paths, types, and kinds.
Extensible predicates¶
sourceModel(type, path, kind)¶
Adds a new taint source. Most taint-tracking queries will use the new source.
type: Name of a type from which to evaluatepath.path: Access path leading to the source.kind: Kind of source to add. See the section on source kinds for a list of supported kinds.
Example:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sourceModel
data:
- ["global", "Member[user].Member[name]", "remote"]
sinkModel(type, path, kind)¶
Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries.
type: Name of a type from which to evaluatepath.path: Access path leading to the sink.kind: Kind of sink to add. See the section on sink kinds for a list of supported kinds.
Example:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: sinkModel
data:
- ["global", "Member[eval].Argument[0]", "code-injection"]
summaryModel(type, path, input, output, kind)¶
Adds flow through a function call.
type: Name of a type from which to evaluatepath.path: Access path leading to a function call.input: Path relative to the function call that leads to input of the flow.output: Path relative to the function call leading to the output of the flow.kind: Kind of summary to add. Can betaintfor taint-propagating flow, orvaluefor value-preserving flow.
Example:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: summaryModel
data:
- [
"global",
"Member[decodeURIComponent]",
"Argument[0]",
"ReturnValue",
"taint",
]
typeModel(type1, type2, path)¶
Adds a new definition of a type.
type1: Name of the type to define.type2: Name of the type from which to evaluatepath.path: Access path leading fromtype2totype1.
Example:
extensions:
- addsTo:
pack: codeql/javascript-all
extensible: typeModel
data:
- [
"mysql.Connection",
"@example/db",
"Member[getConnection].ReturnValue",
]
Types¶
A type is a string that identifies a set of values.
In each of the extensible predicates mentioned in previous section, the first column is always the name of a type.
A type can be defined by adding typeModel tuples for that type. Additionally, the following built-in types are available:
The name of an NPM package matches imports of that package. For example, the type
expressmatches the expressionrequire("express"). If the package name includes dots, it must be surrounded by single quotes, such as in'lodash.escape'.The type
globalidentifies the global object, also known aswindow. In JavaScript, global variables are properties of the global object, so global variables can be identified using this type. (This type also matches imports of the NPM package namedglobal, which is a package that happens to export the global object.)A qualified type name of form
<package>.<type>identifies expressions of type<type>from<package>. For example,mysql.Connectionidentifies expression of typeConnectionfrom themysqlpackage. Note that this only works if type annotations are present in the codebase, or if sufficienttypeModeltuples have been provided for that type.
Access paths¶
The path, input, and output columns consist of a .-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values.
The following components are supported:
Argument[number]selects the argument at the given index.Argument[this]selects the receiver of a method call.Parameter[number]selects the parameter at the given index.Parameter[this]selects thethisparameter of a function.ReturnValueselects the return value of a function or call.Member[name]selects the property with the given name.AnyMemberselects any property regardless of name.ArrayElementselects an element of an array.MapValueselects a value of a map object.Awaitedselects the value of a promise.Instanceselects instances of a class, including instances of its subclasses.Fuzzyselects all values that are derived from the current value through a combination of the other operations described in this list. For example, this can be used to find all values that appear to originate from a particular package. This can be useful for finding method calls from a known package, but where the receiver type is not known or is difficult to model.
The following components are called “call site filters”. They select a subset of the previously-selected calls, if the call fits certain criteria:
WithArity[number]selects the subset of calls that have the given number of arguments.WithStringArgument[number=value]selects the subset of calls where the argument at the given index is a string literal with the given value.
Components related to decorators:
DecoratedClassselects a class that has the current value as a decorator. For example,Member[Component].DecoratedClassselects any class that is decorated with@Component.DecoratedParameterselects a parameter that is decorated by the current value.DecoratedMemberselects a method, field, or accessor that is decorated by the current value.
Additionally there is a component related to middleware functions:
GuardedRouteHandlerinterprets the current value as a middleware function, and selects any route handler function that comes after it in the routing hierarchy. This can be used to model properties injected onto request and response objects, such asreq.dbafter a middleware that injects a database connection. Note that this currently over-approximates the set of route handlers but may be made more accurate in the future.
Additional notes about the syntax of operands:
Multiple operands may be given to a single component, as a shorthand for the union of the operands. For example,
Member[foo,bar]matches the union ofMember[foo]andMember[bar].Numeric operands to
Argument,Parameter, andWithAritymay be given as an interval. For example,Argument[0..2]matches argument 0, 1, or 2.Argument[N-1]selects the last argument of a call, andParameter[N-1]selects the last parameter of a function, withN-2being the second-to-last and so on.
Kinds¶
Source kinds¶
remote: A general source of remote flow.browser: A source in the browser environment that does not fit a more specific browser kind.browser-url-query: A source derived from the query parameters of the browser URL, such aslocation.search.browser-url-fragment: A source derived from the fragment part of the browser URL, such aslocation.hash.browser-url-path: A source derived from the pathname of the browser URL, such aslocation.pathname.browser-url: A source derived from the browser URL, where the untrusted part is prefixed by trusted data such as the scheme and hostname.browser-window-name: A source derived from the window name, such aswindow.name.browser-message-event: A source derived from cross-window message passing, such aseventinwindow.onmessage = event => {...}.
See also Threat models.
Sink kinds¶
Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query.
code-injection: A sink that can be used to inject code, such as in calls toeval.command-injection: A sink that can be used to inject shell commands, such as in calls tochild_process.spawn.path-injection: A sink that can be used for path injection in a file system access, such as in calls tofs.readFile.sql-injection: A sink that can be used for SQL injection, such as in a MySQLquerycall.nosql-injection: A sink that can be used for NoSQL injection, such as in a MongoDBfindOnecall.html-injection: A sink that can be used for HTML injection, such as in a jQuery$()call.request-forgery: A sink that controls the URL of a request, such as in afetchcall.url-redirection: A sink that can be used to redirect the user to a malicious URL.unsafe-deserialization: A deserialization sink that can lead to code execution or other unsafe behaviour, such as an unsafe YAML parser.log-injection: A sink that can be used for log injection, such as in aconsole.logcall.
Summary kinds¶
taint: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well.value: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved.
Threat models¶
Note
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.
A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.
The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:
remotewhich represents requests and responses from the network.localwhich represents data from local files (file), command-line arguments (commandargs), database reads (database), environment variables(environment), standard input (stdin) and Windows registry values (“windows-registry”). Currently, Windows registry values are used by C# only.
Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.
The less commonly used categories are:
androidwhich represents reads from external files in Android (android-external-storage-dir) and parameter of an entry-point method declared in aContentProviderclass (contentprovider). Currently only used by Java/Kotlin.database-access-resultwhich represents a database access. Currently only used by JavaScript.file-writewhich represents opening a file in write mode. Currently only used in C#.reverse-dnswhich represents reverse DNS lookups. Currently only used in Java.view-component-inputwhich represents inputs to a React, Vue, or Angular component (also known as “props”). Currently only used by JavaScript/TypeScript.
When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.