Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit e49597e

Browse files
author
Max Schaefer
committed
JavaScript: Add guide to using summaries.
1 parent 389def4 commit e49597e

1 file changed

Lines changed: 223 additions & 0 deletions

File tree

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
Summary-based information flow analysis
2+
=======================================
3+
4+
Overview
5+
--------
6+
7+
This document presents an approach for running information flow analyses (such as the standard
8+
Semmle security queries) on an application that depends on one or more npm packages. Instead of
9+
installing the npm packages during the snapshot build and analyzing them together with application
10+
code, we analyze each package in isolation and compute *flow summaries* that record information
11+
about any sources, sinks and flow steps contributed by the package's API. These flow summaries
12+
are then imported when building a snapshot of the application (usually in the form of CSV files
13+
added as external data), and are picked up by the standard security queries, allowing them to reason
14+
about flow into, out of and through the npm packages as though they had been included as part of the
15+
build.
16+
17+
Motivating example
18+
------------------
19+
20+
Let us take the `mkdirp <https://www.npmjs.com/package/mkdirp>`_ package as an example. It exports
21+
a function that takes as its first argument a file system path, and creates a folder with that
22+
path, as well as any parent folders that do not exist yet. As further arguments, the function
23+
accepts an optional configuration object and a callback to invoke once the folder has been
24+
created.
25+
26+
An application might use this package as follows:
27+
28+
.. code-block:: js
29+
30+
const mkdirp = require('mkdirp');
31+
// ...
32+
mkdirp(p, opts, function cb(err) {
33+
// ...
34+
});
35+
36+
If the value of ``p`` can be controlled by an untrusted user, this would allow them to create arbitrary
37+
folders, which may not be desirable.
38+
39+
By analyzing the application code base together with the source code for the ``mkdirp`` package,
40+
Semmle's default path injection analysis would be able to track taint through the call to ``mkdirp`` into its
41+
implementation, which ultimately uses built-in Node.js file system APIs to create the folder. Since
42+
the path injection analysis has built-in models of these APIs it would then be able to spot and flag this
43+
vulnerability.
44+
45+
However, analyzing ``mkdirp`` from scratch for every client application is wasteful. Moreover, it would
46+
in this case be undesirable to flag the location inside ``mkdirp`` where the folder is actually created
47+
as part of the alert: the developer of the client application did not write that code and hence will
48+
have a hard time understanding why it is being flagged.
49+
50+
Both of these concerns can be addressed by treating the first argument to ``mkdirp`` as a path injection
51+
sink in its own right: the analysis no longer needs to track flow into the implementation of ``mkdirp``,
52+
so we would no longer need to include its source code in the analysis, and the alert would flag the call
53+
to ``mkdirp`` in application code, not its implementation in library code.
54+
55+
The information that the first parameter of ``mkdirp`` is interpreted as a file system path and hence should
56+
be considered a path injection sink is an example of a *flow summary*, or more precisely a *sink summary*.
57+
Besides sink summaries, we also consider *source summaries* and *flow-step summaries*.
58+
59+
In general, a sink summary states that some API interface point (such as a function parameter) should
60+
be considered a sink for a certain analysis, so if data from a known source reaches this point without
61+
undergoing appropriate sanitization, it should be flagged with an alert. A sink summary may also
62+
specify which taint kind the data needs to have in order for the sink to be problematic.
63+
64+
Conversely, a source summary identifies some API (such as the return value of a function) as a source
65+
of tainted data for a certain analysis, again optionally specifying a taint kind.
66+
67+
Finally, a flow-step summary records the fact that data that flows into the package at some point
68+
may propagate to another point (for example, from a function parameter to its return value).
69+
In this case, there are two relevant taint kinds, one describing the kind of taint data has that
70+
enters, and one describing the taint of the data that emerges. In general, flow steps (like sources
71+
and sinks) are analysis-specific, since we need to know about sanitizers.
72+
73+
In what follows we will first discuss how summaries are generated from a snapshot of an npm package,
74+
and then how they are imported when analyzing client code. Finally, we will discuss the format in which
75+
flow summaries are stored.
76+
77+
Note that flow summaries are considered an experimental feature at this point. Using them involves
78+
some manual configuration, and we make no guarantee that the API will remain stable.
79+
80+
Generating summaries
81+
--------------------
82+
83+
Flow summaries of an npm package can be generated by running special summary extraction queries
84+
either on a snapshot of the package itself, or on a snapshot of a hand-written model of the
85+
package. (Note that this requires a working installation of Semmle Core.)
86+
87+
There are three default summary extraction queries:
88+
89+
- Extract flow step summaries (``js/step-summary-extraction``,
90+
``Security/Summaries/ExtractSourceSummaries.ql``)
91+
- Extract sink summaries (``js/sink-summary-extraction``,
92+
``Security/Summaries/ExtractSinkSummaries.ql``)
93+
- Extract source summaries (``js/source-summary-extraction``,
94+
``Security/Summaries/ExtractSourceSummaries.ql``)
95+
96+
You can run these queries individually against a snapshot of the npm package you want to create
97+
flow summaries for using ``odasa runQuery``, and store the output as CSV files named
98+
``additional-steps.csv``, ``additional-sinks.csv`` and ``additional-sources.csv``, respectively.
99+
100+
For example, assuming that folder ``mkdirp-snapshot`` contains a snapshot of the ``mkdirp``
101+
project, we can extract sink summaries using the command
102+
103+
.. code-block:: bash
104+
105+
odasa runQuery \
106+
--query $SEMMLE_DIST/queries/semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql \
107+
--output-file additional-sinks.csv --snapshot mkdirp-snapshot
108+
109+
110+
Instead of generating summaries directly from the package source code, you can also generate
111+
them from a hand-written model of the package. The model should contain a ``package.json`` file
112+
giving the correct package name, and models for the relevant API entry points. The models are
113+
plain JavaScript with special comments annotating certain expressions as sources or sinks.
114+
115+
For example, a model of ``mkdirp`` might look like this:
116+
117+
.. code-block:: js
118+
119+
module.exports = function mkdirp(path) {
120+
path /* Semmle: sink: taint, TaintedPath */
121+
};
122+
123+
Annotation comments start with ``Semmle:``, and contain ``source`` and ``sink`` specifications.
124+
Each such specification lists a flow label (in this case, ``taint``) and a configuration to which
125+
the specification applies (in this case, ``TaintedPath``).
126+
127+
A source specification annotates an expression as being a source of flow with the given label
128+
for the purposes of the given configuration, and similar for sinks. Annotation comments apply to
129+
any expression (and more generally any data flow node) whose source location ends on the line
130+
where the comment starts.
131+
132+
Using summaries
133+
---------------
134+
135+
Once you have created summaries using the approach outlined above, you have two options for
136+
including them in the analysis of a client application.
137+
138+
External data
139+
:::::::::::::
140+
141+
Firstly, you can include the CSV files generated by running the extraction queries as external
142+
data when building a snapshot of the client application by copying them into the
143+
``$snapshot/external/data`` folder. This is typically done by including a command like this
144+
in your ``project`` file:
145+
146+
.. code-block:: xml
147+
148+
<build>cp /path/to/additional-sinks.csv ${snapshot}/external/data</build>
149+
150+
If you want to include summaries for multiple libraries, you have to concatenate the
151+
corresponding CSV files before copying them into the external data folder.
152+
153+
Additionally, you need to import the library ``Security.Summaries.ImportFromCsv`` in your
154+
``javascript.qll``, which will pick up the summaries from external data and interpret them
155+
as additional sources, sinks and flow steps:
156+
157+
.. code-block:: ql
158+
159+
import Security.Summaries.ImportFromCsv
160+
161+
After these preparatory steps, you can run your analysis without any further changes.
162+
163+
External predicates
164+
:::::::::::::::::::
165+
166+
The second method for including flow summaries is by including the
167+
``Security.Summaries.ImportFromExternalPredicates`` library in your analysis, which declares
168+
three external predicates ``additionalSteps``, ``additionalSinks`` and ``additionalSources`` that
169+
need to be instantiated with the flow summary CSV data.
170+
171+
This is most easily done in QL for Eclipse, which will prompt you for CSV files to populate
172+
the three predicates.
173+
174+
This approach has the advantage that you do not need to include the CSV files during the
175+
snapshot build, so you can use an existing snapshot, for example as downloaded from LGTM.com.
176+
177+
Summary format
178+
--------------
179+
180+
Source and sink summaries are specified as tuples of the form ``(portal, kind, configuration)``,
181+
where ``portal`` is a description of the API element being marked as a source or sink, ``kind``
182+
is a flow label (also known as "taint kind") describing the kind of information being generated
183+
or consumed, and ``configuration`` specifies which flow configuration the summary applies to.
184+
185+
If ``kind`` is empty, it defaults to ``data`` for sources and either ``data`` or ``taint`` for sinks.
186+
If ``configuration`` is empty, the specification applies to all configurations.
187+
The default extraction queries never produce empty ``kind`` or ``configuration`` columns.
188+
189+
Similarly, step summaries are tuples of the form
190+
``(inPortal, inKind, outPortal, outKind, configuration)``, stating that information with label
191+
``inKind`` that flows into ``inPortal`` resurfaces from ``outPortal``, now having kind ``outKind``.
192+
As before, ``configuration`` specifies which configuration this information applies to.
193+
194+
In all of the above, ``portal`` is an S-expression that abstractly describes a *portal*, that is,
195+
an API interface point by which data may enter or leave the npm package being analyzed.
196+
197+
Currently, we model five kinds of portals:
198+
199+
- ``(root <uri>)``, representing the ``module`` object of the main module of the npm package
200+
described by ``<uri>``, which is a URL of the form ``https://www.npmjs.com/package/<pkg>``;
201+
- ``(member <base> <name>)``, representing property ``<name>`` of an object described by
202+
portal ``<base>``;
203+
- ``(instance <base>)``, representing an instance of a (constructor) function or class
204+
described by portal ``base``;
205+
- ``(parameter <base> <i>)``, representing the ``i`` th parameter of a function described by
206+
portal ``base``;
207+
- ``(return <base>)``, representing the return value of a function described by portal ``base``.
208+
209+
In our example above, the first parameter of the default export of package ``mkdirp`` is
210+
described by the portal
211+
212+
.. code-block:: lisp
213+
214+
(parameter (member (root https://www.npmjs.com/package/mkdirp) default) 0)
215+
216+
As a more complicated example,
217+
218+
.. code-block:: lisp
219+
220+
(parameter (parameter (member (instance (member (root https://www.npmjs.com/package/bluebird) Promise)) then) 1) 0)
221+
222+
describes the first parameter of a function passed as second argument to the ``then`` method of
223+
the ``Promise`` constructor exported by package ``bluebird``.

0 commit comments

Comments
 (0)