Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 6a12864

Browse files
committed
JS: Document how API graphs should be interpreted
1 parent 07e450d commit 6a12864

1 file changed

Lines changed: 104 additions & 8 deletions

File tree

javascript/ql/lib/semmle/javascript/ApiGraphs.qll

Lines changed: 104 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,120 @@
22
* Provides an implementation of _API graphs_, which are an abstract representation of the API
33
* surface used and/or defined by a code base.
44
*
5-
* The nodes of the API graph represent definitions and uses of API components. The edges are
6-
* directed and labeled; they specify how the components represented by nodes relate to each other.
7-
* For example, if one of the nodes represents a definition of an API function, then there
8-
* will be nodes corresponding to the function's parameters, which are connected to the function
9-
* node by edges labeled `parameter <i>`.
5+
* See `API::Node` for more in-depth documentation.
106
*/
117

128
import javascript
139
private import semmle.javascript.dataflow.internal.FlowSteps as FlowSteps
1410
private import internal.CachedStages
1511

1612
/**
17-
* Provides classes and predicates for working with APIs defined or used in a database.
13+
* Provides classes and predicates for working with the API boundary between the current
14+
* codebase and external libraries.
15+
*
16+
* See `API::Node` for more in-depth documentation.
1817
*/
1918
module API {
2019
/**
21-
* An abstract representation of a definition or use of an API component such as a function
22-
* exported by an npm package, a parameter of such a function, or its result.
20+
* A node in the API graph, representing a value that has crossed the boundary between this
21+
* codebase and an external library.
22+
*
23+
* ### Basic usage
24+
*
25+
* API graphs are typically used to identify "API calls", that is, calls to an external function
26+
* whose implementation is not necessarily part of the current codebase.
27+
*
28+
* The most basic use of API graphs is typically as follows:
29+
* 1. Start with `API::moduleImport` for the relevant library.
30+
* 2. Follow up with a chain of accessors such as `getMember` describing how to get to the relevant API function.
31+
* 3. Map the resulting API graph nodes to data-flow nodes, using `getAnImmediateUse` or `getARhs`.
32+
*
33+
* For example, a simplified way to get arguments to `underscore.extend` would be
34+
* ```codeql
35+
* API::moduleImport("underscore").getMember("extend").getParameter(0).getARhs()
36+
* ```
37+
*
38+
* The most commonly used accessors are `getMember`, `getParameter`, and `getReturn`.
39+
*
40+
* ### API graph nodes
41+
*
42+
* There are two kinds of nodes in the API graphs, distinguished by who is "holding" the value:
43+
* - **Use-nodes** represent values held by the current codebase, which came from an external library.
44+
* (The current codebase is "using" a value that came from the library).
45+
* - **Def-nodes** represent values held by the external library, which came from this codebase.
46+
* (The current codebase "defines" the value seen by the library).
47+
*
48+
* API graph nodes are associated with data-flow nodes in the current codebase.
49+
* (Since external libraries are not part of the database, there is no way to associate with concrete
50+
* data-flow nodes from the external library).
51+
* - **Use-nodes** are associated with data-flow nodes where a value enters the current codebase,
52+
* such as the return value of a call to an external function.
53+
* - **Def-nodes** are associated with data-flow nodes where a value leaves the current codebase,
54+
* such as an argument passed in a call to an external function.
55+
*
56+
*
57+
* ### Access paths and edge labels
58+
*
59+
* Nodes in the API graph nodes are associated with a set of access paths, describing a series of operations
60+
* that may be performed to obtain that value.
61+
*
62+
* For example, the access path `API::moduleImport("lodash").getMember("extend")` represents the action of
63+
* importing `lodash` and then accessing the member `extend` on the resulting object.
64+
* It would be associated with an expression such as `require("lodash").extend`.
65+
*
66+
* Each edge in the graph is labelled by such an "operation". For an edge `A->B`, the type of the `A` node
67+
* determines who is performing the operation, and the type of the `B` node determines who ends up holding
68+
* the result:
69+
* - An edge starting from a use-node describes what the current codebase is doing to a value that
70+
* came from a library.
71+
* - An edge starting from a def-node describes what the external library might do to a value that
72+
* came from the current codebase.
73+
* - An edge ending in a use-node means the result ends up in the current codebase (at its associated data-flow node).
74+
* - An edge ending in a def-node means the result ends up in external code (its associated data-flow node is
75+
* the place where it was "last seen" in the current codebase before flowing out)
76+
*
77+
* Because the implementation of the external library is not visible, it is not known exactly what operations
78+
* it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would
79+
* lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform
80+
* those operations. (When constructing these edge, we assume the library is somewhat well-behaved).
81+
*
82+
* For example, given this snippet:
83+
* ```js
84+
* require('foo')(x => { doSomething(x) })
85+
* ```
86+
* A callback is passed to the external function `foo`. We can't know if `foo` will actually invoke this callback.
87+
* But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter.
88+
* For that reason, an edge is generated representing the argument-passing operation that might be performed by `foo`.
89+
* This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x`.
90+
*
91+
* ### Thinking in operations versus code patterns
92+
*
93+
* Treating edges as "operations" helps avoid a pitfall in which library models become overly specific to certain code patterns.
94+
* Consider the following two equivalent calls to `foo`:
95+
* ```js
96+
* const foo = require('foo');
97+
*
98+
* foo({
99+
* myMethod(x) {...}
100+
* });
101+
*
102+
* foo({
103+
* get myMethod() {
104+
* return function(x) {...}
105+
* }
106+
* });
107+
* ```
108+
* If `foo` calls `myMethod` on its first parameter, either of the `myMethod` implementations will be invoked.
109+
* An indeed, the access path `API::moduleImport("foo").getParameter(0).getMember("myMethod").getParameter(0)` correctly
110+
* identifies both `x` parameters.
111+
*
112+
* Observe how `getMember("myMethod")` behaves when the member is defined via a getter. When thinking in code patterns,
113+
* it might seem obvious that `getMember` should have obtained a reference the getter method itself.
114+
* But when seeing it as an access to `myMethod` performed by the library, we can deduce that the relevant expression
115+
* on the client side is actually the return-value of the getter.
116+
*
117+
* Although one may think of API graphs as a tool to find certain program elements in the codebase,
118+
* it can lead to some situations where intuition does not match what works best in practice.
23119
*/
24120
class Node extends Impl::TApiNode {
25121
/**

0 commit comments

Comments
 (0)