|
2 | 2 | * Provides an implementation of _API graphs_, which are an abstract representation of the API |
3 | 3 | * surface used and/or defined by a code base. |
4 | 4 | * |
5 | | - * The nodes of the API graph represent definitions and uses of API components. The edges are |
6 | | - * directed and labeled; they specify how the components represented by nodes relate to each other. |
7 | | - * For example, if one of the nodes represents a definition of an API function, then there |
8 | | - * will be nodes corresponding to the function's parameters, which are connected to the function |
9 | | - * node by edges labeled `parameter <i>`. |
| 5 | + * See `API::Node` for more in-depth documentation. |
10 | 6 | */ |
11 | 7 |
|
12 | 8 | import javascript |
13 | 9 | private import semmle.javascript.dataflow.internal.FlowSteps as FlowSteps |
14 | 10 | private import internal.CachedStages |
15 | 11 |
|
16 | 12 | /** |
17 | | - * Provides classes and predicates for working with APIs defined or used in a database. |
| 13 | + * Provides classes and predicates for working with the API boundary between the current |
| 14 | + * codebase and external libraries. |
| 15 | + * |
| 16 | + * See `API::Node` for more in-depth documentation. |
18 | 17 | */ |
19 | 18 | module API { |
20 | 19 | /** |
21 | | - * An abstract representation of a definition or use of an API component such as a function |
22 | | - * exported by an npm package, a parameter of such a function, or its result. |
| 20 | + * A node in the API graph, representing a value that has crossed the boundary between this |
| 21 | + * codebase and an external library. |
| 22 | + * |
| 23 | + * ### Basic usage |
| 24 | + * |
| 25 | + * API graphs are typically used to identify "API calls", that is, calls to an external function |
| 26 | + * whose implementation is not necessarily part of the current codebase. |
| 27 | + * |
| 28 | + * The most basic use of API graphs is typically as follows: |
| 29 | + * 1. Start with `API::moduleImport` for the relevant library. |
| 30 | + * 2. Follow up with a chain of accessors such as `getMember` describing how to get to the relevant API function. |
| 31 | + * 3. Map the resulting API graph nodes to data-flow nodes, using `getAnImmediateUse` or `getARhs`. |
| 32 | + * |
| 33 | + * For example, a simplified way to get arguments to `underscore.extend` would be |
| 34 | + * ```codeql |
| 35 | + * API::moduleImport("underscore").getMember("extend").getParameter(0).getARhs() |
| 36 | + * ``` |
| 37 | + * |
| 38 | + * The most commonly used accessors are `getMember`, `getParameter`, and `getReturn`. |
| 39 | + * |
| 40 | + * ### API graph nodes |
| 41 | + * |
| 42 | + * There are two kinds of nodes in the API graphs, distinguished by who is "holding" the value: |
| 43 | + * - **Use-nodes** represent values held by the current codebase, which came from an external library. |
| 44 | + * (The current codebase is "using" a value that came from the library). |
| 45 | + * - **Def-nodes** represent values held by the external library, which came from this codebase. |
| 46 | + * (The current codebase "defines" the value seen by the library). |
| 47 | + * |
| 48 | + * API graph nodes are associated with data-flow nodes in the current codebase. |
| 49 | + * (Since external libraries are not part of the database, there is no way to associate with concrete |
| 50 | + * data-flow nodes from the external library). |
| 51 | + * - **Use-nodes** are associated with data-flow nodes where a value enters the current codebase, |
| 52 | + * such as the return value of a call to an external function. |
| 53 | + * - **Def-nodes** are associated with data-flow nodes where a value leaves the current codebase, |
| 54 | + * such as an argument passed in a call to an external function. |
| 55 | + * |
| 56 | + * |
| 57 | + * ### Access paths and edge labels |
| 58 | + * |
| 59 | + * Nodes in the API graph nodes are associated with a set of access paths, describing a series of operations |
| 60 | + * that may be performed to obtain that value. |
| 61 | + * |
| 62 | + * For example, the access path `API::moduleImport("lodash").getMember("extend")` represents the action of |
| 63 | + * importing `lodash` and then accessing the member `extend` on the resulting object. |
| 64 | + * It would be associated with an expression such as `require("lodash").extend`. |
| 65 | + * |
| 66 | + * Each edge in the graph is labelled by such an "operation". For an edge `A->B`, the type of the `A` node |
| 67 | + * determines who is performing the operation, and the type of the `B` node determines who ends up holding |
| 68 | + * the result: |
| 69 | + * - An edge starting from a use-node describes what the current codebase is doing to a value that |
| 70 | + * came from a library. |
| 71 | + * - An edge starting from a def-node describes what the external library might do to a value that |
| 72 | + * came from the current codebase. |
| 73 | + * - An edge ending in a use-node means the result ends up in the current codebase (at its associated data-flow node). |
| 74 | + * - An edge ending in a def-node means the result ends up in external code (its associated data-flow node is |
| 75 | + * the place where it was "last seen" in the current codebase before flowing out) |
| 76 | + * |
| 77 | + * Because the implementation of the external library is not visible, it is not known exactly what operations |
| 78 | + * it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would |
| 79 | + * lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform |
| 80 | + * those operations. (When constructing these edge, we assume the library is somewhat well-behaved). |
| 81 | + * |
| 82 | + * For example, given this snippet: |
| 83 | + * ```js |
| 84 | + * require('foo')(x => { doSomething(x) }) |
| 85 | + * ``` |
| 86 | + * A callback is passed to the external function `foo`. We can't know if `foo` will actually invoke this callback. |
| 87 | + * But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter. |
| 88 | + * For that reason, an edge is generated representing the argument-passing operation that might be performed by `foo`. |
| 89 | + * This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x`. |
| 90 | + * |
| 91 | + * ### Thinking in operations versus code patterns |
| 92 | + * |
| 93 | + * Treating edges as "operations" helps avoid a pitfall in which library models become overly specific to certain code patterns. |
| 94 | + * Consider the following two equivalent calls to `foo`: |
| 95 | + * ```js |
| 96 | + * const foo = require('foo'); |
| 97 | + * |
| 98 | + * foo({ |
| 99 | + * myMethod(x) {...} |
| 100 | + * }); |
| 101 | + * |
| 102 | + * foo({ |
| 103 | + * get myMethod() { |
| 104 | + * return function(x) {...} |
| 105 | + * } |
| 106 | + * }); |
| 107 | + * ``` |
| 108 | + * If `foo` calls `myMethod` on its first parameter, either of the `myMethod` implementations will be invoked. |
| 109 | + * An indeed, the access path `API::moduleImport("foo").getParameter(0).getMember("myMethod").getParameter(0)` correctly |
| 110 | + * identifies both `x` parameters. |
| 111 | + * |
| 112 | + * Observe how `getMember("myMethod")` behaves when the member is defined via a getter. When thinking in code patterns, |
| 113 | + * it might seem obvious that `getMember` should have obtained a reference the getter method itself. |
| 114 | + * But when seeing it as an access to `myMethod` performed by the library, we can deduce that the relevant expression |
| 115 | + * on the client side is actually the return-value of the getter. |
| 116 | + * |
| 117 | + * Although one may think of API graphs as a tool to find certain program elements in the codebase, |
| 118 | + * it can lead to some situations where intuition does not match what works best in practice. |
23 | 119 | */ |
24 | 120 | class Node extends Impl::TApiNode { |
25 | 121 | /** |
|
0 commit comments