Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 4218472

Browse files
committed
Ruby: first draft of data flow docs
1 parent 5e28e5a commit 4218472

File tree

2 files changed

+392
-0
lines changed

2 files changed

+392
-0
lines changed
Lines changed: 390 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,390 @@
1+
.. _analyzing-data-flow-in-ruby:
2+
3+
Analyzing data flow in Ruby
4+
=============================
5+
6+
You can use CodeQL to track the flow of data through a Ruby program to places where the data is used.
7+
8+
About this article
9+
------------------
10+
11+
This article describes how data flow analysis is implemented in the CodeQL libraries for Ruby and includes examples to help you write your own data flow queries.
12+
The following sections describe how to use the libraries for local data flow, global data flow, and taint tracking.
13+
For a more general introduction to modeling data flow, see ":ref:`About data flow analysis <about-data-flow-analysis>`."
14+
15+
Local data flow
16+
---------------
17+
18+
Local data flow is data flow within a single method or callable. Local data flow is easier, faster, and more precise than global data flow, and is sufficient for many queries.
19+
20+
Using local data flow
21+
~~~~~~~~~~~~~~~~~~~~~
22+
23+
The local data flow library is in the module ``DataFlow`` and it defines the class ``Node``, representing any element through which data can flow.
24+
``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``).
25+
You can map between a data flow ``ParameterNode`` and its corresponding ``Parameter`` AST node using the ``asParameter`` member predicate.
26+
Meanwhile, the ``asExpr`` member predicate maps between a data flow ``ExprNode`` and its corresponding ``ExprCfgNode`` in the control-flow library.
27+
28+
.. code-block:: ql
29+
30+
class Node {
31+
/** Gets the expression corresponding to this node, if any. */
32+
CfgNodes::ExprCfgNode asExpr() { ... }
33+
34+
/** Gets the parameter corresponding to this node, if any. */
35+
Parameter asParameter() { ... }
36+
37+
...
38+
}
39+
40+
You can also use the predicates ``exprNode`` and ``parameterNode``:
41+
42+
.. code-block:: ql
43+
44+
/**
45+
* Gets a node corresponding to expression `e`.
46+
*/
47+
ExprNode exprNode(CfgNodes::ExprCfgNode e) { ... }
48+
49+
/**
50+
* Gets the node corresponding to the value of parameter `p` at function entry.
51+
*/
52+
ParameterNode parameterNode(Parameter p) { ... }
53+
54+
Note that since ``asExpr`` and ``exprNode`` map between data-flow and control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node,
55+
e.g. by writing ``node.asExpr().getExpr()``.
56+
Due to the control-flow graph being split, there can be multiple data-flow and control-flow nodes associated with a single expression AST node.
57+
58+
The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``.
59+
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``.
60+
61+
For example, you can find flow from an expression ``source`` to an expression ``sink`` in zero or more local steps:
62+
63+
.. code-block:: ql
64+
65+
DataFlow::localFlow(source, sink)
66+
67+
Using local taint tracking
68+
~~~~~~~~~~~~~~~~~~~~~~~~~~
69+
70+
Local taint tracking extends local data flow by including non-value-preserving flow steps.
71+
For example:
72+
73+
.. code-block:: ruby
74+
75+
temp = x
76+
y = temp + ", " + temp
77+
78+
If ``x`` is a tainted string then ``y`` is also tainted.
79+
80+
The local taint tracking library is in the module ``TaintTracking``.
81+
Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``.
82+
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``.
83+
84+
For example, you can find taint propagation from an expression ``source`` to an expression ``sink`` in zero or more local steps:
85+
86+
.. code-block:: ql
87+
88+
TaintTracking::localTaint(source, sink)
89+
90+
91+
Using local sources
92+
~~~~~~~~~~~~~~~~~~~
93+
94+
When asking for local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to a certain investigation.
95+
The next section will give some concrete examples, but there is a more abstract concept that we should call out explicitly, namely that of a local source.
96+
97+
A local source is a data-flow node with no local data flow into it.
98+
As such, it is a local origin of data flow, a place where a new value is created.
99+
This includes parameters (which only receive global data flow) and most expressions (because they are not value-preserving).
100+
Restricting attention to such local sources gives a much lighter and more performant data-flow graph and in most cases also a more suitable abstraction for the investigation of interest.
101+
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources.
102+
It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``.
103+
104+
Examples
105+
~~~~~~~~
106+
107+
This query finds the filename argument passed in each call to ``File.open``:
108+
109+
.. code-block:: ql
110+
111+
import codeql.ruby.DataFlow
112+
import codeql.ruby.ApiGraphs
113+
114+
from DataFlow::CallNode call
115+
where call = API::getTopLevelMember("File").getAMethodCall("open")
116+
select call.getArgument(0)
117+
118+
Notice the use of the ``API`` module for referring to library methods.
119+
For more information, see ":doc:`Using API graphs in Ruby <using-api-graphs-in-ruby>`."
120+
121+
Unfortunately this will only give the expression in the argument, not the values which could be passed to it.
122+
So we use local data flow to find all expressions that flow into the argument:
123+
124+
.. code-block:: ql
125+
126+
import codeql.ruby.DataFlow
127+
import codeql.ruby.ApiGraphs
128+
129+
from DataFlow::CallNode call, DataFlow::ExprNode expr
130+
where
131+
call = API::getTopLevelMember("File").getAMethodCall("open") and
132+
DataFlow::localFlow(expr, call.getArgument(0))
133+
select call, expr
134+
135+
Many expressions flow to the same call.
136+
If you run this query, you may notice that you get several data-flow nodes for an expression as it flows towards a call (notice repeated locations in the ``call`` column).
137+
We are mostly interested in the "first" of these, what might be called the local source for the file name.
138+
To restrict attention to such local sources, and to simultaneously make the analysis more performant, we have the QL class ``LocalSourceNode``.
139+
We could demand that ``expr`` is such a node:
140+
141+
.. code-block:: ql
142+
143+
import codeql.ruby.DataFlow
144+
import codeql.ruby.ApiGraphs
145+
146+
from DataFlow::CallNode call, DataFlow::ExprNode expr
147+
where
148+
call = API::getTopLevelMember("File").getAMethodCall("open") and
149+
DataFlow::localFlow(expr, call.getArgument(0)) and
150+
expr instanceof DataFlow::LocalSourceNode
151+
select call, expr
152+
153+
However, we could also enforce this by casting.
154+
That would allow us to use the member predicate ``flowsTo`` on ``LocalSourceNode`` like so:
155+
156+
.. code-block:: ql
157+
158+
import codeql.ruby.DataFlow
159+
import codeql.ruby.ApiGraphs
160+
161+
from DataFlow::CallNode call, DataFlow::ExprNode expr
162+
where
163+
call = API::getTopLevelMember("File").getAMethodCall("open") and
164+
expr.(DataFlow::LocalSourceNode).flowsTo(call.getArgument(0))
165+
select call, expr
166+
167+
As an alternative, we can ask more directly that ``expr`` is a local source of the first argument, via the predicate ``getALocalSource``:
168+
169+
.. code-block:: ql
170+
171+
import codeql.ruby.DataFlow
172+
import codeql.ruby.ApiGraphs
173+
174+
from DataFlow::CallNode call, DataFlow::ExprNode expr
175+
where
176+
call = API::getTopLevelMember("File").getAMethodCall("open") and
177+
expr = call.getArgument(0).getALocalSource()
178+
select call, expr
179+
180+
All these three queries give identical results.
181+
We now mostly have one expression per call.
182+
183+
We may still have cases of more than one expression flowing to a call, but then they flow through different code paths (possibly due to control-flow splitting).
184+
185+
We might want to make the source more specific, for example a parameter to a method or block.
186+
This query finds instances where a parameter is used as the name when opening a file:
187+
188+
.. code-block:: ql
189+
190+
import codeql.ruby.DataFlow
191+
import codeql.ruby.ApiGraphs
192+
193+
from DataFlow::CallNode call, DataFlow::ParameterNode p
194+
where
195+
call = API::getTopLevelMember("File").getAMethodCall("open") and
196+
DataFlow::localFlow(p, call.getArgument(0))
197+
select call, p
198+
199+
Using the exact name supplied via the parameter may be too strict.
200+
If we want to know if the parameter influences the file name, we can use taint tracking instead of data flow.
201+
This query finds calls to ``File.open`` where the filename is derived from a parameter:
202+
203+
.. code-block:: ql
204+
205+
import codeql.ruby.DataFlow
206+
import codeql.ruby.TaintTracking
207+
import codeql.ruby.ApiGraphs
208+
209+
from DataFlow::CallNode call, DataFlow::ParameterNode p
210+
where
211+
call = API::getTopLevelMember("File").getAMethodCall("open") and
212+
TaintTracking::localTaint(p, call.getArgument(0))
213+
select call, p
214+
215+
Global data flow
216+
----------------
217+
218+
Global data flow tracks data flow throughout the entire program, and is therefore more powerful than local data flow.
219+
However, global data flow is less precise than local data flow, and the analysis typically requires significantly more time and memory to perform.
220+
221+
.. pull-quote:: Note
222+
223+
.. include:: ../reusables/path-problem.rst
224+
225+
Using global data flow
226+
~~~~~~~~~~~~~~~~~~~~~~
227+
228+
The global data flow library is used by extending the class ``DataFlow::Configuration``:
229+
230+
.. code-block:: ql
231+
232+
import codeql.ruby.DataFlow
233+
234+
class MyDataFlowConfiguration extends DataFlow::Configuration {
235+
MyDataFlowConfiguration() { this = "..." }
236+
237+
override predicate isSource(DataFlow::Node source) {
238+
...
239+
}
240+
241+
override predicate isSink(DataFlow::Node sink) {
242+
...
243+
}
244+
}
245+
246+
These predicates are defined in the configuration:
247+
248+
- ``isSource`` - defines where data may flow from.
249+
- ``isSink`` - defines where data may flow to.
250+
- ``isBarrier`` - optionally, restricts the data flow.
251+
- ``isAdditionalFlowStep`` - optionally, adds additional flow steps.
252+
253+
The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name (for instance the class name).
254+
255+
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
256+
257+
.. code-block:: ql
258+
259+
from MyDataFlowConfiguation dataflow, DataFlow::Node source, DataFlow::Node sink
260+
where dataflow.hasFlow(source, sink)
261+
select source, "Dataflow to $@.", sink, sink.toString()
262+
263+
Using global taint tracking
264+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
265+
266+
Global taint tracking is to global data flow what local taint tracking is to local data flow.
267+
That is, global taint tracking extends global data flow with additional non-value-preserving steps.
268+
The global taint tracking library is used by extending the class ``TaintTracking::Configuration``:
269+
270+
.. code-block:: ql
271+
272+
import codeql.ruby.DataFlow
273+
import codeql.ruby.TaintTracking
274+
275+
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
276+
MyTaintTrackingConfiguration() { this = "..." }
277+
278+
override predicate isSource(DataFlow::Node source) {
279+
...
280+
}
281+
282+
override predicate isSink(DataFlow::Node sink) {
283+
...
284+
}
285+
}
286+
287+
These predicates are defined in the configuration:
288+
289+
- ``isSource`` - defines where taint may flow from.
290+
- ``isSink`` - defines where taint may flow to.
291+
- ``isSanitizer`` - optionally, restricts the taint flow.
292+
- ``isAdditionalTaintStep`` - optionally, adds additional taint steps.
293+
294+
Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
295+
296+
Predefined sources and sinks
297+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
298+
299+
The data flow library contains a number of predefined sources and sinks, providing a good starting point for defining data flow based security queries.
300+
301+
- The class ``RemoteFlowSource`` (defined in module ``codeql.ruby.dataflow.RemoteFlowSources``) represents data flow from remote network inputs. This is useful for finding security problems in networked services.
302+
- The library ``Concepts`` (defined in module ``codeql.ruby.Concepts``) contains several subclasses of ``DataFlow::Node`` that are security relevant, such as ``FileSystemAccess`` and ``SqlExecution``.
303+
304+
For global flow, it is also useful to restrict sources to instances of ``LocalSourceNode``.
305+
The predefined sources generally do that.
306+
307+
Class hierarchy
308+
~~~~~~~~~~~~~~~
309+
310+
- ``DataFlow::Configuration`` - base class for custom global data flow analysis.
311+
- ``DataFlow::Node`` - an element behaving as a data-flow node.
312+
313+
- ``DataFlow::CfgNode`` - a control-flow node behaving as a data-flow node.
314+
315+
- ``DataFlow::ExprNode`` - an expression behaving as a data-flow node.
316+
- ``DataFlow::ParameterNode`` - a parameter data-flow node representing the value of a parameter at method/block entry.
317+
318+
- ``RemoteFlowSource`` - data flow from network/remote input.
319+
- ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process.
320+
- ``Concepts::FileSystemAccess`` - a data-flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on.
321+
- ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths.
322+
- ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code.
323+
- ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements.
324+
- ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server.
325+
- ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates an HTTP response on a server.
326+
327+
- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis.
328+
329+
Examples
330+
~~~~~~~~
331+
332+
This query shows a data flow configuration that uses all network input as data sources:
333+
334+
.. code-block:: ql
335+
336+
import codeql.ruby.DataFlow
337+
import codeql.ruby.TaintTracking
338+
import codeql.ruby.Concepts
339+
import codeql.ruby.dataflow.RemoteFlowSources
340+
341+
class RemoteToFileConfiguration extends TaintTracking::Configuration {
342+
RemoteToFileConfiguration() { this = "RemoteToFileConfiguration" }
343+
344+
override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }
345+
346+
override predicate isSink(DataFlow::Node sink) {
347+
sink = any(FileSystemAccess fa).getAPathArgument()
348+
}
349+
}
350+
351+
from DataFlow::Node input, DataFlow::Node fileAccess, RemoteToFileConfiguration config
352+
where config.hasFlow(input, fileAccess)
353+
select fileAccess, "This file access uses data from $@.", input, "user-controllable input."
354+
355+
This data flow configuration tracks data flow from environment variables to opening files:
356+
357+
.. code-block:: ql
358+
359+
import codeql.ruby.DataFlow
360+
import codeql.ruby.controlflow.CfgNodes
361+
import codeql.ruby.ApiGraphs
362+
363+
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
364+
EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }
365+
366+
override predicate isSource(DataFlow::Node source) {
367+
exists(ExprNodes::ConstantReadAccessCfgNode env |
368+
env.getExpr().getName() = "ENV" and
369+
env = source.asExpr().(ExprNodes::ElementReferenceCfgNode).getReceiver()
370+
)
371+
}
372+
373+
override predicate isSink(DataFlow::Node sink) {
374+
sink = API::getTopLevelMember("File").getAMethodCall("open").getArgument(0)
375+
}
376+
}
377+
378+
from EnvironmentToFileConfiguration config, DataFlow::Node environment, DataFlow::Node fileOpen
379+
where config.hasFlow(environment, fileOpen)
380+
select fileOpen, "This call to 'File.open' uses data from $@.", environment,
381+
"an environment variable"
382+
383+
Further reading
384+
---------------
385+
386+
- ":ref:`Exploring data flow with path queries <exploring-data-flow-with-path-queries>`"
387+
388+
389+
.. include:: ../reusables/ruby-further-reading.rst
390+
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

docs/codeql/codeql-language-guides/codeql-for-ruby.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,6 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
1515

1616
- :doc:`CodeQL library for Ruby <codeql-library-for-ruby>`: When you're analyzing a Ruby program, you can make use of the large collection of classes in the CodeQL library for Ruby.
1717

18+
- :doc:`Analyzing data flow in Ruby <analyzing-data-flow-in-ruby>`: You can use CodeQL to track the flow of data through a Ruby program to places where the data is used.
19+
1820
.. include:: ../reusables/ruby-beta-note.rst

0 commit comments

Comments
 (0)