Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 1af2e56

Browse files
committed
Summary of recent meeting.
Perhaps a not-python-specific version of this could go into the shared implementation.
1 parent 375da38 commit 1af2e56

1 file changed

Lines changed: 106 additions & 0 deletions

File tree

  • python/ql/src/semmle/code/python/dataflow/internal
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Using the shared dataflow library
2+
3+
## File organisation
4+
5+
The files currently live in `semmle/code/python` (whereas the exisitng implementation lives in `semmle/python/dataflow`).
6+
7+
In there is found `DataFlow.qll`, `DataFlow2.qll` etc. which refer to `internal\DataFlowImpl`, `internal\DataFlowImpl2` etc. respectively. The `DataFlowImplN`-files are all identical copies to avoid mutual recursion. They start off by including two files `internal\DataFlowImplCommon` and `internal\DataFlowImplSpecific`. The former contains all the language-agnostic definitions, while the latter is where we describe our favorite language. `Sepcific` simply forwards to two other files `internal/DataFlowPrivate.qll` and `internal/DataFlowPublic.qll`. Definitions in the former will be hidden behind a `private` modifier, while those in the latter can be referred to in data flow queries. For instance, the definition of `DataFlow::Node` should likely be in `DataFlowPublic.qll`.
8+
9+
## Define the dataflow graph
10+
11+
In order to use the dataflow library, we need to define the dataflow graph,
12+
that is define the nodes and the edges.
13+
14+
### Define the nodes
15+
16+
The nodes are defined in the type `DataFlow::Node` (found in `DataFlowPublic.qll`).
17+
This should likely be an IPA type, so we can extend it as needed.
18+
19+
Typical cases needed to construct the call graph include
20+
- argument node
21+
- parameter node
22+
- return node
23+
24+
Typical extensions include
25+
- postupdate nodes
26+
- implicit `this`-nodes
27+
28+
### Define the edges
29+
30+
The edges split into local flow (within a function) and global flow (the call graph, between functions/procedures).
31+
32+
Extra flow, such as reading from and writing to global variables, can be captured in `jumpStep`.
33+
The local flow should be obtainalble from an SSA computation.
34+
35+
The global flow should be obtainable from a `PointsTo` analysis. It is specified via `viableCallable` and
36+
`getAnOutNode`. Consider making `ReturnKind` a singleton IPA type as in java.
37+
38+
If complicated dispatch needs to be modelled, try using the `[reduced|pruned]viable*` predicates.
39+
40+
## Field flow
41+
42+
To track flow through fields we need to provide a model of fields, that is the `Content` class.
43+
44+
Field access is specified via `read_step` and `store_step`.
45+
46+
Work is being done to make field flow handle lists and dictionaries and the like.
47+
48+
`PostUpdateNode`s become important when field flow is used, as they track modifications to fields resulting from function calls.
49+
50+
## Type pruning
51+
52+
If type information is available, flows can be discarded on the grounds of type mismatch.
53+
54+
Tracked types are given by the class `DataFlowType` and the predicate `getTypeBound`, and compatibility is recorded in the predicate `compatibleTypes`.
55+
56+
Further, possible casts are given by the class `CastNode`.
57+
58+
---
59+
60+
# Plan
61+
62+
## Stage I, data flow
63+
64+
### Phase 0, setup
65+
Define minimal IPA type for `DataFlow::Node`
66+
Define all required predicates empty (via `none()`),
67+
except `compatibleTypes` which should be `any()`.
68+
Define `ReturnKind`, `DataFlowType`, and `Content` as singleton IPA types.
69+
70+
71+
### Phase 1, local flow
72+
Implement `simpleLocalFlowStep` based on the existing SSA computation
73+
74+
### Phase 2, local flow
75+
Implement `viableCallable` and `getAnOutNode` based on the existing predicate `PointsTo`.
76+
77+
### Phase 3, field flow
78+
Redefine `Content` and implement `read_step` and `store_step`.
79+
80+
Review use of post-update nodes.
81+
82+
### Phase 4, type pruning
83+
Use type trackers to obtain relevant type information and redefine `DataFlowType` to contain appropriate cases. Record the type information in `getTypeBound`.
84+
85+
Implement `compatibleTypes` (perhaps simply as the identity).
86+
87+
If necessary, re-implement `getErasedRepr` and `ppReprType`.
88+
89+
If necessary, redefine `CastNode`.
90+
91+
### Phase 5, bonus
92+
Review possible use of `[reduced|pruned]viable*` predicates.
93+
94+
Review need for more elaborate `ReturnKind`.
95+
96+
Review need for non-empty `jumpStep`.
97+
98+
Review need for non-empty `isUnreachableInCall`.
99+
100+
## Stage II, taint tracking
101+
102+
# Phase 0, setup
103+
Implement all predicates empty.
104+
105+
# Phase 1, experiments
106+
Try recovering an existing taint tracking query by implementing sources, sinks, sanitizers, and barriers.

0 commit comments

Comments
 (0)