Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a8944c8

Browse files
committed
model accept states more accurately by adding an AcceptAny state, modelling $, and checking the existence of rejecting suffixes
1 parent d9ebb7b commit a8944c8

3 files changed

Lines changed: 121 additions & 60 deletions

File tree

javascript/ql/src/Performance/ReDoS.ql

Lines changed: 115 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ import semmle.javascript.security.performance.SuperlinearBackTracking
4646
*
4747
* This is what the query does. It makes a simple attempt to construct a
4848
* prefix `v` leading into `q`, but only to improve the alert message.
49-
* And the query only weakly attempts to construct a suffix that ensures
50-
* rejection; this causes some false positives.
49+
* And the query tries to prove the existence of a suffix that ensures
50+
* rejection. This check might fail, which can cause false positives.
5151
*
5252
* Finally, sometimes it depends on the translation whether the NFA generated
5353
* for a regular expression has a pumpable fork or not. We implement one
@@ -59,7 +59,9 @@ import semmle.javascript.security.performance.SuperlinearBackTracking
5959
*
6060
* * Every sub-term `t` gives rise to an NFA state `Match(t,i)`, representing
6161
* the state of the automaton before attempting to match the `i`th character in `t`.
62-
* * There is one additional accepting state `Accept(r)`.
62+
* * There is one accepting state `Accept(r)`.
63+
* * There is a special `AcceptAnySuffix(r)` state, which accepts any suffix string
64+
* by using an epsilon transition to `Accept(r)` and an any transition to itself.
6365
* * Transitions between states may be labelled with epsilon, or an abstract
6466
* input symbol.
6567
* * Each abstract input symbol represents a set of concrete input characters:
@@ -73,13 +75,8 @@ import semmle.javascript.security.performance.SuperlinearBackTracking
7375
* * Once a trace of pairs of abstract input symbols that leads from a fork
7476
* back to itself has been identified, we attempt to construct a concrete
7577
* string corresponding to it, which may fail.
76-
* * Instead of trying to construct a suffix that makes the automaton fail,
77-
* we ensure that repeating `n` copies of `w` does not reach a state that is
78-
* an epsilon transition from the accepting state.
79-
* This assumes that the accepting state accepts any suffix.
80-
* Regular expressions - where the end anchor `$` is used - have an accepting state
81-
* that does not accept all suffixes. Such regular expression not accurately
82-
* modelled by this assumption, which can cause false negatives.
78+
* * Lastly we ensure that any state reached by repeating `n` copies of `w` has
79+
* a suffix `x` (possible empty) that is __not__ accepted.
8380
*/
8481

8582
/**
@@ -457,26 +454,35 @@ newtype TState =
457454
exists(t.(RegexpCharacterConstant).getValue().charAt(i))
458455
)
459456
} or
460-
Accept(RegExpRoot l) { l.isRelevant() }
457+
Accept(RegExpRoot l) { l.isRelevant() } or
458+
AcceptAnySuffix(RegExpRoot l) { l.isRelevant() }
461459

462460
/**
463461
* A state in the NFA corresponding to a regular expression.
464462
*
465463
* Each regular expression literal `l` has one accepting state
466-
* `Accept(l)` and a state `Match(t, i)` for every subterm `t`,
464+
* `Accept(l)`, one state that accepts all suffixes `AcceptAnySuffix(l)`,
465+
* and a state `Match(t, i)` for every subterm `t`,
467466
* which represents the state of the NFA before starting to
468467
* match `t`, or the `i`th character in `t` if `t` is a constant.
469468
*/
470469
class State extends TState {
471470
RegExpTerm repr;
472471

473-
State() { this = Match(repr, _) or this = Accept(repr) }
472+
State() {
473+
this = Match(repr, _) or
474+
this = Accept(repr) or
475+
this = AcceptAnySuffix(repr)
476+
}
474477

475478
string toString() {
476479
exists(int i | this = Match(repr, i) | result = "Match(" + repr + "," + i + ")")
477480
or
478481
this instanceof Accept and
479482
result = "Accept(" + repr + ")"
483+
or
484+
this instanceof AcceptAnySuffix and
485+
result = "AcceptAny(" + repr + ")"
480486
}
481487

482488
Location getLocation() { result = repr.getLocation() }
@@ -524,7 +530,7 @@ State after(RegExpTerm t) {
524530
or
525531
exists(RegExpOpt opt | t = opt.getAChild() | result = after(opt))
526532
or
527-
exists(RegExpRoot root | t = root | result = Accept(root))
533+
exists(RegExpRoot root | t = root | result = AcceptAnySuffix(root))
528534
}
529535

530536
/**
@@ -579,6 +585,16 @@ predicate delta(State q1, EdgeLabel lbl, State q2) {
579585
or
580586
q1 = before(opt) and q2 = after(opt)
581587
)
588+
or
589+
exists(RegExpRoot root | q1 = AcceptAnySuffix(root) |
590+
lbl = Any() and q2 = q1
591+
or
592+
lbl = Epsilon() and q2 = Accept(root)
593+
)
594+
or
595+
exists(RegExpDollar dollar | q1 = before(dollar) |
596+
lbl = Epsilon() and q2 = Accept(getRoot(dollar))
597+
)
582598
}
583599

584600
/**
@@ -959,34 +975,98 @@ module PrefixConstruction {
959975
}
960976

961977
/**
962-
* Gets a state that can be reached from pumpable `fork` consuming all
963-
* chars in `w` any number of times followed by the first `i+1` characters of `w`.
978+
* Predicates for testing the presence of a rejecting suffix.
979+
*
980+
* These predicates are used to ensure that the all states reached from the fork
981+
* by repeating `w` have a rejecting suffix.
982+
*
983+
* For example, a regexp like `/^(a+)+/` will accept any string as long the prefix is
984+
* some number of `"a"`s, and it is therefore not possible to construct a rejecting suffix.
964985
*
965-
* This predicate is used to ensure that the accepting state is not reached from the fork by repeating `w`.
966-
* This works under the assumption that any accepting state accepts all suffixes.
967-
* For example, a regexp like `/^(a+)+/` will accept any string as long the prefix is some number of `"a"`s,
968-
* and it is therefore not possible to construct a rejected suffix.
969-
* This assumption breaks on regular expression that use the anchor `$`, e.g: `/^(a+)+$/`, and such regular
970-
* expression are not accurately modeled by this query.
986+
* A regexp like `/(a+)+$/` or `/(a+)+b/` trivially has a rejecting suffix,
987+
* as the suffix "X" will cause both the regular expressions to be rejected.
971988
*
972989
* The string `w` is repeated any number of times because it needs to be
973990
* infinitely repeatedable for the attack to work.
974-
* For a regular expression `/((ab)+)*abab/` the accepting state is not reachable from the fork
975-
* using epsilon transitions. But any attempt at repeating `w` will end in the accepting state.
976-
* This also relies on the assumption that any accepting state will accept all suffixes.
991+
* For the regular expression `/((ab)+)*abab/` the accepting state is not reachable from the fork
992+
* using epsilon transitions. But any attempt at repeating `w` will end in a state that accepts all suffixes.
977993
*/
978-
State process(State fork, string w, int i) {
979-
isPumpable(fork, w) and
980-
exists(State prev |
981-
i = 0 and prev = fork
994+
module SuffixConstruction {
995+
/**
996+
* Holds if all states reachable from `fork` by repeating `w`
997+
* are rejectable by appending some suffix.
998+
*/
999+
predicate reachesOnlyRejectableSuffixes(State fork, string w) {
1000+
isPumpable(fork, w) and
1001+
forex(State next | next = process(fork, w, w.length() - 1) | isDefinitelyRejectable(next))
1002+
}
1003+
1004+
/**
1005+
* Holds if there definitely exists a path starting from `s` that leads to the regular expression being rejected.
1006+
*/
1007+
private predicate isDefinitelyRejectable(State s) {
1008+
// exists a reject edge with some char.
1009+
hasRejectEdge(s, _)
9821010
or
983-
prev = process(fork, w, i - 1)
1011+
// all edges (at least one) with some char leads to another state that is rejectable.
1012+
exists(string char | char = relevant() |
1013+
forex(State next | deltaClosed(s, getAnInputSymbolMatching(char), next) |
1014+
isDefinitelyRejectable(next)
1015+
)
1016+
)
9841017
or
985-
// repeat until fixpoint
986-
i = 0 and
987-
prev = process(fork, w, w.length() - 1)
988-
|
989-
deltaClosed(prev, getAnInputSymbolMatching(w.charAt(i)), result)
1018+
// stopping here is rejection
1019+
not epsilonSucc*(s) = Accept(_)
1020+
}
1021+
1022+
/**
1023+
* Gets a char used for finding possible suffixes.
1024+
*/
1025+
private string relevant() { result = CharacterClasses::getARelevantChar() }
1026+
1027+
/**
1028+
* Holds if there is no edge from `s` labeled `char` in our NFA.
1029+
* The NFA does not model reject states, so the above is the same as saying there is a reject edge.
1030+
*/
1031+
private predicate hasRejectEdge(State s, string char) {
1032+
char = relevant() and
1033+
not deltaClosed(s, getAnInputSymbolMatching(char), _)
1034+
}
1035+
1036+
/**
1037+
* Gets a state that can be reached from pumpable `fork` consuming all
1038+
* chars in `w` any number of times followed by the first `i+1` characters of `w`.
1039+
*/
1040+
private State process(State fork, string w, int i) {
1041+
isPumpable(fork, w) and
1042+
exists(State prev |
1043+
i = 0 and prev = fork
1044+
or
1045+
prev = process(fork, w, i - 1)
1046+
or
1047+
// repeat until fixpoint
1048+
i = 0 and
1049+
prev = process(fork, w, w.length() - 1)
1050+
|
1051+
deltaClosed(prev, getAnInputSymbolMatching(w.charAt(i)), result)
1052+
)
1053+
}
1054+
}
1055+
1056+
/**
1057+
* Holds if `term` may cause exponential backtracking on strings containing many repetitions of `witness`.
1058+
*/
1059+
predicate isReDoSAttackable(RegExpTerm term, string witness, State s) {
1060+
exists(int i, string c | s = Match(term, i) |
1061+
c =
1062+
min(string w |
1063+
isPumpable(s, w) and
1064+
not isPumpable(epsilonSucc+(s), _) and
1065+
SuffixConstruction::reachesOnlyRejectableSuffixes(s, w)
1066+
|
1067+
w order by w.length(), w
1068+
) and
1069+
witness = escape(rotate(c, i))
9901070
)
9911071
}
9921072

@@ -1015,23 +1095,6 @@ string rotate(string str, int i) {
10151095
result = str.suffix(str.length() - i) + str.prefix(str.length() - i)
10161096
}
10171097

1018-
/**
1019-
* Holds if `term` may cause exponential backtracking on strings containing many repetitions of `witness`.
1020-
*/
1021-
predicate isReDoSAttackable(RegExpTerm term, string witness, State s) {
1022-
exists(int i, string c | s = Match(term, i) |
1023-
c =
1024-
min(string w |
1025-
isPumpable(s, w) and
1026-
not isPumpable(epsilonSucc+(s), _) and
1027-
not epsilonSucc*(process(s, w, _)) = Accept(_)
1028-
|
1029-
w order by w.length(), w
1030-
) and
1031-
witness = escape(rotate(c, i))
1032-
)
1033-
}
1034-
10351098
from RegExpTerm t, string witness, State s, string prefixMsg
10361099
where
10371100
isReDoSAttackable(t, witness, s) and

javascript/ql/test/query-tests/Performance/ReDoS/ReDoS.expected

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,8 @@
5757
| tst.js:31:54:31:55 | .* | This part of the regular expression may cause exponential backtracking on strings starting with '!\|\\n-\|\\n' and containing many repetitions of '\|\|\\n'. |
5858
| tst.js:36:23:36:32 | (\\\\\\/\|.)*? | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\\\/'. |
5959
| tst.js:41:27:41:28 | .* | This part of the regular expression may cause exponential backtracking on strings starting with '#' and containing many repetitions of '#'. |
60-
| tst.js:47:25:47:27 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with '"' and containing many repetitions of '""'. |
61-
| tst.js:47:31:47:33 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with ''' and containing many repetitions of ''''. |
60+
| tst.js:47:31:47:33 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with '"' and containing many repetitions of '""'. |
61+
| tst.js:47:37:47:39 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with ''' and containing many repetitions of ''''. |
6262
| tst.js:52:37:52:39 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with '$[' and containing many repetitions of ']['. |
6363
| tst.js:52:70:52:72 | .*? | This part of the regular expression may cause exponential backtracking on strings starting with '$.$[' and containing many repetitions of ']['. |
6464
| tst.js:58:15:58:20 | [a-z]+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
@@ -93,7 +93,6 @@
9393
| tst.js:167:15:167:27 | (1s\|[\\da-z])* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '1s'. |
9494
| tst.js:170:15:170:23 | (0\|[\\d])* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '0'. |
9595
| tst.js:173:16:173:20 | [\\d]+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '0'. |
96-
| tst.js:182:17:182:21 | [^>]+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '='. |
9796
| tst.js:185:16:185:21 | [^>a]+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '='. |
9897
| tst.js:188:17:188:19 | \\s* | This part of the regular expression may cause exponential backtracking on strings starting with '\\n' and containing many repetitions of '\\n'. |
9998
| tst.js:191:18:191:20 | \\s+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of ' '. |
@@ -117,7 +116,6 @@
117116
| tst.js:275:38:275:40 | \\s* | This part of the regular expression may cause exponential backtracking on strings starting with '<a a=' and containing many repetitions of '"" a='. |
118117
| tst.js:281:16:281:17 | a+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
119118
| tst.js:284:16:284:17 | a+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
120-
| tst.js:287:16:287:17 | a+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
121119
| tst.js:290:16:290:17 | a+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
122120
| tst.js:293:17:293:18 | a+ | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
123121
| tst.js:299:90:299:91 | e+ | This part of the regular expression may cause exponential backtracking on strings starting with '00000000000000' and containing many repetitions of 'e'. |

javascript/ql/test/query-tests/Performance/ReDoS/tst.js

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,8 @@ var bad6 = /^([\s\[\{\(]|#.*)*$/;
4343
// GOOD
4444
var good4 = /(\r\n|\r|\n)+/;
4545

46-
// GOOD because it cannot be made to fail after the loop (but we can't tell that)
47-
var good5 = /((?:[^"']|".*?"|'.*?')*?)([(,)]|$)/;
46+
// BAD - PoC: `node -e "/((?:[^\"\']|\".*?\"|\'.*?\')*?)([(,)]|$)/.test(\"'''''''''''''''''''''''''''''''''''''''''''''\\\"\");"`. It's complicated though, because the regexp still matches something, it just matches the empty-string after the attack string.
47+
var actuallyBad = /((?:[^"']|".*?"|'.*?')*?)([(,)]|$)/;
4848

4949
// NOT GOOD; attack: "a" + "[]".repeat(100) + ".b\n"
5050
// Adapted from Knockout (https://github.com/knockout/knockout), which is
@@ -178,7 +178,7 @@ var good12 = /(\d+(X\d+)?)+/;
178178
// GOOD - there is no witness in the end that could cause the regexp to not match
179179
var good13 = /([0-9]+(X[0-9]*)?)*/;
180180

181-
// GOOD - but still flagged (always matches something)
181+
// GOOD
182182
var good15 = /^([^>]+)*(>|$)/;
183183

184184
// NOT GOOD
@@ -283,7 +283,7 @@ var good31 = /(a+)*[^]{2,3}/;
283283
// GOOD - but we don't find that no suffix is rejected
284284
var good32 = /(a+)*([^]{2,}|X)$/;
285285

286-
// GOOD - but still flagged
286+
// GOOD
287287
var good33 = /(a+)*([^]*|X)$/;
288288

289289
// NOT GOOD

0 commit comments

Comments
 (0)