Use different regexes in KeywordDetector to improve accuracy#86
Conversation
Turned on Keyword detector by default Down-graded version to '0.1.666' to test it on a few repos without causing havoc
|
Just stating for memories sake, that we had a discussion that I'll paraphrase here, I'll maybe make a separate issue: Currently, the assumption is that "no secret is repeated within a single file" however, with this plugin it is more likely that this assumption is gonna break e.g. if you have 2 different assignments of We can either (a) incorporate line numbers in , or (b) put a count of how many secrets are in the file, and put the count in baseline.The downsides of doing nothing, and merging the PR as is:
We agree changing that assumption is a larger task than what's at hand. |
Filter out $variables for PHP files Filter out `(|[` followed by `)|]` Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES
After merging in master
Trim uncovered code Change tox to ensure tests are covered 100%
Removed `token` as a keyword Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword
| """Generates raw secrets by re-scanning the line, with the specified plugin""" | ||
| for raw_secret in plugin.secret_generator(secret_line): | ||
| yield raw_secret | ||
| if isinstance(plugin, KeywordDetector): |
There was a problem hiding this comment.
I felt :/ about writing it like this, but didn't see a better way.
Just wanted to put neon lights on it for the review 😅
In keyword_test.py
Made quotes required in Python files/added regexes for this Added a Filetype Enum and `determine_file_type` function Replaced 'pass' with 'db_pass' in BLACKLIST Added 'aws_secret_access_key' to BLACKLIST Added some trailing char cases to FALSE_POSITIVES :boom: Changed secret_type to 'Secret Keyword'
ea99830 to
e01d818
Compare
ce5862b to
ec1e0cd
Compare
By adding an optional `((\'|")])?` to the regexes This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`
ec1e0cd to
b7e48ab
Compare
76ddcdf to
a37a9c9
Compare
calvinli
left a comment
There was a problem hiding this comment.
LGTM based on internal testing. There are some remaining false positives but it should be okay.
Added Javascript specific false-positive checks
Added ${ before } heuristic for e.g. ${link}
Added more false-positives to FALSE_POSITIVES
:zap: keyword_test.py
Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness
7dd4926 to
a29108b
Compare
* Base on minimal vs python39 UBI * use tested version - python38 Co-authored-by: Timothy Figgins <[email protected]>
No description provided.