Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Use different regexes in KeywordDetector to improve accuracy#86

Merged
KevinHock merged 19 commits into
masterfrom
upgrade_keyword_detector
Jan 3, 2019
Merged

Use different regexes in KeywordDetector to improve accuracy#86
KevinHock merged 19 commits into
masterfrom
upgrade_keyword_detector

Conversation

@KevinHock
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread tests/plugins/keyword_test.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread detect_secrets/plugins/keyword.py Outdated
Comment thread tests/plugins/keyword_test.py Outdated
Copy link
Copy Markdown
Collaborator Author

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit

Comment thread tests/plugins/keyword_test.py Outdated
@KevinHock
Copy link
Copy Markdown
Collaborator Author

Just stating for memories sake, that we had a discussion that I'll paraphrase here, I'll maybe make a separate issue:

Currently, the assumption is that "no secret is repeated within a single file" however, with this plugin it is more likely that this assumption is gonna break e.g. if you have 2 different assignments of password = "ehehehwew" in the same file, the hash of ehehehwew will be in 1 place in the baseline, but 2 places in the file. This is currently true for e.g. high-entropy secrets right now, for instance.

We can either (a) incorporate line numbers in

self.fields_to_compare = ['filename', 'secret_hash', 'type']
, or (b) put a count of how many secrets are in the file, and put the count in baseline.

The downsides of doing nothing, and merging the PR as is:

  1. It looks unintelligent to users, as in "This tool doesn't even detect the same secret on the next line."
  2. When removing a secret, you suddenly get alerted to a new one that we did not complain about before.
  3. Less concerning: Adding a new secret can be missed, if it is already in that file.
  4. Less concerning: The audit command does not show the secret more than once.

We agree changing that assumption is a larger task than what's at hand.

Filter out $variables for PHP files
Filter out `(|[` followed by `)|]`
Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES
After merging in master
Trim uncovered code
Change tox to ensure tests are covered 100%
Removed `token` as a keyword
Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword
"""Generates raw secrets by re-scanning the line, with the specified plugin"""
for raw_secret in plugin.secret_generator(secret_line):
yield raw_secret
if isinstance(plugin, KeywordDetector):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt :/ about writing it like this, but didn't see a better way.

Just wanted to put neon lights on it for the review 😅

Comment thread tests/plugins/keyword_test.py Outdated
Made quotes required in Python files/added regexes for this
Added a Filetype Enum and `determine_file_type` function

Replaced 'pass' with 'db_pass' in BLACKLIST
Added 'aws_secret_access_key' to BLACKLIST
Added some trailing char cases to FALSE_POSITIVES

:boom: Changed secret_type to 'Secret Keyword'
@KevinHock KevinHock force-pushed the upgrade_keyword_detector branch from ea99830 to e01d818 Compare December 28, 2018 22:43
@KevinHock KevinHock force-pushed the upgrade_keyword_detector branch from ce5862b to ec1e0cd Compare December 28, 2018 23:48
By adding an optional `((\'|")])?` to the regexes
This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`
@KevinHock KevinHock force-pushed the upgrade_keyword_detector branch from 76ddcdf to a37a9c9 Compare December 29, 2018 00:05
Comment thread detect_secrets/plugins/keyword.py
@calvinli calvinli self-requested a review January 3, 2019 01:18
Copy link
Copy Markdown
Member

@calvinli calvinli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on internal testing. There are some remaining false positives but it should be okay.

Comment thread detect_secrets/plugins/keyword.py Outdated
Added Javascript specific false-positive checks
Added ${ before } heuristic for  e.g. ${link}
Added more false-positives to FALSE_POSITIVES

:zap: keyword_test.py
Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness
@KevinHock KevinHock force-pushed the upgrade_keyword_detector branch from 7dd4926 to a29108b Compare January 3, 2019 23:03
@KevinHock KevinHock merged commit 164b7eb into master Jan 3, 2019
@KevinHock KevinHock deleted the upgrade_keyword_detector branch March 21, 2019 22:04
jfagoagas pushed a commit to jfagoagas/detect-secrets that referenced this pull request Mar 14, 2026
* Base on minimal vs python39 UBI

* use tested version - python38

Co-authored-by: Timothy Figgins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants