Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: Add 'auto_obfuscate' transformation to basic transformer #20728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

blesniewski
Copy link
Contributor

Summary

Adds automatic obfuscation transformation basing on the new metadata field added in cloudquery/plugin-sdk#2134

@cq-bot cq-bot added the area/cli label May 8, 2025
@blesniewski
Copy link
Contributor Author

blesniewski commented May 8, 2025

Not ready yet:

  • Needs the SDK PR
  • docs need to be updated

Also, not sure if:

  • we should add way to skip certain tables and to set that in spec
  • modifying the default output of obfuscate_columns is ok, or if this should have it's own logic to do the obfuscation

…feature/eng-1033-allow-specifying-sensitive-table-columns-in-the-sdk
@blesniewski
Copy link
Contributor Author

blesniewski commented May 9, 2025

One question remaining:
If there's a JSON column, the current obfuscate transformation behavior is that it won't obfuscate the entire column- it requires a json path: failed to transform schema: column tags is not a string column

So 2 options:

  • if we want this capability, I'll probably split the logic between obfuscate and auto_obfuscate, and add handling (or adjust the obfuscate behavior if we want to keep sharing)
  • if we don't want to be able to obfuscate entire json columns, I'll adjust the validation in SDK

I assume we do want it, just putting the question out for confirmation @murarustefaan

@blesniewski
Copy link
Contributor Author

I've added handling for obfuscating entire JSON columns.

Drawback of the current solution - to not have to unmarshal every value for each handled column, we're calculating hash of the entire value, which results in JSON output, but without preserving the internal structure:
{"redacted_by_cloudquery": "81f2a9ddc7ae49a...hash_value"}

Additionally, the output of the obfuscate transformation has changed, now the hashes values in would be preceded by Redacted by CloudQuery |

@blesniewski blesniewski marked this pull request as ready for review May 9, 2025 14:54
@blesniewski blesniewski requested a review from a team as a code owner May 9, 2025 14:54
@blesniewski blesniewski requested a review from maaarcelino May 9, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants