This project uses a combination of natural language processing (NLP), specifically named entity recognition (NER), combined with regular expressions, to identify and remove personally identifiable information (PII) from a given piece of text.
- After cloning the repo, save your raw text file/s to be sanitised in the Files directory.
- Start
sbtand typerun. - Enter the name of the file you want to process (e.g. 'inputfile.txt').
- Your sanitised file will appear in the Files folder as 'sanitised.txt' and the PII that was found will be in 'pii.txt'.
- Repeat for each file.