Fast profanity detection and filtering for 13 languages.
- Multi-format Detection: Single words, phrases, and contextual profanity
- Custom Word Lists: Extend built-in lists with your own profanity words
- Whitelisting: Exclude specific words from detection
- Auto Language Detection: From text or subtitle files
- Precise Filtering: Exact position tracking and custom censoring
- Simple Integration: One-line setup with clean API
easily install safetext with pip:
pip install safetextfor development setup, see our scripts documentation.
>>> from safetext import SafeText
>>> st = SafeText(language='en')
>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]
>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."Add your own profanity words by providing a custom words directory:
# Directory structure:
# custom_profanity_words/
# โโโ en.txt # English custom words
# โโโ tr.txt # Turkish custom words
# โโโ es.txt # Spanish custom words
>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')
>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]Custom word files should contain one word/phrase per line:
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
exclude specific words from profanity detection:
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])
# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')
# Combining custom words with whitelist
>>> st = SafeText(
... language='en',
... custom_words_dir='custom_profanity_words',
... whitelist=['allowedcustomword']
... )- from text:
>>> from safetext import SafeText
>>> eng_text = "This story is about to take a dark turn."
>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)
>>> st.language
'en'- from .srt (subtitle) file:
>>> from safetext import SafeText
>>> turkish_srt_file_path = "turkish.srt"
>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)
>>> st.language
'tr'safetext currently supports profanity detection in 13 languages:
| Language | ISO 639-1 Code | Language Name |
|---|---|---|
| ๐ธ๐ฆ | ar |
Arabic |
| ๐ฆ๐ฟ | az |
Azerbaijani |
| ๐ฉ๐ช | de |
German |
| ๐ฌ๐ง | en |
English |
| ๐ช๐ธ | es |
Spanish |
| ๐ฎ๐ท | fa |
Persian (Farsi) |
| ๐ซ๐ท | fr |
French |
| ๐ฎ๐ณ | hi |
Hindi |
| ๐ฏ๐ต | ja |
Japanese |
| ๐ต๐น | pt |
Portuguese |
| ๐ท๐บ | ru |
Russian |
| ๐น๐ท | tr |
Turkish |
| ๐จ๐ณ | zh |
Chinese |
join our mission in refining content moderation!
contribute by:
- adding new languages: create a folder with the ISO 639-1 code and include a
words.txt. - enhancing word lists: improve detection accuracy.
- sharing feedback: your ideas can shape
safetext.
see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.
meet our awesome contributors who make safetext better every day!