High-performance profanity filter for Python with multilingual support and evasion detection.
Installation • Quick Start • Supported Languages • Advanced Evasion Detection
BadWords is a sophisticated profanity filtering library designed to clean up user-generated content. Unlike simple keyword matching, it uses similarity scoring, homoglyph detection, and transliteration to catch even the most cleverly disguised insults.
- Recommended: Python 3.13
- Minimum: Python 3.10+
pip install git+[https://github.com/FlacSy/badwords.git](https://github.com/FlacSy/badwords.git)
pip install badwords-pyfrom badwords import ProfanityFilter
# Initialize filter
p = ProfanityFilter()
# Load specific languages (e.g., English and Russian)
p.init(languages=["en", "ru"])
# Or load ALL 26+ supported languages
p.init()text = "Some very b4d text here"
# 1. Simple check (Returns Boolean)
is_bad = p.filter_text(text)
print(is_bad) # True
# 2. Censoring text (Returns String)
clean_text = p.filter_text(text, replace_character="*")
print(clean_text) # "Some very *** text here"The core method of the library.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
Required | Input text to check. |
match_threshold |
float |
0.8 |
Similarity threshold (1.0 = exact match, 0.7 = aggressive). |
replace_character |
str/None |
None |
If provided, returns censored string. If None, returns bool. |
Warning
Performance Tip: Using match_threshold < 1.0 enables fuzzy matching which is slower. Use 1.0 for high-traffic real-time filtering, or 0.95 for a good balance.
Standard filters are easy to bypass. BadWords is built to detect:
- Homoglyphs: Detects
hеllo(using Cyrillic 'е') orh4llo(numbers). - Transliteration: Automatically handles mapping between Cyrillic and Latin alphabets.
- Normalization: Strips diacritics, special characters, and decorative Unicode symbols.
- Similarity Analysis: Uses fuzzy matching to find words with deliberate typos.
_filter.filter_text("hеllо") # Mixed alphabets (Cyrillic + Latin) -> DETECTED
_filter.filter_text("h3ll0") # Character substitution -> DETECTED
_filter.filter_text("h⍺llo") # Mathematical/Greek symbols -> DETECTED
_filter.filter_text("привет") # Transliterated matches -> DETECTEDBadWords currently supports 26 languages out of the box:
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
en |
English | ru |
Russian | ua |
Ukrainian |
de |
German | fr |
French | it |
Italian |
sp |
Spanish | pl |
Polish | cz |
Czech |
ja |
Japanese | ko |
Korean | th |
Thai |
| ... | & 14 more |
Use p.get_all_languages() to see the full list in your code.
from badwords import ProfanityFilter
def monitor_chat():
# Setup for a global chat
profanity_filter = ProfanityFilter()
profanity_filter.init(["en", "ru", "de"])
# Custom project-specific banned words
profanity_filter.add_words(["spam_link_v1", "scam_bot_99"])
user_input = "Hey! Check out this b.a.d.w.o.r.d"
# Moderate with high accuracy
is_offensive = profanity_filter.filter_text(user_input, match_threshold=0.95)
if is_offensive:
print("Message blocked: Contains restricted language.")
else:
# Proceed with processing
pass
if __name__ == "__main__":
monitor_chat()Contributions are what make the open-source community an amazing place to learn, inspire, and create.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.