Thanks to visit codestin.com
Credit goes to github.com

Skip to content

geoherna/gosh-darnit

Repository files navigation

Emoji-angry-default

gosh-darnit

CI Go Report Card codecov Go Reference License: MIT

A fast, efficient Go library for profanity detection and censorship.

Features

  • Fast: Uses the Aho-Corasick algorithm to match all patterns in a single pass
  • Smart word boundaries: Prevents false positives like "bass", "analyst", "assist", "Scunthorpe"
  • Evasion resistant: Handles common obfuscation techniques:
    • Leetspeak: @ss, sh1t, fvck, a$$
    • Unicode homoglyphs: Cyrillic, Greek, fullwidth characters
    • Zero-width characters: U+200B, U+200C, U+200D, U+FEFF
    • Repeated characters: fuuuuck, shiiiit
    • NFKC Unicode normalization
  • Flexible censoring: Multiple modes for replacing profanity
  • Zero external dependencies: Only uses Go standard library + golang.org/x/text

Installation

go get github.com/geoherna/gosh-darnit

Usage

Basic Detection

package main

import (
	"fmt"
	"github.com/geoherna/gosh-darnit"
)

func main() {
    // Check if text contains profanity
    if goshdarnit.IsProfane("What the fuck?") {
        fmt.Println("Profanity detected!")
    }

    // Find which words matched
    words := goshdarnit.FindProfanity("This is some shit")
    fmt.Println("Found:", words) // ["shit"]
}

Censoring

package main

import (
	"fmt"
	"github.com/geoherna/gosh-darnit"
)

func main() {
    text := "What the fuck is this shit?"

    // Replace all characters with asterisks
    fmt.Println(goshdarnit.Censor(text, goshdarnit.CensorAll))
    // Output: "What the **** is this ****?"

    // Keep first character visible
    fmt.Println(goshdarnit.Censor(text, goshdarnit.CensorKeepFirst))
    // Output: "What the f*** is this s***?"

    // Keep first and last characters visible
    fmt.Println(goshdarnit.Censor(text, goshdarnit.CensorKeepFirstLast))
    // Output: "What the f**k is this s**t?"
}

Evasion Detection

The library automatically handles common evasion techniques:

// Leetspeak
goshdarnit.IsProfane("@ss")      // true (@ -> a)
goshdarnit.IsProfane("sh1t")     // true (1 -> i)
goshdarnit.IsProfane("fvck")     // true (v -> u)
goshdarnit.IsProfane("a$$")      // true ($ -> s)

// Repeated characters
goshdarnit.IsProfane("fuuuuck")  // true
goshdarnit.IsProfane("shiiiit")  // true

// Unicode homoglyphs (Cyrillic 'а' looks like Latin 'a')
goshdarnit.IsProfane("аss")      // true

False Positive Prevention

Word boundary detection prevents common false positives:

goshdarnit.IsProfane("The bass is great")     // false
goshdarnit.IsProfane("She's an analyst")      // false
goshdarnit.IsProfane("I need to assist you")  // false
goshdarnit.IsProfane("Scunthorpe is a town")  // false
goshdarnit.IsProfane("The shitake mushrooms") // false
goshdarnit.IsProfane("Assess the situation")  // false
goshdarnit.IsProfane("Classic movie")         // false

API Reference

Functions

Function Description
IsProfane(text string) bool Returns true if text contains profanity
ContainsProfanity(text string) bool Alias for IsProfane
Censor(text string, mode CensorMode) string Replaces profanity with asterisks
CensorWithDefault(text string) string Censors with CensorAll mode
FindProfanity(text string) []string Returns list of matched profane words

Censor Modes

Mode Example Description
CensorAll **** Replace all characters
CensorKeepFirst f*** Keep first character visible
CensorKeepFirstLast f**k Keep first and last characters visible

Performance

Benchmarks on Apple M4 Max:

Benchmark Time Allocations
CleanShort ~766ns 8 allocs
ProfaneShort ~839ns 9 allocs
Leetspeak ~847ns 9 allocs
RepeatedChars ~1.0µs 11 allocs
MixedText ~2.5µs 14 allocs

Run benchmarks yourself:

go test -bench=. -benchmem

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Special Shoutouts

Huge thanks to John Kim for the graphic design assets used in this project.

Note on content

This software contains a list of profanities, slurs, and other offensive terms solely for the purpose of detecting and filtering harmful language in user-generated content. These terms are included for harm-reduction, research, and moderation purposes only. Their presence in the source code does not constitute endorsement or promotion of such language by the authors.

License

MIT License - see LICENSE for details.