Invisible, Contextual, and Neural Watermarking for AI-Generated Text
This repository contains a complete implementation of invisible watermarking techniques for AI-generated text, including both Unicode-based watermarking and a more advanced contextual neural watermarking system using PyTorch and HuggingFace Transformers.
The goal of this project is to embed a watermark during generation so that text remains untouched visually but can be reliably detected later — even after copy/paste or light editing.
- Uses zero-width Unicode characters
- Fully invisible to users
- Survives copy/paste into plain editors
- Good baseline watermarking method
Built using state-of-the-art techniques:
- EnhancedHashNet: Neural context-based hashing
- Green/Red token lists per decoding step
- Logit manipulation using custom LogitsProcessor
- Dynamic watermark embedding
- Statistical detection using Z-score and p-values
- Visualization of watermark patterns
This method provides higher security, robustness, and stealth.
During text generation:
- Context Analysis: The model takes previous tokens (context window)
- Neural Hashing: A neural hash network generates a unique seed based on context
- Vocabulary Permutation: Vocabulary is permuted using this seed
- Token Biasing: "Green tokens" are boosted, "red tokens" are penalized
- Natural Selection: Model is more likely to choose green tokens → creates invisible pattern
The watermark is embedded seamlessly without affecting text quality or fluency.
Given a text to verify:
- Seed Reconstruction: Detector reconstructs the seed at each position
- Green-list Rebuilding: Rebuilds green-lists exactly as during generation
- Pattern Matching: Checks how often text chooses green tokens
- Statistical Analysis: Computes:
- z-score: Measures deviation from random selection
- p-value: Statistical significance of watermark presence
- confidence: Overall detection confidence score
- Python 3.8+
- PyTorch 2.0+
- HuggingFace Transformers
- NumPy, SciPy
pip install torch transformers numpy scipy matplotlibOr use requirements.txt:
pip install -r requirements.txt| Parameter | Description | Default |
|---|---|---|
context_width |
Number of previous tokens used for hashing | 5 |
gamma |
Proportion of vocabulary marked as "green" | 0.25 |
delta |
Logit bias added to green tokens | 2.0 |
detection_threshold |
Z-score threshold for detection | 4.0 |
- Higher
gamma: More tokens marked green → stronger watermark, potentially less natural - Higher
delta: Stronger bias → more detectable but may affect quality - Larger
context_width: More secure but slower detection
The detector provides several metrics:
-
Z-score: Measures how unusual the green token frequency is
z > 4.0: Strong watermark detected2.0 < z < 4.0: Weak signalz < 2.0: No watermark
-
P-value: Probability of observing this pattern by chance
p < 0.0001: Very high confidencep < 0.05: Significant detection
-
Green Token Ratio: Percentage of tokens that are green
- Expected ratio without watermark:
gamma(e.g., 0.25) - With watermark: typically > 0.5
- Expected ratio without watermark:
✅ Copy/paste operations
✅ Light paraphrasing
✅ Minor edits
✅ Format changes
❌ Heavy rewriting or summarization
❌ Translation to another language
❌ Adversarial attacks specifically designed to remove watermarks
Generate watermark pattern visualizations to analyze the detection results and see the distribution of green tokens throughout the text.
This implementation is based on research in:
- "A Watermark for Large Language Models" (Kirchenbauer et al., 2023)
- "On the Reliability of Watermarks for Large Language Models" (Christ et al., 2023)
- Zero-width character steganography techniques
This project is licensed under the MIT License - see the LICENSE file for details.