A utility that cleans up text by removing or translating common "slop" patterns (unnecessary or non-standard characters and phrases) into cleaner output. Particularly useful for cleaning up AI-generated text from various LLMs.
- Intelligently handles em-dashes (—) and en-dashes (–) based on context
- Replaces non-standard spaces with regular spaces
- Condenses multiple spaces while preserving indentation
- Standardizes bullet points and list formatting
- Preserves technical and scientific characters (±, §, µ, °, etc.)
- Detects and removes overused emoji and emoji clusters
- Selectively removes only the most egregious phrases like "Certainly! " at the start of text
- Maintains most natural phrasing while removing unnecessary verbosity
- Configurable to add custom phrase removal patterns
- Can optionally remove transitional phrases like "That being said," "Here's why:"
- Standardizes date formats for consistency
- Preserves human-readable month names and abbreviations
- Ensures proper spacing in time formats with AM/PM and time zones
- Standardizes case for time zone abbreviations (UTC, GMT, EST, etc.)
- Ensures consistent capitalization for months, seasons, and astronomical terms
- Maintains appropriate capitalization based on context
- Reduces multiple exclamation marks to single marks
- Standardizes excessive question marks
- Fixes unbalanced quotes and parentheses
- Converts smart (curly) quotes to normal (straight) quotes
- Normalizes ellipsis with too many dots
# Install from npm
npm install -g deslopify
# Or clone and build locally
git clone https://github.com/davedean/deslopify.git
cd deslopify
npm install
npm run build
npm link # Makes the CLI available globally# Basic usage with stdin/stdout
deslopify < input.txt > output.txt
# Using file arguments
deslopify --input input.txt --output output.txt
# Pipe from another command
cat input.txt | deslopify > output.txt
# Skip specific processing
deslopify --skip-chars < input.txt > output.txt
deslopify --skip-phrases < input.txt > output.txt
deslopify --skip-datetime < input.txt > output.txt
deslopify --skip-abbreviations < input.txt > output.txt
deslopify --skip-punctuation < input.txt > output.txt
deslopify --skip-emoji < input.txt > output.txt
# Disable fixing unbalanced delimiters
deslopify --no-fix-unbalanced < input.txt > output.txt
# Emoji handling options
deslopify --remove-all-emoji < input.txt > output.txt
deslopify --remove-overused-emoji < input.txt > output.txtimport deslopify, { Deslopifier } from 'deslopify';
// Simple usage
const cleanText = deslopify('Certainly! This text—has some slop in it!!!!');
console.log(cleanText); // 'This text - has some slop in it!'
// Advanced usage with options
const processor = new Deslopifier({
skipCharacterReplacement: false,
skipPhraseRemoval: false,
skipDateTimeFormatting: false,
skipAbbreviationHandling: false,
skipPunctuationNormalization: false,
skipEmojiHandling: false,
fixUnbalancedDelimiters: true,
// Configure emoji handling
emojiOptions: {
removeAll: false,
removeOverused: true
},
// Add custom mappings if needed
customCharacterMappings: [
{ pattern: /\*/g, replacement: '•' }
],
customPhrasePatterns: [
{ pattern: /In conclusion,/g, position: 'anywhere' }
],
customDateTimeMappings: [
{ pattern: /(\d{4})(\d{2})(\d{2})/g, replacement: '$1-$2-$3' }
],
customAbbreviationMappings: [
{ pattern: /\bpst\b/g, replacement: 'PST', preserveCase: false }
],
customPunctuationMappings: [
{ pattern: /\.{5,}/g, replacement: '...' }
]
});
// Add more custom patterns if needed
processor.addCharacterMapping(/\?\?\?/g, '?');
processor.addPhrasePattern(/^To be honest,/i, 'start');
processor.addDateTimeMapping(/\b(\d{1,2})\.(\d{1,2})\.(\d{4})\b/g, '$3-$2-$1');
processor.addAbbreviationMapping(/\bcentral european time\b/gi, 'CET', false);
processor.addPunctuationMapping(/\!\?\!\?/g, '!?');
processor.addOverusedEmojiPattern(/🦄/gu); // Add unicorn emoji to overused list
processor.setEmojiOptions({ removeAll: true }); // Remove all emoji
const result = processor.process('To be honest, this text has ??? many stars *** in it!!!');
console.log(result); // 'this text has ? many stars ••• in it!'The Deslopifier accepts the following options:
customCharacterMappings: Custom character mappings to use instead of defaultscustomPhrasePatterns: Custom phrase patterns to use instead of defaultscustomDateTimeMappings: Custom date/time format mappings to usecustomAbbreviationMappings: Custom abbreviation mappings to usecustomPunctuationMappings: Custom punctuation mappings to useskipCharacterReplacement: Skip the character replacement stepskipPhraseRemoval: Skip the phrase removal stepskipDateTimeFormatting: Skip the date/time formatting stepskipAbbreviationHandling: Skip the abbreviation handling stepskipPunctuationNormalization: Skip the punctuation normalization stepskipEmojiHandling: Skip the emoji handling stepfixUnbalancedDelimiters: Whether to fix unbalanced quotes and parenthesesemojiOptions: Configuration for emoji handlingremoveAll: Remove all emoji charactersremoveOverused: Remove only overused emoji and emoji clusters
Contributions are welcome! Please see CONTRIBUTING.md for details.
The LLM_HINTS.md file contains additional information about the code structure, methods, and tips for running/testing the project. This file is particularly useful for LLMs (Large Language Models) to aid in understanding the codebase.
When publishing a new version:
- Update the version in package.json
- Run tests and build the project:
npm run lint && npm test && npm run build
- Publish to npm:
npm publish
The prepublishOnly script will automatically run linting, tests, and build before publishing.
MIT