| name | yargen |
|---|---|
| description | Generate YARA rules from malware samples using yarGen-Go. Manage goodware databases, use CLI or API for rule generation, and integrate with yarGen web server. Use when generating YARA rules, managing goodware databases, creating custom string/opcode databases, or interacting with yarGen web API. |
Automatic YARA rule generator that extracts strings from malware samples while filtering out goodware strings.
yarGen database initialization takes 2-10 minutes depending on hardware:
- High-end systems: ~30-60 seconds
- Average systems: 2-5 minutes
- Lower-end systems: 5-10 minutes
During this time, you'll see messages like:
[+] Loaded dbs/good-strings-part1.db (1416757 entries)
Do not interrupt this process - the databases are being loaded into memory.
| Scenario | Method | Recommendation |
|---|---|---|
| Single sample | CLI with -f flag |
Use -f for quick one-offs |
| Multiple samples | Start server once | More efficient - databases loaded once |
💡 Recommendation: If analyzing more than one sample, start the yarGen server (
./yargen serve) and keep it running. The database initialization happens only once, making subsequent samples much faster to process.
# 1. Ensure yarGen is available
export YARGEN_DIR="$HOME/clawd/projects/yarGen-Go/repo"
# 2. Download databases (first time)
$SKILL_DIR/scripts/yargen-db.sh update
# 3. Generate rules from a single file
$SKILL_DIR/scripts/yargen-generate.sh -f ./malware.exe -a "Your Name" --opcodes
# 4. Or generate from a directory
$SKILL_DIR/scripts/yargen-generate.sh -m ./malware-samples -a "Your Name" --opcodesyarGen-Go must be cloned and built:
git clone https://github.com/Neo23x0/yarGen-Go.git ~/clawd/projects/yarGen-Go
cd ~/clawd/projects/yarGen-Go
go build -o yargen ./cmd/yargen
go build -o yargen-util ./cmd/yargen-util
./yargen-util updateAnalyze a single sample without starting the server:
# Using the wrapper script
./yargen-generate.sh -f malware.exe -a "Author Name"
# Or directly with yarGen
./yargen -f malware.exe -a "Author Name" -o rule.yar
# With opcodes (recommended for PE files)
./yargen -f malware.exe -a "Author Name" --opcodes💡 Note: When using
-f, yarGen creates a temporary directory internally and cleans it up after processing. This is equivalent to:mkdir -p /tmp/yarGen-work && cp sample.exe /tmp/yarGen-work/ ./yargen -m /tmp/yarGen-work -a "Author" -o rule.yar
For multiple samples, start the server once and submit samples via API:
# Start server (if not running) - takes 2-10 min to initialize
cd $YARGEN_DIR && ./yargen serve &
# Wait for: "[+] Starting web server at http://127.0.0.1:8080"
# Submit sample - simplest usage
./yargen-util submit malware.exe
# With options (flags must come BEFORE the sample file)
./yargen-util submit -a "Florian Roth" -show-scores -v malware.exe
# Save to file
./yargen-util submit -o rules.yar -wait 300 malware.exeImportant: Flags must come before the sample file (Go flag parsing limitation).
Options:
| Flag | Description | Default |
|---|---|---|
-a <author> |
Author name in rule meta | yarGen |
-r <reference> |
Reference string (URL, report) | none |
-show-scores |
Include string scores as comments | false |
-no-opcodes |
Skip opcode analysis (faster) | false |
-o <file> |
Save rules to file | stdout |
-wait <sec> |
Max wait time for large files | 600 (10min) |
-v |
Verbose progress output | false |
-server <url> |
yarGen server URL | http://127.0.0.1:8080 |
Use the generate script for batch processing:
$SKILL_DIR/scripts/yargen-generate.sh -m <malware-dir> [options]
Options:
-m <dir> Malware directory (required for batch mode)
-f <file> Single file mode (alternative to -m)
-o <file> Output file (default: yargen_rules.yar)
-a <author> Author name
-r <reference> Reference string
--opcodes Include opcode analysis
--score Show scores as commentsOr use yarGen directly:
cd $YARGEN_DIR
./yargen -m ./malware --opcodes -a "Author"Use the database script:
$SKILL_DIR/scripts/yargen-db.sh <command>
Commands:
list List all databases
update Download pre-built databases
create Create from goodware directory
append Append to existing database
merge Merge multiple databases
inspect Show database statsSee database-guide.md for detailed best practices.
Start the server:
cd $YARGEN_DIR
./yargen serve --port 8080Use the API client script:
# Check server
$SKILL_DIR/scripts/yargen-api.sh health
# Upload and generate (one-shot)
$SKILL_DIR/scripts/yargen-api.sh full ./malware.exe -a "Author"
# Or step by step:
$SKILL_DIR/scripts/yargen-api.sh upload malware.exe
# → Copy job_id from output
$SKILL_DIR/scripts/yargen-api.sh generate <job-id> -a "Author"
$SKILL_DIR/scripts/yargen-api.sh rules <job-id>See api-reference.md for complete API documentation.
- Clone and build yarGen-Go
- Run
yargen-db.sh updateto download databases - Optionally create custom database:
yargen-db.sh create -g /opt/goodware -i local
- Run
./yargen -f ./malware.exe --opcodes -a "Author" - Review and post-process generated rule
💡 Note: This will show a recommendation message suggesting the server mode for multiple samples.
- Start server:
./yargen serve(wait 2-10 min for initialization) - Submit samples:
yargen-util submit -a "Author" sample1.exe - Continue submitting more samples - no re-initialization needed
- Stop server when done:
pkill -f "yargen serve"
Why this is better: The databases are loaded once and stay in memory. Each subsequent sample processes in seconds instead of minutes.
The yarGen server keeps all goodware databases in memory (~1-2GB RAM depending on configuration).
After all work is complete, stop the service to free memory:
pkill -f "yargen serve"yargen-db.sh list- Check database sizesyargen-db.sh inspect <db>- Review contentsyargen-db.sh update- Get latest pre-built DBsyargen-db.sh append -g <dir> -i local- Add to custom DB
- Multiple
good-strings-part*.dbfiles - Your
good-strings-local.db - yarGen merges them at runtime
yargen-util merge -o combined.db dbs/good-strings-*.dbSee database-guide.md for trade-offs.
Create config/config.yaml for LLM integration:
llm:
provider: "openai"
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}"
database:
dbs_dir: "./dbs"- Use
--opcodesfor executable files (adds opcode analysis) - Use
--scoreto see string scoring in rule comments - Custom databases help reduce false positives for your environment
- The web API is useful for automation and integrations
- For single files, use
-fflag instead of creating temp directories manually - Start the server once and keep it running when analyzing multiple samples
- Remember to kill the server after all work is done to free up RAM