A standalone Babashka tool for extracting, analyzing, and validating keywords in Markdown documentation. Supports keyword indexing, relationship graphing, validation against taxonomy, and cross-project analysis.
This repo has been deprecated in favor of the .doc utilities, which splits out functionality into multiple scripts which each "do one thing, and do it well."
- Extract keywords from individual Markdown files
- Build keyword indexes showing which documents contain which keywords
- Generate relationship graphs visualizing keyword connections
- Validate keywords against a project-specific taxonomy
- Suggest keywords based on document content
- Usage statistics across single or multiple projects
- Cross-project analysis for comparing keyword usage across codebases
- Project-aware with optional configuration per project
- Zero dependencies - single executable script using Babashka
- Babashka installed on your system
git clone <repository-url> ~/bin/doc-keywords
# Add ~/bin/doc-keywords to your PATH
echo 'export PATH="$HOME/bin/doc-keywords:$PATH"' >> ~/.bashrc # or ~/.zshrc
source ~/.bashrc # or source ~/.zshrcgit clone <repository-url> ~/tools/doc-keywords
ln -s ~/tools/doc-keywords/doc-keywords ~/bin/doc-keywords
# Ensure ~/bin is in your PATHcurl -o ~/bin/doc-keywords <raw-url-to-script>
chmod +x ~/bin/doc-keywords# Get help
doc-keywords help
# Extract keywords from a file
doc-keywords extract-keywords doc/README.md
# Get statistics for current project
cd ~/my-project
doc-keywords keyword-stats
# Validate keywords against taxonomy
doc-keywords validate-keywords
# Build keyword index as JSON
doc-keywords keyword-index > keywords.json
# Generate DOT graph
doc-keywords keyword-graph > keywords.dot
dot -Tpng keywords.dot -o keywords.pngKeywords follow Clojure-style conventions and are embedded in Markdown using bracket notation:
Single keyword: [:architecture]
Multiple keywords: [:security :authentication :encryption]Extract all keywords from a specific Markdown file.
doc-keywords extract-keywords path/to/file.mdOutput: List of keywords, one per line
Build a JSON index showing which documents contain which keywords.
# Current project
doc-keywords keyword-index
# Specific project
doc-keywords keyword-index ~/projects/my-app
# Multiple projects (cross-project index)
doc-keywords keyword-index ~/proj1 ~/proj2 ~/proj3Output: JSON object mapping keywords to arrays of file paths
Example:
{
"security": ["components/auth.md", "architecture/overview.md"],
"architecture": ["architecture/overview.md", "doc/design.md"]
}Generate a DOT format graph showing keyword relationships. Keywords that appear together in the same bracket set are considered related.
# Current project
doc-keywords keyword-graph > keywords.dot
# Multiple projects
doc-keywords keyword-graph ~/proj1 ~/proj2 > keywords.dot
# Generate PNG
doc-keywords keyword-graph > keywords.dot
dot -Tpng keywords.dot -o keywords.pngOutput: GraphViz DOT format
Validate all keywords used in documentation against the project's taxonomy file.
# Current project
doc-keywords validate-keywords
# Specific project
doc-keywords validate-keywords ~/projects/my-app
# Multiple projects (validates each separately)
doc-keywords validate-keywords ~/proj1 ~/proj2Output: Success message or list of invalid keywords by file
Exit codes:
- 0: All keywords valid
- 1: Invalid keywords found
Analyze a file's content and suggest relevant keywords based on common patterns.
doc-keywords suggest-keywords path/to/file.mdOutput: Suggested keywords based on content analysis
Show usage statistics for all keywords across projects.
# Current project
doc-keywords keyword-stats
# Specific project
doc-keywords keyword-stats ~/projects/my-app
# Multiple projects (combined statistics)
doc-keywords keyword-stats ~/proj1 ~/proj2 ~/proj3Output: Keyword usage counts, sorted by frequency
Create .doc-keywords.edn in your project root to customize behavior:
{:doc-dir "doc"
:taxonomy "doc-tools/keyword-taxonomy.md"}Configuration options:
:doc-dir- Directory containing Markdown documentation (default:"doc"):taxonomy- Path to keyword taxonomy file (default:"doc-tools/keyword-taxonomy.md")
If no .doc-keywords.edn is present, the tool uses these defaults:
- Documentation directory:
doc/ - Taxonomy file:
doc-tools/keyword-taxonomy.md
The tool automatically discovers your project root by:
- Looking for a git repository (searches upward from current directory)
- Falling back to the current working directory if not in a git repo
All paths in the config are resolved relative to the project root.
The taxonomy file defines valid keywords for your project. Keywords should be documented using backtick-colon notation:
# Keyword Taxonomy
## Architecture
- `:architecture` - Architectural decisions and patterns
- `:modularity` - Module boundaries and organization
- `:abstraction` - Abstraction layers and interfaces
## Security
- `:security` - General security concerns
- `:authentication` - Authentication mechanisms
- `:authorization` - Authorization and access controlSee example-taxonomy.md for a complete example.
cd ~/my-project
# Create taxonomy
mkdir -p doc-tools
cat > doc-tools/keyword-taxonomy.md << 'EOF'
# Project Keywords
- `:architecture` - Architecture decisions
- `:security` - Security concerns
- `:api` - API design
EOF
# Add keywords to your docs
echo "This document covers [:architecture :api] design." > doc/design.md
# Validate
doc-keywords validate-keywords
# Output: ✓ All keywords valid
# Get statistics
doc-keywords keyword-stats
# Output: Keyword usage statistics...Compare keyword usage across multiple projects:
# Analyze three projects
doc-keywords keyword-stats ~/work/api ~/work/frontend ~/work/backend
# Build unified index
doc-keywords keyword-index ~/work/* > all-projects-index.json
# Generate relationship graph across projects
doc-keywords keyword-graph ~/work/* > all-keywords.dotValidate keywords in continuous integration:
# In .github/workflows/docs.yml or similar
- name: Validate documentation keywords
run: doc-keywords validate-keywordsThe command exits with code 1 if validation fails, making it suitable for CI pipelines.
If your docs are in a different location:
;; .doc-keywords.edn
{:doc-dir "docs"
:taxonomy "docs/taxonomy.md"}doc-keywords validate-keywords
# Uses docs/ directory and docs/taxonomy.mdThis tool extracts and generalizes the keyword utilities from the original doc-tools/bb.edn format:
Original (project-specific):
cd project
bb keyword-index
bb validate-keywordsNew (standalone, reusable):
doc-keywords keyword-index ~/project
doc-keywords validate-keywords ~/projectKey improvements:
- Single executable script (no bb.edn per project)
- Works across multiple projects
- Project-specific configuration via
.doc-keywords.edn - Cross-project analysis built-in
- Standalone with no external dependencies
Ensure your taxonomy file exists at the configured location (default: doc-tools/keyword-taxonomy.md). You can customize this in .doc-keywords.edn:
{:taxonomy "docs/keywords.md"}Check that:
- Your Markdown files use the correct syntax:
[:keyword]or[:keyword1 :keyword2] - Files are in the configured doc directory (default:
doc/) - Files have
.mdextension
Ensure the script is executable:
chmod +x ~/bin/doc-keywords/doc-keywordsThe tool is a single Babashka script with no external dependencies beyond Babashka's built-in libraries:
babashka.fs- File system operationsclojure.edn- Configuration parsingclojure.string- String manipulationclojure.java.io- I/O operationscheshire.core- JSON generation
To modify:
- Edit
doc-keywordsscript directly - Test with:
bb doc-keywords <command> [args] - No compilation or build step required
MIT
Will review Pull Requests