zircote · zircote · Dec 26, 2025 · Dec 26, 2025 · Dec 26, 2025 · Dec 26, 2025
@@ -286,6 +286,15 @@ LSP hooks are configured in `.claude/hooks.json` for immediate feedback on Pytho
 
 ## Completed Spec Projects
 
+- `docs/spec/completed/2025-12-25-observability-instrumentation/` - Observability Instrumentation
+  - Completed: 2025-12-26
+  - Outcome: success
+  - GitHub Issue: [#10](https://github.com/zircote/git-notes-memory/issues/10)
+  - Features: Metrics collection, distributed tracing, structured logging, CLI commands (/metrics, /traces, /health)
+  - Deliverables: 115+ tests, 4 phases completed (6 total, 2 optional skipped), 20 tasks, 11 ADRs
+  - Note: Phases 5-6 (OpenTelemetry, Docker stack) skipped as optional Tier 3 enhancements
+  - Key docs: REQUIREMENTS.md, ARCHITECTURE.md, IMPLEMENTATION_PLAN.md, DECISIONS.md, PROGRESS.md
+
 - `docs/spec/completed/2025-12-25-fix-git-notes-fetch-refspec/` - Fix Git Notes Fetch Refspec
   - Completed: 2025-12-25
   - Outcome: success

@@ -0,0 +1,135 @@
+---
+description: Health check with optional timing percentiles
+argument-hint: "[--timing] [--verbose]"
+allowed-tools: ["Bash", "Read"]
+---
+
+<help_check>
+## Help Check
+
+If `$ARGUMENTS` contains `--help` or `-h`:
+
+**Output this help and HALT (do not proceed further):**
+
+<help_output>
+```
+HEALTH(1)                                            User Commands                                            HEALTH(1)
+
+NAME
+    health - Health check with optional timing percentiles
+
+SYNOPSIS
+    /memory:health [--timing] [--verbose]
+
+DESCRIPTION
+    Perform a quick health check of the memory system. Shows component status
+    and optionally includes latency percentiles from collected metrics.
+
+OPTIONS
+    --help, -h       Show this help message
+    --timing         Include latency percentiles from metrics
+    --verbose        Show detailed component status
+
+EXAMPLES
+    /memory:health
+    /memory:health --timing
+    /memory:health --verbose
+    /memory:health --timing --verbose
+    /memory:health --help
+
+SEE ALSO
+    /memory:status for detailed system status
+    /memory:metrics for all collected metrics
+    /memory:traces for recent operation traces
+
+                                                                      HEALTH(1)
+```
+</help_output>
+
+**After outputting help, HALT immediately. Do not proceed with command execution.**
+</help_check>
+
+---
+
+# /memory:health - System Health Check
+
+Quick health check of the memory system with optional timing information.
+
+## Your Task
+
+You will check the health of the memory system components.
+
+<step number="1" name="Parse Arguments">
+
+**Arguments format**: `$ARGUMENTS`
+
+Parse the following options:
+- `--timing` - Include latency percentiles
+- `--verbose` - Show detailed component status
+
+</step>
+
+<step number="2" name="Run Health Check">
+
+**Execute the health check**:
+```bash
+PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(ls -d ~/.claude/plugins/cache/git-notes-memory/memory-capture/*/ 2>/dev/null | head -1)}"
+uv run --directory "$PLUGIN_ROOT" python3 "$PLUGIN_ROOT/scripts/health.py" $ARGUMENTS
+```
+
+</step>
+
+<step number="3" name="Provide Recommendations">
+
+If issues are detected, show recommendations:
+
+```
+### Recommendations
+
+- **Index not initialized** - Run `/memory:sync` to create the index
+- **Embedding model not loaded** - First search may be slow while model loads
+- **Git notes not configured** - Run `/memory:capture` to create first memory
+- **High timeout rate** - Consider increasing hook timeouts in environment
+```
+
+</step>
+
+## Output Sections
+
+| Section | Description |
+|---------|-------------|
+| Components | Status of each system component |
+| Latency Percentiles | Timing metrics (with --timing) |
+| Hook Timeout Rate | Percentage of timed-out hooks |
+| Component Details | Detailed stats (with --verbose) |
+
+## Status Indicators
+
+| Symbol | Meaning |
+|--------|---------|
+| ✓ | Healthy/OK |
+| ✗ | Error/Failed |
+| ⚠ | Warning/Issues |
+
+## Examples
+
+**User**: `/memory:health`
+**Action**: Quick health check of all components
+
+**User**: `/memory:health --timing`
+**Action**: Health check with latency percentiles
+
+**User**: `/memory:health --verbose`
+**Action**: Health check with detailed component info
+
+**User**: `/memory:health --timing --verbose`
+**Action**: Full health check with all details
+
+## Related Commands
+
+| Command | Description |
+|---------|-------------|
+| `/memory:status` | Detailed system status |
+| `/memory:metrics` | View all metrics |
+| `/memory:traces` | Recent operation traces |
+| `/memory:validate` | Full validation |
@@ -0,0 +1,161 @@
+---
+description: Display observability metrics for the memory system
+argument-hint: "[--format=text|json|prometheus] [--filter=<pattern>]"
+allowed-tools: ["Bash", "Read"]
+---
+
+<help_check>
+## Help Check
+
+If `$ARGUMENTS` contains `--help` or `-h`:
+
+**Output this help and HALT (do not proceed further):**
+
+<help_output>
+```
+METRICS(1)                                           User Commands                                           METRICS(1)
+
+NAME
+    metrics - Display observability metrics for the memory system
+
+SYNOPSIS
+    /memory:metrics [--format=text|json|prometheus] [--filter=<pattern>]
+
+DESCRIPTION
+    Display collected observability metrics including counters, histograms, and gauges.
+    Metrics track operation counts, durations, errors, and system health indicators.
+
+OPTIONS
+    --help, -h            Show this help message
+    --format=FORMAT       Output format: text (default), json, prometheus
+    --filter=PATTERN      Filter metrics by name pattern (e.g., "capture", "hook")
+
+EXAMPLES
+    /memory:metrics
+    /memory:metrics --format=json
+    /memory:metrics --format=prometheus
+    /memory:metrics --filter=hook
+    /memory:metrics --format=json --filter=capture
+    /memory:metrics --help
+
+SEE ALSO
+    /memory:status for system status
+    /memory:health for health checks with timing
+
+                                                                      METRICS(1)
+```
+</help_output>
+
+**After outputting help, HALT immediately. Do not proceed with command execution.**
+</help_check>
+
+---
+
+# /memory:metrics - Observability Metrics
+
+Display collected observability metrics for the memory system.
+
+## Your Task
+
+You will display metrics collected by the observability system.
+
+<step number="1" name="Parse Arguments">
+
+**Arguments format**: `$ARGUMENTS`
+
+Parse the following options:
+- `--format=text|json|prometheus` - Output format (default: text)
+- `--filter=<pattern>` - Filter metrics by name pattern
+
+</step>
+
+<step number="2" name="Collect and Display Metrics">
+
+**Execute the metrics collection**:
+```bash
+PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(ls -d ~/.claude/plugins/cache/git-notes-memory/memory-capture/*/ 2>/dev/null | head -1)}"
+uv run --directory "$PLUGIN_ROOT" python3 "$PLUGIN_ROOT/scripts/metrics.py" $ARGUMENTS
+```
+
+</step>
+
+<step number="3" name="Explain Output">
+
+After displaying metrics, provide context:
+
+**For text format**:
+```
+### Metric Types
+
+- **Counters**: Cumulative values that only increase (e.g., memories_captured_total)
+- **Histograms**: Distribution of values (e.g., capture_duration_ms)
+- **Gauges**: Current values that can go up or down (e.g., index_size_bytes)
+
+Use `--format=prometheus` for Prometheus/Grafana scraping.
+```
+
+**For JSON format**:
+```
+### JSON Structure
+
+The output contains:
+- `counters`: Name → value pairs
+- `histograms`: Name → {count, sum, avg, p50, p95, p99}
+- `gauges`: Name → current value
+```
+
+**For Prometheus format**:
+```
+### Prometheus Format
+
+Ready for scraping by Prometheus. Each metric includes:
+- TYPE declaration (counter, histogram, gauge)
+- HELP description
+- Labels in {key="value"} format
+```
+
+</step>
+
+## Output Formats
+
+| Format | Use Case |
+|--------|----------|
+| text | Human-readable output (default) |
+| json | Machine parsing, debugging |
+| prometheus | Prometheus/Grafana scraping |
+
+## Available Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| memories_captured_total | counter | Total memories captured by namespace |
+| capture_duration_ms | histogram | Capture operation timing |
+| recall_duration_ms | histogram | Search/recall timing |
+| hook_execution_duration_ms | histogram | Hook handler timing |
+| hook_executions_total | counter | Hook invocations by name and status |
+| hook_timeouts_total | counter | Hook timeout events |
+| silent_failures_total | counter | Logged silent failures by location |
+| embedding_duration_ms | histogram | Embedding generation timing |
+| index_operations_total | counter | Index CRUD operations |
+
+## Examples
+
+**User**: `/memory:metrics`
+**Action**: Show all metrics in text format
+
+**User**: `/memory:metrics --format=json`
+**Action**: Show all metrics as JSON
+
+**User**: `/memory:metrics --filter=hook`
+**Action**: Show only hook-related metrics
+
+**User**: `/memory:metrics --format=prometheus`
+**Action**: Show Prometheus exposition format for scraping
+
+## Related Commands
+
+| Command | Description |
+|---------|-------------|
+| `/memory:status` | System status and statistics |
+| `/memory:health` | Health checks with timing |
+| `/memory:validate` | Full system validation |