feat(cloudwatch-logs): Add field index recommender tools#2738
Merged
shri-tambe merged 9 commits intoApr 9, 2026
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2738 +/- ##
==========================================
+ Coverage 91.38% 91.40% +0.01%
==========================================
Files 1014 1017 +3
Lines 74971 75433 +462
Branches 12053 12153 +100
==========================================
+ Hits 68511 68947 +436
- Misses 3988 3995 +7
- Partials 2472 2491 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7e5aced to
1f01330
Compare
Add two MCP tools that analyze CloudWatch Logs Insights query history and recommend fields for indexing. Tools: 1. recommend_indexes_loggroup - deep analysis of a specific log group 2. recommend_indexes_account - fast triage across all log groups Features: - Parses CWLI, SQL, and PPL queries using API-provided queryLanguage - Handles log group name or ARN (strips :* suffix) - Chunks field existence queries (50 per batch) to avoid length limits - Strips quoted strings/regex patterns to avoid false positives - Excludes numeric-prefixed tokens (5m, 1h) and query aliases - Dynamic function call detection (no hardcoded function names) - Expanded SYSTEM_FIELDS with data-source-specific default indexes - Error handling: parse failures logged and skipped Scoring: frequency (30%), equality filter ratio (25%), recency (15%), scan volume (15%), cardinality top-10 (15%). Weights validated to sum to 1.0. Account tool uses lightweight analysis (no Insights queries) for speed. Log group tool runs full analysis with field existence and cardinality. 73 tests, ruff clean, pyright clean. Co-authored-by: Shrikant Tambe <[email protected]>
1f01330 to
fe35df5
Compare
- Use math.isclose() for floating-point weight validation to avoid platform-dependent rounding issues at import time - Replace fixed 1s polling in _run_quick_query with exponential backoff (0.2s start, 2s cap) for faster response on quick queries - Run field existence check chunks concurrently via asyncio.gather instead of sequentially, reducing wall-clock time proportionally to the number of chunks
goranmod
previously approved these changes
Apr 2, 2026
…ules Split the 1232-line index_recommender.py into three modules with clear single responsibilities and zero circular dependencies: - query_parser.py (274 lines): Regex patterns, CWLI/SQL/PPL parsing, language detection, field extraction, value stripping - scoring.py (256 lines): Weights, constants, Pydantic result models, scoring engine, cardinality refinement - index_recommender.py (463 lines): Analysis pipelines, API helpers, tool entry points No behavioral changes. All 73 index recommender tests pass, full suite of 444 tests pass.
8c29a14 to
5e7c563
Compare
goranmod
approved these changes
Apr 9, 2026
agiuliano
approved these changes
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Add two MCP tools to the
cloudwatch-mcp-serverthat analyze CloudWatch Logs Insights query history and recommend fields for indexing:recommend_indexes_loggroup— Deep analysis of a specific log group. Fetches query history, checks existing index policies, verifies field existence in log data, and scores candidates using: query frequency (30%), equality filter ratio (25%), recency (15%), scan volume (15%), and cardinality of top-10 fields (15%).recommend_indexes_account— Fast triage across all log groups in the account. Lightweight scan that parses query history and checks account-level index policies (single API call) without running per-log-group Insights queries.Key implementation details:
queryLanguagefield:*suffix)5m,1h) and query aliasesSYSTEM_FIELDSwith data-source-specific default indexes (VPC Flow Logs, Route53, WAF, CloudTrail)59 unit tests, 97% coverage.
User experience
Before: Users had no automated way to identify which CloudWatch Logs fields would benefit from indexing. They had to manually review query patterns and guess which fields to index.
After: Users can ask their MCP client to recommend field indexes:
recommend_indexes_accountscans account-wide query history in ~5 secondsrecommend_indexes_loggroupreturns scored recommendations with detailed breakdowns, already-indexed fields, and fields not found in log dataChecklist
If your change doesn't seem to apply, please leave them unchecked.
Is this a breaking change? N
RFC issue number: NA
Checklist:
Acknowledgment
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.