Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(cloudwatch-logs): Add field index recommender tools#2738

Merged
shri-tambe merged 9 commits into
awslabs:mainfrom
shri-tambe:feat/cloudwatch-logs-index-recommender
Apr 9, 2026
Merged

feat(cloudwatch-logs): Add field index recommender tools#2738
shri-tambe merged 9 commits into
awslabs:mainfrom
shri-tambe:feat/cloudwatch-logs-index-recommender

Conversation

@shri-tambe
Copy link
Copy Markdown
Contributor

Summary

Changes

Add two MCP tools to the cloudwatch-mcp-server that analyze CloudWatch Logs Insights query history and recommend fields for indexing:

  1. recommend_indexes_loggroup — Deep analysis of a specific log group. Fetches query history, checks existing index policies, verifies field existence in log data, and scores candidates using: query frequency (30%), equality filter ratio (25%), recency (15%), scan volume (15%), and cardinality of top-10 fields (15%).

  2. recommend_indexes_account — Fast triage across all log groups in the account. Lightweight scan that parses query history and checks account-level index policies (single API call) without running per-log-group Insights queries.

Key implementation details:

  • Parses CWLI, SQL, and PPL queries using the API-provided queryLanguage field
  • Handles log group name or ARN input (strips :* suffix)
  • Chunks field existence queries (50 per batch) to avoid query string length limits
  • Strips quoted strings, regex patterns, and inline comments to avoid false field matches
  • Excludes numeric-prefixed tokens (5m, 1h) and query aliases
  • Dynamic function call detection (no hardcoded function names)
  • Expanded SYSTEM_FIELDS with data-source-specific default indexes (VPC Flow Logs, Route53, WAF, CloudTrail)
  • Graceful error handling: parse failures logged and skipped, API errors produce warnings not crashes

59 unit tests, 97% coverage.

User experience

Before: Users had no automated way to identify which CloudWatch Logs fields would benefit from indexing. They had to manually review query patterns and guess which fields to index.

After: Users can ask their MCP client to recommend field indexes:

  • "Which log groups in my account would benefit from field indexing?"recommend_indexes_account scans account-wide query history in ~5 seconds
  • "What fields should I index for /aws/lambda/my-func?"recommend_indexes_loggroup returns scored recommendations with detailed breakdowns, already-indexed fields, and fields not found in log data

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Is this a breaking change? N

RFC issue number: NA

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 94.37229% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.40%. Comparing base (04d2ca2) to head (5e7c563).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...udwatch_mcp_server/cloudwatch_logs/query_parser.py 89.93% 3 Missing and 12 partials ⚠️
...ch_mcp_server/cloudwatch_logs/index_recommender.py 95.75% 3 Missing and 6 partials ⚠️
...s/cloudwatch_mcp_server/cloudwatch_logs/scoring.py 97.95% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2738      +/-   ##
==========================================
+ Coverage   91.38%   91.40%   +0.01%     
==========================================
  Files        1014     1017       +3     
  Lines       74971    75433     +462     
  Branches    12053    12153     +100     
==========================================
+ Hits        68511    68947     +436     
- Misses       3988     3995       +7     
- Partials     2472     2491      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shri-tambe shri-tambe force-pushed the feat/cloudwatch-logs-index-recommender branch from 7e5aced to 1f01330 Compare March 20, 2026 08:06
Add two MCP tools that analyze CloudWatch Logs Insights query history
and recommend fields for indexing.

Tools:
1. recommend_indexes_loggroup - deep analysis of a specific log group
2. recommend_indexes_account - fast triage across all log groups

Features:
- Parses CWLI, SQL, and PPL queries using API-provided queryLanguage
- Handles log group name or ARN (strips :* suffix)
- Chunks field existence queries (50 per batch) to avoid length limits
- Strips quoted strings/regex patterns to avoid false positives
- Excludes numeric-prefixed tokens (5m, 1h) and query aliases
- Dynamic function call detection (no hardcoded function names)
- Expanded SYSTEM_FIELDS with data-source-specific default indexes
- Error handling: parse failures logged and skipped

Scoring: frequency (30%), equality filter ratio (25%), recency (15%),
scan volume (15%), cardinality top-10 (15%). Weights validated to sum
to 1.0.

Account tool uses lightweight analysis (no Insights queries) for speed.
Log group tool runs full analysis with field existence and cardinality.

73 tests, ruff clean, pyright clean.

Co-authored-by: Shrikant Tambe <[email protected]>
@shri-tambe shri-tambe force-pushed the feat/cloudwatch-logs-index-recommender branch from 1f01330 to fe35df5 Compare March 20, 2026 08:06
gcacace and others added 6 commits March 20, 2026 08:20
- Use math.isclose() for floating-point weight validation to avoid
  platform-dependent rounding issues at import time
- Replace fixed 1s polling in _run_quick_query with exponential backoff
  (0.2s start, 2s cap) for faster response on quick queries
- Run field existence check chunks concurrently via asyncio.gather
  instead of sequentially, reducing wall-clock time proportionally
  to the number of chunks
@shri-tambe shri-tambe enabled auto-merge April 2, 2026 03:27
goranmod
goranmod previously approved these changes Apr 2, 2026
…ules

Split the 1232-line index_recommender.py into three modules with clear
single responsibilities and zero circular dependencies:

- query_parser.py (274 lines): Regex patterns, CWLI/SQL/PPL parsing,
  language detection, field extraction, value stripping
- scoring.py (256 lines): Weights, constants, Pydantic result models,
  scoring engine, cardinality refinement
- index_recommender.py (463 lines): Analysis pipelines, API helpers,
  tool entry points

No behavioral changes. All 73 index recommender tests pass, full suite
of 444 tests pass.
@shri-tambe shri-tambe force-pushed the feat/cloudwatch-logs-index-recommender branch from 8c29a14 to 5e7c563 Compare April 9, 2026 03:44
@shri-tambe shri-tambe added this pull request to the merge queue Apr 9, 2026
Merged via the queue into awslabs:main with commit 69677e8 Apr 9, 2026
147 checks passed
@shri-tambe shri-tambe deleted the feat/cloudwatch-logs-index-recommender branch April 9, 2026 17:06
@github-project-automation github-project-automation Bot moved this from To triage to Done in awslabs/mcp Project Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants