Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@trevorwhitney
Copy link
Collaborator

What this PR does / why we need it:

This PR adds a new endpoint at /loki/api/v1/config that exposes a mix of tenant limits and config that is useful to Drilldown to make decisions to improve the user experience.

This extends the work done in #18521, which was originally done for this purpose (of supporting Drilldown), but in practice has two major drawbacks.

First, the tenant limits endpoint is at /config/tenant/v1/limits which is a path outside of what the current Loki datasource can access, so in order for Drilldown to use there needs to be changes to core Grafana.

Second, that endpoint only returns limits, and there is also some config Drilldown needs to know about (such as if the pattern ingesters are enabled). It could be possible to move this to a per-tenant config, but this PR is a bit more future proof in case there are configs added in the future that are relevant to Drilldown.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@trevorwhitney trevorwhitney requested a review from a team as a code owner August 26, 2025 22:55
@trevorwhitney trevorwhitney added the claude-review Requests a review from Claude Code when added label Aug 29, 2025
@claude
Copy link

claude bot commented Aug 29, 2025

Claude finished @trevorwhitney's task —— View job


PR Review Complete ✅

I've completed a comprehensive review of PR #19028: feat: drilldown config endpoint

📋 Review Summary

This PR successfully adds a new /loki/api/v1/config endpoint that provides a JSON response containing filtered tenant limits and pattern ingester configuration for Drilldown. The implementation is solid with good test coverage.

✅ Positive Aspects

Code Quality & Architecture

  • Clean separation of concerns with dedicated DrilldownConfigResponse struct
  • Consistent error handling patterns throughout
  • Good code organization and readability
  • Proper use of existing Loki patterns and interfaces

Security & Authorization

  • ✅ Proper tenant ID extraction using tenant.ExtractTenantIDFromHTTPRequest
  • ✅ Respects the existing TenantLimitsAllowPublish allowlist for field filtering
  • ✅ Graceful degradation when tenant limits are unavailable
  • ✅ Consistent authorization model with existing endpoints

Test Coverage

  • Comprehensive test suite covering all major scenarios
  • ✅ Tests for fallback behavior (TenantLimits → Overrides → defaults)
  • ✅ Tests for allowlist filtering functionality
  • ✅ Tests for pattern ingester enabled/disabled states
  • ✅ Edge cases covered (missing tenants, empty configs)
  • ✅ Proper use of table-driven tests

🔍 Detailed Technical Analysis

Implementation Quality

pkg/loki/drilldown_config.go:9 - Simple, focused struct with clear field mappings

type DrilldownConfigResponse struct {
    Limits                 map[string]any `json:"limits"`
    PatternIngesterEnabled bool           `json:"pattern_ingester_enabled"`
    Version                string         `json:"version"`
}

pkg/loki/config_handler.go:193-243 - Handler follows Loki conventions:

  • Proper error handling with appropriate HTTP status codes
  • Tenant-aware limit resolution with fallback chain
  • JSON response with correct content-type
  • Uses existing filterLimitFields for consistency

Limit Resolution Logic

The fallback chain is implemented correctly:

  1. Try TenantLimits.TenantLimits(user) for per-tenant overrides
  2. Fall back to Overrides.DefaultLimits() for defaults
  3. Error if no limits are available

This matches the expected behavior described in the tests and follows Loki's multi-tenancy patterns.

Performance Considerations

  • ✅ Efficient O(1) allowlist lookup using map
  • ✅ Direct JSON marshaling of validation.Limits (has proper json tags)
  • ✅ No unnecessary data copying or transformations
  • ✅ Lightweight endpoint suitable for frequent polling

Pattern Integration

  • ✅ Clean integration with t.Cfg.Pattern.Enabled
  • ✅ Provides necessary config for Drilldown UI decisions

🚦 Minor Observations

Version Handling (pkg/loki/config_handler.go:226-229)

version := build.GetVersion().Version
if version == "" {
    version = "unknown"
}

Good defensive programming - handles edge cases where version might be empty.

Error Messages

All error messages are clear and actionable:

  • "No default limits configured" (line 213)
  • Clear tenant extraction errors propagated from dskit

🎯 Security Assessment

✅ No Security Concerns Identified

  • Uses existing tenant isolation mechanisms
  • Respects configured field allowlists
  • No credential exposure risk
  • Same authorization model as existing config endpoints
  • No potential for privilege escalation

📊 Test Coverage Analysis

The test suite is exemplary with:

  • Multiple test files covering different aspects
  • 527 lines of comprehensive test code
  • Edge case coverage including missing tenants, empty configs
  • Mock implementations properly simulating real behavior
  • Table-driven tests for maintainability
  • Response validation checking both structure and content

🎯 Recommendation

✅ APPROVED - This PR is well-implemented and ready for merge.

Why This PR Succeeds:

  1. Addresses real need: Solves Drilldown's path accessibility issue vs /config/tenant/v1/limits
  2. Future-proof: Extensible for additional config beyond just limits
  3. Consistent: Follows established Loki patterns and conventions
  4. Thoroughly tested: Comprehensive test coverage with edge cases
  5. Secure: Proper tenant isolation and field filtering
  6. Performant: Efficient implementation suitable for production use

No changes required - this implementation demonstrates good software engineering practices and is ready for production deployment.


@trevorwhitney
Copy link
Collaborator Author

@claude are you there? this is a test

@claude
Copy link

claude bot commented Aug 29, 2025

Claude finished @trevorwhitney's task —— View job


Yes, I'm here and working!

Todo List

  • Respond to test message

Test confirmed - Claude Code is operational and ready to help with your Loki development tasks!


@gtk-grafana
Copy link
Contributor

gtk-grafana commented Sep 3, 2025

@trevorwhitney I'm not seeing volume_enabled getting pulled through when running locally. Is this something we're able to add to this PR?

@gtk-grafana
Copy link
Contributor

A few more questions I asked in Slack that might be nice to have greater visibility on:

All of the limits we’re returning (i.e. max_query_length, max_query_lookback, max_query_range) seem to be duration, are there limits associated with bytes processed that we might want to expose?

pattern_persistence_enabled is in limits, but pattern_ingester_enabledis a top level property, is this on purpose?

otlp_config contains AttributesConfig properties that appear identical, same action, same attributes.
Why are there two entires? What does it mean?

{
  "otlp_config": {
      "LogAttributes": null,
      "ResourceAttributes": {
          "AttributesConfig": [
              {
                  "Action": "index_label",
                  "Attributes": [
                      "service.name",
                      "service.namespace",
                      "service.instance.id",
                      "deployment.environment",
                      "deployment.environment.name",
                      "cloud.region",
                      "cloud.availability_zone",
                      "k8s.cluster.name",
                      "k8s.namespace.name",
                      "k8s.pod.name",
                      "k8s.container.name",
                      "container.name",
                      "k8s.replicaset.name",
                      "k8s.deployment.name",
                      "k8s.statefulset.name",
                      "k8s.daemonset.name",
                      "k8s.cronjob.name",
                      "k8s.job.name"
                  ],
                  "Regex": ""
              },
              {
                  "Action": "index_label",
                  "Attributes": [
                      "service.name",
                      "service.namespace",
                      "service.instance.id",
                      "deployment.environment",
                      "deployment.environment.name",
                      "cloud.region",
                      "cloud.availability_zone",
                      "k8s.cluster.name",
                      "k8s.namespace.name",
                      "k8s.pod.name",
                      "k8s.container.name",
                      "container.name",
                      "k8s.replicaset.name",
                      "k8s.deployment.name",
                      "k8s.statefulset.name",
                      "k8s.daemonset.name",
                      "k8s.cronjob.name",
                      "k8s.job.name"
                  ],
                  "Regex": ""
              }
          ],
          "IgnoreDefaults": false
      },
      "ScopeAttributes": null,
      "SeverityTextAsLabel": false
  }
}

@gtk-grafana
Copy link
Contributor

All of the limits we’re returning (i.e. max_query_length, max_query_lookback, max_query_range) seem to be duration, are there limits associated with bytes processed that we might want to expose?

A: there are, but you'd need to make a case for them, because the argument to not expose them is that an operator may not want a user to discover certain limits on their queries, as they may request the operator to bump them

If discover_log_levels is false, can we assume that level is a label?

A: no, we can never assume anything about a customer's labeling strategy. discover log level was implemented to give us a consistent place we could assume level will be (detected_level).

pattern_persistence_enabled is in limits, but pattern_ingester_enabled is a top level property, is this on purpose?

A: yes, the latter is a Loki config, and the whole reason we needed an endpoint beyond the tenant limits endpoint

otlp_config contains AttributesConfig properties that appear identical, same action, same attributes.
Why are there two entires? What does it mean?

A: that's a good question, I'm not sure, I'd have to poke around a bit to see if this happens everywhere or is an artifact of your config.

@gtk-grafana
Copy link
Contributor

@trevorwhitney can we add max_entries_limit_per_query?

@gtk-grafana
Copy link
Contributor

And max_query_bytes_read?

@trevorwhitney
Copy link
Collaborator Author

@gtk-grafana I added the extra properties to the default allow list. I also confirmed the duplicate otlp config you noticed. It looks like we're registering the defaults twice, here and here. I'm investigating why.

@trevorwhitney trevorwhitney force-pushed the dirlldown-config-endpoint branch from 23ee772 to c0d7feb Compare September 17, 2025 22:34
@github-actions
Copy link
Contributor

github-actions bot commented Sep 17, 2025

💻 Deploy preview deleted.

@trevorwhitney
Copy link
Collaborator Author

chatted with @salvacorts about this one today. we're going to move the endpoint to something more drilldown specific to make it more clear what it's for (ie api/v1/drilldown-limits), and going to try and consolidate the logic a bit better with the existing limits endpoint.

Comment on lines -144 to -148
if t.TenantLimits == nil {
http.Error(w, "Tenant configs not enabled", http.StatusNotFound)
return
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic was moved, because previously defaults weren't working unless there was a runtime config file configured.

return
}
} else {
writeYAMLResponse(w, filteredLimits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit so feel free to disregard: it would be more idomatic to return early and avoid this if-else

if !forDrilldown {
    writeYAMLResponse(w, filteredLimits)
    return
}

// Hereafter, we know the response should be for drilldown so we build the custom response
version := build.GetVersion().Version
if version == "" {
	version = "unknown"
}
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I agree, I'll make that change!

@trevorwhitney trevorwhitney merged commit 52b5d95 into main Oct 9, 2025
65 checks passed
@trevorwhitney trevorwhitney deleted the dirlldown-config-endpoint branch October 9, 2025 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-review Requests a review from Claude Code when added size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants