fix: hypothesis always "(not specified)" in daily experiment report#34037
Conversation
Co-authored-by: pelikhan <[email protected]>
…ote workflow files
The experiments analyze command was failing to find remote workflow .md files
because workflowFileCandidates only returned the sanitized experiment name
(e.g. "cicoach") while the actual file has hyphens ("ci-coach.md").
Add findRemoteWorkflowFilenameForExperiment which lists .github/workflows/ via
the GitHub API and finds the file whose sanitized basename matches the experiment
name — mirroring the local findWorkflowFileForExperiment. Update
loadRemoteExperimentConfigs to use it as the primary lookup, falling back to
the bare sanitized name as before.
Co-authored-by: pelikhan <[email protected]>
Address code review: extract filename-matching logic into testable helper, add warning log for ambiguous matches, and add unit tests covering all real workflow name cases from the experiment report. Co-authored-by: pelikhan <[email protected]>
There was a problem hiding this comment.
Pull request overview
Fixes gh aw experiments analyze --repo failing to load remote workflow frontmatter when the experiment branch suffix is a sanitized workflow ID (hyphens removed), causing hypothesis/config overrides to always fall back to defaults.
Changes:
- Add a remote workflow filename resolver that lists
.github/workflowsvia the GitHub contents API and matches the real.mdfilename by comparing sanitized basenames. - Extract matching logic into a pure helper (
matchWorkflowFilenameByExperiment) and add unit tests for matching + ambiguous collisions. - Regenerate/update workflow lock files (notably
smoke-temporary-id.lock.yml) with large, unrelated workflow/template/version changes.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/experiments_command.go | Resolves the correct remote workflow .md filename before fetching/parsing frontmatter configs. |
| pkg/cli/experiments_analyze_statistics_test.go | Adds tests for the new filename matching helper and updates candidate fallback expectations. |
| .github/workflows/smoke-temporary-id.lock.yml | Large regenerated lock workflow diff (template/version/step changes) unrelated to the CLI bugfix. |
| .github/workflows/ai-moderator.lock.yml | Minor change to GH_AW_SKIP_BOTS list ordering/contents. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 2
| // Prepend the resolved name so it is tried before the bare sanitized form. | ||
| candidates = append([]string{resolved}, candidates...) |
| # gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"f53e3dc40b0ce0efcc0d2598bbbfac5c5aaf180ae5eec48b8e0d8cea9c56f62e","strict":true,"agent_id":"copilot"} | ||
| # gh-aw-manifest: {"version":1,"secrets":["COPILOT_GITHUB_TOKEN","GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GH_AW_OTEL_GRAFANA_AUTHORIZATION","GH_AW_OTEL_GRAFANA_ENDPOINT","GH_AW_OTEL_SENTRY_AUTHORIZATION","GH_AW_OTEL_SENTRY_ENDPOINT","GITHUB_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.51"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.51"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.51"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.17"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} | ||
| # ___ _ _ | ||
| # / _ \ | | (_) | ||
| # | |_| | __ _ ___ _ __ | |_ _ ___ |
|
|
|
|
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Solid fix for remote experiment config resolution
The core logic correctly addresses the root cause: sanitized experiment names (hyphens removed) couldn't match hyphenated workflow filenames. The directory listing + sanitization match approach mirrors the local lookup pattern and is well-tested.
Review highlights
Strengths:
- ✅ Comprehensive test coverage including edge cases (ambiguous names, no matches)
- ✅ Proper error handling with graceful degradation to fallback
- ✅ Clear function separation (
findRemoteWorkflowFilenameForExperiment,matchWorkflowFilenameByExperiment) - ✅ Good documentation explaining the sanitization reversibility problem
Minor improvements suggested:
- Deduplicate candidates to avoid wasteful double-fetch when resolved == experimentName
- Log first fetch failure for better debugging
Lockfile changes:
- Bot name reordering and workflow regeneration are standard artifacts - no concerns
🔎 Code quality review by PR Code Quality Reviewer · ● 813.9K
Comments that could not be inline-anchored
pkg/cli/experiments_command.go:366
Potential duplicate candidate entry: When findRemoteWorkflowFilenameForExperiment returns a basename that equals experimentName (e.g., a file named exactly "nohyphen.md"), line 366 prepends a duplicate entry. This wastes one API call attempt.
<details>
<summary>💡 Suggested fix</summary>
if resolved := findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName); resolved != "" && resolved != experimentName {
// Prepend the resolved name so it is tried before the ba…
</details>
<details><summary>pkg/cli/experiments_command.go:379</summary>
**Silent error swallowing in loop**: When API fetch or parsing fails for a candidate, the error is silently ignored (`continue` on lines 379, 386, 391, 396). While this fallback behavior is intentional, consider logging at least the first failure for debugging.
<details>
<summary>💡 Suggested improvement</summary>
```go
for i, candidate := range candidates {
apiPath := ".github/workflows/" + candidate + ".md"
args := []string{"api",
"repos/{owner}/{repo}/contents/" + url.PathE…
</details>|
Merge main, recompile, lint go, lint js, review all comments |
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
🏗️ Design Decision Gate — ADR RequiredThis PR adds 105 new lines of code in business logic directories (above the 100-line default threshold) but does not link to an Architecture Decision Record (ADR). 📄 Draft ADR committed:
📋 What to do next
Once the ADR link appears in the PR body, this gate will re-run and verify the implementation matches the decision. ❓ Why an ADR for a bug fix?This is borderline — it's framed as a bug fix, but the chosen remedy commits the codebase to a particular resolution strategy (directory listing) over realistic alternatives (combinatorial candidate generation, a static map, or fixing the sanitizer itself). That kind of "which approach do we lock in?" choice is exactly what an ADR exists to record, so that a future contributor renaming a workflow or auditing API-call budgets understands why If on review you decide this truly doesn't warrant an ADR (e.g. it's purely a regression fix of a previously documented design), you may mark the draft 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four narrative sections plus a normative specification:
All ADRs are stored in References:
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — approving with minor suggestions on error handling and test coverage.
📋 Review Summary
Fix Quality
✅ Root cause addressed: The fix correctly mirrors the local lookup logic (findWorkflowFileForExperiment) for remote repos
✅ Clean extraction: matchWorkflowFilenameByExperiment is a well-isolated pure helper
✅ Good test coverage: New tests validate the core resolution logic with multiple hyphenation scenarios
Key Themes
- Error context — API failure logs could include path context for easier debugging
- Edge case tests — API failure and empty directory scenarios lack test coverage
- Log noise — Ambiguous match warning may repeat on every run if legitimately ambiguous files exist
Positive Highlights
✅ Excellent diagnostic example in the PR description (before/after candidate lists)
✅ Ambiguous collision warning prevents silent wrong-file selection
✅ Fallback candidate list maintained for graceful degradation when API unavailable
✅ Test names are descriptive and read as specifications
This is a solid bug fix with thoughtful fallback behavior. The suggestions above are minor refinements around error context and test completeness — none are blockers.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 813.2K
| // GitHub API and returns the basename (without .md) of the first file whose sanitized name | ||
| // matches experimentName. This mirrors findWorkflowFileForExperiment for remote repos. | ||
| // Returns "" when the directory cannot be listed or no match is found. | ||
| func findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName string) string { |
There was a problem hiding this comment.
[/diagnose] Missing error context — when the API call fails, the logged error lacks repository name for debugging in multi-repo scenarios.
💡 Suggested improvement
if err != nil {
experimentsLog.Printf("Failed to list remote workflow files from %s (.github/workflows): %v", repoOverride, err)
return ""
}Adding the path context helps when debugging GitHub API permission or network issues.
|
|
||
| // TestMatchWorkflowFilenameByExperimentAmbiguous verifies that the first match is returned | ||
| // and a warning is logged when multiple files share the same sanitized name. | ||
| func TestMatchWorkflowFilenameByExperimentAmbiguous(t *testing.T) { |
There was a problem hiding this comment.
[/tdd] Test coverage gap: API failure scenarios — findRemoteWorkflowFilenameForExperiment calls the GitHub API but no test verifies error handling when the API fails.
💡 Suggested test cases
Consider adding tests for:
func TestFindRemoteWorkflowFilenameForExperiment_APIFailure(t *testing.T) {
// Test with invalid repo format
result := findRemoteWorkflowFilenameForExperiment("invalid-repo", "cicoach")
assert.Empty(t, result, "should return empty on API failure")
}
func TestFindRemoteWorkflowFilenameForExperiment_EmptyDirectory(t *testing.T) {
// Mock scenario where directory exists but has no .md files
// (may require refactoring to inject the API call)
}These edge cases ensure graceful degradation when GitHub API is unavailable.
| // Build the candidate list. First, use the directory listing to find the exact filename | ||
| // whose sanitized basename matches experimentName (e.g. "ci-coach" for "cicoach"). | ||
| // Fall back to the bare experiment name if the listing is unavailable. | ||
| candidates := workflowFileCandidates(experimentName) |
There was a problem hiding this comment.
[/diagnose] Resolution order is inverted — resolved filename is prepended to candidates but comment says "tried before". The slice append makes resolved the first candidate, which is correct, but this happens after workflowFileCandidates() call.
💡 Clarify intent or reorder
Current flow:
candidates := workflowFileCandidates(experimentName) // ["cicoach"]
if resolved := ...; resolved != "" {
candidates = append([]string{resolved}, candidates...) // ["ci-coach", "cicoach"]
}This is correct but reads awkwardly. Consider:
Option A: Initialize empty, build in order:
var candidates []string
if resolved := findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName); resolved != "" {
candidates = append(candidates, resolved)
}
candidates = append(candidates, workflowFileCandidates(experimentName)...)Option B: Keep current but clarify comment:
-// Prepend the resolved name so it is tried before the bare sanitized form.
+// Prepend the resolved name so it is tried first (before fallback candidates).| experimentsLog.Printf("Failed to parse remote workflow file listing: %v", err) | ||
| return "" | ||
| } | ||
|
|
There was a problem hiding this comment.
[/diagnose] Edge case: empty filenames list — when the directory exists but contains no .md files, this returns "" silently.
💡 Consider adding defensive logging
func matchWorkflowFilenameByExperiment(filenames []string, experimentName string) string {
if len(filenames) == 0 {
experimentsLog.Printf("No workflow files provided for matching experiment %q", experimentName)
return ""
}
var matches []string
// ... rest of function
}This would help distinguish "no files in directory" from "files present but no match" when debugging.
| }{ | ||
| {"myworkflow", "myworkflow"}, | ||
| {"daily-report", "daily-report"}, | ||
| {"dailyreport", "dailyreport"}, |
There was a problem hiding this comment.
[/tdd] Assertion message missing — test checks presence but does not explain why the candidate should be in the list.
💡 Add descriptive assertion message
found := false
for _, c := range candidates {
if c == tt.wantContains {
found = true
break
}
}
assert.True(t, found, "candidates should include %q as fallback when directory listing unavailable", tt.wantContains)Descriptive messages help future maintainers understand test intent when they fail.
| } | ||
| if len(matches) == 0 { | ||
| return "" | ||
| } |
There was a problem hiding this comment.
[/diagnose] Ambiguous match warning may fire on every analysis run — if two files legitimately map to the same sanitized name (e.g., "ci-coach.md" and "cicoach.md"), this warning appears in logs repeatedly.
💡 Consider severity or deduplication
Options:
- Log once per session using a simple cache:
var warnedAmbiguous = make(map[string]bool)
if len(matches) > 1 {
key := experimentName
if !warnedAmbiguous[key] {
experimentsLog.Printf("...", experimentName, strings.Join(matches, ", "))
warnedAmbiguous[key] = true
}
}- Treat as debug-level if this is expected in some repos:
experimentsLog.Printf("[DEBUG] Ambiguous experiment name...")Repeated warnings for expected conditions can obscure real issues in logs.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics & Test Classification (2 tests analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
…ypothesis-not-found Co-authored-by: pelikhan <[email protected]>
- Merge origin/main - Skip duplicate candidate when findRemoteWorkflowFilenameForExperiment returns the same value as experimentName (avoids redundant API fetch) - Fix gofmt formatting in experiments_analyze_statistics_test.go - Fix JS prettier formatting in actions/setup/js/send_otlp_span.cjs - Recompile all 234 workflow lock files Co-authored-by: pelikhan <[email protected]>
…thub.com/github/gh-aw into copilot/investigate-hypothesis-not-found Co-authored-by: pelikhan <[email protected]>
Done:
|
gh aw experiments analyze --repowas silently failing to load workflow frontmatter for every experiment, sohypothesis,analysis_type,guardrail_metrics, andmin_samplesoverrides all fell back to defaults.Root cause:
loadRemoteExperimentConfigstried to fetch.github/workflows/<experimentName>.mdwhereexperimentNameis the sanitized branch suffix — hyphens stripped, lowercased (e.g.experiments/cicoach→ triescicoach.md). The actual file isci-coach.md, so every API call 404'd and the function returnednil.Fix:
findRemoteWorkflowFilenameForExperiment: lists.github/workflows/via the GitHub contents API, then finds the.mdfile whoseSanitizeWorkflowIDForCacheKey(basename)matches the experiment name — mirroring whatfindWorkflowFileForExperimentalready does for local lookups.matchWorkflowFilenameByExperiment(pure helper; logs a warning on ambiguous collisions).loadRemoteExperimentConfigscalls the new resolver first; bare sanitized name kept as last-resort fallback.pr-sous-chef: branch updated from https://github.com/github/gh-aw/actions/runs/26296266826