Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix: hypothesis always "(not specified)" in daily experiment report#34037

Merged
pelikhan merged 7 commits into
mainfrom
copilot/investigate-hypothesis-not-found
May 22, 2026
Merged

fix: hypothesis always "(not specified)" in daily experiment report#34037
pelikhan merged 7 commits into
mainfrom
copilot/investigate-hypothesis-not-found

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

gh aw experiments analyze --repo was silently failing to load workflow frontmatter for every experiment, so hypothesis, analysis_type, guardrail_metrics, and min_samples overrides all fell back to defaults.

Root cause: loadRemoteExperimentConfigs tried to fetch .github/workflows/<experimentName>.md where experimentName is the sanitized branch suffix — hyphens stripped, lowercased (e.g. experiments/cicoach → tries cicoach.md). The actual file is ci-coach.md, so every API call 404'd and the function returned nil.

Fix:

  • Add findRemoteWorkflowFilenameForExperiment: lists .github/workflows/ via the GitHub contents API, then finds the .md file whose SanitizeWorkflowIDForCacheKey(basename) matches the experiment name — mirroring what findWorkflowFileForExperiment already does for local lookups.
  • Extract the scan loop into matchWorkflowFilenameByExperiment (pure helper; logs a warning on ambiguous collisions).
  • loadRemoteExperimentConfigs calls the new resolver first; bare sanitized name kept as last-resort fallback.
// Before: only tried "cicoach.md" → 404
candidates := workflowFileCandidates("cicoach")  // → ["cicoach"]

// After: directory listing finds "ci-coach.md" first
resolved := findRemoteWorkflowFilenameForExperiment(repo, "cicoach")  // → "ci-coach"
candidates = append([]string{resolved}, candidates...)

pr-sous-chef: branch updated from https://github.com/github/gh-aw/actions/runs/26296266826

Generated by 👨‍🍳 PR Sous Chef · ● 5.2M ·

Copilot AI and others added 3 commits May 22, 2026 14:49
Co-authored-by: pelikhan <[email protected]>
…ote workflow files

The experiments analyze command was failing to find remote workflow .md files
because workflowFileCandidates only returned the sanitized experiment name
(e.g. "cicoach") while the actual file has hyphens ("ci-coach.md").

Add findRemoteWorkflowFilenameForExperiment which lists .github/workflows/ via
the GitHub API and finds the file whose sanitized basename matches the experiment
name — mirroring the local findWorkflowFileForExperiment. Update
loadRemoteExperimentConfigs to use it as the primary lookup, falling back to
the bare sanitized name as before.

Co-authored-by: pelikhan <[email protected]>
Address code review: extract filename-matching logic into testable helper,
add warning log for ambiguous matches, and add unit tests covering all
real workflow name cases from the experiment report.

Co-authored-by: pelikhan <[email protected]>
Copilot AI changed the title fix: resolve hypothesis not shown in experiment report fix: hypothesis always "(not specified)" in daily experiment report May 22, 2026
Copilot AI requested a review from pelikhan May 22, 2026 14:55
@pelikhan pelikhan marked this pull request as ready for review May 22, 2026 14:57
Copilot AI review requested due to automatic review settings May 22, 2026 14:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes gh aw experiments analyze --repo failing to load remote workflow frontmatter when the experiment branch suffix is a sanitized workflow ID (hyphens removed), causing hypothesis/config overrides to always fall back to defaults.

Changes:

  • Add a remote workflow filename resolver that lists .github/workflows via the GitHub contents API and matches the real .md filename by comparing sanitized basenames.
  • Extract matching logic into a pure helper (matchWorkflowFilenameByExperiment) and add unit tests for matching + ambiguous collisions.
  • Regenerate/update workflow lock files (notably smoke-temporary-id.lock.yml) with large, unrelated workflow/template/version changes.
Show a summary per file
File Description
pkg/cli/experiments_command.go Resolves the correct remote workflow .md filename before fetching/parsing frontmatter configs.
pkg/cli/experiments_analyze_statistics_test.go Adds tests for the new filename matching helper and updates candidate fallback expectations.
.github/workflows/smoke-temporary-id.lock.yml Large regenerated lock workflow diff (template/version/step changes) unrelated to the CLI bugfix.
.github/workflows/ai-moderator.lock.yml Minor change to GH_AW_SKIP_BOTS list ordering/contents.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment on lines +365 to +366
// Prepend the resolved name so it is tried before the bare sanitized form.
candidates = append([]string{resolved}, candidates...)
Comment on lines +1 to 5
# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"f53e3dc40b0ce0efcc0d2598bbbfac5c5aaf180ae5eec48b8e0d8cea9c56f62e","strict":true,"agent_id":"copilot"}
# gh-aw-manifest: {"version":1,"secrets":["COPILOT_GITHUB_TOKEN","GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GH_AW_OTEL_GRAFANA_AUTHORIZATION","GH_AW_OTEL_GRAFANA_ENDPOINT","GH_AW_OTEL_SENTRY_AUTHORIZATION","GH_AW_OTEL_SENTRY_ENDPOINT","GITHUB_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.51"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.51"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.51"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.17"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]}
# ___ _ _
# / _ \ | | (_)
# | |_| | __ _ ___ _ __ | |_ _ ___
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments and address the unresolved review feedback on this PR.

Generated by 👨‍🍳 PR Sous Chef · ● 5.2M ·

@github-actions
Copy link
Copy Markdown
Contributor

@copilot please rebase onto main, rerun checks, and summarize any remaining blockers.

Generated by 👨‍🍳 PR Sous Chef · ● 5.2M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for remote experiment config resolution

The core logic correctly addresses the root cause: sanitized experiment names (hyphens removed) couldn't match hyphenated workflow filenames. The directory listing + sanitization match approach mirrors the local lookup pattern and is well-tested.

Review highlights

Strengths:

  • ✅ Comprehensive test coverage including edge cases (ambiguous names, no matches)
  • ✅ Proper error handling with graceful degradation to fallback
  • ✅ Clear function separation (findRemoteWorkflowFilenameForExperiment, matchWorkflowFilenameByExperiment)
  • ✅ Good documentation explaining the sanitization reversibility problem

Minor improvements suggested:

  • Deduplicate candidates to avoid wasteful double-fetch when resolved == experimentName
  • Log first fetch failure for better debugging

Lockfile changes:

  • Bot name reordering and workflow regeneration are standard artifacts - no concerns

🔎 Code quality review by PR Code Quality Reviewer · ● 813.9K

Comments that could not be inline-anchored

pkg/cli/experiments_command.go:366

Potential duplicate candidate entry: When findRemoteWorkflowFilenameForExperiment returns a basename that equals experimentName (e.g., a file named exactly "nohyphen.md"), line 366 prepends a duplicate entry. This wastes one API call attempt.

<details>
<summary>💡 Suggested fix</summary>

if resolved := findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName); resolved != &quot;&quot; &amp;&amp; resolved != experimentName {
    // Prepend the resolved name so it is tried before the ba…

</details>

<details><summary>pkg/cli/experiments_command.go:379</summary>

**Silent error swallowing in loop**: When API fetch or parsing fails for a candidate, the error is silently ignored (`continue` on lines 379, 386, 391, 396). While this fallback behavior is intentional, consider logging at least the first failure for debugging.

&lt;details&gt;
&lt;summary&gt;💡 Suggested improvement&lt;/summary&gt;

```go
for i, candidate := range candidates {
    apiPath := &quot;.github/workflows/&quot; + candidate + &quot;.md&quot;
    args := []string{&quot;api&quot;,
        &quot;repos/{owner}/{repo}/contents/&quot; + url.PathE…

</details>

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot

Merge main, recompile, lint go, lint js, review all comments

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR adds 105 new lines of code in business logic directories (above the 100-line default threshold) but does not link to an Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/34037-resolve-remote-workflow-filename-via-directory-listing.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff and description. The narrative names three real alternatives (combinatorial hyphen reinsertion, static name→file map, change the sanitizer) and the RFC 2119 section locks in directory-listing-first resolution with the bare sanitized name as fallback.
  2. Verify what the AI couldn't infer — in particular, the Deciders field is Unknown, and the negative consequence about contents-API rate limits assumes typical repo sizes; confirm those reflect reality.
  3. Refine or correct any sections that don't match the actual decision history (e.g. if you considered a fourth alternative, or if the sanitizer change was genuinely on the table).
  4. Commit the finalized ADR on this branch (the file path is already created).
  5. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-34037: Resolve Remote Workflow Filenames via Directory Listing

Once the ADR link appears in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why an ADR for a bug fix?

This is borderline — it's framed as a bug fix, but the chosen remedy commits the codebase to a particular resolution strategy (directory listing) over realistic alternatives (combinatorial candidate generation, a static map, or fixing the sanitizer itself). That kind of "which approach do we lock in?" choice is exactly what an ADR exists to record, so that a future contributor renaming a workflow or auditing API-call budgets understands why findRemoteWorkflowFilenameForExperiment exists and why workflowFileCandidates was demoted to a fallback.

If on review you decide this truly doesn't warrant an ADR (e.g. it's purely a regression fix of a previously documented design), you may mark the draft Status: Deprecated with a short justification rather than completing it.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four narrative sections plus a normative specification:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done? (≥2 genuine alternatives)
  • Consequences — What are the trade-offs (positive and negative)?
  • Normative Specification — RFC 2119 requirements that an implementation must satisfy.

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g. 34037-*.md for PR #34037).

References:

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 5.4M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with minor suggestions on error handling and test coverage.

📋 Review Summary

Fix Quality

Root cause addressed: The fix correctly mirrors the local lookup logic (findWorkflowFileForExperiment) for remote repos
Clean extraction: matchWorkflowFilenameByExperiment is a well-isolated pure helper
Good test coverage: New tests validate the core resolution logic with multiple hyphenation scenarios

Key Themes

  1. Error context — API failure logs could include path context for easier debugging
  2. Edge case tests — API failure and empty directory scenarios lack test coverage
  3. Log noise — Ambiguous match warning may repeat on every run if legitimately ambiguous files exist

Positive Highlights

✅ Excellent diagnostic example in the PR description (before/after candidate lists)
✅ Ambiguous collision warning prevents silent wrong-file selection
✅ Fallback candidate list maintained for graceful degradation when API unavailable
✅ Test names are descriptive and read as specifications


This is a solid bug fix with thoughtful fallback behavior. The suggestions above are minor refinements around error context and test completeness — none are blockers.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 813.2K

// GitHub API and returns the basename (without .md) of the first file whose sanitized name
// matches experimentName. This mirrors findWorkflowFileForExperiment for remote repos.
// Returns "" when the directory cannot be listed or no match is found.
func findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Missing error context — when the API call fails, the logged error lacks repository name for debugging in multi-repo scenarios.

💡 Suggested improvement
if err != nil {
    experimentsLog.Printf("Failed to list remote workflow files from %s (.github/workflows): %v", repoOverride, err)
    return ""
}

Adding the path context helps when debugging GitHub API permission or network issues.


// TestMatchWorkflowFilenameByExperimentAmbiguous verifies that the first match is returned
// and a warning is logged when multiple files share the same sanitized name.
func TestMatchWorkflowFilenameByExperimentAmbiguous(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Test coverage gap: API failure scenarios — findRemoteWorkflowFilenameForExperiment calls the GitHub API but no test verifies error handling when the API fails.

💡 Suggested test cases

Consider adding tests for:

func TestFindRemoteWorkflowFilenameForExperiment_APIFailure(t *testing.T) {
    // Test with invalid repo format
    result := findRemoteWorkflowFilenameForExperiment("invalid-repo", "cicoach")
    assert.Empty(t, result, "should return empty on API failure")
}

func TestFindRemoteWorkflowFilenameForExperiment_EmptyDirectory(t *testing.T) {
    // Mock scenario where directory exists but has no .md files
    // (may require refactoring to inject the API call)
}

These edge cases ensure graceful degradation when GitHub API is unavailable.

// Build the candidate list. First, use the directory listing to find the exact filename
// whose sanitized basename matches experimentName (e.g. "ci-coach" for "cicoach").
// Fall back to the bare experiment name if the listing is unavailable.
candidates := workflowFileCandidates(experimentName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Resolution order is inverted — resolved filename is prepended to candidates but comment says "tried before". The slice append makes resolved the first candidate, which is correct, but this happens after workflowFileCandidates() call.

💡 Clarify intent or reorder

Current flow:

candidates := workflowFileCandidates(experimentName)  // ["cicoach"]
if resolved := ...; resolved != "" {
    candidates = append([]string{resolved}, candidates...)  // ["ci-coach", "cicoach"]
}

This is correct but reads awkwardly. Consider:

Option A: Initialize empty, build in order:

var candidates []string
if resolved := findRemoteWorkflowFilenameForExperiment(repoOverride, experimentName); resolved != "" {
    candidates = append(candidates, resolved)
}
candidates = append(candidates, workflowFileCandidates(experimentName)...)

Option B: Keep current but clarify comment:

-// Prepend the resolved name so it is tried before the bare sanitized form.
+// Prepend the resolved name so it is tried first (before fallback candidates).

experimentsLog.Printf("Failed to parse remote workflow file listing: %v", err)
return ""
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Edge case: empty filenames list — when the directory exists but contains no .md files, this returns "" silently.

💡 Consider adding defensive logging
func matchWorkflowFilenameByExperiment(filenames []string, experimentName string) string {
    if len(filenames) == 0 {
        experimentsLog.Printf("No workflow files provided for matching experiment %q", experimentName)
        return ""
    }
    var matches []string
    // ... rest of function
}

This would help distinguish "no files in directory" from "files present but no match" when debugging.

}{
{"myworkflow", "myworkflow"},
{"daily-report", "daily-report"},
{"dailyreport", "dailyreport"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Assertion message missing — test checks presence but does not explain why the candidate should be in the list.

💡 Add descriptive assertion message
found := false
for _, c := range candidates {
    if c == tt.wantContains {
        found = true
        break
    }
}
assert.True(t, found, "candidates should include %q as fallback when directory listing unavailable", tt.wantContains)

Descriptive messages help future maintainers understand test intent when they fail.

}
if len(matches) == 0 {
return ""
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Ambiguous match warning may fire on every analysis run — if two files legitimately map to the same sanitized name (e.g., "ci-coach.md" and "cicoach.md"), this warning appears in logs repeatedly.

💡 Consider severity or deduplication

Options:

  1. Log once per session using a simple cache:
var warnedAmbiguous = make(map[string]bool)
if len(matches) > 1 {
    key := experimentName
    if !warnedAmbiguous[key] {
        experimentsLog.Printf("...", experimentName, strings.Join(matches, ", "))
        warnedAmbiguous[key] = true
    }
}
  1. Treat as debug-level if this is expected in some repos:
experimentsLog.Printf("[DEBUG] Ambiguous experiment name...")

Repeated warnings for expected conditions can obscure real issues in logs.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 2 test(s): 2 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (2 tests analyzed)
Metric Value
New/modified tests analyzed 2
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (100%)
Duplicate test clusters 0
Test inflation detected No (0.75:1 ratio)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
TestMatchWorkflowFilenameByExperiment pkg/cli/experiments_analyze_statistics_test.go:577 ✅ Design None — covers edge cases (not found, no hyphens)
TestMatchWorkflowFilenameByExperimentAmbiguous pkg/cli/experiments_analyze_statistics_test.go:608 ✅ Design None — verifies ambiguous match behavior

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 2 tests — unit (//go:build !integration)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 1.1M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

Copilot AI and others added 2 commits May 22, 2026 16:16
- Merge origin/main
- Skip duplicate candidate when findRemoteWorkflowFilenameForExperiment
  returns the same value as experimentName (avoids redundant API fetch)
- Fix gofmt formatting in experiments_analyze_statistics_test.go
- Fix JS prettier formatting in actions/setup/js/send_otlp_span.cjs
- Recompile all 234 workflow lock files

Co-authored-by: pelikhan <[email protected]>
…thub.com/github/gh-aw into copilot/investigate-hypothesis-not-found

Co-authored-by: pelikhan <[email protected]>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 22, 2026

@copilot

Merge main, recompile, lint go, lint js, review all comments

Done:

  • Merged origin/main
  • Recompiled all 234 workflow lock files (make recompile — 0 errors, 55 warnings all pre-existing)
  • Fixed gofmt issue in experiments_analyze_statistics_test.go
  • Fixed JS prettier issue in actions/setup/js/send_otlp_span.cjs
  • Addressed the duplicate-candidate review comment: resolved is now only prepended when it differs from experimentName

Copilot AI requested a review from pelikhan May 22, 2026 16:22
@pelikhan pelikhan merged commit 31ccffc into main May 22, 2026
@pelikhan pelikhan deleted the copilot/investigate-hypothesis-not-found branch May 22, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants