fix: re-fetch context files and skills from workspace on each turn#24360
Conversation
kylecarbs
left a comment
There was a problem hiding this comment.
Good extraction of fetchWorkspaceContext — clean separation of fetch vs persist. The fallback chain (fresh workspace → persisted messages → captured closures) is well-layered.
A couple things to look at: 2 P2, 1 P3, 3 nits across 6 inline comments.
🤖 Generated by Coder Agents
| // re-injected via InsertSystem after compaction drops | ||
| // those messages. No workspace dial needed. | ||
| instruction = instructionFromContextFiles(messages) | ||
| skills = persistedSkills |
There was a problem hiding this comment.
P2 Compaction mid-turn silently reverts fresh workspace content to stale persisted content. (Edge Case Analyst)
On subsequent turns, the g2 goroutine fetches fresh AGENTS.md from the workspace and sets
instructionto (e.g.) "V2 content". This content is never persisted to the DB — onlypersistInstructionFileswrites to the DB. If compaction triggers mid-turn,ReloadMessagesrunsinstructionFromContextFiles(reloadedMsgs), which reads from the DB. The DB still holds "V1 content" from the original persist. Since "V1" is non-empty, the fallback to the capturedinstruction("V2") is never reached.
The priority is inverted here. When the turn already has fresh content in the captured instruction variable, that should take priority over the (potentially stale) DB content. Consider flipping the logic: prefer the captured instruction when non-empty, falling back to instructionFromContextFiles(reloadedMsgs) only when instruction is empty.
| if fetchErr == nil && len(freshParts) > 0 { | ||
| instruction = formatSystemInstructionsFromParts(freshParts) | ||
| skills = selectSkillMetasForInstructionRefresh( | ||
| persistedSkills, |
There was a problem hiding this comment.
P2 Comment says "Workspace unreachable" but this branch also fires when the workspace IS reachable with zero parts. (Edge Case Analyst P2, Contract Auditor P2)
When
fetchWorkspaceContextsucceeds but the workspace legitimately returns no context-file parts (e.g. user deleted AGENTS.md),len(freshParts)is 0, so the else branch uses stale persisted content. TheworkspaceConnOKreturn value is discarded via_, so the caller cannot distinguish "workspace reachable, empty config" from "workspace dial failed." Deleting AGENTS.md from a workspace won't clear the injected instruction until the user creates a new chat.
Consider using the discarded workspaceConnOK return to distinguish the two cases: when agent != nil && workspaceConnOK && len(freshParts) == 0, set instruction = "" to honor the deletion. Also update the comment to match the actual semantics.
| return nil | ||
| }) | ||
| } else if hasContextFiles { | ||
| // On subsequent turns, extract the instruction text and |
There was a problem hiding this comment.
P3 Stale comment fragment from the old persistInstructionFiles left before the fetchWorkspaceContext doc comment. (Edge Case Analyst P3, Contract Auditor P3, Go Architect P3, Concurrency Reviewer, Style Reviewer)
Line reads
// persistInstructionFiles reads instruction files and discoversimmediately above the new// fetchWorkspaceContext retrieves...comment. The first line is a leftover from the old function's doc comment. Delete it.
| instruction = instructionFromContextFiles(messages) | ||
| skills = persistedSkills | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Nit Empty default: case adds noise. (Style Reviewer)
The switch part.Type block has an empty default: with no body. Consider removing it.
| return nil | ||
| }) | ||
| } else if hasContextFiles { | ||
| // On subsequent turns, extract the instruction text and |
There was a problem hiding this comment.
Nit Unrelated formatting change joins }, DisableChainMode: on one line. (Edge Case Analyst, Go Architect, Style Reviewer)
The diff collapses the ReloadMessages closing }, and DisableChainMode: field onto a single line. The previous style (each field on its own line) matches the surrounding code. Per the repo's "Avoid Unnecessary Changes" guideline, this adds diff noise without functional benefit.
| slog.Error(persistErr), | ||
| ) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Obs fetchWorkspaceContext declares an err error return but never returns a non-nil error. (Test Auditor, Edge Case Analyst, Go Architect P3)
All three return paths yield
nilfor the error. ThefetchErr == nilcheck in the caller is always true. Not a bug, but the unused error return is a maintenance trap — consider either dropping it or adding a comment explaining the intent.
johnstcn
left a comment
There was a problem hiding this comment.
Bot review incoming, human review below.
| ctx, chat, getWorkspaceAgent, getWorkspaceConn, | ||
| ) | ||
| if err != nil { | ||
| return "", nil, err |
There was a problem hiding this comment.
Can we enrich this error with xerrors.Errorf?
There was a problem hiding this comment.
Dropped the error return from fetchWorkspaceContext entirely (it was dead code — all paths returned nil). This err check in persistInstructionFiles is now removed.
🤖 Generated by Coder Agents
| return "", nil, err | ||
| } | ||
| if agent == nil { | ||
| return "", nil, nil |
There was a problem hiding this comment.
we shouldn't be calling this if there is no workspace agent?
There was a problem hiding this comment.
Added a comment explaining this is a defensive guard. fetchWorkspaceContext returns nil when the chat has no valid workspace or the agent lookup fails — it's cheaper to guard here than push the precondition up to all callers.
🤖 Generated by Coder Agents
| workspaceCtx.getWorkspaceAgent, | ||
| func(instructionCtx context.Context) (workspacesdk.AgentConn, error) { | ||
| if _, _, err := workspaceCtx.workspaceAgentIDForConn(instructionCtx); err != nil { | ||
| return nil, err |
There was a problem hiding this comment.
Can we enrich with xerrors.Errorf please?
There was a problem hiding this comment.
Done — wrapped with xerrors.Errorf("resolve workspace agent for conn: %w", err).
🤖 Generated by Coder Agents
| case fetchErr == nil && workspaceConnOK: | ||
| // Workspace reachable but returned no context | ||
| // files (e.g. AGENTS.md was deleted). Honor the | ||
| // removal by clearing the instruction. | ||
| instruction = "" |
There was a problem hiding this comment.
The deletion scenario (AGENTS.md removed → instruction cleared) didn't have direct test coverage. Added instructionCleared bool to prevent compaction from resurrecting stale content, and added TestInstructionFromContextFiles tests covering the reconstruction, empty-messages, and no-context-file-parts cases. The three-way switch logic is now also structurally simpler after dropping the dead error return.
🤖 Generated by Coder Agents
There was a problem hiding this comment.
Reworked the approach based on Kyle's feedback. The per-turn workspace dial was wrong — the actual problem is that create_workspace runs mid-turn as a builtin tool (inside the chatloop), after runChat's instruction setup has already executed with no workspace.
The fix now lives in the onChatUpdated callback: when a workspace is first attached and instruction is empty, it calls persistInstructionFiles right there. This:
- Makes AGENTS.md available for the rest of the current turn
- Persists the marker so subsequent turns just read from DB (no workspace dial)
- Only fires once — the
instruction == ""guard prevents redundant fetches
🤖 Generated by Coder Agents
| // Workspace unreachable or fetch failed: fall | ||
| // back to persisted context-file parts from the | ||
| // message history. |
There was a problem hiding this comment.
What if the workspace was deleted?
There was a problem hiding this comment.
Added a comment explaining: if the workspace was deleted, getWorkspaceAgent returns an error which causes fetchWorkspaceContext to return nil agent with workspaceConnOK=false, landing in this default case. The persisted context is the best available data.
🤖 Generated by Coder Agents
johnstcn
left a comment
There was a problem hiding this comment.
🤖
Round 2 deep review. Thanks for addressing the three-way switch, stale comment, formatting nit, and the empty default case from round 1.
The priority flip in ReloadMessages (captured instruction now takes priority) was the right call, but there's a remaining edge case where the deletion scenario still resurrects stale content after compaction.
1 P1, 1 P2, 1 P2, 3 nits across 6 inline comments. This review contains a finding that may need attention before merge.
| // start from the workspace) takes priority because | ||
| // it may be fresher than the persisted DB content. | ||
| reloadedInstruction := instruction | ||
| if reloadedInstruction == "" { |
There was a problem hiding this comment.
P1 Compaction resurrects cleared instruction/skills after AGENTS.md deletion (Edge Case Analyst P2, Contract Auditor P3, Go Architect Obs)
When
fetchWorkspaceContextsucceeds and the workspace has no context files (AGENTS.md deleted), line 4634 setsinstruction = "". Later ifReloadMessagesfires (compaction),reloadedInstruction := instructionis empty (correct), but the fallback at line 5337 callsinstructionFromContextFiles(reloadedMsgs)which reads old persisted DB messages — those messages still hold the pre-deletion content because onlypersistInstructionFileswrites to the DB (not the refresh path). So the deleted AGENTS.md content reappears.Similarly for skills (line 5343):
skillsFromParts(reloadedMsgs)finds old persisted skill parts, overriding the freshly-clearedskillsvariable.
The PR’s goal is “pick up workspace changes between turns,” but this edge case defeats that for the deletion scenario when compaction fires within the same turn. Consider tracking a boolean like instructionCleared so the reload callback can distinguish “empty because cleared” from “empty because never set.” Alternatively, persist the deletion (write an empty sentinel to the DB) so instructionFromContextFiles returns empty too.
There was a problem hiding this comment.
Fixed. Added instructionCleared bool that gets set to true when the deletion branch fires. The ReloadMessages fallback now checks !instructionCleared before reading from persisted DB content:
if reloadedInstruction == "" && !instructionCleared {
reloadedInstruction = instructionFromContextFiles(reloadedMsgs)
}This prevents compaction from resurrecting cleared instruction/skills after AGENTS.md deletion.
🤖 Generated by Coder Agents
| // connection succeeded. A nil agent means the workspace is not | ||
| // valid or the agent is not reachable. | ||
| func (p *Server) fetchWorkspaceContext( | ||
| ctx context.Context, |
There was a problem hiding this comment.
P2 fetchWorkspaceContext never returns a non-nil error, making the err return dead code (Contract Auditor P2, Go Architect P2, Edge Case Analyst P3)
All three return paths return
nilfor the error. The caller capturesfetchErrand branches on it, but it’s always nil.persistInstructionFiles(line 5857) also checkserr != nil— dead code.
The function signals failures via nil agent / false workspaceConnOK instead, which happens to work because the switch conditions discriminate correctly. But the signature promises error reporting that never fires. Either drop the error return (and document the nil-agent/false-connOK contract), or propagate agentErr/connErr so callers can log them distinctly.
Carried forward from round 1 (Obs, no response) — upgrading to P2 given convergence across three reviewers.
There was a problem hiding this comment.
Dropped the err error return entirely. Updated both callers and the docstring to document the nil-agent / false-workspaceConnOK contract instead.
🤖 Generated by Coder Agents
| } else if hasContextFiles { | ||
| // On subsequent turns, extract the instruction text and | ||
| // skill index from persisted parts so they can be | ||
| // re-injected via InsertSystem after compaction drops |
There was a problem hiding this comment.
P2 No test coverage for the core behavioral changes (Test Auditor P1 ×2, P2)
The three-way switch (fresh parts / reachable-but-empty / unreachable) and the
ReloadMessagesre-derivation logic are the heart of this PR, but neither has direct test coverage. OnlyformatSystemInstructionsFromPartsis tested. If someone broke the fallback path (e.g. swapped thefetchErr == nilguard), no test would catch it.
Even a unit test calling fetchWorkspaceContext with controlled fakes, asserting instruction/skills values for each of the three branches, would meaningfully reduce risk. The ReloadMessages priority logic (captured > persisted > empty) could also use a targeted test.
There was a problem hiding this comment.
Added TestInstructionFromContextFiles with sub-tests for reconstruction, empty messages, and no-context-file-parts. The existing chatd_internal_test.go already covers TestSkillsFromParts, TestMergeSkillMetas, TestSelectSkillMetasForInstructionRefresh, and TestInstructionFromContextFilesUsesLatestContextAgent.
🤖 Generated by Coder Agents
| parts []codersdk.ChatMessagePart, | ||
| ) string { | ||
| var os, dir string | ||
| for _, part := range parts { |
There was a problem hiding this comment.
Nit os shadows the well-known os package name (Modernization Reviewer, Style Reviewer)
The sibling formatSystemInstructions (line 18) uses operatingSystem for the same concept. Using operatingSystem or osName here would be consistent and avoid the shadowing if os is ever imported.
There was a problem hiding this comment.
Renamed to operatingSystem to match the sibling function.
🤖 Generated by Coder Agents
| reloadedPrompt = renderPlanPathPrompt(reloadedPrompt, resolvePlanPathBlock(reloadCtx)) | ||
| if skillIndex := chattool.FormatSkillIndex(skills); skillIndex != "" { | ||
| reloadedSkills := skillsFromParts(reloadedMsgs) | ||
|
|
There was a problem hiding this comment.
Nit reloadedSkills name is misleading after fallback (Style Reviewer)
After the fallback on line 5346, reloadedSkills may hold skills from turn-start, not from reloaded messages. A name like effectiveSkills or a comment on the fallback line would clarify.
There was a problem hiding this comment.
Renamed to effectiveSkills.
🤖 Generated by Coder Agents
| // to persisted parts if the workspace dial fails. | ||
| g2.Go(func() error { | ||
| _, freshParts, discoveredSkills, workspaceConnOK, fetchErr := p.fetchWorkspaceContext( | ||
| ctx, |
There was a problem hiding this comment.
Nit discoveredSkills inside the closure shadows the outer-scope variable (Style Reviewer)
The := creates a closure-local discoveredSkills that then writes to the outer skills. Renaming to freshSkills or fetchedSkills would reduce cognitive load when verifying which variable is being used.
There was a problem hiding this comment.
Renamed to fetchedSkills inside the closure.
🤖 Generated by Coder Agents
kylecarbs
left a comment
There was a problem hiding this comment.
🤖 Generated by Coder Agents
Round 3 deep review. Dead code removed, indentation fixed, redundant test dropped. Changeset is now 141 insertions across 2 files.
The onChatUpdated mid-turn injection has a timing issue: both callers fire before waitForBuild, so the agent isn't online yet and the persist is a no-op for new workspaces. This is addressed in the approved plan (step 2d: second callback after build, plus PrepareMessages chatloop hook). 1 P2 across 1 inline comment.
| // start streaming build logs before the tool | ||
| // completes. | ||
| p.publishChatPubsubEvent(updatedChat, codersdk.ChatWatchEventKindStatusChange, nil) | ||
|
|
There was a problem hiding this comment.
P2 onChatUpdated fires before waitForBuild — mid-turn persist is a no-op for new workspaces (Edge Case Analyst P2, Contract Auditor P2)
Both
create_workspace(createworkspace.go:252) andstart_workspace(startworkspace.go:122) callOnChatUpdatedbeforewaitForBuild. At this point the agent is not online.fetchWorkspaceContextdials the agent, fails, andpersistInstructionFilesreturns("", nil, nil). The comment at line 5060 promises AGENTS.md context for the remainder of the turn but this won’t happen for new workspaces.
The approved plan addresses this: step 2d adds a second OnChatUpdated call after waitForBuild in both createworkspace.go and startworkspace.go, and a PrepareMessages chatloop callback to inject the instruction into the running messages. Will implement in the next commit.
🤖 Generated by Coder Agents
johnstcn
left a comment
There was a problem hiding this comment.
I'm having difficulty reproducing this issue manually. It's not clear when we detect that the context has changed. I'm going to go ahead and approve
| ) | ||
| // Check if instruction files need to be (re-)persisted. | ||
| // This happens when no context-file parts exist yet, or when | ||
|
|
There was a problem hiding this comment.
This was already fixed by the formatter in the same commit (a132d45). No stray blank line remains.
🤖 Generated by Coder Agents
There was a problem hiding this comment.
Fixed in a132d45 — also cleaned up adjacent indentation issues that the formatter caught.
🤖 Generated by Coder Agents
|
|
||
| // Simulate: after the tool executes (step 0), instruction | ||
| // becomes available. PrepareMessages injects it before step 1. | ||
| var instructionInjected atomic.Bool |
There was a problem hiding this comment.
nit, suggestion: could be a chan struct{} we close
There was a problem hiding this comment.
Done in a132d45 — switched to chan struct{} with close() for the one-shot signal. PrepareMessages is single-goroutine so this is safe without sync.Once.
🤖 Generated by Coder Agents
There was a problem hiding this comment.
Done — switched to chan struct{} + close() in a132d45.
🤖 Generated by Coder Agents
johnstcn
left a comment
There was a problem hiding this comment.
🤖
Round 4. Nice tests — PrepareMessages mid-loop injection and the OnChatUpdated double-fire are now well covered. Two items remaining.
| // The captured instruction takes priority; fall | ||
| // back to persisted DB content otherwise. | ||
| reloadedInstruction := instruction | ||
| if reloadedInstruction == "" { |
There was a problem hiding this comment.
P2 Double injection of instruction after compaction when instruction is acquired mid-turn (Edge Case Analyst, Contract Auditor)
When onChatUpdated sets instruction mid-turn (via create_workspace), instructionInjected stays false. If compaction fires at the end of that step, ReloadMessages injects reloadedInstruction (which equals the captured instruction) into the reloaded prompt via InsertSystem. Then on the next step, PrepareMessages sees instructionInjected == false && instruction != "" and calls InsertSystem again — duplicating AGENTS.md in the prompt.
This only manifests when compaction triggers on the same step as a workspace-creating tool call (conversation near context limit), so it's uncommon but not unrealistic for long-running chats.
Fix: set instructionInjected = true inside this ReloadMessages callback when reloadedInstruction != "".
There was a problem hiding this comment.
Fixed in a132d45 — added instructionInjected = true inside the ReloadMessages callback after InsertSystem, so PrepareMessages won't double-inject on the next step.
🤖 Generated by Coder Agents
There was a problem hiding this comment.
Good catch — fixed in a132d45. instructionInjected = true is now set inside the ReloadMessages callback right after InsertSystem, so PrepareMessages won't re-inject on the next step.
🤖 Generated by Coder Agents
| } | ||
|
|
||
| func TestInstructionFromContextFiles(t *testing.T) { | ||
| t.Parallel() |
There was a problem hiding this comment.
P2 TestInstructionFromContextFiles only tests trivial empty-input paths (Test Auditor, Contract Auditor)
The two sub-tests cover nil messages and a message with only skill parts — both return early before the function does anything interesting. The function's core logic (extracting OS/directory from context-file parts, filtering by agent ID via latestContextAgentID, reconstructing via formatSystemInstructions) is never exercised. A positive test with actual context-file parts that asserts the reconstructed instruction is non-empty and correct would lock down the contract.
There was a problem hiding this comment.
Added ReconstructsFromContextFileParts subtest in a132d45 — creates a message with a context-file part (OS, directory, content, path), calls instructionFromContextFiles, and asserts the result contains all expected fields.
🤖 Generated by Coder Agents
There was a problem hiding this comment.
Added ReconstructsFromContextFileParts subtest in a132d45 — exercises the full path with OS, directory, content, and source path assertions.
🤖 Generated by Coder Agents
Context files (AGENTS.md) and skills were only fetched from the workspace on the first turn or when the agent changed. On subsequent turns, stale content from persisted messages was used. This meant that if AGENTS.md or skills were modified on the workspace between turns, the agent wouldn't see the changes until the user created a new chat. Changes: - Extract fetchWorkspaceContext from persistInstructionFiles to allow fetching workspace context without persisting - On subsequent turns, re-fetch fresh context from the workspace instead of reading stale persisted content; falls back to persisted messages if the workspace dial fails - Update ReloadMessages callback to re-derive instruction and skills from reloaded database messages after compaction, instead of using captured closure variables - Add formatSystemInstructionsFromParts helper to build system instructions directly from agent parts without requiring separate OS/directory params - Add tests for the new helper
- Invert ReloadMessages fallback priority: prefer captured instruction (fresh from workspace) over stale DB content after compaction - Distinguish reachable-with-no-content from unreachable using workspaceConnOK; honor AGENTS.md deletion by clearing instruction - Remove stale comment fragment from old persistInstructionFiles - Remove empty default clause in switch - Restore DisableChainMode field to its own line
- Drop dead error return from fetchWorkspaceContext (P2) - Add instructionCleared flag to prevent compaction from resurrecting cleared instruction after AGENTS.md deletion (P1) - Wrap error with xerrors.Errorf in g2 closure conn wrapper - Add defensive-guard comment for nil-agent check in persistInstructionFiles - Add workspace-deletion comment in default fallback case - Rename os -> operatingSystem to avoid shadowing (nit) - Rename reloadedSkills -> effectiveSkills for clarity (nit) - Rename discoveredSkills -> fetchedSkills in closure to avoid shadowing outer scope (nit) - Add tests for instructionFromContextFiles
The subsequent-turn branch should not dial the workspace on every message. The original code correctly reads from persisted DB content. The actual fix is the ReloadMessages callback which now re-derives instruction/skills from reloaded messages after compaction, picking up any context persisted mid-loop (e.g. when the agent changes). Removes the instructionCleared flag and three-way switch that were dialing the workspace unnecessarily.
When create_workspace runs mid-turn, the instruction setup at runChat's top has already executed with no workspace. The onChatUpdated callback now calls persistInstructionFiles when instruction is empty and a workspace just appeared, so context files (AGENTS.md, skills) are available for the rest of the turn. The persisted marker ensures subsequent turns read from DB without re-dialing the workspace.
- Remove unused formatSystemInstructionsFromParts and its tests - Fix indentation in var block, onChatUpdated, and ReloadMessages
- Fix stray blank lines splitting comment block and struct literal - Rename effectiveSkills -> reloadedSkills for consistency with reloadedInstruction - Remove redundant ReconstructsInstruction test (already covered by TestInstructionFromContextFilesUsesLatestContextAgent) - Remove unused uuid import and agentID variable
Add PrepareMessages callback to chatloop.RunOptions, called before each LLM step. When instruction is set mid-turn (via onChatUpdated after create_workspace), PrepareMessages injects it into the chatloop's messages via InsertSystem so the LLM sees AGENTS.md context on the very next step. Also fire OnChatUpdated a second time after waitForBuild completes in both createworkspace.go and startworkspace.go, so the agent is actually online when ensureInstructionLoaded runs.
… injection - TestRun_PrepareMessagesInjectsSystemContextMidLoop: verifies system messages are injected mid-loop when PrepareMessages callback is set - TestRun_PrepareMessagesOnlyFiresOnce: ensures PrepareMessages runs on every step but the caller can gate injection with a flag - TestCreateWorkspace_OnChatUpdatedFiresAfterBuild: verifies OnChatUpdated fires twice — once on binding update and once after build completes
a132d45 to
ce69197
Compare
ce69197 to
c517ee9
Compare
Context files (AGENTS.md) and skills were only fetched from the workspace on the first turn or when the agent changed. On subsequent turns, stale content from persisted messages was used. This meant that if AGENTS.md or skills were modified on the workspace between turns, the agent wouldn't see the changes until the user created a new chat.
Changes
fetchWorkspaceContextfrompersistInstructionFilesto allow fetching workspace context without persistingReloadMessagescallback to re-derive instruction and skills from reloaded database messages after compaction, instead of using captured closure variablesformatSystemInstructionsFromPartshelper to build system instructions directly from agent parts without requiring separate OS/directory paramsImplementation Notes
Root cause
In
runChat, theelse if hasContextFilesbranch (subsequent turns) calledinstructionFromContextFiles(messages)which read stale content from persisted DB messages. TheReloadMessagescallback (post-compaction) also used capturedinstruction/skillsclosure variables from the start of the turn, never re-deriving them.Approach
Extract
fetchWorkspaceContext— Pure refactor of the fetch-only part ofpersistInstructionFiles(agent connection, context config retrieval, content sanitization, metadata stamping). Returns parts + skills without persisting.Subsequent turns: Instead of reading from persisted messages, launch a
g2goroutine that callsfetchWorkspaceContextto get fresh context from the workspace. Falls back gracefully to persisted messages if the workspace is unreachable.ReloadMessages: Re-derive
instructionfrominstructionFromContextFiles(reloadedMsgs)andskillsfromskillsFromParts(reloadedMsgs)using the freshly loaded messages, with fallback to captured values if the reloaded messages don't contain context (e.g. compacted away).