Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 962191f

Browse files
authored
Restore expired sessions via summary-based catch-up (#52)
When the backend loses a session (typically: HF Space restarted), the chat now shows a small inline banner instead of silently erroring: Where were we? Let me skim the conversation so far and pick up right where we left off — or we can start something new. [ Catch me up ] [ Start fresh ] Flow, in one pass: * Frontend stashes the raw backend messages in localStorage on every mount-hydrate and turn_complete. When the backend 404s for the session id, the SSE transport and the mount effect both fire onSessionDead → the session is flagged `expired` in sessionStore; the sidebar marks it "needs a catch-up". * Catch me up → POST /api/session/restore-summary with the cached messages. Backend creates a fresh session, runs the existing summarizer (factored into summarize_messages() and shared with in-session compaction) with a restore-specific prompt that preserves the tool-call trail, and seeds the new session with that summary wrapped in a [SYSTEM: ...] user turn. New id is swapped back via renameSession; UIMessages + backend-cache move with it. * Start fresh → delete the session + its caches. * The design is lazy per session: users with 5 stale tabs only pay for a summary on the ones they actually reopen. Frontend filters [SYSTEM: ...] user turns from rendering so the seed message (plus existing doom-loop / compact nudges) stays invisible. For sessions that predate the raw-message cache, fall back to reconstructing the backend message list from the longstanding UIMessage cache (tool calls + paired results, text preserved). Also sets litellm.modify_params = True globally in agent/__init__.py (moved out of agent/main.py) so the backend entry also picks it up — required for Anthropic to accept a history containing tool_calls without a `tools=` kwarg, which is exactly the summarization shape.
1 parent 28b8f2b commit 962191f

15 files changed

Lines changed: 580 additions & 41 deletions

File tree

‎agent/__init__.py‎

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,20 @@
22
HF Agent - Main agent module
33
"""
44

5-
from agent.core.agent_loop import submission_loop
5+
import litellm
6+
7+
# Global LiteLLM behavior — set once at package import so both CLI and
8+
# backend entries share the same config.
9+
# drop_params: quietly drop unsupported params rather than raising
10+
# suppress_debug_info: hide the noisy "Give Feedback" banner on errors
11+
# modify_params: let LiteLLM patch Anthropic's tool-call requirements
12+
# (synthesize a dummy tool spec when we call completion on a history
13+
# that contains tool_calls but aren't passing `tools=` — happens
14+
# during summarization / session seeding).
15+
litellm.drop_params = True
16+
litellm.suppress_debug_info = True
17+
litellm.modify_params = True
18+
19+
from agent.core.agent_loop import submission_loop # noqa: E402
620

721
__all__ = ["submission_loop"]

‎agent/context_manager/manager.py‎

Lines changed: 78 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,63 @@ def _get_hf_username(hf_token: str | None = None) -> str:
6868
return "unknown"
6969

7070

71+
_COMPACT_PROMPT = (
72+
"Please provide a concise summary of the conversation above, focusing on "
73+
"key decisions, the 'why' behind the decisions, problems solved, and "
74+
"important context needed for developing further. Your summary will be "
75+
"given to someone who has never worked on this project before and they "
76+
"will be have to be filled in."
77+
)
78+
79+
# Used when seeding a brand-new session from prior browser-cached messages.
80+
# Here we're writing a note to *ourselves* — so preserve the tool-call trail,
81+
# files produced, and planned next steps in first person. Optimized for
82+
# continuity, not brevity.
83+
_RESTORE_PROMPT = (
84+
"You're about to be restored into a fresh session with no memory of the "
85+
"conversation above. Write a first-person note to your future self so "
86+
"you can continue right where you left off. Include:\n"
87+
" • What the user originally asked for and what progress you've made.\n"
88+
" • Every tool you called, with arguments and a one-line result summary.\n"
89+
" • Any code, files, scripts, or artifacts you produced (with paths).\n"
90+
" • Key decisions and the reasoning behind them.\n"
91+
" • What you were planning to do next.\n\n"
92+
"Don't be cute. Be specific. This is the only context you'll have."
93+
)
94+
95+
96+
async def summarize_messages(
97+
messages: list[Message],
98+
model_name: str,
99+
hf_token: str | None = None,
100+
max_tokens: int = 2000,
101+
tool_specs: list[dict] | None = None,
102+
prompt: str = _COMPACT_PROMPT,
103+
) -> tuple[str, int]:
104+
"""Run a summarization prompt against a list of messages.
105+
106+
``prompt`` defaults to the compaction prompt (terse, decision-focused).
107+
Callers seeding a new session after a restart should pass ``_RESTORE_PROMPT``
108+
instead — it preserves the tool-call trail so the agent can answer
109+
follow-up questions about what it did.
110+
111+
Returns ``(summary_text, completion_tokens)``.
112+
"""
113+
from agent.core.llm_params import _resolve_llm_params
114+
115+
prompt_messages = list(messages) + [Message(role="user", content=prompt)]
116+
llm_params = _resolve_llm_params(model_name, hf_token, reasoning_effort="high")
117+
response = await acompletion(
118+
messages=prompt_messages,
119+
max_completion_tokens=max_tokens,
120+
tools=tool_specs,
121+
**llm_params,
122+
)
123+
summary = response.choices[0].message.content or ""
124+
completion_tokens = response.usage.completion_tokens if response.usage else 0
125+
return summary, completion_tokens
126+
127+
71128
class ContextManager:
72129
"""Manages conversation context and message history for the agent"""
73130

@@ -318,32 +375,32 @@ async def compact(
318375
if not messages_to_summarize:
319376
return
320377

321-
messages_to_summarize.append(
322-
Message(
323-
role="user",
324-
content="Please provide a concise summary of the conversation above, focusing on key decisions, the 'why' behind the decisions, problems solved, and important context needed for developing further. Your summary will be given to someone who has never worked on this project before and they will be have to be filled in.",
325-
)
326-
)
327-
328-
from agent.core.llm_params import _resolve_llm_params
329-
330-
llm_params = _resolve_llm_params(model_name, hf_token, reasoning_effort="high")
331-
response = await acompletion(
332-
messages=messages_to_summarize,
333-
max_completion_tokens=self.compact_size,
334-
tools=tool_specs,
335-
**llm_params,
336-
)
337-
summarized_message = Message(
338-
role="assistant", content=response.choices[0].message.content
378+
summary, completion_tokens = await summarize_messages(
379+
messages_to_summarize,
380+
model_name=model_name,
381+
hf_token=hf_token,
382+
max_tokens=self.compact_size,
383+
tool_specs=tool_specs,
384+
prompt=_COMPACT_PROMPT,
339385
)
386+
summarized_message = Message(role="assistant", content=summary)
340387

341388
# Reconstruct: system + first user msg + summary + recent messages
342389
head = [system_msg] if system_msg else []
343390
if first_user_msg:
344391
head.append(first_user_msg)
345392
self.items = head + [summarized_message] + recent_messages
346393

347-
self.running_context_usage = (
348-
len(self.system_prompt) // 4 + response.usage.completion_tokens
349-
)
394+
# Count the actual post-compact context — system prompt + first user
395+
# turn + summary + the preserved tail all contribute, not just the
396+
# summary. litellm.token_counter uses the model's real tokenizer.
397+
from litellm import token_counter
398+
399+
try:
400+
self.running_context_usage = token_counter(
401+
model=model_name,
402+
messages=[m.model_dump() for m in self.items],
403+
)
404+
except Exception as e:
405+
logger.warning("token_counter failed post-compact (%s); falling back to rough estimate", e)
406+
self.running_context_usage = len(self.system_prompt) // 4 + completion_tokens

‎backend/routes/agent.py‎

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,50 @@ async def create_session(
227227
return SessionResponse(session_id=session_id, ready=True)
228228

229229

230+
@router.post("/session/restore-summary", response_model=SessionResponse)
231+
async def restore_session_summary(
232+
request: Request, body: dict, user: dict = Depends(get_current_user)
233+
) -> SessionResponse:
234+
"""Create a new session seeded with a summary of the caller's prior
235+
conversation. The client sends its cached messages; we run the standard
236+
summarization prompt on them and drop the result into the new
237+
session's context as a user-role system note.
238+
"""
239+
messages = body.get("messages")
240+
if not isinstance(messages, list) or not messages:
241+
raise HTTPException(status_code=400, detail="Missing 'messages' array")
242+
243+
hf_token = None
244+
auth_header = request.headers.get("Authorization", "")
245+
if auth_header.startswith("Bearer "):
246+
hf_token = auth_header[7:]
247+
if not hf_token:
248+
hf_token = request.cookies.get("hf_access_token")
249+
if not hf_token:
250+
hf_token = os.environ.get("HF_TOKEN")
251+
252+
try:
253+
session_id = await session_manager.create_session(
254+
user_id=user["user_id"], hf_token=hf_token
255+
)
256+
except SessionCapacityError as e:
257+
raise HTTPException(status_code=503, detail=str(e))
258+
259+
try:
260+
summarized = await session_manager.seed_from_summary(session_id, messages)
261+
except ValueError as e:
262+
raise HTTPException(status_code=500, detail=str(e))
263+
except Exception as e:
264+
logger.exception("seed_from_summary failed")
265+
raise HTTPException(status_code=500, detail=f"Summary failed: {e}")
266+
267+
logger.info(
268+
f"Seeded session {session_id} for {user.get('username', 'unknown')} "
269+
f"(summary of {summarized} messages)"
270+
)
271+
return SessionResponse(session_id=session_id, ready=True)
272+
273+
230274
@router.get("/session/{session_id}", response_model=SessionInfo)
231275
async def get_session(
232276
session_id: str, user: dict = Depends(get_current_user)

‎backend/session_manager.py‎

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,69 @@ def _create_session_sync():
207207
logger.info(f"Created session {session_id} for user {user_id}")
208208
return session_id
209209

210+
async def seed_from_summary(self, session_id: str, messages: list[dict]) -> int:
211+
"""Rehydrate a session from cached prior messages via summarization.
212+
213+
Runs the standard summarization prompt (same one compaction uses)
214+
over the provided messages, then seeds the new session's context
215+
with that summary. Tool-call pairing concerns disappear because the
216+
output is plain text. Returns the number of messages summarized.
217+
"""
218+
from litellm import Message
219+
220+
from agent.context_manager.manager import _RESTORE_PROMPT, summarize_messages
221+
222+
agent_session = self.sessions.get(session_id)
223+
if not agent_session:
224+
raise ValueError(f"Session {session_id} not found")
225+
226+
# Parse into Message objects, tolerating malformed entries.
227+
parsed: list[Message] = []
228+
for raw in messages:
229+
if raw.get("role") == "system":
230+
continue # the new session has its own system prompt
231+
try:
232+
parsed.append(Message.model_validate(raw))
233+
except Exception as e:
234+
logger.warning("Dropping malformed message during seed: %s", e)
235+
236+
if not parsed:
237+
return 0
238+
239+
session = agent_session.session
240+
# Pass the real tool specs so the summarizer sees what the agent
241+
# actually has — otherwise Anthropic's modify_params injects a
242+
# dummy tool and the summarizer editorializes that the original
243+
# tool calls were fabricated.
244+
tool_specs = None
245+
try:
246+
tool_specs = agent_session.tool_router.get_tool_specs_for_llm()
247+
except Exception:
248+
pass
249+
try:
250+
summary, _ = await summarize_messages(
251+
parsed,
252+
model_name=session.config.model_name,
253+
hf_token=session.hf_token,
254+
max_tokens=4000,
255+
prompt=_RESTORE_PROMPT,
256+
tool_specs=tool_specs,
257+
)
258+
except Exception as e:
259+
logger.error("Summary call failed during seed: %s", e)
260+
raise
261+
262+
seed = Message(
263+
role="user",
264+
content=(
265+
"[SYSTEM: Your prior memory of this conversation — written "
266+
"in your own voice right before restart. Continue from here.]\n\n"
267+
+ (summary or "(no summary returned)")
268+
),
269+
)
270+
session.context_manager.items.append(seed)
271+
return len(parsed)
272+
210273
@staticmethod
211274
async def _cleanup_sandbox(session: Session) -> None:
212275
"""Delete the sandbox Space if one was created for this session."""
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
/**
2+
* Shown inline in a chat when the backend no longer recognizes the
3+
* session id (typically: Space was restarted). Lets the user catch the
4+
* agent up with a summary of the prior conversation, or start over.
5+
*/
6+
import { useState, useCallback } from 'react';
7+
import { Box, Button, CircularProgress, Typography } from '@mui/material';
8+
import { apiFetch } from '@/utils/api';
9+
import { useSessionStore } from '@/store/sessionStore';
10+
import { useAgentStore } from '@/store/agentStore';
11+
import { loadBackendMessages } from '@/lib/backend-message-store';
12+
import { loadMessages } from '@/lib/chat-message-store';
13+
import { uiMessagesToLLMMessages } from '@/lib/convert-llm-messages';
14+
import { logger } from '@/utils/logger';
15+
16+
interface Props {
17+
sessionId: string;
18+
}
19+
20+
export default function ExpiredBanner({ sessionId }: Props) {
21+
const { renameSession, deleteSession } = useSessionStore();
22+
const [busy, setBusy] = useState<'catch-up' | 'start-over' | null>(null);
23+
const [error, setError] = useState<string | null>(null);
24+
25+
const handleCatchUp = useCallback(async () => {
26+
setBusy('catch-up');
27+
setError(null);
28+
try {
29+
// Prefer the raw backend-message cache; fall back to reconstructing
30+
// from UIMessages (for sessions that predate the backend cache).
31+
let messages = loadBackendMessages(sessionId);
32+
if (!messages || messages.length === 0) {
33+
const uiMsgs = loadMessages(sessionId);
34+
if (uiMsgs.length > 0) messages = uiMessagesToLLMMessages(uiMsgs);
35+
}
36+
if (!messages || messages.length === 0) {
37+
setError('Nothing to summarize from this chat.');
38+
setBusy(null);
39+
return;
40+
}
41+
42+
const res = await apiFetch('/api/session/restore-summary', {
43+
method: 'POST',
44+
body: JSON.stringify({ messages }),
45+
});
46+
if (!res.ok) throw new Error(`restore-summary failed: ${res.status}`);
47+
const data = await res.json();
48+
const newId = data.session_id as string | undefined;
49+
if (!newId) throw new Error('no session_id in response');
50+
51+
useAgentStore.getState().clearSessionState(sessionId);
52+
renameSession(sessionId, newId);
53+
} catch (e) {
54+
logger.warn('Catch-up failed:', e);
55+
setError("Couldn't catch up — try starting over.");
56+
setBusy(null);
57+
}
58+
}, [sessionId, renameSession]);
59+
60+
const handleStartOver = useCallback(() => {
61+
setBusy('start-over');
62+
useAgentStore.getState().clearSessionState(sessionId);
63+
deleteSession(sessionId);
64+
}, [sessionId, deleteSession]);
65+
66+
return (
67+
<Box
68+
sx={{
69+
mx: { xs: 2, md: 'auto' },
70+
my: 2,
71+
maxWidth: 720,
72+
p: 2.5,
73+
borderRadius: 2,
74+
border: '1px solid',
75+
borderColor: 'divider',
76+
bgcolor: 'background.paper',
77+
boxShadow: '0 1px 3px rgba(0,0,0,0.06)',
78+
}}
79+
>
80+
<Typography variant="body1" sx={{ fontWeight: 600, mb: 0.5 }}>
81+
Where were we?
82+
</Typography>
83+
<Typography variant="body2" sx={{ color: 'text.secondary', mb: 2 }}>
84+
Let me skim the conversation so far and pick up right where we left
85+
off — or we can start something new.
86+
</Typography>
87+
<Box sx={{ display: 'flex', gap: 1, flexWrap: 'wrap' }}>
88+
<Button
89+
variant="contained"
90+
onClick={handleCatchUp}
91+
disabled={busy !== null}
92+
startIcon={busy === 'catch-up' ? <CircularProgress size={16} color="inherit" /> : null}
93+
sx={{ textTransform: 'none' }}
94+
>
95+
{busy === 'catch-up' ? 'Catching up…' : 'Catch me up'}
96+
</Button>
97+
<Button
98+
variant="outlined"
99+
onClick={handleStartOver}
100+
disabled={busy !== null}
101+
sx={{ textTransform: 'none' }}
102+
>
103+
Start fresh
104+
</Button>
105+
</Box>
106+
{error && (
107+
<Typography variant="caption" sx={{ display: 'block', mt: 1.5, color: 'error.main' }}>
108+
{error}
109+
</Typography>
110+
)}
111+
</Box>
112+
);
113+
}

0 commit comments

Comments
 (0)