Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

doxav
Copy link
Contributor

@doxav doxav commented Jul 3, 2025

This is not the final version but to understand current implementation (NOT TO BE MERGED)

I am currently simplifying the data model (below) and will reduce get/log key parameters.

B. Data Model {#b.-data-model}

Core Fields (Required for All Entries)** {#b.2-core-fields-(required-for-all-entries)}

Field Type Description
entry_id string Globally unique ID for this record (e.g., UUID or “goalID_stepID_candID”).
goal_id string Top‐level task/problem identifier (groups multiple steps).
parent_goal_id string or null OPTIONAL: If this entry’s goal is a sub‐goal, this points to the parent goal’s goal_id.
step_id integer Sub‐step or iteration number within that goal.
candidate_id integer Distinguishes parallel candidates at the same step (default = 1 if no branching).
data Dict[any] The step data payload. For example: a code snippet (string), prompt text (string), error trace (string), or a structured dict/array.
metadata object Auxiliary key–value pairs, including: agent (string) — which agent or component logged this entry data_types — list all types of data stored in data timestamp (string, ISO8601) — when it was logged status (string) — e.g. “pending”/“evaluated”/“error” tags ([string]) — free‐form labels - rbitrary labels to flag this entry (e.g., ["mutation","candidate","checkpoint"]). source_entry_id (string
embedding float[] or null A vector (e.g., length D) used for semantic retrieval. Present only if embeddings are enabled.

Canonical data payload types Values** {#b.3-canonical-data-payload-types-values}

Below is the canonical set of values that may appear in data’s payload. Each key indicates how the value field should be interpreted:

data payload types Meaning Expected value Type
goal A high‐level description or user instruction for the entire task. string (task description)
prompt The full LLM prompt text issued at this step. string
code Complete code snippet returned by an LLM or generated by an agent. string (source code)
diff_patch A unified‐diff or patch describing how to modify a previous code to produce a new candidate. string (diff text)
graph_nodes Optional storage of graph nodes (important for decoupling) nodes object
score A primary numeric evaluation for a candidate (e.g., a reward or objective value). number
scores (Alias) A multi‐metric evaluation; same as optional field score in JSON. object – mapping names→numbers
feedback Free‐text critique or commentary (LLM‐generated or human). string
error Any error or exception information produced while executing a candidate. string (stack trace/text)
validation_result Outcome of a formal check or test suite, often a pass/fail or detailed test log. string or small object
hypothesis A textual or structured hypothesis/reasoning proposed during exploration. string or object
observation Raw data or environment observation relevant to that step (e.g., sensor reading, function output). string or object
checkpoint A saved state reference (e.g., “use this code as a starting point later”). string or minimal object
context Any additional context (environment state, configuration, etc.) consumed/needed at that step. object

@allenanie
Copy link
Member

Very excited about this actually -- let's discuss more on Friday :D

doxav added 3 commits July 23, 2025 17:33
support heapq
all unit tests of TraceMemoryDB passing
added first TraceMemoryDB test use cases (DEMO how to replace optimizer buffer and log)
@doxav
Copy link
Contributor Author

doxav commented Jul 25, 2025

FYI: I'm progressing by drafting some of the key use cases to better design the API

@allenanie
Copy link
Member

allenanie commented Aug 15, 2025

Technically, is this a Memory Module for the Optimizer (treating Optimizer as an Agent)? I think you already mentioned this in your doc -- but any chance to make it as part of the "agent-memory interface"? (i.e., define the ways an optimizer agent will interact with the memory database to retrieve and store memory?)

@doxav
Copy link
Contributor Author

doxav commented Aug 21, 2025

This is a shared Memory mechanism available for the Optimizer(s) and the Trainer(s)

Sure it could be the agent memory interface => you can see some use cases in test_trace_memory_db.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants