docs: replace core-components with architecture page#1323
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
d2251b3 to
472627d
Compare
|
|
||
| Each task attempt flows through three steps. The resulting trajectory is called a *rollout*: | ||
|
|
||
| 1. **Initialize** — The agent receives a task row from the dataset and initializes a session on the Resources Server, which sets up isolated state for this task. |
There was a problem hiding this comment.
initializes a session on the Resources Server
Not always, right? eg SWE RL
There was a problem hiding this comment.
I think it should always be initialized by the resources server. i opened a separate issue to track decoupling for SWE here #1249
|
|
||
| ## Where Tools Live | ||
|
|
||
| Tools exist on a spectrum — some belong to the agent, some belong to the environment: |
There was a problem hiding this comment.
| Tools exist on a spectrum — some belong to the agent, some belong to the environment: | |
| Tools can be implemented in an agent server, but it is useful to decouple them with a resources server for reuse and scalability: |
There was a problem hiding this comment.
Users with existing agents will already have tools in their agent, and they should be able to use Gym without modifying their agent. Let me think on how to make this clearer in the wording
There was a problem hiding this comment.
I prefer the initial wording. Many users will already have tools in their agent.
There was a problem hiding this comment.
kept the bullet wording, updated the intro sentence to provide additional clarity
| Tools exist on a spectrum — some belong to the agent, some belong to the environment: | ||
|
|
||
| - **Agent-side tools** are part of the agent harness. They're capabilities the agent brings regardless of which environment it runs in (e.g., OpenHands brings file editing and terminal tools). | ||
| - **Environment-side tools** are part of the Resources Server. They're capabilities the environment provides to any agent that connects (e.g., a `run_tests` endpoint, a database query tool, a sandbox execution API). |
There was a problem hiding this comment.
this separation of agent and environment feels a bit confusing given that we say environment contains agent, then here it sounds like it doesnt
There was a problem hiding this comment.
agree - let me iterate on the wording
There was a problem hiding this comment.
renamed to environment-specific and agent-specific
| |---------|---------------| | ||
| | Dataset | JSONL: Responses API input — each row is a task for the agent | | ||
| | Agent Harness | FastAPI Agent Server or your own agent via HTTP | | ||
| | Verifier + State | FastAPI Resources Server | |
There was a problem hiding this comment.
Do we also add environment tools here?
There was a problem hiding this comment.
good question, i originally had it, but then felt that would be confusing because i dont have that as a core concept in the Environments concepts page. let me think on this
There was a problem hiding this comment.
I first was thinking to put it in, but now I lean towards leaving it out. This is because tools can be in either the agent harness or in the state (resources server). It's too confusing to introduce here and it's better down below expanded upon.
| | Dataset | JSONL: Responses API input — each row is a task for the agent | | ||
| | Agent Harness | FastAPI Agent Server or your own agent via HTTP | | ||
| | Verifier + State | FastAPI Resources Server | | ||
| | Model | FastAPI Model Server or managed by your own agent | |
There was a problem hiding this comment.
Don't understand what "managed by your own agent" means.
We should be explicit (maybe) that you can use both hosted models and local models, perhaps? Unless it's obvious to user.
There was a problem hiding this comment.
Ah, I see. You probably mean if the model is TIED to the agent harness.
Claude code, for example, will let you bring your own endpoint or it can be a default out of the box.
Nevermind, I think this is good.
|
|
||
| ### Agent Server | ||
|
|
||
| Hosts built-in agent harnesses, including NeMo Gym native harnesses and integrated open-source harnesses like OpenHands. |
There was a problem hiding this comment.
Closed source harnesses will also be supported. Claude code, for example.
There was a problem hiding this comment.
I dropped 'open source'. once we have claude code we could add it explicitly
| Manages the verifier, state, and environment-specific tools: | ||
|
|
||
| - **Environment Tools** — environment-specific capabilities available to any agent (e.g., code execution, database queries, API calls) | ||
| - **State** — isolated per rollout via session IDs. Some environments are stateless (e.g., math verification), while others provision full container environments (e.g., a Docker container with a specific repo checkout for SWE-Bench-style tasks). |
There was a problem hiding this comment.
stateless from task to task? Not sure if right place to address. Like does stateless mean that each task has a fresh state or something else?
There was a problem hiding this comment.
wait building on this it sounds like yes, it's isolated per single task row in the jsonl.
I don't understand how provisioning a container environment is related to statelessness. You can still be stateless and have a container environment. What are we trying to say here? Kinda open q
There was a problem hiding this comment.
updated the language to clarify
69f5161 to
888d2e1
Compare
ananthsub
left a comment
There was a problem hiding this comment.
these definitions make the architecture much clearer!
5b8df16 to
98cad74
Compare
Refactor the "Environment Components" page into an "Architecture" page that bridges conceptual components (from the environments concept page) with NeMo Gym's server-based implementation. Clarifies composability, external agent/model support, tool ownership, and the task execution flow. Signed-off-by: Chris Wing <[email protected]>
Address reviewer feedback: replace ambiguous "stateless" framing with an isolation spectrum, and use consistent "environment-specific tools" / "agent-specific tools" naming throughout. Signed-off-by: Chris Wing <[email protected]>
98cad74 to
caf87ad
Compare
Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Chris Wing <[email protected]>
caf87ad to
6cdb6d8
Compare
Summary
core-components.mdx) with a new "Architecture" page that bridges conceptual components with NeMo Gym's server implementation.Builds on the concept pages added in #1265.
Test plan
fern devand verify the Architecture page renders correctlyarchitecture.mdxand no longer showscore-components