Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 9acd209

Browse files
authored
Merge pull request #51 from cybertronai/docs/build-notes
docs: add BUILD_NOTES.md — meta dev report on how v1 was built
2 parents d95278f + 26cfa77 commit 9acd209

1 file changed

Lines changed: 238 additions & 0 deletions

File tree

BUILD_NOTES.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Session Report: Building hinton-problems via Agent Teams
2+
3+
**Session ID:** `d8af4bb0-1435-4528-a5da-ac91c30b7bcb`
4+
**Project:** SutroYaro (the lead session was checked out there)
5+
**Output:** [cybertronai/hinton-problems](https://github.com/cybertronai/hinton-problems) — 53 stubs, all merged
6+
**Span:** 2026-05-01 21:52 → 2026-05-04 03:35 (~30 wall hours, with overnight idle gaps)
7+
**Source:** the full jsonl is at `~/.claude/projects/-Users-yadkonrad-dev-dev-year26-feb26-SutroYaro/d8af4bb0-...jsonl` (5.1 MB, 3,033 events)
8+
9+
This report is what the session log actually shows, suitable for a team video.
10+
11+
---
12+
13+
## TL;DR for the video opener
14+
15+
- 53 Hinton-paper stubs implemented in 30 wall hours, ~800k tokens, on Claude Opus 4.7 with the 1M-token context window.
16+
- The **SPEC was a single GitHub issue** ([#1](https://github.com/cybertronai/hinton-problems/issues/1)).
17+
- The **dispatcher was Claude Code's `agent-teams` primitive** — one team, ten waves, fresh teammates per wave.
18+
- **One human prompt of intent ("use parallel team of agents… DONT USE THE SKILL CRAP!")** turned a serial workflow into a 10-wave parallel build.
19+
- All work routed through GitHub: 18 issues, 15 PRs, audits via subagents, merges only on user approval.
20+
21+
---
22+
23+
## The actual chain of events
24+
25+
| Time (UTC) | Event |
26+
|---|---|
27+
| 05-01 21:52 | Session opens in SutroYaro |
28+
| 05-01 21:53 | Yad invokes the `sutro-sync` skill — the only skill call in the session — to pull Telegram chat, Google Docs, and GitHub context |
29+
| 05-01 21:59 | Yad: *"lets focus on hinton, pull it into may26, ok, shall we pull hinto-porblems, make a branch and then try doing SPEC, branch, and then github issues / what do we think?"* — the SPEC-first idea is born |
30+
| 05-01 22:04 | Yad: *"Don't merge anything, we need to pull it, we need to open up a GitHub issue and create the spec as a GitHub issue saying that's what you will follow"* — SPEC = issue |
31+
| 05-01 22:13 | **Issue #1 opened**: *Spec: minimum implementation requirements for stub problems (v1)*. Authored by `agent-0bserver07 (Claude Code) on behalf of Yad`. Lists required files, 8-section README template, reproducibility rules, acceptance checklist. |
32+
| 05-02 05:21 | Yad: *"ok do u see Yaroslav's comment, can that help with pre-context for our waves of agents?"* — Yaroslav had commented on issue #1 overnight |
33+
| 05-02 05:48 | Yad: *"deploy all the waves one after another given Yaroslav's comment and our spec and the local repo of hintons problems, and do branches per waves"* |
34+
| 05-02 05:51 | Yad: ***"I need you to use parallel team of agents that claude code has built in, DONT USE THE SKILL CRAP! https://code.claude.com/docs/en/agent-teams"*** |
35+
| 05-02 05:51 | Lead dispatches a `claude-code-guide` subagent to read the agent-teams docs |
36+
| 05-02 05:53 | **`TeamCreate`** — team `hinton-impl` born. agent_type `orchestrator`. Description: *"Each teammate owns one stub, works in its own worktree at `/may26/hinton-problems-waves/`, pushes branch `impl/<slug>`, opens PR. Lead is the SutroYaro session; reviews PRs and merges only on user approval."* |
37+
| 05-02 05:55 → 06:07 | **Wave 0**: single-stub spike. `xor-builder` teammate spawned, builds, opens PR #3. Sanity check passes. |
38+
| 05-02 06:07 → 09:20 | **Wave 1**: 3 teammates (`n-bit-parity-builder`, `symmetry-builder`, `negation-builder`). All three open PRs. Then shut down via `SendMessage(shutdown_request)`. |
39+
| 05-02 09:21 → 13:46 | **Wave 2**: 5 teammates (binary-addition, encoder-3-parity, encoder-4-3-4, encoder-8-3-8, encoder-backprop-8-3-8). |
40+
| 05-02 13:49 → 15:37 | **Wave 3**: 6 teammates (encoder-40-10-40, shifter, grapheme-sememe, distributed-to-local-bottleneck, t-c-discrimination, recurrent-shift-register). |
41+
| 05-02 14:48 | Yad: *"why are there so many PRs? Weren't there supposed to be 5 waves?"* — turning point. From here, **multiple stubs per PR**, one PR per wave. |
42+
| 05-02 15:37 → 20:17 | **Waves 4 → 7**: 6 stubs each. PR titles read like a tour: *tier-B 1980s-90s classics*, *Helmholtz/MDL/Imax/fast-weights*, *TRBM/RTRBM/gated-RBM/RNN/factorial-VQ/eGLOM*, *MNIST cluster (FF + distillation + capsule precursor)*. |
43+
| 05-02 21:10 | **Wave 8**: 6 stubs (external-data + harder architectures). |
44+
| 05-03 04:20 | **Wave 9**: 5 stubs. 50/53 v1 done. |
45+
| 05-03 22:56 | **Wave 10**: final 3 stubs (AIR + matrix capsules). **v1 complete at 53/53.** |
46+
| 05-03 23:18 → 23:55 | Docs PRs: RESULTS.md, MkDocs site, **switch to mdBook (after Yad pushed back hard)**, 4-column catalog tables. |
47+
| 05-04 00:34 | Introduction page in mdBook style + Unlicense. |
48+
| 05-04 00:55 | **v1 gap analysis** issue opened — umbrella tracker for 25 partials + 1 non-replication. |
49+
| 05-04 01:08 | Yad: *"so whats left, coz we aending this sessions 800k"* — context budget watermark. |
50+
| 05-04 03:35 | Last event in the session log. |
51+
52+
---
53+
54+
## The SPEC (issue #1) — the actual contract
55+
56+
The contract between Yad and every teammate was a single GitHub issue. Not chat. Not a system prompt. An issue every PR linked back to.
57+
58+
It defined:
59+
- **Required files** per stub: `<name>.py`, `README.md`, `make_<name>_gif.py`, `visualize_<name>.py`, `<name>.gif`, `viz/`
60+
- **8 README sections**: Header / Problem / Files / Running / Results / Visualizations / Deviations / Open questions
61+
- **Reproducibility rules**: seed exposed via CLI, all hyperparameters in Results, command in §Running reproduces the number
62+
- **Acceptance checklist** (8 checkboxes): reproduces under 5 min on a laptop / final accuracy with seed / GIF / weight viz / training curves / deviations section / open questions / no `NotImplementedError`
63+
- **Out of scope for v1**: energy metric (deferred to v2 ByteDMD), GPU / large-scale runs
64+
65+
That's the entire DSL. Every stub had to fit.
66+
67+
---
68+
69+
## The orchestration model
70+
71+
```
72+
┌──────────────────┐
73+
│ hinton-impl team │ (TeamCreate, agent_type=orchestrator)
74+
└─────────┬────────┘
75+
76+
┌────────────┼────────────┐
77+
│ │ │
78+
Wave 0/1/2… SendMessage Subagent dispatches
79+
│ │
80+
▼ ▼
81+
┌──────────┐ ┌──────────────┐
82+
│ teammates │ │ Agent tool │
83+
│ <slug>- │ │ (general- │
84+
│ builder │ │ purpose, │
85+
│ x53 │ │ Explore) │
86+
└────┬─────┘ └──────┬───────┘
87+
│ │
88+
▼ ▼
89+
worktree branch PR audits, code reads
90+
impl/<slug>
91+
92+
93+
gh pr create
94+
95+
96+
PR review + merge (Yad approves)
97+
98+
99+
SendMessage(shutdown_request)
100+
101+
102+
Next wave starts fresh
103+
```
104+
105+
**Why fresh teammates per wave**: each teammate burns context as it builds and tests. Shutting down between waves keeps later waves running on full context windows. The lead persists; the workers turn over.
106+
107+
---
108+
109+
## What the session actually used
110+
111+
### Tool calls (in the lead session)
112+
113+
| Tool | Calls | What for |
114+
|---|---|---|
115+
| Bash | 191 | git, gh CLI, file ops, running tests |
116+
| Read | 124 | reading paper PDFs, stub code, READMEs |
117+
| Agent | **62** | subagent dispatches (see breakdown below) |
118+
| Write | 55 | new files (READMEs, scripts, configs) |
119+
| **SendMessage** | **53** | inter-teammate messaging (mostly wave shutdowns) |
120+
| TaskUpdate | 24 | shared task list maintenance |
121+
| TaskCreate | 22 | new tasks added to the team's list |
122+
| Edit | 10 | small in-place edits |
123+
| ToolSearch | 3 | loading deferred tool schemas |
124+
| WebFetch | 2 | external doc reads |
125+
| **Skill** | **1** | only `sutro-sync` at session start |
126+
| **TeamCreate** | **1** | the `hinton-impl` team itself |
127+
128+
### Subagent dispatches (Agent tool, n=62)
129+
130+
| Type | Count | Use |
131+
|---|---|---|
132+
| `general-purpose` | 54 | per-stub builders ("Build xor stub for hinton-problems") |
133+
| `Explore` | 7 | PR audits, stub correctness checks, wave reviews |
134+
| `claude-code-guide` | 1 | researched the agent-teams docs at session start |
135+
136+
### GitHub artifacts produced
137+
138+
- **18 issues created** (1 SPEC + 15 per-stub issues for early waves + 2 follow-up: v2 ByteDMD, v1 gap analysis)
139+
- **15 PRs created** (10 wave PRs + 5 docs PRs)
140+
- **6 PRs merged via `gh pr merge`** in-session (the rest were merged separately by Yad)
141+
- **24 git pushes**
142+
143+
### Skills
144+
145+
- **One skill call.** `sutro-sync`, used once at the very start to pull Telegram + Google Docs + GitHub context.
146+
- After that, **Yad explicitly told the lead to use `agent-teams` instead of skills**: *"DONT USE THE SKILL CRAP!"*
147+
148+
That's the cleanest single data point about which mechanism is right for what:
149+
- Skill = a recipe for "do this set of steps once at start"
150+
- Agent team = parallel workers with shared task list
151+
- They serve different purposes; this session used both, just sparingly.
152+
153+
---
154+
155+
## The waves at a glance
156+
157+
| Wave | Stubs | Highlights |
158+
|---|---|---|
159+
| 0 | 1 | xor (sanity check, single teammate) |
160+
| 1 | 3 | n-bit-parity, symmetry, negation |
161+
| 2 | 5 | binary-addition + encoder family |
162+
| 3 | 6 | tier-B 1980s foundational (shifter, grapheme-sememe, etc.) |
163+
| 4 | 6 | tier-B 1980s-90s classics |
164+
| 5 | 6 | Helmholtz / MDL / Imax / fast-weights |
165+
| 6 | 6 | TRBM / RTRBM / gated-RBM / RNN / factorial-VQ / eGLOM |
166+
| 7 | 6 | MNIST cluster: FF + distillation + capsule precursor |
167+
| 8 | 6 | external-data + harder architectures |
168+
| 9 | 5 | final hard stubs (50/53 v1 done) |
169+
| 10 | 3 | AIR + matrix capsules — **v1 complete at 53/53** |
170+
171+
Total: **53 stubs in 11 waves**.
172+
173+
---
174+
175+
## Yad's interaction pattern (the human side)
176+
177+
70 typed prompts across 30 wall hours. Most of them were one of three types:
178+
179+
**Type A — high-leverage direction (rare, big effects):**
180+
- *"shall we pull hinton-problems, make a branch and then try doing SPEC, branch, and then github issues"* — chose the SPEC-as-issue model
181+
- *"deploy multiple parallel workstreams to get things done under our supervision, by doing waves"* — chose the wave model
182+
- *"I need you to use parallel team of agents that claude code has built in, DONT USE THE SKILL CRAP!"* — chose agent-teams
183+
- *"why are there so many PRs? Weren't there supposed to be 5 waves?"* — collapsed per-stub PRs into per-wave PRs
184+
- *"why didnt u use mdbook like i asked?"* — pushed back when the lead drifted (MkDocs got swapped for mdBook)
185+
186+
**Type B — status checks (frequent, low cost):**
187+
- *"status?"* / *"status, what is left?"* / *"whats up"* — appears ~10 times. The lead summarises and continues.
188+
189+
**Type C — review and merge approvals:**
190+
- *"i dont wanna merge yet, lets do audits left and then finish the implementation other waves right?"*
191+
- *"ok wana comment on the PR that are ready to merge with evidence?"*
192+
- *"Should we get up an issue with the context of the partials and the no?"* → led to the v1 gap analysis issue
193+
194+
The session also has frustrated moments. They are part of an honest report: when the lead drifted on docs tooling, Yad swore at it, and the lead course-corrected within minutes. Worth showing in a team video as the realistic version of "human in the loop."
195+
196+
---
197+
198+
## What this session actually proves
199+
200+
1. **A SPEC issue is enough contract.** No instruction file, no role prompt — just a versioned GitHub issue every PR points at. Acceptance checklist becomes the PR review template.
201+
2. **Waves with fresh teammates beat one long-running team.** The lead persists; workers turn over per wave. This is what kept the run inside 800k tokens.
202+
3. **`agent-teams` is the dispatcher; subagents are the workers.** This session used both: `TeamCreate` to spin up the team, then `Agent(subagent_type=general-purpose)` 54 times to actually build the stubs. Each teammate's work happened inside its subagent.
203+
4. **Audits via separate Explore subagents.** 7 of the 62 Agent dispatches were reviewers, not builders. Keeps review context separate from build context.
204+
5. **GitHub is the substrate.** Issues created the work, PRs delivered it, comments coordinated with Yaroslav, merges gated on Yad. No Slack. No call.
205+
6. **One human per session is enough.** 70 prompts, mostly status checks. 5–6 of them set direction. The rest let the lead run.
206+
207+
---
208+
209+
## Concrete numbers you can quote in the video
210+
211+
- **53 / 53** Hinton-paper stubs implemented
212+
- **27 reproduce** paper claims, **25 partial** (gap documented), **1 honest non-replication**
213+
- **~30 wall hours**, with overnight idle gaps
214+
- **~800,000 tokens** on Claude Opus 4.7 (1M context)
215+
- **1 GitHub issue** as the SPEC
216+
- **1 `TeamCreate`**, **53 named teammates**, **11 waves**
217+
- **18 issues + 15 PRs** filed
218+
- **62 subagent dispatches** (54 builders + 7 auditors + 1 docs-research)
219+
- **191 bash, 124 reads, 55 writes, 53 SendMessages, 1 Skill**
220+
- **70 human prompts** total; ~6 set direction, ~10 were status checks, ~10 were merge approvals, the rest were follow-ups
221+
- **6 PRs merged in-session via `gh pr merge`**, 24 pushes
222+
223+
---
224+
225+
## Suggested video shot list
226+
227+
1. **Open on the SPEC issue** ([#1](https://github.com/cybertronai/hinton-problems/issues/1)) on screen. *"This is the entire contract."*
228+
2. **Cut to the GitHub PRs page** filtered to "wave" — show the 10 wave PRs. *"This is what came out of it."*
229+
3. **Show the agent-teams docs page** ([code.claude.com/docs/en/agent-teams](https://code.claude.com/docs/en/agent-teams)). *"This is the primitive that made parallel cheap."*
230+
4. **Show the `TeamCreate` JSON** (in this report). *"One call. One team."*
231+
5. **Walk through one wave** — pick wave 5 (Helmholtz/MDL/Imax/fast-weights, 6 stubs). Show the 6 teammate names, the 6 PRs, the merged commit.
232+
6. **Show a single per-stub README** (e.g., `encoder-4-2-4`) — show how it satisfies all 8 spec sections.
233+
7. **Show the v1 gap analysis issue** at the end. *"v1 = correctness. v1.5 = paper parity. v2 = energy."*
234+
8. **Close on the bottom-line numbers** (53 / 30 hr / 800k / 1 spec / 11 waves).
235+
236+
---
237+
238+
*Generated from the live session log on 2026-05-04. Throwaway artifact — delete after the video is recorded.*

0 commit comments

Comments
 (0)