Codestin Search App

tamnd · 2026-06-15T06:42:25Z

The reads each answer one question about one object: a video's metadata, a channel's uploads, a playlist's items. discover chains them. From a seed video, channel, or playlist it follows that object's links outward, hop by hop, streaming one node per row as it is reached.

The walker

The traversal lives in the library (youtube/discover.go) behind a small grapher interface, the exact subset of *Client it needs, so the BFS is tested hermetically over a fake in-memory graph with no network. The bounds, ordering, dedup, and degradation are unit tests, not integration tests.

Nine edges across five node kinds:

Edge	From to	Gated	Meaning
`channel`	video to channel	no	the uploader
`related`	video to video	no	a related video
`comments`	video to comment	yes	a comment
`uploads`	channel to video	no	an upload
`playlists`	channel to playlist	no	an owned playlist
`community`	channel to post	no	a community post
`items`	playlist to video	no	a playlist item
`owner`	playlist to channel	no	the owner
`commenter`	comment to channel	yes	the comment's author

Presets bundle the edges by intent (content, feed, comments, all), with content the default since it spans every seed kind, so plain discover does the obvious thing with no flags. Preset names are kept disjoint from edge names, so naming an edge follows just that link.

Tier-less, with graceful degradation

Unlike x-cli there are no scrape tiers to gate, so nothing is dropped up front. The only walled edges are the two that touch comments, refused by YouTube's per-IP Restricted Mode on some networks. The walk attempts them and degrades to a one-line note on failure, continuing on the rest of the graph, rather than failing the whole walk. A seed that cannot be fetched is still fatal, matching a single read; deeper failures are notes.

Depth, fanout, and a total node budget bound the walk so it always terminates, even with an uncapped fanout. Edges are recorded eagerly so the stored graph stays complete while nodes stay de-duplicated by an alias-collapsing key.

Persisting

ytb discover --store tees the walk into the typed crawl store and records each traversed link into a new edges table, so a live walk doubles as a crawl you can query with ytb db query afterwards. The existing seed/crawl/queue/jobs worklist crawler is untouched; discover is the complement that finds the worklist by walking instead of draining one.

Tests and docs

Hermetic walker tests cover edge parsing, seed classification, BFS order, presets, the comment degradation path, the budget and fanout caps, dedup, fatal-seed, and depth-zero. Docs get a graph-discovery guide, a persist-a-walk section in the store guide, and a discover entry in the CLI reference.

Verified: CGO_ENABLED=0 go build/vet/test ./... green, gofmt clean, go mod tidy no-op, live walk confirmed against a real video.

The reads each answer one question about one object: a video's metadata, a channel's uploads, a playlist's items. discover chains them. From a seed video, channel, or playlist it follows that object's links outward, hop by hop, streaming one node per row as it is reached. The walker lives in the library behind a small grapher interface, the exact subset of Client it needs, so the BFS is tested hermetically over a fake in-memory graph with no network: the bounds, ordering, dedup, and degradation are unit tests, not integration tests. Nine edges across five node kinds: a video to its channel and related videos and comments, a channel to its uploads, playlists, and community posts, a playlist to its items and owner, a comment to the channel that wrote it. Presets bundle them by intent (content, feed, comments, all), with content the default since it spans every seed kind, so plain discover does the obvious thing with no flags. Unlike X there are no scrape tiers to gate, so nothing is dropped up front. The only walled edges are the two that touch comments, refused by YouTube's per-IP Restricted Mode on some networks. The walk attempts them and degrades to a one-line note on failure, continuing on the rest of the graph, rather than failing the whole walk. A seed that cannot be fetched is still fatal, matching a single read; deeper failures are notes. Depth, fanout, and a total node budget bound the walk so it always terminates, even with an uncapped fanout. Edges are recorded eagerly so the stored graph stays complete while nodes stay de-duplicated by an alias-collapsing key. discover --store tees the walk into the typed crawl store and records each traversed link into a new edges table, so a live walk doubles as a crawl you can query with db query afterwards. The existing seed, crawl, queue, and jobs worklist crawler is untouched; discover is the complement that finds the worklist by walking instead of draining one. Docs get a graph-discovery guide, a persist-a-walk section in the store guide, and a discover entry in the CLI reference.

tamnd added 2 commits June 15, 2026 13:41

Add v0.4.0 release notes for discover

acc61f8

tamnd merged commit 8432603 into main Jun 15, 2026
7 checks passed

tamnd deleted the discover-graph-bfs branch June 15, 2026 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add discover, a breadth-first walk of the YouTube graph#9

Add discover, a breadth-first walk of the YouTube graph#9
tamnd merged 2 commits into
mainfrom
discover-graph-bfs

tamnd commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tamnd commented Jun 15, 2026

The walker

Tier-less, with graceful degradation

Persisting

Tests and docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant