Block stream pipelining

The subgraph processing loop is currently entirely sequential. Pipelining is a way to parallelize it. The first target for this is the block stream, so that while a block is processed the next blocks are being scanned for triggers. This could reduce indexing time by up to 50%, though for must subgraphs that number is more like 20%. A few points to consider:

#### The block stream must be made independent from the subgraph store

Currently the block stream depends on a `subgraph_store`
https://github.com/graphprotocol/graph-node/blob/381f22e07607c66affa18b39a76d4c6003405c8e/graph/src/blockchain/block_stream.rs#L196-L196
this for for two reasons, first for keeping track of the current block pointer for the subgraph
https://github.com/graphprotocol/graph-node/blob/381f22e07607c66affa18b39a76d4c6003405c8e/graph/src/blockchain/block_stream.rs#L351-L351
which can be solved simply by the block stream keeping track of the last block pointer it emitted. Second for updating the synced status on the subgraph
https://github.com/graphprotocol/graph-node/blob/381f22e07607c66affa18b39a76d4c6003405c8e/graph/src/blockchain/block_stream.rs#L591-L591
this responsibility should be moved to the instance manager. Maybe `BlockStreamEvent` should gain a `Done` variant to signal that this should be checked.

#### The pipelining mechanism

An attractive solution is to implement this as an adapter to the existing block stream. This would be a tokio task that pulls blocks from the block stream and puts it in a channel. Buffering a single range will likely be enough for the optimization to be effective. However, the fact that the block stream scans blocks in batches (ranges) but emits them as individual blocks will require some async cleverness in this adapter. Possibly the block stream itself should emit batches (a stream of `Vec<BlockWithTriggers>`) and the adapter would expose it as a stream of individual blocks.
 
#### Stopwatch metrics

Right now the block stream uses only the `"scan_blocks"` stopwatch section. We could keep the same name for backwards compatibility with the dashboards, and just make `"scan_blocks"` mean time spent waiting for the block stream to emit a block. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block stream pipelining #2581

The block stream must be made independent from the subgraph store

The pipelining mechanism

Stopwatch metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Block stream pipelining #2581

Description

The block stream must be made independent from the subgraph store

The pipelining mechanism

Stopwatch metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions