-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
The subgraph processing loop is currently entirely sequential. Pipelining is a way to parallelize it. The first target for this is the block stream, so that while a block is processed the next blocks are being scanned for triggers. This could reduce indexing time by up to 50%, though for must subgraphs that number is more like 20%. A few points to consider:
The block stream must be made independent from the subgraph store
Currently the block stream depends on a subgraph_store
| subgraph_store: Arc<dyn WritableStore>, |
this for for two reasons, first for keeping track of the current block pointer for the subgraph
| let subgraph_ptr = ctx.subgraph_store.block_ptr()?; |
which can be solved simply by the block stream keeping track of the last block pointer it emitted. Second for updating the synced status on the subgraph
| fn update_subgraph_synced_status(&self) -> Result<(), Error> { |
this responsibility should be moved to the instance manager. Maybe
BlockStreamEvent should gain a Done variant to signal that this should be checked.
The pipelining mechanism
An attractive solution is to implement this as an adapter to the existing block stream. This would be a tokio task that pulls blocks from the block stream and puts it in a channel. Buffering a single range will likely be enough for the optimization to be effective. However, the fact that the block stream scans blocks in batches (ranges) but emits them as individual blocks will require some async cleverness in this adapter. Possibly the block stream itself should emit batches (a stream of Vec<BlockWithTriggers>) and the adapter would expose it as a stream of individual blocks.
Stopwatch metrics
Right now the block stream uses only the "scan_blocks" stopwatch section. We could keep the same name for backwards compatibility with the dashboards, and just make "scan_blocks" mean time spent waiting for the block stream to emit a block.