Thanks to visit codestin.com
Credit goes to github.com

Skip to content

perf: reduce SQL round-trips in scheduler hot paths#57

Merged
deepjoy merged 5 commits into
mainfrom
improve-perf
Mar 19, 2026
Merged

perf: reduce SQL round-trips in scheduler hot paths#57
deepjoy merged 5 commits into
mainfrom
improve-perf

Conversation

@deepjoy

@deepjoy deepjoy commented Mar 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Batch dependency queries: Replace per-dep iterative SQL lookups (history check, active check, edge insert) with single batched queries, and swap BFS cycle detection for a recursive CTE that resolves in one round-trip.
  • Merge completion + dependency resolution into a single transaction: New complete_with_record_and_resolve combines the complete_with_record and resolve_dependents calls, eliminating a BEGIN IMMEDIATE / COMMIT cycle on every task completion.
  • Add fast-path flags to skip unnecessary queries: has_paused_tasks (atomic bool) skips the paused_tasks() query when nothing has been preempted; has_tags skips populate_tags when no tags exist; check_scheduled skips next_run_after when no scheduled tasks are present.
  • Lightweight task claim: New claim_task uses a simple UPDATE … SET status='running' without RETURNING *, patching the already-held in-memory TaskRecord instead of re-fetching the full row.
  • Gate pprof behind optional profile feature: Moves pprof from a mandatory dev-dependency to an optional feature, fixing CI builds on platforms where pprof fails to compile.
  • Add 0.4.x → 0.5.0 migration guide covering the ModuleDomain<D> API transition.

deepjoy added 5 commits March 18, 2026 22:27
The iterative BFS issued one SQL round-trip per graph node, producing
O(n²) total queries for linear dependency chains (~19,900 for depth 200).
A single recursive CTE collapses each cycle check to one query,
yielding an 82% speedup at depth 200.
Combine completion + dependency resolution into a single transaction,
cache next_run_after to skip the query when no scheduled tasks exist,
carry tags from peek to pop to avoid a redundant populate_tags query,
skip tag queries entirely when the store has never had tags inserted,
and skip the paused_tasks query when no tasks have been preempted.

Benchmarked at ~40% improvement on dep_chain_dispatch/50 (42ms → 26ms)
and ~20-34% on fan-in dispatch benchmarks.
…-trips

- Batch resolve_dependency_edges: replace 3N per-dep queries (history check,
  active check, edge insert) with 3 batch queries using IN clauses
- Single-pass cycle detection: seed one recursive CTE with all dep IDs
  instead of running N separate CTEs
- Replace pop_by_id_no_tags (UPDATE RETURNING *) with claim_task (UPDATE
  only), reusing the TaskRecord already fetched by peek_next

Benchmarked improvement on dep_fan_in_dispatch:
  width=10:  -15%  (5.2ms → 4.4ms)
  width=50:  -19%  (21.7ms → 17.5ms)
  width=100: -14%  (40.5ms → 34.8ms)
…mpatibility

Move pprof to an optional dependency gated by a `profile` feature so
benchmarks can run in CI without requiring perf_event_open. Local
profiling is still available via `cargo bench --features profile`.
Also simplify TTL expiry check in run_loop to use if-let instead of
unwrap.
@deepjoy deepjoy changed the title Improve perf perf: reduce SQL round-trips in scheduler hot paths Mar 19, 2026
@deepjoy deepjoy enabled auto-merge (squash) March 19, 2026 06:04
@deepjoy deepjoy merged commit 208f55b into main Mar 19, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 19, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison

Click to expand
group                                       current
-----                                       -------
backoff_delay/constant                      1.00     48.1±0.50ns        ? ?/sec
backoff_delay/exponential                   1.00    189.0±1.18ns        ? ?/sec
backoff_delay/exponential_jitter            1.00    267.7±0.55ns        ? ?/sec
backoff_delay/linear                        1.00     76.5±2.20ns        ? ?/sec
batch_submit_1000                           1.00     33.4±2.71ms        ? ?/sec
byte_progress/byte_reporting_500            1.00    476.4±4.57ms        ? ?/sec
byte_progress/noop_500                      1.00    448.3±6.85ms        ? ?/sec
byte_progress_snapshot_100_tasks            1.00    134.9±2.05ms        ? ?/sec
concurrency_scaling/1                       1.00    541.4±5.11ms        ? ?/sec
concurrency_scaling/2                       1.00    494.0±6.41ms        ? ?/sec
concurrency_scaling/4                       1.00    455.7±7.12ms        ? ?/sec
concurrency_scaling/8                       1.00    453.5±5.65ms        ? ?/sec
count_by_tags/100                           1.00    126.3±3.10µs        ? ?/sec
count_by_tags/1000                          1.00    214.2±5.26µs        ? ?/sec
count_by_tags/5000                          1.00    618.3±4.42µs        ? ?/sec
dep_chain_dispatch/10                       1.00     14.7±0.18ms        ? ?/sec
dep_chain_dispatch/25                       1.00     35.5±0.65ms        ? ?/sec
dep_chain_dispatch/50                       1.00     69.9±1.03ms        ? ?/sec
dep_chain_submit/10                         1.00      3.4±0.27ms        ? ?/sec
dep_chain_submit/200                        1.00     77.7±3.72ms        ? ?/sec
dep_chain_submit/50                         1.00     17.3±0.99ms        ? ?/sec
dep_fan_in_dispatch/10                      1.00     12.2±0.18ms        ? ?/sec
dep_fan_in_dispatch/100                     1.00     98.6±1.06ms        ? ?/sec
dep_fan_in_dispatch/50                      1.00     50.2±0.78ms        ? ?/sec
dispatch_and_complete_1000                  1.00   905.4±12.39ms        ? ?/sec
dispatch_group_scaling/1                    1.00    505.5±8.51ms        ? ?/sec
dispatch_group_scaling/10                   1.00    504.3±6.17ms        ? ?/sec
dispatch_group_scaling/100                  1.00    503.8±5.74ms        ? ?/sec
dispatch_group_scaling/50                   1.00    502.9±6.44ms        ? ?/sec
dispatch_no_groups_500                      1.00    439.6±5.16ms        ? ?/sec
dispatch_one_group_500                      1.00    503.1±8.49ms        ? ?/sec
dispatch_permanent_failure_500              1.00    466.1±5.86ms        ? ?/sec
history_by_type/100                         1.00  1040.0±42.50µs        ? ?/sec
history_by_type/1000                        1.00  1047.1±32.01µs        ? ?/sec
history_by_type/5000                        1.00  1039.6±28.76µs        ? ?/sec
history_query/100                           1.00   615.3±20.68µs        ? ?/sec
history_query/1000                          1.00   621.7±10.89µs        ? ?/sec
history_query/5000                          1.00    613.5±7.78µs        ? ?/sec
history_stats/100                           1.00    144.1±1.46µs        ? ?/sec
history_stats/1000                          1.00    352.9±1.16µs        ? ?/sec
history_stats/5000                          1.00   1288.4±3.71µs        ? ?/sec
mixed_priority_dispatch_500                 1.00    456.4±6.97ms        ? ?/sec
peek_next/100                               1.00    117.5±2.63µs        ? ?/sec
peek_next/1000                              1.00    117.8±3.02µs        ? ?/sec
peek_next/5000                              1.00    118.6±3.25µs        ? ?/sec
query_by_tags/100                           1.00  1282.8±153.31µs        ? ?/sec
query_by_tags/1000                          1.00     10.7±1.31ms        ? ?/sec
query_by_tags/5000                          1.00     59.3±5.17ms        ? ?/sec
retryable_dead_letter/constant              1.00    220.6±2.55ms        ? ?/sec
retryable_dead_letter/exponential           1.00    220.9±3.14ms        ? ?/sec
retryable_dead_letter/exponential_jitter    1.00    220.7±3.22ms        ? ?/sec
retryable_dead_letter/linear                1.00    221.4±4.62ms        ? ?/sec
submit_1000_tasks                           1.00    179.9±4.29ms        ? ?/sec
submit_dedup_hit_1000                       1.00    238.5±7.87ms        ? ?/sec
submit_with_tags/0                          1.00     90.6±2.94ms        ? ?/sec
submit_with_tags/10                         1.00   241.9±11.20ms        ? ?/sec
submit_with_tags/20                         1.00   392.5±18.41ms        ? ?/sec
submit_with_tags/5                          1.00    166.3±6.36ms        ? ?/sec
tag_values/100                              1.00    132.9±2.55µs        ? ?/sec
tag_values/1000                             1.00    193.2±2.96µs        ? ?/sec
tag_values/5000                             1.00    460.2±5.05µs        ? ?/sec

deepjoy pushed a commit that referenced this pull request Mar 19, 2026
## 🤖 New release

* `taskmill`: 0.5.1 -> 0.5.2 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.5.2](v0.5.1...v0.5.2)
- 2026-03-19

### Other

- reduce SQL round-trips and CPU overhead in scheduler hot paths
([#60](#60))
- coalesce task completions into batched transactions
([#59](#59))
- reduce SQL round-trips in scheduler hot paths
([#57](#57))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant