Thanks to visit codestin.com
Credit goes to github.com

Skip to content

tokio-postgres: fix deadlock in query_typed/prepare when result schema has unknown OIDs#1348

Open
rubenfiszel wants to merge 1 commit into
rust-postgres:masterfrom
rubenfiszel:fix-query-typed-raw-deadlock
Open

tokio-postgres: fix deadlock in query_typed/prepare when result schema has unknown OIDs#1348
rubenfiszel wants to merge 1 commit into
rust-postgres:masterfrom
rubenfiszel:fix-query-typed-raw-deadlock

Conversation

@rubenfiszel
Copy link
Copy Markdown

@rubenfiszel rubenfiszel commented May 11, 2026

Summary

Client::query_typed_raw (and the streaming-prepare code path) deadlocks when:

  1. The result schema contains a column whose OID is not a built-in
    tokio_postgres::types::Type (citext, custom enums / domains, postgis
    geometry, etc.), AND
  2. The typeinfo cache is cold for that OID, AND
  3. The result set is large enough to span more than one BackendMessages
    frame on the wire (≈100+ small rows on localhost in repro).

Head-of-line blocking: query::query_typed calls get_type(client, oid).await
synchronously while it still holds the original query's Responses stream
(tokio-postgres/src/query.rs):

Message::RowDescription(row_description) => {
    let mut columns: Vec<Column> = vec![];
    let mut it = row_description.fields();
    while let Some(field) = it.next().map_err(Error::parse)? {
        let type_ = get_type(client, field.type_oid()).await?;  // <-- blocks here
        ...
    }
    return Ok(RowStream { ... });
}

The original query's DataRows back up in the per-request Responses
mpsc::channel(1). Once full, Connection::poll_read parks the next
BackendMessages frame in pending_responses and returns Ok(None) (the
Poll::Pending branch around connection.rs:159). The wire stops being
drained, but the typeinfo sub-query's response is queued on the same socket
behind those DataRows — it never arrives, and the outer get_type await
never completes.

Two fix variants

This PR is the minimal variant. There's a more involved alternative in
#1349 that preserves the bounded-channel API. The two are mutually
exclusive — pick whichever you prefer. Same root cause, same memory profile
in the deadlock case, different surface.

this PR (#1348) #1349
Per-Responses channel unbounded mpsc::channel(1) (unchanged)
Where overflow buffers inside the mpsc itself new parked: VecDeque<…> field on each Response
New routing state none completion_seen flag + target_response_idx()
Lines changed ~25 ~120
Memory in deadlock case one result set's worth one result set's worth (identical)
Public-ish API surface Responses receiver type changes (pub-crate-private) unchanged

This PR (minimal)

Switch the per-request response channel from mpsc::channel(1) to
mpsc::unbounded() so Connection::poll_read can keep draining the wire
regardless of how slowly the per-request consumer is iterating. The request
channel (ClientConnection) is already unbounded; this makes the
per-request response side match.

Trade-off: per-Responses backpressure between the Connection task and
its consumer is now unbounded. In practice the kernel socket buffer is what
bounded buffering anyway — channel(1) didn't slow the server down, it
just shifted where the bytes pile up.

#1349 (surgical alternative)

Keeps mpsc::channel(1) and instead changes Connection::poll_read to
keep draining the wire when a sender backs up, parking the unsent frame on
a per-response queue (parked: VecDeque<(BackendMessages, bool)>) inside
Response itself. Tracks a completion_seen flag so wire frames after a
parked-but-not-yet-delivered ReadyForQuery get routed to the next
response.

Per-response (rather than a single global parked queue) is the critical
detail: a global queue still deadlocks because once response[0]'s
sender is full, frames for response[1] would pile up behind it with no
way to deliver them. Per-response queues let us poll each sender
independently.

Why two PRs

I have a slight preference for this one because it's a one-line semantic
change and a one-type-swap diff. But I don't know if there's a reason the
original mpsc::channel(1) was specifically 1 rather than some other
small bound, and #1349 is there in case you'd rather preserve that. Happy
to fold the loser into the winner if you'd like — let me know.

Reproduction

Minimal standalone repro: https://github.com/rubenfiszel/tokio-postgres-deadlock-repro

git clone https://github.com/rubenfiszel/tokio-postgres-deadlock-repro
cd tokio-postgres-deadlock-repro
./setup.sh
cargo run --release

Buggy output on master:

[limit   1]  ok (1 rows) in 827µs
...
[limit 100]  ok (100 rows) in 663µs
[limit 200]  TIMEOUT after 10s
[limit 500]  TIMEOUT after 10s

After warming the typeinfo cache via client.prepare(...) on the same client:
[warm cache, limit 500]  ok (500 rows) in 604µs

With this PR applied (via [patch.crates-io] to this branch):

[limit 200]  ok (200 rows) in 669µs
[limit 500]  ok (500 rows) in 676µs

The repro's README has a walkthrough of the deadlock with line-by-line
references to query.rs and connection.rs.

Test plan

  • cargo build -p tokio-postgres clean
  • cargo test -p tokio-postgres --lib passes (the in-process unit tests
    that don't need a live Postgres)
  • Standalone repro: deadlocks on master, passes with this PR
  • End-to-end verified against a real workload (the same query_typed_raw
    call against a partitioned table with a citext column on Neon) —
    hangs indefinitely on master, completes in ~420ms with the patch

…a has unknown OIDs

When the result schema of a streaming query (`Client::query_typed_raw`,
`Client::query_typed`, or `Client::prepare` when followed by a Bind+Execute
on the same connection) contains a column whose OID isn't a built-in
`tokio_postgres::types::Type`, and the typeinfo cache is cold, `query::query_typed`
calls `get_type(client, oid).await` synchronously while it still holds
the original query's `Responses` stream:

    Message::RowDescription(row_description) => {
        let mut columns: Vec<Column> = vec![];
        let mut it = row_description.fields();
        while let Some(field) = it.next().map_err(Error::parse)? {
            let type_ = get_type(client, field.type_oid()).await?; // <-- blocks here
            ...
        }
        return Ok(RowStream { ... });
    }

If the result is large enough that the server keeps sending `DataRow`
messages for the original query before that `get_type` completes, those
`DataRow`s back up in the per-request `Responses` `mpsc::channel(1)`. As
soon as the channel is full `Connection::poll_read` returns `Poll::Pending`
and parks the next `BackendMessages` frame in `pending_responses`, then
stops draining the wire. The typeinfo sub-query's response is queued on
the same socket behind those `DataRow`s, so it never arrives, and the
outer `get_type` await never completes. Classic head-of-line blocking.

This trips most often with the `citext` extension, custom enums, custom
domains, and postgis geometry — anything with an OID that
`Type::from_oid` doesn't recognise. The deadlock requires the result to
span more than one `BackendMessages` frame on the wire, so it manifests
at moderate row counts (≈100+ small rows on localhost in repro).

Fix: switch the per-request response channel from `mpsc::channel(1)` to
`mpsc::unbounded()` so `Connection::poll_read` can keep draining the
wire regardless of how slowly the per-request consumer is iterating.
The `Request` channel (`Client` → `Connection`) is already unbounded;
this makes the per-request response side match.

Trade-off: per-`Responses` backpressure between the `Connection` task
and its consumer is now unbounded. In practice the kernel socket buffer
is what bounded buffering anyway — `channel(1)` didn't slow the server
down, it just shifted where the bytes pile up.

Minimal reproduction:
https://github.com/rubenfiszel/tokio-postgres-deadlock-repro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant