tokio-postgres: fix deadlock in query_typed/prepare when result schema has unknown OIDs#1348
Open
rubenfiszel wants to merge 1 commit into
Open
Conversation
…a has unknown OIDs
When the result schema of a streaming query (`Client::query_typed_raw`,
`Client::query_typed`, or `Client::prepare` when followed by a Bind+Execute
on the same connection) contains a column whose OID isn't a built-in
`tokio_postgres::types::Type`, and the typeinfo cache is cold, `query::query_typed`
calls `get_type(client, oid).await` synchronously while it still holds
the original query's `Responses` stream:
Message::RowDescription(row_description) => {
let mut columns: Vec<Column> = vec![];
let mut it = row_description.fields();
while let Some(field) = it.next().map_err(Error::parse)? {
let type_ = get_type(client, field.type_oid()).await?; // <-- blocks here
...
}
return Ok(RowStream { ... });
}
If the result is large enough that the server keeps sending `DataRow`
messages for the original query before that `get_type` completes, those
`DataRow`s back up in the per-request `Responses` `mpsc::channel(1)`. As
soon as the channel is full `Connection::poll_read` returns `Poll::Pending`
and parks the next `BackendMessages` frame in `pending_responses`, then
stops draining the wire. The typeinfo sub-query's response is queued on
the same socket behind those `DataRow`s, so it never arrives, and the
outer `get_type` await never completes. Classic head-of-line blocking.
This trips most often with the `citext` extension, custom enums, custom
domains, and postgis geometry — anything with an OID that
`Type::from_oid` doesn't recognise. The deadlock requires the result to
span more than one `BackendMessages` frame on the wire, so it manifests
at moderate row counts (≈100+ small rows on localhost in repro).
Fix: switch the per-request response channel from `mpsc::channel(1)` to
`mpsc::unbounded()` so `Connection::poll_read` can keep draining the
wire regardless of how slowly the per-request consumer is iterating.
The `Request` channel (`Client` → `Connection`) is already unbounded;
this makes the per-request response side match.
Trade-off: per-`Responses` backpressure between the `Connection` task
and its consumer is now unbounded. In practice the kernel socket buffer
is what bounded buffering anyway — `channel(1)` didn't slow the server
down, it just shifted where the bytes pile up.
Minimal reproduction:
https://github.com/rubenfiszel/tokio-postgres-deadlock-repro
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Client::query_typed_raw(and the streaming-prepare code path) deadlocks when:tokio_postgres::types::Type(citext, custom enums / domains, postgisgeometry, etc.), AND
BackendMessagesframe on the wire (≈100+ small rows on localhost in repro).
Head-of-line blocking:
query::query_typedcallsget_type(client, oid).awaitsynchronously while it still holds the original query's
Responsesstream(
tokio-postgres/src/query.rs):The original query's
DataRows back up in the per-requestResponsesmpsc::channel(1). Once full,Connection::poll_readparks the nextBackendMessagesframe inpending_responsesand returnsOk(None)(thePoll::Pendingbranch aroundconnection.rs:159). The wire stops beingdrained, but the typeinfo sub-query's response is queued on the same socket
behind those
DataRows — it never arrives, and the outerget_typeawaitnever completes.
Two fix variants
This PR is the minimal variant. There's a more involved alternative in
#1349 that preserves the bounded-channel API. The two are mutually
exclusive — pick whichever you prefer. Same root cause, same memory profile
in the deadlock case, different surface.
Responseschannelunboundedmpsc::channel(1)(unchanged)parked: VecDeque<…>field on eachResponsecompletion_seenflag +target_response_idx()Responsesreceiver type changes (pub-crate-private)This PR (minimal)
Switch the per-request response channel from
mpsc::channel(1)tompsc::unbounded()soConnection::poll_readcan keep draining the wireregardless of how slowly the per-request consumer is iterating. The request
channel (
Client→Connection) is already unbounded; this makes theper-request response side match.
Trade-off: per-
Responsesbackpressure between theConnectiontask andits consumer is now unbounded. In practice the kernel socket buffer is what
bounded buffering anyway —
channel(1)didn't slow the server down, itjust shifted where the bytes pile up.
#1349 (surgical alternative)
Keeps
mpsc::channel(1)and instead changesConnection::poll_readtokeep draining the wire when a sender backs up, parking the unsent frame on
a per-response queue (
parked: VecDeque<(BackendMessages, bool)>) insideResponseitself. Tracks acompletion_seenflag so wire frames after aparked-but-not-yet-delivered
ReadyForQueryget routed to the nextresponse.
Per-response (rather than a single global parked queue) is the critical
detail: a global queue still deadlocks because once
response[0]'ssender is full, frames for
response[1]would pile up behind it with noway to deliver them. Per-response queues let us poll each sender
independently.
Why two PRs
I have a slight preference for this one because it's a one-line semantic
change and a one-type-swap diff. But I don't know if there's a reason the
original
mpsc::channel(1)was specifically1rather than some othersmall bound, and #1349 is there in case you'd rather preserve that. Happy
to fold the loser into the winner if you'd like — let me know.
Reproduction
Minimal standalone repro: https://github.com/rubenfiszel/tokio-postgres-deadlock-repro
git clone https://github.com/rubenfiszel/tokio-postgres-deadlock-repro cd tokio-postgres-deadlock-repro ./setup.sh cargo run --releaseBuggy output on
master:With this PR applied (via
[patch.crates-io]to this branch):The repro's README has a walkthrough of the deadlock with line-by-line
references to
query.rsandconnection.rs.Test plan
cargo build -p tokio-postgrescleancargo test -p tokio-postgres --libpasses (the in-process unit teststhat don't need a live Postgres)
master, passes with this PRquery_typed_rawcall against a partitioned table with a
citextcolumn on Neon) —hangs indefinitely on
master, completes in ~420ms with the patch