Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@glevkovich
Copy link
Collaborator

@glevkovich glevkovich commented Dec 14, 2025

Implemented TlsSocket::TrySend using a non-blocking state machine loop to handle upstream backpressure and TLS state requirements (NEED_READ/NEED_WRITE) without unnecessary context switching.

Key changes include:

  • Async Flush Offload: Introduced a "Fire and Hold" mechanism. If TrySend successfully consumes all user data but fails to fully flush the pending ciphertext to the network (partial write), the remaining flush is offloaded to a detached background AsyncReq. This prevents stalling the caller while ensuring data safety.

  • Async Logic Update: Updated AsyncRoleBasedAction to correctly handle detached flush requests (where vec == nullptr), allowing the background fiber to exit gracefully once the buffer is drained.

  • Small Buffer Optimization (SSO): Applied SBO using absl::InlinedVector for iovec copying to minimize heap allocations for standard batch sizes.

  • Build System: Added iovec_utils.cc to the tls_lib target in CMakeLists.txt.

Testing improvements:

  • Scatter-Gather Tests: Added TrySendVectorTest to validate behavior with various iovec counts and split patterns.

  • White-Box Testing: Introduced MockTlsSocketTest infrastructure with Strict/Nice mocks for the Proactor, FiberSocket, and TlsEngine.

  • Edge Case Coverage: Added TrySendErrorTest to verify handling of EAGAIN, Dirty Shutdowns, and Concurrency Conflicts.

  • Async Verification: Added TrySendAsyncFlushTest to verify that stranded data is correctly offloaded to the async path.

  • Bug Fix: Fixed RegisterOnRecv test to explicitly call ResetOnRecvHook() before manual TryRecv, resolving a "Concurrent TryRecv and Recv" usage error.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements non-blocking TrySend functionality for TLS sockets with support for scatter-gather I/O and comprehensive error handling. The implementation uses a state machine loop to handle TLS engine requirements (NEED_WRITE, NEED_READ) without fiber context switching, applies Small Buffer Optimization (SBO) to minimize heap allocations for common iovec counts, and includes extensive test coverage for both happy paths and error scenarios.

Key changes:

  • Implemented TlsSocket::TrySend with non-blocking state machine handling for TLS engine opcodes
  • Added AdvanceIovec helper for correctly tracking partial consumption in scatter-gather arrays
  • Applied SBO pattern (stack buffer for ≤16 iovecs) to reduce allocations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
util/tls/tls_socket.h Added AdvanceIovec helper method declaration and improved its documentation
util/tls/tls_socket.cc Implemented TrySend methods with state machine loop, AdvanceIovec helper, and SBO optimization; includes flush-encrypt loop with error handling
util/tls/tls_socket_test.cc Added parameterized scatter-gather tests (TrySendVectorTest) for various iovec counts; added comprehensive mock-based error scenario tests (TrySendErrorTest) covering concurrency guards, flush blockages, renegotiation, and fatal errors; updated RegisterOnRecv test to use new TrySend; improved parameter parsing for uring detection

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 87.34177% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.43%. Comparing base (dd0b9d2) to head (df7ced4).

Files with missing lines Patch % Lines
util/tls/tls_socket_test.cc 91.00% 18 Missing ⚠️
util/tls/iovec_utils.cc 62.06% 11 Missing ⚠️
util/tls/tls_socket.cc 87.35% 11 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #516      +/-   ##
==========================================
+ Coverage   78.12%   78.43%   +0.30%     
==========================================
  Files         116      117       +1     
  Lines       10319    10629     +310     
==========================================
+ Hits         8062     8337     +275     
- Misses       2257     2292      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@glevkovich glevkovich force-pushed the glevkovich/improve_io_flow_trysend_impl branch 3 times, most recently from b034858 to 7771310 Compare December 14, 2025 13:29
@glevkovich glevkovich requested a review from romange December 14, 2025 13:30
@glevkovich glevkovich force-pushed the glevkovich/improve_io_flow_trysend_impl branch from 7771310 to f3dad3a Compare December 14, 2025 13:52
@glevkovich glevkovich marked this pull request as ready for review December 14, 2025 14:44
Comment on lines 536 to 544
bool has_data{false};
for (size_t i{}; i < len; ++i) {
if (v[i].iov_len > 0) {
has_data = true;
break;
}
}
if (!has_data)
return 0; // nothing to send
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be factored out into a helper function, and lets DCHECK as well, as I do not see why should we allow callers to call this function with no data

Copy link
Collaborator Author

@glevkovich glevkovich Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'll put a helper, thanks - see my other comment about iovec_utils.cc/h. I'm going to add part of it there.
  • I'll add a DCHECK since I can assume we never want to send 0 data in debug.
  • Production: Standard POSIX behaviour (specifically writev and sendmsg) dictates that if you pass a valid iovec array where the sum of lengths is zero, the system call simply returns 0 and does nothing. It is not an error, and we cannot crash on it.

DVSOCK(3) << "TrySend blocked: WRITE_IN_PROGRESS detected";
return make_unexpected(make_error_code(errc::resource_unavailable_try_again));
}
bool read_in_progress{(state_ & READ_IN_PROGRESS) != 0};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we DCHECK here that engine_->OutputPending() == 0 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so:

  • TrySend does not control SSL engine exclusively. I have no idea who called the engine before, maybe via other function. Maybe oms read on the socket generated SSL metadata on the output , and other cases.
  • TrySend can run concurrently with some Read operation, so again OutputPending() can become non-zero.


while ((curr_iovec_len > 0) || (engine_->OutputPending() > 0)) {
// 1. Flush into the upstream socket any pending output from the engine output buffer before
// pushing more data to the engine from the user. These might be bytes from previous call.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it's the best approach.

  1. batching writes is beneficial - we do not want to send a packet per few bytes, we already had such bugs.
  2. if engine_->OutputPending() > 0 that should mean that there is an asynchronous process that takes care of it, imho.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Regarding Batching: You are right that the current "Flush-Push-Flush" loop could result in small packets (one per iovec), which isn't ideal for performance. I will try to refactor this to Send only once in multiple PushToEngine calls before triggering a flush to the upstream socket. If this makes the non-blocking logic too complex for this single PR, I will merge this correct version first and optimize this in a dedicated follow-up PR to keep the changes manageable.

  2. Regarding OutputPending & Async Flushing: I have to disagree on the "asynchronous process" point. At the very top of TrySend, I check:
    if ((state_ & WRITE_IN_PROGRESS) != 0) return ... try_again;
    If there were any asynchronous process or background fiber currently flushing this buffer, WRITE_IN_PROGRESS would be set, and I would have exited immediately.
    Since I reached this line, I am the exclusive writer. There is no other active process responsible for this data. If I don't flush OutputPending here, the data will sit in the SSL BIO indefinitely (causing latency) until the next API call happens to trigger a write. Therefore, it is my responsibility to flush it. Also, this is used as a single place to flush between iteration (only when needed, in the optimised-to-be-written version).

DVSOCK(3) << "Flushed " << *send_result << " bytes to upstream";
if ((*send_result) < output_buf.size()) { // case 1.A: partial write
// upstream socket is full - try again later
returned_status = make_error_code(errc::resource_unavailable_try_again);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Bug. what ensures that the write will be flushed when next_sock_ becomes available?
  2. make_error_code(errc::resource_unavailable_try_again); is wrong as you already consumed some of the input data and it was copied to ssl engine. From a caller perspective - they need to retry the entire operation, so they will try to push the same data again

Copy link
Collaborator Author

@glevkovich glevkovich Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that for partial writes code is correct, there is only one edge case I will mention at the end which is a bit "tricky" (full write with partial flash of the SSL engine output).

====================

  • "what will push the data eventually(1)" ?
    Since I return total_bytes_sent > 0, the caller MUST treats this as a successful partial write. Standard non-blocking socket behaviour implies the caller will eventually call TrySend (or TryRecv) again. iT MUST call TrySend again since the partial write implied also "try again after advancing your iovec".

When user do call TrySend again (even with new data), the very first block of this function:
if (engine_->OutputPending() > 0) { ... }
ensures that we attempt to flush the pending ciphertext before accepting any new user data.
Regarding the error handling (swallowing EAGAIN/errors): this mimics standard POSIX write() semantics. If a write partially succeeds but then hits a network error or blocks, we report the success first (we return total_bytes_sent>0) . The error remains pending and will be returned on the next call (when total_bytes_sent is 0). If error is fatal, it will still "wait for" next user call, if it's non-fatal - the next TrySend/TryRecv might not encounter the error anymore. There is one edge case to discuss at the end of this comment.

=========================
Regarding the duplication concern (2):
I believe the duplication concern does not exist, and is resolved by the return logic at the end of the function.
You are correct that returned_status is set to try_again, but notice the check at the very end:

if (total_bytes_sent > 0) { 
     return total_bytes_sent;
}

Even if returned_status contains an error (like EAGAIN or a socket error), if I have successfully pushed any bytes to the engine (total_bytes_sent > 0), I return that positive count, effectively masking the error for this specific call. This ensures that:

  • The caller sees a positive return value (partial write).
  • The caller advances their buffer pointers by total_bytes_sent.
  • The caller invokes TrySend again with the remaining (new) data.

Therefore, no data is duplicated. The user never retries with the same data because they received a positive confirmation for the chunk that was processed.

====================
Regarding the "Full Write, Partial Flush" scenario (this one is tricky):

Consider the case where the user sends 1000 bytes. We successfully push all 1000 bytes into the SSL engine (encrypting them), but the upstream socket only accepts 500 bytes of the ciphertext before returning EAGAIN.
In this state:
I must return 1000 (success). All the user data has been consumed and encrypted (but some of it not sent yet on the upstream socket). If I return EAGAIN here, the user will retry sending the same 1000 bytes. This would encrypt the data a second time, corrupting the TLS stream (duplication).

So, what ensures the flush?

My claim is that TrySend is a "best effort" function with minimalist return value which cant reflect all complex situations and must return the number of bytes sent even if there was an error. It's the duty of the caller to make sure the TLS engine output buffer is flashed by calling again to TrySend/TryRecv or using AsyncReq.

In details:

  1. Immediate Retry: Since I return a positive byte count, the non-blocking contract implies the user (or upper layer) will continue to call TrySend (to send more data) or TryRecv (to wait for a response). Both functions begin by attempting to flush the engine buffer. But what if user do not call again since they do not want to send or receive more data? for that we have mechanism 2.
  2. Async Layer Safety: For the async/fiber implementation (AsyncReq), we explicitly handle this state. In AsyncReq::MaybeSendOutputAsyncWithRead (and AsyncReadSome), we check engine_->OutputPending() and call StartUpstreamWrite() if positive This one register for write on the uostream socker in next_socket->AsyncWriteSome. If there is buffered ciphertext, we register in that function (next_socket_->AsyncWriteSome) for a Write event (EPOLLOUT) , even if the user requested a Read, preventing any deadlock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we concluded that function is correct for all cases except one: when all user data sent and there is pending data n engine output buffer. In that case we would like to make this function "send and forget". Code will start an async process to make sure engine output buffer is flushed into upstream socket.

}
// case 1.B: full write - fall through to the next step
} else { // case 1.C: write failed (EAGAIN or other Error).
returned_status = send_result.error();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. same thing - what will push the data eventually?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Regarding a general error case (General Error): This follows standard POSIX partial write semantics. If we successfully processed some bytes but then hit a hard error (e.g., Broken Pipe), we return the total_bytes_sent first . The next call will attempt to flush, hit the error immediately (with total_bytes_sent == 0), and correctly return the error to the user. The edge case of full write will partial flash - see here at the end: feat(tls): implement non-blocking TrySend with async flush offload #516 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we concluded that function is correct for all cases except one: when all user data sent and there is pending data n engine output buffer. In that case we would like to make this function "send and forget". Code will start an async process to make sure engine output buffer is flushed into upstream socket.

- Implemented TlsSocket::TrySend using a non-blocking state machine loop
 to handle NEED_WRITE and NEED_READ without context switching.
- Added AdvanceIovec helper to correctly track partial consumption of
scatter-gather arrays.
- Applied Small Buffer Optimization (SBO) to iovec vectors, using stack
storage for small batches to minimize heap allocations.
- Simplified the flush/encrypt loop structure for improved readability
and reduced code size.
- Updated tls_socket_test with parameterized scatter-gather tests and
mock-based error scenarios.

Signed-off-by: Gil Levkovich <[email protected]>
Signed-off-by: Gil Levkovich <[email protected]>
Signed-off-by: Gil Levkovich <[email protected]>
@glevkovich glevkovich force-pushed the glevkovich/improve_io_flow_trysend_impl branch from d05c05a to ede2ab2 Compare December 17, 2025 17:33
Signed-off-by: Gil Levkovich <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

@glevkovich glevkovich changed the title feat(tls): implement non-blocking TrySend with SBO and state handling feat(tls): implement non-blocking TrySend with async flush offload Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants