Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Dltmd202
Copy link
Contributor

@Dltmd202 Dltmd202 commented Jul 6, 2025

  • I have registered the PR changes.

Ⅰ. Describe what this PR did

Fixes a potential Netty I/O thread blocking issue by executing releaseChannel() asynchronously via a dedicated reconnectExecutor thread pool.

Also ensures proper shutdown of reconnectExecutor to avoid thread leaks.

Ⅱ. Does this pull request fix one issue?

fixes #7497

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

  • Naming conventions (rpcReconnectExecutor) aligned with existing merge thread patterns.
  • reconnectExecutor is now managed with proper init and destroy lifecycle hooks.

@Dltmd202 Dltmd202 changed the title fix: prevent Netty I/O thread blocking by async channel release via r… bugfix: prevent Netty I/O thread blocking by async channel release via reconnectExecutor Jul 6, 2025
@codecov
Copy link

codecov bot commented Jul 6, 2025

Codecov Report

Attention: Patch coverage is 63.63636% with 8 lines in your changes missing coverage. Please review.

Project coverage is 60.61%. Comparing base (9f39706) to head (2a82afb).
Report is 1 commits behind head on 2.x.

Files with missing lines Patch % Lines
...ta/core/rpc/netty/AbstractNettyRemotingClient.java 63.63% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #7505      +/-   ##
============================================
+ Coverage     60.50%   60.61%   +0.11%     
  Complexity      658      658              
============================================
  Files          1307     1307              
  Lines         49359    49377      +18     
  Branches       5805     5805              
============================================
+ Hits          29865    29932      +67     
+ Misses        16848    16791      -57     
- Partials       2646     2654       +8     
Files with missing lines Coverage Δ
...ta/core/rpc/netty/AbstractNettyRemotingClient.java 44.54% <63.63%> (+6.92%) ⬆️

... and 10 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Dltmd202 Dltmd202 force-pushed the 7497 branch 5 times, most recently from e24feeb to b60f6d1 Compare July 7, 2025 00:22
@Dltmd202 Dltmd202 marked this pull request as ready for review July 7, 2025 00:35
new NamedThreadFactory(getThreadPrefix(), MAX_MERGE_SEND_THREAD));
mergeSendExecutorService.submit(new MergedSendRunnable());
}
reconnectExecutor = new ThreadPoolExecutor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the timerExecutor thread pool directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @funky-eyes
I added a separate thread pool mainly to give it a clear thread name, so it’s easier to trace when reconnect-related issues happen. But if that feels unnecessary here, I’m happy to switch to using the existing timerExecutor. Let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, creating a new thread pool is generally preferred when there’s a high-priority task or when real-time processing is critical. Also, how frequently the issue occurs can be an important factor in deciding whether to create a new thread pool.

From a simple traceability standpoint, the current timeoutExecutor is used in several places — such as handling reconnections and removing timed-out messages.

So, it might make sense to update the prefix to something more general that fits well across all these usages, and reuse it accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @funky-eyes I added a separate thread pool mainly to give it a clear thread name, so it’s easier to trace when reconnect-related issues happen. But if that feels unnecessary here, I’m happy to switch to using the existing timerExecutor. Let me know what you think!

I wrote very clearly in the issue why we should reuse the timeoutExecutor. I suggest you can take a look at the reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I revisited the issue and I now see your point more clearly — reusing timeoutExecutor makes sense, especially since it’s already used for similar tasks like handling reconnections and timeouts. Avoiding an extra thread pool also helps keep things lean and easier to manage.

I’ll go ahead and update the code to reuse timeoutExecutor accordingly. Appreciate the feedback!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve finished the changes. Would you mind taking a look when you have time?
@funky-eyes @YongGoose

@Dltmd202 Dltmd202 force-pushed the 7497 branch 2 times, most recently from 6a1dfca to 0489f84 Compare July 9, 2025 11:47
@funky-eyes funky-eyes added type: bug Category issues or prs related to bug. module/core core module labels Jul 10, 2025
Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

handler.exceptionCaught(mockCtx, new IllegalArgumentException("test"));

Thread.sleep(500);
verify(spyManager).releaseChannel(eq(channel), anyString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you used anyString()?
It seems like serverAddress will always be 127.0.0.1:8091, so I’m wondering if matching the exact value would be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right — that makes sense to me as well. I’ll go ahead and update it to match the exact value instead of using anyString().

@PeppaO
Copy link
Contributor

PeppaO commented Jul 10, 2025

This modification cannot solve the problem. I think you can reproduce the problem first, then analyze the cause, and retest after modification.

@PeppaO
Copy link
Contributor

PeppaO commented Jul 10, 2025

To reproduce, start three tc nodes 8091/8092/8093, start a business-xa application, observe the corresponding tcp connection port, then kill 8091, and observe whether the tcp connection is disconnected and reconnected

@PeppaO
Copy link
Contributor

PeppaO commented Jul 10, 2025

After resolving the disconnection and reconnection, perform a stress test to see if the TPS is 0 during the shutdown period after killing one of the tc nodes.
@funky-eyes @Dltmd202

Copy link
Contributor

@funky-eyes funky-eyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All places in ChannelHandler that call the clientChannelManager.releaseChannel method need to be made asynchronous.

@Dltmd202 Dltmd202 force-pushed the 7497 branch 3 times, most recently from 15a1116 to e31f5ab Compare July 12, 2025 14:27
@funky-eyes
Copy link
Contributor

18.18% of diff hit (target 60.50%)
Can you add some test cases to improve the coverage?

@Dltmd202
Copy link
Contributor Author

Sure, I’ll take care of it!

@Dltmd202
Copy link
Contributor Author

I’ve finished the changes. Would you mind taking a look when you have time?
@funky-eyes

@funky-eyes funky-eyes added this to the 2.6.0 milestone Jul 24, 2025
@funky-eyes funky-eyes merged commit 61d6cb7 into apache:2.x Jul 24, 2025
10 checks passed
slievrly pushed a commit to slievrly/fescar that referenced this pull request Oct 21, 2025
YvCeung pushed a commit to YvCeung/incubator-seata that referenced this pull request Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module/core core module type: bug Category issues or prs related to bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The client I/O thread may be blocked

4 participants