Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bugfix for stuck in write method of WiFiClient and WiFiClientSecure until the remote peer closed connection #6104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2019

Conversation

sislakd
Copy link
Contributor

@sislakd sislakd commented May 16, 2019

Couple of days I was troubleshooting strange behavior with stability of components built on top of WiFiClient and WiFiClientSecure. Finally, I found the root cause of these issues. From time to time it happened that call of write method get stuck until the remote peer closed connection. It seems that root cause bug is present for quite long time in the code.

When tcp send buffer is full, ClientContext::_write_from_source increments _send_waiting and switch context to NONOS using esp_yield. If something else call esp_schedule (not _write_some_from_cb method in the same instance of ClientContext), the cycle in _write_from_source is repeated, send buffer is still full and value of _send_waiting is incremented again (thus from this moment _send_waiting>1). Any successful ack on the relevant connection never call esp_schedule because of condition in _write_some_from_cb where _send_waiting is decremented only if it is equal to 1.

One example when something else can call esp_schedule method is when there are two or more ClientContext instances (e.g. two client connections). Ack on other client context cause esp_schedule and thus resume of write this client context while there is still no space in tcp send buffer.

The simplest solution is set _send_waiting to 1 instead of its increment. As _send_waiting is one Byte it has no sense to change it to bool.

@earlephilhower
Copy link
Collaborator

To @d-a-v to give this a once-over.

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you found a sneaky bug that has been there for a long time, thanks!

However, I would request you change it to a bool and adjust the if (send_waiting==1) statement (which caused the infinite hang once send_waiting got to 2) accordingly. We really want a flag here, not a count, so a bool would reduce technical debt.

@sislakd
Copy link
Contributor Author

sislakd commented May 16, 2019

I've updated _send_waiting to be clear bool flag.

@earlephilhower
Copy link
Collaborator

Thanks! I'll leave it to @d-a-v to double-check that this only needs to be a flag and not a count (in which case the ==1 in the original code should be a >= 1)...

Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sislakd this is a great finding and fixing!
I have always been suspecting this send_waiting operations but wasn't sure if it had to be fixed.
no fail, no fix .. until it fails, thanks

@d-a-v d-a-v merged commit 25c95ac into esp8266:master May 16, 2019
@TD-er
Copy link
Contributor

TD-er commented May 17, 2019

Well, this looks like a very sneaky bug indeed and just by looking at the code, I already imagine a lot of related reported issues.
Will make a test build for a few of my nodes immediately to test it.

@d-a-v if you're aware of more of these 'suspecting' parts of the code, please add some issues about them so they can be looked into.
This change does have the potential to fix a lot of hanging issues.

earlephilhower added a commit to earlephilhower/Arduino that referenced this pull request May 20, 2019
Changes since 2.5.1 (to 2.5.2)

Core
----
* Add explicit Print::write(char) (esp8266#6101)

Build system
----
* Fix typo in elf2bin for QOUT binary generation (esp8266#6116)
* Support PIO Wl-T and Arduino -T linking properly (esp8266#6095)
* Allow *.cc files to be linked into flash by default (esp8266#6100)
* Use custom "ElfToBin" builder for PIO (esp8266#6091)
* Fail if generated JSON file cannot be read (esp8266#6076)
* Moved 'Dropping' print from stdout to stderr in drop_versions.py (esp8266#6071)
* Fix PIO issue when build environment contains spaces (esp8266#6119)

Libraries
----
* Remove deadlock when server is not acking our data (esp8266#6107)
* Bugfix for stuck in write method of WiFiClient and WiFiClientSecure until the remote peer closed connection (esp8266#6104)
* Re-add original SD FAT info access methods (esp8266#6092)
* Make FILE_WRITE append in SD.h wrapper (esp8266#6106)
* Drop X509 after connection, avoid hang on TLS broken (esp8266#6065)
@earlephilhower earlephilhower mentioned this pull request May 20, 2019
earlephilhower added a commit that referenced this pull request May 20, 2019
Changes since 2.5.1 (to 2.5.2)

Core
----
* Add explicit Print::write(char) (#6101)

Build system
----
* Fix typo in elf2bin for QOUT binary generation (#6116)
* Support PIO Wl-T and Arduino -T linking properly (#6095)
* Allow *.cc files to be linked into flash by default (#6100)
* Use custom "ElfToBin" builder for PIO (#6091)
* Fail if generated JSON file cannot be read (#6076)
* Moved 'Dropping' print from stdout to stderr in drop_versions.py (#6071)
* Fix PIO issue when build environment contains spaces (#6119)

Libraries
----
* Remove deadlock when server is not acking our data (#6107)
* Bugfix for stuck in write method of WiFiClient and WiFiClientSecure until the remote peer closed connection (#6104)
* Re-add original SD FAT info access methods (#6092)
* Make FILE_WRITE append in SD.h wrapper (#6106)
* Drop X509 after connection, avoid hang on TLS broken (#6065)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants