-
Notifications
You must be signed in to change notification settings - Fork 1.3k
raspberrypi 8.0.0 socket exceptions when web workflow is enabled #7333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Web workflow uses 1 socket for mdns, 1 socket to accept connections with a buffer of 1, 1 "active" socket for web requests and 1 socket for the websocket connection. So, 4 or 5 depending on how the pending connection is handled. |
On some URLs, with web workflow off, things appear to work. But with the added print-debugs to Requests to expose the This happens in beta.4 and beta.5. So something has been lurking there that may be related to the web workflow beta.5 case. |
Forgot to mention that with web workflow on raspberrypi beta.5, when using the above code, I've tried several URLs. With the Adafruit URL, I don't know what the server sees, but none are successful. When I hit a WAN Apache server I can access (using domain name), the server shows HTTP 200 status codes and occasionally some are successful, but usually error as described above. But a LAN Apache server (using IP address) shows HTTP 408 status codes (server timed out). So it looks like some level of connection and data transfer is made, but the reads tend to fail in most circumstances. |
Did some testing on However, with some URLs, the first try in the Addendum: In another espressif code context (QT Py ESP32-S2 running beta.5, requests timeout set to 5 seconds), the first request in It does seem that there are several issues at play here, but the most immediate is |
I'm testing with
this contains some fixes to make a wider range of socket name lookup failures throw the "gaierror" exception as intended, but should be functionally identical to main. wifi workflow is enabled. However, I have not connected to the web workflow interface or messed with resolving the device's mdns name from other computers. To exclude as many factors as possible, I did not use adafruit_requests, but just wrote low level tests of getaddrinfo and socket.connect. all code was run by import from
Unfortunately, this doesn't reproduce any problems for me. Typical output:
|
I tried running the original test script on the Pico W running the 12/8 beta.5 bits from s3 and after about 5 minutes I got the "OutOfRetries: Repeated socket failures" message. I then flashed the latest bits from github and replaced my .env file with a settings.toml file and ran the test script which results in the success message being displayed repeatedly. I've run it now for several minutes without any failures. |
This is probably expected behavior but I hadn't noticed it before. Prior to running this script the Pico W shows up in the web workflow list of "CircuitPython devices on your network" from another idle device, but while the script is running the device doesn't show up in the list..... |
No, I think it would be expected to still show up. @tannewt can you say for sure? |
I'd expect it to show up. However, mdns does use udp so it may be missed occasionally due to that. |
I did check it out on a Feather ESP32-S3 and didn't see the same behavior so I opened an issue up on it #7346 |
I just tested this with latest S3 This works now with web workflow enabled. (Requests is still often getting an Does anyone know what was broken in beta.5, and what fixed it? |
Re-opening based on: |
I suspect there may be some marginal timing (perhaps variable) where (TLS?) sockets aren't ready at code start, and the perception of persistence across power / reset is just the code not working. But once it doesn't work, it will continue to not work until some delay is added and the device is reset. This works after a reset and ongoing, but if import wifi
import time
import socketpool
import ssl
STARTUP_WAIT = 5
ITERATIONS = 10
HOST = "example.com"
PATH = "/"
PORT = 443
MAXBUF = 4096
time.sleep(STARTUP_WAIT) # wait for serial (and maybe wait for sockets)
print(f"{'='*25}")
print("Web Workflow enabled - already connected to AP")
pool = socketpool.SocketPool(wifi.radio)
for _ in range(ITERATIONS):
print(f"{'-'*25}\nCreate TCP Client Socket")
with pool.socket(pool.AF_INET, pool.SOCK_STREAM) as sock:
s = ssl.create_default_context().wrap_socket(sock, server_hostname=HOST)
print("Connecting")
s.connect((HOST, PORT))
size = s.send(f"HEAD {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")
# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size]) Indeed, even @jepler's code behaves the same way if the TLS block is attempted first: EXPAND...import time
import socketpool
import wifi
import ssl
time.sleep(0) # wait for serial (and maybe wait for sockets)
print(f"{'-'*25}")
if wifi.radio.ipv4_address is None:
print("connecting to wifi")
wifi.radio.connect(os.getenv("WIFI_SSID"), os.getenv('WIFI_PASSWORD'))
print(f"local address {wifi.radio.ipv4_address}")
socket = socketpool.SocketPool(wifi.radio)
print()
print("SSL connection test")
time.sleep(.1)
ctx = ssl.create_default_context()
success = failure = 0
for i in range(10):
try:
with ctx.wrap_socket(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
print(f"{s=}")
time.sleep(.1)
s.connect(("example.com", 443))
s.send('HEAD / HTTP/1.1\r\nHost: example.com\r\n\r\n')
buf = bytearray(4096)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])
success += 1
except Exception as e:
print(f"{type(e)}: {e} {getattr(e, 'errno', None)}")
failure += 1
print()
print(f"Over {success+failure} attempts, {success} success {failure} failure") Addendum: The issue does not appear to be TLS-specific. The original example fails as well after a reset without a delay: EXPAND...import time
import traceback
import wifi
import socketpool
import ssl
import adafruit_requests
TEXT_URL = "http://wifitest.adafruit.com/testwifi/index.html"
STARTUP_WAIT = 0
time.sleep(STARTUP_WAIT) # wait for serial (and maybe wait for sockets)
print(f"{'='*25}")
print("Web Workflow enabled - already connected to AP")
pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())
while True:
try:
print("Fetching text from", TEXT_URL)
response = requests.get(TEXT_URL)
print("-" * 40)
print(response.text)
print("-" * 40)
except Exception as e:
traceback.print_exception(e, e, e.__traceback__)
time.sleep(5) Perhaps related to #7313 (thanks, @Neradoc) - one second may not be enough. |
@jepler Is there something we need to wait for when starting up the co-processor? Sounds like same problem reported here: https://forums.adafruit.com/viewtopic.php?t=198486 (noted by mgmt). I haven't caught up on discord this weekend, so maybe this is all already known and linked. |
hyde00001 confirmed on Discord just now that a delay at the start of the code works as above, once the board is then reset. |
I have not seen a documented need for such a delay. I do not understand how to check the latest reproducer script to see the resulting exception, since without a delay I won't be able to connect and see the printed exception. What exception does the non-tls version produce? |
No exception, it (non-tls) just times out with zero bytes received. Try your TLS loop but with no TLS right at the top of code, no delay, right after a reset. You'll miss the first prints, but subsequent loops will print. This works with 5 seconds delay after reset, but not with 0 seconds delay after reset: EXPAND...import time
import socketpool
import wifi
import ssl
import traceback
time.sleep(5) # wait for serial (and maybe wait for sockets)
print(f"{'='*25}")
print("Web Workflow enabled - already connected to AP")
socket = socketpool.SocketPool(wifi.radio)
print()
print("non-SSL connection test")
time.sleep(.1)
# ctx = ssl.create_default_context()
success = failure = 0
for i in range(10):
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
print(f"local address {wifi.radio.ipv4_address}")
print(f"{s=}")
time.sleep(.1)
s.connect(("wifitest.adafruit.com", 80))
# "http://wifitest.adafruit.com/testwifi/index.html"
s.send('HEAD /testwifi/index.html HTTP/1.1\r\nHost: wifitest.adafruit.com\r\n\r\n')
buf = bytearray(1024)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])
success += 1
except Exception as e:
traceback.print_exception(e, e, e.__traceback__)
print(f"{type(e)}: {e} {getattr(e, 'errno', None)}")
failure += 1
print()
print(f"Over {success+failure} attempts, {success} success {failure} failure") |
It may not be possible to reproduce it in MicroPython since it's really more like the non-web-workflow case in CircuitPython (where it's likely that the time to set up and connect to the AP is enough). This MicroPython code without a delay ( EXPAND...import time
ITERATIONS = 10
# Connect to network
import network
wlan = network.WLAN(network.STA_IF)
wlan.active(True)
wlan.connect('ssid', 'password')
# Should be connected and have an IP address
wlan.status() # 3 == success
wlan.ifconfig()
# Get IP address for destination
import socket
ai = socket.getaddrinfo("wifitest.adafruit.com", 80)
addr = ai[0][-1]
for _ in range(ITERATIONS):
# Create a socket and make a HTTP request
s = socket.socket()
s.connect(addr)
# s.send(b"GET / HTTP/1.0\r\n\r\n")
s.send('HEAD /testwifi/index.html HTTP/1.1\r\nHost: wifitest.adafruit.com\r\n\r\n')
# Print the response
print(s.recv(512))
s.close()
time.sleep(1)
time.sleep(10)
import machine
machine.reset() |
Sorry - I missed that is with the web workflow only. |
|
Interesting. I didn't expect that to behave the same, but it did. The issue appears more fundamental than wifi init. This code works with a 5-second delay, but not with 0 seconds (again non-SSL version times out with 0 bytes received ; SSL version gets the EXPAND...import time
time.sleep(5) # wait for serial and cyw43
print(f"{'='*25}")
print("Web Workflow enabled - already connected to AP")
import wifi
import socketpool
# import ssl
import traceback
socket = socketpool.SocketPool(wifi.radio)
print()
print("non-SSL connection test")
time.sleep(.1)
# ctx = ssl.create_default_context()
success = failure = 0
for i in range(10):
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
print(f"local address {wifi.radio.ipv4_address}")
print(f"{s=}")
time.sleep(.1)
s.connect(("wifitest.adafruit.com", 80))
# "http://wifitest.adafruit.com/testwifi/index.html"
s.send('HEAD /testwifi/index.html HTTP/1.1\r\nHost: wifitest.adafruit.com\r\n\r\n')
buf = bytearray(1024)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])
success += 1
except Exception as e:
traceback.print_exception(e, e, e.__traceback__)
print(f"{type(e)}: {e} {getattr(e, 'errno', None)}")
failure += 1
print()
print(f"Over {success+failure} attempts, {success} success {failure} failure") I guess that makes sense though, even |
Clearly some timing issue here, though I cannot see where. Putting a "mp_hal_delay_ms(1500);" at the top of common_hal_socketpool_socket() in ports/raspberrypi/common-hal/socketpool/Socket.c makes everything work, but that's quite a hack. |
Increase number of LWIP timers for MDNS (fixes #7333)
CircuitPython version
Code/REPL
Behavior
The above code, slightly modified boilerplate internet connect code, does not work on
raspberrypi
8.0.0-beta.5 when web workflow is enabled (wifi credentials in.env
file).Consistently gets exception:
It seems that the timeout in requests is kicking in within the library. No data is received. Occasionally
ETIMEDOUT
.Underlying exceptions to the
OutOfRetries
occur (discovered through some print-debugging inadafruit_requests
) if a manual timeout is supplied. Exceptions are initiallyOSError: [Errno 116] ETIMEDOUT
inrecv_into
, waiting for theH
ofHTTP
to kick off the receive. ThenOSError: 32
in_send
takes over on later requests. But again, sometimes there is no error, just no data received.I suspect some low-level socket shenanigans. There should be 8 sockets available, web workflow uses at least one for the TCP
listen
, probably something for mDNS, and more with client(s) accessing web workflow features?Interestingly If requests alternate between two URLs, after several failures for both, the second will succeed once. Rinse, repeat.
No issue with the above code on 8.0.0-beta.4, or on 8.0.0-beta.5 with web workflow disabled (no
.env
file).Description
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: