Commit cd09c2f
committed
[libc++] Fix semaphore timed wait hanging on Windows (llvm#180398)
Fixes llvm#180334
The semaphore timed wait test is flaky on Windows. It hangs from time to
time.
Some examples:
[windows (clang-cl-no-vcruntime, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21737380876/job/62707542836#logs)
[windows (mingw-static, true, cc,
c++)](https://github.com/llvm/llvm-project/actions/runs/21636063482/job/62367831823?pr=179483#logs)
[windows (clang-cl-static, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21453876753/job/61794464147#logs)
[windows (clang-cl-dll, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21382902941/job/61556154029#logs)
[windows (mingw-static, true, cc,
c++)](https://github.com/llvm/llvm-project/actions/runs/21365713577/job/61502377123#logs)
The internal dylib function takes a timeout
```cpp
static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns)
```
We followed the same convention as `__libcpp_thread_poll_with_backoff`,
where we used `0ns` to indicate wait indefinitely until being notified.
```cpp
_LIBCPP_HIDE_FROM_ABI __poll_with_backoff_results __libcpp_thread_poll_with_backoff(
_Poll&& __poll, _Backoff&& __backoff, chrono::nanoseconds __max_elapsed = chrono::nanoseconds::zero())
```
This is problematic, if the caller indeed wants to wait `0ns` and passes
`0`, the internal dylib function `__platform_wait_on_address` would wait
indefinitely
```cpp
__timeout_ns == 0 ? INFINITE : static_cast<DWORD>(__timeout_ns / 1'000'000)
```
This is what actually happened here. So the fix is to update internal
dylib function to use `optional` and use `nullopt` to indicate wait
indefinitely
```cpp
static void __platform_wait_on_address(void const* __ptr, void const* __val, optional<uint64_t> __timeout_ns)
```
Edit: after code review, the code is updated to use tag type `NoTimeout`
to indicate "wait indefinitely". this is superior because it is coded
into the type system instead of runtime check
problem?
`__libcpp_thread_poll_with_backoff` has this " `0ns` means wait
indefinitely " semantic for years (it has always been like that), but it
never causes issues. This is because, it has
```cpp
chrono::nanoseconds const __elapsed = chrono::high_resolution_clock::now() - __start;
if (__max_elapsed != chrono::nanoseconds::zero() && __max_elapsed < __elapsed)
return __poll_with_backoff_results::__timeout;
```
`__max_elapsed` is what user passed in, let's assume the user passed in
`0ns`, and `__elapsed` is certainly a positive number so it never goes
to the backoff function and directly returned. No hanging possible
So in the test, we passed wait some time, say `1ms`, from the
`semaphore` public API, which calls `__libcpp_thread_poll_with_backoff`
with `1ms` timeout. `__libcpp_thread_poll_with_backoff` will do some
polling loops and then calling into the backoff function, platform timed
wait in this case, with timeout of `1ms - elapsed` , say `950us`.
However, Windows platform wait has millisecond precision, so `950us` is
rounded down to `0ms` (`static_cast<DWORD>(__timeout_ns / 1'000'000)`),
so the function call almost immediately returns, and
`__libcpp_thread_poll_with_backoff` will keep its polling loop like
this. As time goes by in the polling loop, the timeout for platform wait
will decrease from `950us` to smaller and smaller number.
In the `__libcpp_thread_poll_with_backoff`
```cpp
if (__max_elapsed != chrono::nanoseconds::zero() && __max_elapsed < __elapsed)
return __poll_with_backoff_results::__timeout;
if (auto __backoff_res = __backoff(__elapsed); __backoff_res == __backoff_results::__continue_poll)
```
`__max_elapsed` is user requested timeout, which is `1ms` in this case,
and `__elapsed` gradually increases and eventually, if it becomes
greater than `1ms`, we have `__max_elapsed < __elapsed`, it will return
and test passes. all Good. But there is a slim chance that on one loop,
`__elapsed` is exactly the same number of the user requested
`__max_elapsed` `1ms`, so `__max_elapsed == __elapsed`, this will make
the code path go to backoff platform wait, with the timeout
`__max_elapsed - __elapsed == 0ns`. So now we are requesting
`__platform_wait_on_address` to wait exactly `0ns`, and due to our
ambiguous API, `0ns` means wait indefinitely
```cpp
__timeout_ns == 0 ? INFINITE : static_cast<DWORD>(__timeout_ns / 1'000'000)
```
The test will just hang forever.
in `__libcpp_thread_poll_with_backoff`, If we check `__max_elapsed <=
__elapsed` instead of `__max_elapsed < __elapsed`, we would avoid the
call to platform wait with `0ns`. But according to the standard, I think
the current `__max_elapsed < __elapsed` is more correct
> The timeout expires
([[thread.req.timing]](https://eel.is/c++draft/thread.req.timing)) when
the current time is after abs_time (for try_acquire_until) or when at
least rel_time has passed from the start of the function (for
try_acquire_for)[.](https://eel.is/c++draft/thread.sema#cnt-18.sentence-2)
https://eel.is/c++draft/thread.sema#cnt-18.2
So this **after** means that it needs strictly greater than i think.
The fix is to update the internal dylib API to use `nullopt` to indicate
wait indefinitely. So a call to platform wait with `0ns` will not cause
a hang.
Edit: after code review, the code is updated to use tag type `NoTimeout`
to indicate "wait indefinitely". this is superior because it is coded
into the type system instead of runtime check
I made a small change as well. it is very easy to get into a situation
where the requested platform wait timeout is `<1ms`, we will be keep
calling Windows platform wait with `0ns` because of the rounding, and
this is effectively a spin lock . I made a change such that if the
requested timeout is between `100us to 1ms`, just rounded up to `1ms` to
wait a bit longer (which is conforming IIUC) . let me know if this
change is necessary, happy to take this part out if it is not considered
good.1 parent 2bb9885 commit cd09c2f
2 files changed
Lines changed: 105 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
62 | 65 | | |
63 | 66 | | |
| 67 | + | |
| 68 | + | |
64 | 69 | | |
65 | 70 | | |
66 | | - | |
67 | | - | |
| 71 | + | |
| 72 | + | |
68 | 73 | | |
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
72 | 77 | | |
73 | | - | |
| 78 | + | |
74 | 79 | | |
75 | 80 | | |
76 | | - | |
77 | | - | |
| 81 | + | |
| 82 | + | |
78 | 83 | | |
79 | 84 | | |
80 | 85 | | |
| |||
96 | 101 | | |
97 | 102 | | |
98 | 103 | | |
99 | | - | |
100 | | - | |
| 104 | + | |
| 105 | + | |
101 | 106 | | |
102 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
103 | 114 | | |
104 | 115 | | |
105 | 116 | | |
| |||
130 | 141 | | |
131 | 142 | | |
132 | 143 | | |
133 | | - | |
134 | | - | |
| 144 | + | |
| 145 | + | |
135 | 146 | | |
136 | 147 | | |
137 | 148 | | |
138 | | - | |
| 149 | + | |
139 | 150 | | |
140 | 151 | | |
141 | 152 | | |
142 | | - | |
143 | | - | |
| 153 | + | |
| 154 | + | |
144 | 155 | | |
145 | 156 | | |
146 | 157 | | |
| |||
184 | 195 | | |
185 | 196 | | |
186 | 197 | | |
187 | | - | |
188 | | - | |
| 198 | + | |
| 199 | + | |
189 | 200 | | |
190 | 201 | | |
191 | 202 | | |
192 | 203 | | |
193 | 204 | | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
198 | 218 | | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
199 | 223 | | |
200 | 224 | | |
201 | 225 | | |
202 | | - | |
| 226 | + | |
203 | 227 | | |
204 | 228 | | |
205 | 229 | | |
| |||
233 | 257 | | |
234 | 258 | | |
235 | 259 | | |
236 | | - | |
237 | | - | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
238 | 266 | | |
239 | 267 | | |
240 | 268 | | |
241 | | - | |
| 269 | + | |
242 | 270 | | |
243 | 271 | | |
244 | 272 | | |
| |||
261 | 289 | | |
262 | 290 | | |
263 | 291 | | |
264 | | - | |
| 292 | + | |
265 | 293 | | |
266 | 294 | | |
267 | 295 | | |
268 | | - | |
| 296 | + | |
269 | 297 | | |
270 | 298 | | |
271 | 299 | | |
272 | 300 | | |
273 | 301 | | |
274 | | - | |
| 302 | + | |
275 | 303 | | |
276 | 304 | | |
277 | 305 | | |
| |||
334 | 362 | | |
335 | 363 | | |
336 | 364 | | |
337 | | - | |
| 365 | + | |
338 | 366 | | |
339 | 367 | | |
340 | 368 | | |
| |||
356 | 384 | | |
357 | 385 | | |
358 | 386 | | |
359 | | - | |
| 387 | + | |
360 | 388 | | |
361 | 389 | | |
362 | 390 | | |
| |||
431 | 459 | | |
432 | 460 | | |
433 | 461 | | |
434 | | - | |
| 462 | + | |
435 | 463 | | |
436 | 464 | | |
437 | 465 | | |
| |||
450 | 478 | | |
451 | 479 | | |
452 | 480 | | |
453 | | - | |
| 481 | + | |
454 | 482 | | |
455 | 483 | | |
456 | 484 | | |
| |||
462 | 490 | | |
463 | 491 | | |
464 | 492 | | |
| 493 | + | |
| 494 | + | |
Lines changed: 45 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
0 commit comments