-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Appveyor tests are being flaky #3895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What makes you think it's related to stdout? The tracebacks all seem to point to this line:
This does appear to be part of a pytest internal class (FDCapture.resume()). |
The previous stack is def resume_capturing(self):
if self.out:
self.out.resume()
if self.err:
> self.err.resume() I suppose stderr is more accurate, my apologies, but the effect is the same. This is part of pytest capturing ouput to the console. |
Apparently PyInstaller had similar issues with pytest-xdist, so I believe that we can put the source on issues with pytest-xdist's scheduling. At pybay @pkch and I saw an interesting talk about pytest-concurrent. It may be too early on for a switch, but it might be nice to use in the future. Eg "allowing certain tests to be grouped so that they are executed sequentially". |
Scrolling through twitter this morning I also ran into this https://twitter.com/pumpichank/status/903280328978178048 So maybe its Appveyor flakiness? Hard to tell. |
Since it has been 3 weeks, I think (I hope) this can be safe to close. |
It happened again in #3973 |
Darn. Okay, I think that eliminates it being Appveyor being flaky. Seems that our entire test suite needs to be tested. (see also #3975). |
@ethanhs see my recent comment there. If my comment is related to the bug, then it seems to be unrelated to appveyor flakiness - or at least not precisely the same problem. Besides, the problems with appveyor only began after my PR (#3870), whereas the problems with Travis came up earlier. What do you think? |
@elazarg The timeout issue seems quite plausible as the issue for Travis, that is a good find! However, the Appveyor flakes seem to be something different. Unless timeouts cause issues with streams, I don't see how the timed out test could cause issues with the duplication of the stderr stream (which I believe is the symptom the issue is causing). |
This is slowing down our progress. @ethanhs have you thought about this more? |
Only a little. Since travis is having timeout issues, my initial hunch was that it was related to that however I have eliminated timeouts from being the root of the issue I believe, as if I force tests to time out by spawning an inordinately large number of processes it does not cause the errors we are seeing. I looked at the relevant source of pytest and it appears that the failure is on trying to copy the file descriptor from a temp file used to capture output back to stderr. My hunch is something is causing the file to be cleaned up prematurely, thus invalidating the file descriptor. I don't yet have an idea of why, but that is what I will look at next. |
Wow, I keep looking at that log and getting pulled in. At the very end there are two "normal" failures, and the second of these seems to be a clue: |
I just had another failure like this, and it failed on |
I don't think that will help. As I previously mentioned, I simulated a high process/thread ratio (the issue with travis) by spawning a hundred processes on my 12 thread CPU. Nothing happened except a few tests timed out (empty output). |
So can we generate another theory why the 1st test in python2eval is more
likely to be hit, but sometimes other tests are hit? Is it always in
python2eval?
|
If all else fails maybe we should just skip the python2eval tests in the AppVeyor script, it's never found anything Windows-specific. |
Digging a little bit more. Our tests (python eval tests specifically) use subprocesses to capture stdout and stderr. However, the subprocess module uses OS level interactions to open and close handles to these file descriptors. I was pretty sure that the handle was being closed by subprocess. Based on this pull request and this change set, I can surmise that is indeed quite plausible. In these examples, the issue is claimed that the use of sys.std(out/err) is the issue, however, we use I'm not certain exactly what however. Is it perhaps that we don't call communicate on the process after killing it? According to https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1081 the fd stays open until another communicate call. |
There's something magical about the number 10, apparently. I got another failure and the failing test, |
This replaces the old subprocess based method. In addition, the test cases are no longer run, in order to reduce test time. This has led to a 10-20% speedup in pytest. This might help with python#3895, as it removes the subprocesses. Fixes python#1671
Jukka and I debated this briefly offline. We decided that it would be simplest to try and get this off our backs by no longer running the python2eval tests on AppVeyor -- they probably don't bring much of interest to the table. Perhaps the easiest way to accomplish this would be to not install Python 2, since then runtests.py wil automatically skip those tests. (But how?) Or we could just add some "-x" flag to the runtests.py call in appveyor.yml. |
I'd much rather have the tests excluded in the AppVeyor config file, I have yet to have the same issue locally, so I see no reason to exclude the tests locally on Windows. |
OK, can you submit a diff for that? |
I will do that when I am back in front of a computer in an hour. |
This was first brought up in #3846. Moved here to avoid clutter in the PR.
You can see the results here: https://ci.appveyor.com/project/JukkaL/mypy/build/1.0.1042
The relevant exception is
OSError: [Errno 9] Bad file descriptor
which I believe is from pytest trying to read from stdout, however for some reason stdout is either locked, unavailable, or Appveyor is incorrectly reporting the handle of stdout.@elazarg chimed in
If true, AIUI, this should explain the issue. However, Im not sure we currently spawn multiple pytest instances concurrently, so this may indicate our scheduling logic is broken, or it is an entirely different issue.
The text was updated successfully, but these errors were encountered: