Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: add a fuzzing test to try to introduce segfaults #24175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mikedh
Copy link

@mikedh mikedh commented Jul 13, 2023

Adds the fuzzing test used to surface two segfaults in #24023 with a slightly cleaned-up loop and check counts sized to finish in less than a minute. Note that this surfaces another intermittent segfault probably in array.choose that is beyond my ability to debug on a Mac:

...
checking method: `byteswap`
checking method: `choose`
zsh: segmentation fault  python test_segfault.py
mikedh@luna tests % python -c "import numpy; print(numpy.__version__)"
1.25.1
mikedh@luna tests % python --version
Python 3.9.2
mikedh@luna tests % uname -a
Darwin luna.localdomain 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:22 PDT 2023; root:xnu-8796.121.3~7/RELEASE_X86_64 x86_64

While this catches things, especially in a build matrix, it still takes quite a bit of work to hunt down the guilty arguments (especially since segfaults kill everything). I played with writing args to a tempfile but it was unacceptably slow for a unit test. I totally understand if the project doesn't want to run this in the large test matrix.

This is mostly deterministic, but it does check values produced by numpy.random. I can change this to always be seeded from a constant unless that is somehow already done in the test framework.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name the new test file test_junk_calls.py to be consistent with the test class name, I also think test_junk_calls.py is a little clearer about what it's trying to do.

The new test is pretty slow, about 97 seconds on my machine. Obviously breaking it up into many tests using parametrization and fixtures will make this into many shorter tests, but still, that's a decent amount of overhead.

There's a lot of itertools.product happening, are some of these combinations redundant or unnecessary? Could you use fewer choices in some of the categories you're enumerating over, especially if any are particularly expensive?

I'm not able to reproduce the seg fault you mention in the PR description. If you end up with a reproducer please file a separate issue.

warnings.filterwarnings("ignore")
# loop through the named methods
for method in methods:
print('checking method: `{}`'.format(method))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use a pytest parametrized test so you can get this sort of reporting from pytest itself

# a list of all methods on the numpy array
methods = dir(np.empty(1))

with warnings.catch_warnings():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is supposed to catch floating point warnings from NumPy, probably better to use np.errstate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some errors I didn't see errorstate catching, although admittedly I might be using it incorrectly, I was still seeing these make it through:

test_junk_calls.py:75: RuntimeWarning: overflow encountered in cast

@mikedh
Copy link
Author

mikedh commented Jul 13, 2023

Sounds good! I got a reproducer on choose but it's still very intermittent, I'll try to clean it up into a bug report. For this PR:

  • I'll rename to test_junk_paths.py
  • Refactor loops to be pytest fixtures
  • see if we can de-duplicate sample data in a way that still catches the issues surfaced in 1.25.0
  • aim to reduce test to ~10s

@mattip
Copy link
Member

mattip commented Jul 14, 2023

Perhaps you could use hypothesis rather than a home-grown fuzzer? It provides a structured way to explore property based testing. It saves state between runs, so will allow capturing input cases that are problematic.

Adding this to every CI run would be too expensive, but maybe we could add a marker to run it only rarely.

@seberg
Copy link
Member

seberg commented Jul 14, 2023

There is also ossfuzz from google, which this might actually fit into well? (Mainly also pointing it out in case you find it interesting.)
Right now they have something for NumPy, but it only fuzzes boring calls to loadtxt, IIRC: https://github.com/google/oss-fuzz/tree/master/projects/numpy

@mikedh
Copy link
Author

mikedh commented Jul 14, 2023

Yeah, using a fuzzing library could be desirable, although adding a dependency to numpy is above my pay grade 😄. I think there is perhaps an argument that the needs here are a little more special case than a general-purpose fuzzer, and a self-contained script is usually easier to maintain. Either way I made the following changes based on suggestions:

  • I reduced runtime to 1.2s on my laptop by reducing the number of cases checked (i.e. checking an array with every byte order and dtype) and verified that it was still catching the errors surfaced in numpy==1.25.0
  • I refactored the argument generation to be a pytest fixture.
  • I changed the argument generation to use itertools.product and added a check to make sure it wasn't including any duplicates.
  • I renamed to test_junk_calls.py

@seberg
Copy link
Member

seberg commented Jul 31, 2023

Do we want to pursue this? I am fine with just putting it in. It found some nice bugs! OTOH, it would be more useful to fuzz also functions and offload it to not run regularly. But if we don't integrate it in the test-suite (and thus it is very fast), I am not usre we will actually end up running it often enough to be useful.

@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Jul 31, 2023
@mattip
Copy link
Member

mattip commented Jul 31, 2023

Let's discuss it at a community/triage meeting.

@seberg
Copy link
Member

seberg commented Aug 11, 2023

@ngoldbaum do you want to make a call either way? It seems somewhat useful, but I agree with Matti that the real deal would be some property based (hypothesis) or a dedicated extensive fuzzer (like ossfuzz).

@ngoldbaum
Copy link
Member

I think it probably makes sense to pull this in, since the test has a much faster runtime than when this PR was initially proposed (although I haven't checked that today). Hypothesis or something would be better, but that requires someone to wire it up, and this already exists and is finding real bugs in poorly tested error paths in numpy.

However, the test failures are real and need to be fixed before this can be merged. There's a segfault on pypy that needs to be looked at, the full tests are failing because of the warning level test, and it looks like the build with assertion turned and some windows builds had issues as well, although the build log has been deleted on azure so I triggered another run.

@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

4 participants