-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Multiplication of a matrix with its transpose causes segfault for "large" matrices #19685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't really understand your question. CPU caches are of course orders of magnitude smaller. However I fail to understand your logic here, since the multiplication between two different matrices does not segfault. And in that case the input size is twice as much with output size being the same. Or am I missing something? |
The last one is puzzling. We do have an optimization to detect |
If you look at
So i think it goes to the If you have any suggestions/tips how to debug that further I can certainly give it a try. |
Thanks. It seems the segfaults are only in the SYRK codepath, which the last variant avoids since it copies the data. I am not familiar with the OpenBLAS internals. Maybe @martin-frbg has a thought? Did you try using fewer threads with |
Oh yeah, I did and clearly forgot to mention that. Yes, |
Ahh thanks, makes sense. Is there a way we can trap this without segfaulting and warn the user to use fewer threads? |
Hi, my comment is simply to highlight that this issue still present. Moreover, the temporary solutions also still work. It is not a critical issue, but I simply wanted to leave a comment if ever someone wanted to work on it in the near future. |
I guess it would be possible to calculate columns times rows times bytes per variable type and check if that fits into the compile-time BUFFERSIZE for the bundled OpenBLAS (which IIRC is built with a smaller-than-default buffer allocation to reduce memory footprint). OTOH it would probably be tedious to do this every time, and one couldn't be sure the user is |
Multiplication of a matrix with its transpose causes segfault for "large" matrices. Multiplication works for "small" matrices and also works if transpose is copied (
np.copy()
). Specific examples of large/small matrices below.I do have sufficient amount of memory (>350GB) on the machine, and I observe the same behavior on EC2 instance (both Ubuntu and Amazon Linux) as well as in a Docker container.
Reproducing code example:
This fails:
This also fails:
All these work:
Error message:
NumPy/Python version information:
The text was updated successfully, but these errors were encountered: