Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Native code order of magnitude slower than translated code on Apple M1 #17989

Closed
@neurolabusc

Description

@neurolabusc

I realize numpy is using experimental compilers for native builds on the M1, and still has some bugs, so it might be premature to discuss optimizations. Perhaps this is a feature request and not a bug. However, one would expect that native ARM code would typically be at least as fast as translated x86-64 code. I noticed that the nibabel bench_finite_range.py test is much slower for the native code than translated code. I found translated code (Python 3.8.3, NumPy version 1.19.4) is x10 faster than native code (Python 3.9.1rc1, NumPy version 1.19.4)

Reproducing code example:

# -*- coding: utf-8 -*-

import numpy as np
from numpy.testing import measure
#example where translated code (Python 3.8.3, NumPy version 1.19.4) is x10 faster than native code (Python 3.9.1rc1, NumPy version 1.19.4)
rng = np.random.RandomState(20111001)
img_shape = (128, 128, 64, 10)
repeat = 100
arr = rng.normal(size=img_shape)
mtime = measure('np.max(arr)', repeat)
print('%30s %6.2f' % ('max all finite', mtime))
mtime = measure('np.min(arr)', repeat)
print('%30s %6.2f' % ('min all finite', mtime))
arr[:, :, :, 1] = np.nan
mtime = measure('np.max(arr)', repeat)
print('%30s %6.2f' % ('max all nan', mtime))
mtime = measure('np.min(arr)', repeat)
print('%30s %6.2f' % ('min all nan', mtime))

Performance:

Translated:

$ time ./numpy_native_slower_than_translated.py
                max all finite   0.18
                min all finite   0.18
                   max all nan   0.18
                   min all nan   0.19
./numpy_native_slower_than_translated.py  1.32s user 1.28s system 214% cpu 1.213 total

Native:

$ time ./numpy_native_slower_than_translated.py
                max all finite   1.98
                min all finite   1.99
                   max all nan   1.99
                   min all nan   1.98
./numpy_native_slower_than_translated.py  8.49s user 0.14s system 104% cpu 8.237 total

NumPy/Python version information:

Translated:

  • 1.19.4 3.8.3 (default, May 19 2020, 13:54:14)
    [Clang 10.0.0 ]

Native:

  • 1.19.4 3.9.1rc1 | packaged by conda-forge | (default, Nov 28 2020, 22:21:58)
    [Clang 11.0.0 ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions