Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH, SIMD: Add new NPYV intrinsics pack(1) #17790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 22, 2020

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Nov 17, 2020

ENH, SIMD: Add new NPYV intrinsics pack(1)

  • add bitwise logical operations for boolean vectors
  • add round conversion for float vectors
  • add NAN test for float vectors
  • add conditional addition and subtraction
  • add #definition NPY_SIMD_FMA3 to check FUSED native support
  • add testing cases for all of the above

required by #17587
merge after #17789

TODO:

@seiko2plus seiko2plus force-pushed the npyv_new_intrinsic_pk1 branch 2 times, most recently from b024a59 to cea4492 Compare December 14, 2020 01:25
@seiko2plus
Copy link
Member Author

ping @mattip

@mattip
Copy link
Member

mattip commented Dec 14, 2020

Is there a way to get comments into the tests so it is clear which test is relating to which npyv_* primitive?

@seiko2plus
Copy link
Member Author

@mattip, we can add the last NPYV calls to the traceback but the current traceback is pretty good, it has the tested SIMD target, data types, and SIMD width.

traceback sample
self = <numpy.core.tests.test_simd.Test_SIMD_FP32_256_FMA3__AVX2_f32 object at 0x7fbfc1eade20>

    def test_conversions(self):
        features = self._cpu_features()
        if not self.npyv.simd_f64 and re.match(r".*(NEON|ASIMD)", features):
            # very costly to emulate nearest even on Armv7
            # instead we round halves to up. e.g. 0.5 -> 1, -0.5 -> -1
            _round = lambda v: int(v + (0.5 if v >= 0 else -0.5))
        else:
            _round = round
    
        vdata_a = self.load(self._data())
        vdata_a = self.sub(vdata_a, self.setall(0.5))
        data_round = [_round(x) for x in vdata_a]
        vround = self.round_s32(vdata_a)
>       assert vround != data_round
E       assert <npyv_s32 of [0, 2, 2, 4, 4, 6, 6, 8]> != [0, 2, 2, 4, 4, 6, ...]

_round     = <built-in function round>
data_round = [0, 2, 2, 4, 4, 6, ...]
features   = 'FMA3 AVX2'
self       = <numpy.core.tests.test_simd.Test_SIMD_FP32_256_FMA3__AVX2_f32 object at 0x7fbfc1eade20>
vdata_a    = <npyv_f32 of [0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5]>
vround     = <npyv_s32 of [0, 2, 2, 4, 4, 6, 6, 8]>

numpy/core/tests/test_simd.py:219: AssertionError
__________________________________________________________________________________ Test_SIMD_FP32_128_SSE42_f32.test_conversions __________________________________________________________________________________

self = <numpy.core.tests.test_simd.Test_SIMD_FP32_128_SSE42_f32 object at 0x7fbfc1be1e20>

    def test_conversions(self):
        features = self._cpu_features()
        if not self.npyv.simd_f64 and re.match(r".*(NEON|ASIMD)", features):
            # very costly to emulate nearest even on Armv7
            # instead we round halves to up. e.g. 0.5 -> 1, -0.5 -> -1
            _round = lambda v: int(v + (0.5 if v >= 0 else -0.5))
        else:
            _round = round
    
        vdata_a = self.load(self._data())
        vdata_a = self.sub(vdata_a, self.setall(0.5))
        data_round = [_round(x) for x in vdata_a]
        vround = self.round_s32(vdata_a)
>       assert vround != data_round
E       assert <npyv_s32 of [0, 2, 2, 4]> != [0, 2, 2, 4]

_round     = <built-in function round>
data_round = [0, 2, 2, 4]
features   = 'SSE42'
self       = <numpy.core.tests.test_simd.Test_SIMD_FP32_128_SSE42_f32 object at 0x7fbfc1be1e20>
vdata_a    = <npyv_f32 of [0.5, 1.5, 2.5, 3.5]>
vround     = <npyv_s32 of [0, 2, 2, 4]>

numpy/core/tests/test_simd.py:219: AssertionError
________________________________________________________________________________ Test_SIMD_FP32_128_baseline_f32.test_conversions _________________________________________________________________________________

self = <numpy.core.tests.test_simd.Test_SIMD_FP32_128_baseline_f32 object at 0x7fbfc1b56a90>

    def test_conversions(self):
        features = self._cpu_features()
        if not self.npyv.simd_f64 and re.match(r".*(NEON|ASIMD)", features):
            # very costly to emulate nearest even on Armv7
            # instead we round halves to up. e.g. 0.5 -> 1, -0.5 -> -1
            _round = lambda v: int(v + (0.5 if v >= 0 else -0.5))
        else:
            _round = round
    
        vdata_a = self.load(self._data())
        vdata_a = self.sub(vdata_a, self.setall(0.5))
        data_round = [_round(x) for x in vdata_a]
        vround = self.round_s32(vdata_a)
>       assert vround != data_round
E       assert <npyv_s32 of [0, 2, 2, 4]> != [0, 2, 2, 4]

_round     = <built-in function round>
data_round = [0, 2, 2, 4]
features   = 'SSE SSE2 SSE3'
self       = <numpy.core.tests.test_simd.Test_SIMD_FP32_128_baseline_f32 object at 0x7fbfc1b56a90>
vdata_a    = <npyv_f32 of [0.5, 1.5, 2.5, 3.5]>
vround     = <npyv_s32 of [0, 2, 2, 4]>

numpy/core/tests/test_simd.py:219: AssertionError
============================================================================================= short test summary info =============================================================================================
FAILED numpy/core/tests/test_simd.py::Test_SIMD_FP32_256_FMA3__AVX2_f32::test_conversions - assert <npyv_s32 of [0, 2, 2, 4, 4, 6, 6, 8]> != [0, 2, 2, 4, 4, 6, ...]
FAILED numpy/core/tests/test_simd.py::Test_SIMD_FP32_128_SSE42_f32::test_conversions - assert <npyv_s32 of [0, 2, 2, 4]> != [0, 2, 2, 4]
FAILED numpy/core/tests/test_simd.py::Test_SIMD_FP32_128_baseline_f32::test_conversions - assert <npyv_s32 of [0, 2, 2, 4]> != [0, 2, 2, 4]

@mattip
Copy link
Member

mattip commented Dec 15, 2020

sorry for not being clear. I meant adding some kind of comment or other notation in the test file itself to make it clear what intrinsics each test is checking. This will make it easier to review since the reviewer can see that the added passing test is aimed at testing a specific intrinsic or set of intrinsics.

Comment on lines 702 to 742
def test_mask_conditional(self):
vdata_a = self.load(self._data())
vdata_b = self.load(self._data(reverse=True))
true_mask = self.cmpeq(self.zero(), self.zero())
false_mask = self.cmpneq(self.zero(), self.zero())

data_sub = self.sub(vdata_b, vdata_a)
ifsub = self.ifsub(true_mask, vdata_b, vdata_a, vdata_b)
assert ifsub == data_sub
ifsub = self.ifsub(false_mask, vdata_a, vdata_b, vdata_b)
assert ifsub == vdata_b

data_add = self.add(vdata_b, vdata_a)
ifadd = self.ifadd(true_mask, vdata_b, vdata_a, vdata_b)
assert ifadd == data_add
ifadd = self.ifadd(false_mask, vdata_a, vdata_b, vdata_b)
assert ifadd == vdata_b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip,

I meant adding some kind of comment or other notation in the test file itself to make it clear what intrinsics each test is checking.

please take a look here, we are here testing mask conditional operations on several SIMD extensions with different data types. in the same time, we count on another NPYV intrinsics for generating the testing data and args in order to shrink the amount of code.

in other words, the code here equivalent to the following code except it testing all vector data types on all supported SIMD targets. sweet isn't it?:

following code
from numpy.core._simd import targets
npyv = targets["baseline"]
vdata_a = npyv.load_u8(range(npyv.nlanes_u8))
vdata_b = npyv.load_u8(reversed(range(npyv.nlanes_u8)))

true_mask  = npyv.cmpeq_u8(npyv.zero_u8(), npyv.zero_u8())
false_mask = npyv.cmpneq_u8(npyv.zero_u8(), npyv.zero_u8())

data_sub = npyv.sub_u8(vdata_b, vdata_a)
ifsub = npyv.ifsub_u8(true_mask, vdata_b, vdata_a, vdata_b)
assert ifsub == data_sub

ifsub = npyv.ifsub_u8(false_mask, vdata_a, vdata_b, vdata_b)
assert ifsub == vdata_b

data_add = npyv.add_u8(vdata_b, vdata_a)
ifadd = npyv.ifadd_u8(true_mask, vdata_b, vdata_a, vdata_b)
assert ifadd == data_add

ifadd = npyv.ifadd_u8(false_mask, vdata_a, vdata_b, vdata_b)
assert ifadd == vdata_b

So I wonder, How can we add notations or special marks? we can replace parm name self to npyv
but would it help? I guess not. The only solution in my head right now is adding a docstring for each testing function clarifying the following:

  • the place where these intrinsics are defined in the source
  • the signatures of each one of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with a hand written docstring that just mentions "testing cmpeq_u8, cmpneq_u8, ..." for every new test in this PR. No need to do more than that now. Then the reviewers can see which intrinsics have been added and that there are tests that cover them.

@seiko2plus seiko2plus force-pushed the npyv_new_intrinsic_pk1 branch from cea4492 to 66e8db7 Compare December 18, 2020 05:01
Comment on lines 726 to 731
Conditional addition and subtraction for all supported data types.
Samples:
npyv_s32 npyv_ifadd_s32(npyv_b32 mask, npyv_s32 a, npyv_s32 b, npyv_s32 c) ->
mask ? a + b : c
npyv_f64 npyv_ifsub_f64(npyv_b64 mask, npyv_f64 a, npyv_f64 b, npyv_f64 c) ->
mask ? a - b : c
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip, Is that okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of just

Test npyv_ifadd_##SFX and npyvv_ifsub_##SFX

That way it is easy to grep the tests and header files for coverage without needing too much added prose in the test

Copy link
Member Author

@seiko2plus seiko2plus Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like that:

    def test_mask_conditional(self):
        """
        Test the following intrinsics:
        
            - npyv_ifadd_u8,  npyv_ifadd_s8,  npyv_ifadd_u16, npyv_ifadd_s16, npyv_ifadd_u32,
              npyv_ifadd_s32, npyv_ifadd_u64, npyv_ifadd_s64, npyv_ifadd_f32, npyv_ifadd_f64

            - npyv_ifsub_u8,  npyv_ifsub_s8,  npyv_ifsub_u16, npyv_ifsub_s16, npyv_ifsub_u32,
              npyv_ifsub_s32, npyv_ifsub_u64, npyv_ifsub_s64, npyv_ifsub_f32, npyv_ifsub_f64
        """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant exactly npyv_ifadd_##SFX since that is the grep-able macro name. The name npyv_ifadd_u8 does not appear in the source code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name npyv_ifadd_u8 does not appear in the source code.

Because its an inline function generated by C macro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a docstring that mentions the actual C macro npyv_ifadd_##SFX would be more helpful in linking the test to the code it is testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I literally followed your suggestion but I don't think it would for a useful coverage test. I think there's a possibility to generate _simd.dispatch.c.src directly from the NPYV headers including the doc str. Is that would be a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a future PR. I think I would still like the resulting generated test code to be checked in, so we can read it. Otherwise it will be very painful to track down errors in tests.

@seiko2plus seiko2plus force-pushed the npyv_new_intrinsic_pk1 branch from 66e8db7 to d7a183e Compare December 18, 2020 05:14
   - add bitwise logical operations for boolean vectors
   - add round conversion for float vectors
   - add NAN test for float vectors
   - add conditional addition and subtraction
   - add #definition NPY_SIMD_FMA3 to check FUSED native support
   - add testing cases for all of the above
@seiko2plus seiko2plus force-pushed the npyv_new_intrinsic_pk1 branch from d7a183e to 150d459 Compare December 22, 2020 20:35
@mattip mattip merged commit 3b39031 into numpy:master Dec 22, 2020
@mattip
Copy link
Member

mattip commented Dec 22, 2020

Thanks @seiko2plus

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants