-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Yes we have a low-hanging fruit in vectorization π
Some time ago I discovered that the compiler only needs some help to auto vectorize replace_copy and replace_copy_if. The help was very minor, compared to the couple of other places where we also help the compiler auto-vectorize.
So the PR #4431 was open, and it was merged with only mild reluctance.
Shortly thereafter, the optimization ceased to work. I've filed DevCom-10895463. Unfortunately, it hasn't been fixed so far. The final VS 2022 ended up having replace_copy and replace_copy_if non-vectorized.
We cannot manually vectorize replace_copy_if, as it takes user's predicate, and there isn't a reasonable standard predicate to query against. (Well, actually there might be a way, but even if what I'm thinking of would work, you will not like it). Anyway, we can still vectorize replace_copy, and it is pretty easy thing to do.
Note that unlike assisted auto-vectorization, the manual vectorization condition has to be strict. Only contiguous iterators! And be sure to get the pointers properly, see #5683 and the linked issue and paper. Also a new approach to control macro should be used, but this part is hard to miss.
Unlike replace, it doesn't need to rely on AVX2 masked stores, it is classic byte blending, so it can work on any element size.