Performance on MSVC and Apple Clang is about 3-5x worse than on gcc or clang on Linux. We should attempt to bring them to parity. This is likely related to autovectorization differences in the compilers and there are some resources here which would allow us to debug this here:
- https://llvm.org/docs/Vectorizers.html
- https://docs.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-170#auto-vectorizer