Perhaps I've missed it, but in the documentation, for a new user of this library, it is unclear what the ideal basic usage pattern is in terms of the compiler's target architecture and whatever is happening under the hood in Xsimd.
Do you target compilation at the highest CPU architecture possible and let Xsimd figure it out from there? The lowest? The former case would seem to prevent the binary from running on lower CPUs (because the compiler might optimize some code to something like AVX512x but a user's CPU is limited to AVX1). The latter case seems safer (run everywhere), but then does that constrain Xsimd to the maximum instruction set defined by the compiler?
For example, in the readme, the section "Auto detection of the instruction set extension to be used" does not currently specify what to do with the compilation settings. However, the block above "Explicit use of an instruction set extension" specifies to set the compiler to target AVX. The same clarity for the automatic case would be extremely useful.
Dispatch would seem to handle this at runtime... but it's still unclear what architecture to target.
In summary, just a little bit of clarification in the docs would go a long way, I think.