-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[mypyc] librt base64: use existing SIMD CPU dispatch by customizing build flags #20253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
379cd1e
44e1ebc
d0eda17
3bc82e6
cb0c4ed
2cf7bff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| import platform | ||
| import sys | ||
|
|
||
| try: | ||
| # Import setuptools so that it monkey-patch overrides distutils | ||
| import setuptools # noqa: F401 | ||
| except ImportError: | ||
| pass | ||
|
|
||
| if sys.version_info >= (3, 12): | ||
| # From setuptools' monkeypatch | ||
| from distutils import ccompiler # type: ignore[import-not-found] | ||
| else: | ||
| from distutils import ccompiler | ||
|
|
||
| EXTRA_FLAGS_PER_COMPILER_TYPE_PER_PATH_COMPONENT = { | ||
| "unix": { | ||
| "base64/arch/ssse3": ["-mssse3", "-DBASE64_WITH_SSSE3"], | ||
| "base64/arch/sse41": ["-msse4.1", "-DBASE64_WITH_SSE41"], | ||
| "base64/arch/sse42": ["-msse4.2", "-DBASE64_WITH_SSE42"], | ||
| "base64/arch/avx2": ["-mavx2", "-DBASE64_WITH_AVX2"], | ||
| "base64/arch/avx": ["-mavx", "-DBASE64_WITH_AVX"], | ||
| }, | ||
| "msvc": { | ||
| "base64/arch/sse42": ["/arch:SSE4.2", "/DBASE64_WITH_SSE42"], | ||
| "base64/arch/avx2": ["/arch:AVX2", "/DBASE64_WITH_AVX2"], | ||
| "base64/arch/avx": ["/arch:AVX", "/DBASE64_WITH_AVX"], | ||
| }, | ||
| } | ||
|
|
||
| ccompiler.CCompiler.__spawn = ccompiler.CCompiler.spawn # type: ignore[attr-defined] | ||
| X86_64 = platform.machine() in ("x86_64", "AMD64", "amd64") | ||
|
|
||
|
|
||
| def spawn(self, cmd, **kwargs) -> None: # type: ignore[no-untyped-def] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason why not to annotate this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I tried annotating this before, but the signature varies too much between setuptools/distutils versions and Python versions. |
||
| compiler_type: str = self.compiler_type | ||
| extra_options = EXTRA_FLAGS_PER_COMPILER_TYPE_PER_PATH_COMPONENT[compiler_type] | ||
| new_cmd = list(cmd) | ||
| if X86_64 and extra_options is not None: | ||
| # filenames are closer to the end of command line | ||
| for argument in reversed(new_cmd): | ||
| # Check if argument contains a filename. We must check for all | ||
| # possible extensions; checking for target extension is faster. | ||
| if self.obj_extension and not str(argument).endswith(self.obj_extension): | ||
| continue | ||
|
|
||
| for path in extra_options.keys(): | ||
| if path in str(argument): | ||
| if compiler_type == "bcpp": | ||
| compiler = new_cmd.pop() | ||
| # Borland accepts a source file name at the end, | ||
| # insert the options before it | ||
| new_cmd.extend(extra_options[path]) | ||
| new_cmd.append(compiler) | ||
| else: | ||
| new_cmd.extend(extra_options[path]) | ||
|
|
||
| # path component is found, no need to search any further | ||
| break | ||
| self.__spawn(new_cmd, **kwargs) | ||
|
|
||
|
|
||
| ccompiler.CCompiler.spawn = spawn # type: ignore[method-assign] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,9 +25,56 @@ | |
| "pythonsupport.c", | ||
| ] | ||
|
|
||
| EXTRA_FLAGS_PER_COMPILER_TYPE_PER_PATH_COMPONENT = { | ||
| "unix": { | ||
| "base64/arch/ssse3": ["-mssse3", "-DBASE64_WITH_SSSE3"], | ||
| "base64/arch/sse41": ["-msse4.1", "-DBASE64_WITH_SSE41"], | ||
| "base64/arch/sse42": ["-msse4.2", "-DBASE64_WITH_SSE42"], | ||
| "base64/arch/avx2": ["-mavx2", "-DBASE64_WITH_AVX2"], | ||
| "base64/arch/avx": ["-mavx", "-DBASE64_WITH_AVX"], | ||
| }, | ||
| "msvc": { | ||
| "base64/arch/sse42": ["/arch:SSE4.2", "/DBASE64_WITH_SSE42"], | ||
| "base64/arch/avx2": ["/arch:AVX2", "/DBASE64_WITH_AVX2"], | ||
| "base64/arch/avx": ["/arch:AVX", "/DBASE64_WITH_AVX"], | ||
| }, | ||
| } | ||
|
|
||
| ccompiler.CCompiler.__spawn = ccompiler.CCompiler.spawn # type: ignore[attr-defined] | ||
| X86_64 = platform.machine() in ("x86_64", "AMD64", "amd64") | ||
|
|
||
|
|
||
| def spawn(self, cmd, **kwargs) -> None: # type: ignore[no-untyped-def] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks similar if not the same, any particular reason why not have it in a shared location, or is this because of the fact these are
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the later; the existing code also has duplication issues as already noted
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now that there is more duplicated we can try to share more of the code, but it can happen outside this PR. |
||
| compiler_type: str = self.compiler_type | ||
| extra_options = EXTRA_FLAGS_PER_COMPILER_TYPE_PER_PATH_COMPONENT[compiler_type] | ||
| new_cmd = list(cmd) | ||
| if X86_64 and extra_options is not None: | ||
| # filenames are closer to the end of command line | ||
| for argument in reversed(new_cmd): | ||
| # Check if argument contains a filename. We must check for all | ||
| # possible extensions; checking for target extension is faster. | ||
| if self.obj_extension and not str(argument).endswith(self.obj_extension): | ||
| continue | ||
|
|
||
| for path in extra_options.keys(): | ||
| if path in str(argument): | ||
| if compiler_type == "bcpp": | ||
| compiler = new_cmd.pop() | ||
| # Borland accepts a source file name at the end, | ||
| # insert the options before it | ||
| new_cmd.extend(extra_options[path]) | ||
| new_cmd.append(compiler) | ||
| else: | ||
| new_cmd.extend(extra_options[path]) | ||
|
|
||
| # path component is found, no need to search any further | ||
| break | ||
| self.__spawn(new_cmd, **kwargs) | ||
|
|
||
|
|
||
| ccompiler.CCompiler.spawn = spawn # type: ignore[method-assign] | ||
|
|
||
|
|
||
| class BuildExtGtest(build_ext): | ||
| def get_library_names(self) -> list[str]: | ||
| return ["gtest"] | ||
|
|
@@ -80,14 +127,10 @@ def run(self) -> None: | |
| compiler = ccompiler.new_compiler() | ||
| sysconfig.customize_compiler(compiler) | ||
| cflags: list[str] = [] | ||
| if compiler.compiler_type == "unix": | ||
| if compiler.compiler_type == "unix": # type: ignore[attr-defined] | ||
| cflags += ["-O3"] | ||
| if X86_64: | ||
| cflags.append("-msse4.2") # Enable SIMD (see also mypyc/build.py) | ||
| elif compiler.compiler_type == "msvc": | ||
| elif compiler.compiler_type == "msvc": # type: ignore[attr-defined] | ||
| cflags += ["/O2"] | ||
| if X86_64: | ||
| cflags.append("/arch:SSE4.2") # Enable SIMD (see also mypyc/build.py) | ||
|
|
||
| setup( | ||
| ext_modules=[ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these
BASE64_WITH...#defines need to enabled in all files. Otherwise the codec choosing code doesn't get triggered (which happens incodec_choose.c). With these changes we compile the SIMD versions, but I don't think they will be used at runtime. I ran a microbenchmark and performance was slower on an AMD system with this PR.Here's the benchmark I used (added it to
run-base64.testtemporarily):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you need to force optimization level to 3 to get meaningful benchmark results (e.g. patch
mypyc.build.mypycifyand forceopt_levelto be 3).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JukkaL Thanks for looking into this. My laptop died this morning, so feel free to push additional fixes to my branch
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: In the lib-rt
setup.pythe 3rd optimization level is enabledmypy/mypyc/lib-rt/setup.py
Lines 130 to 131 in 379cd1e
I'm surprised you had an issue with mypycify, as
3is the default levelmypy/mypyc/build.py
Line 545 in 379cd1e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree that for X86-64, all the
HAVE_*definitions should always be enabled.I guess the easiest way is to edit
mypyc/lib-rt/base64/config.hto set those flags inside a#if defined(__x86_64__) && defined(__LP64__)check and trim the above flags to just setting-mavx2and similar.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm back; I got a new laptop charger :-D
Thank you for the micro benchmark, it helps a lot!
Ah, it got overridden by this line, so setting
MYPYC_OPT_LEVEL=3 pytest -n0 -vvv -s mypyc -k testXXX_librt_experimentalis easier than patchingmypy/mypyc/test/test_run.py
Line 283 in 1f09855
I've added a commit to set the
HAVE_{SSSE3,SSE41,SSE42,AVX,AVX2}flags automatically for amd64/x86-64 systems, removing the need for theBASE64_WITH_*definitions on the compile time.The baseline speed on my system using your benchmarking was 9,089 MB/s before my changes, now it is 14,461 MB/s. It also showed that all the
-mavx2 -mavxflags were being added also to the final linking stage, which is obviously not appropriate:So the next commit limits the matches to when the term ends in
.c. The new speed is 14,242 MB/s, a 57% improvement from the baseline (before this PR).