Fix `USE_STATIC_MKL` lost functionality #138996

xuhancn · 2024-10-26T19:16:45Z

Currently, USE_STATIC_MKL is lost functionality to control static or shared link mkl of PyTorch. The reason is cmake/Modules/FindMKL.cmake code ignore USE_STATIC_MKL cmake variable. And search MKL libraries with many work around.

This PR is target to fix this issue. It is important to PyTorch XPU version build, we expected that:

In CPU and CUDA build, link MKL staticly.
In XPU build, link MKL shared link. We would have oneAPI environment, we can re-use shared MKL binaries.

The MKL config, we can reference to Intel official online tool: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.htm

OS	Link Type	Linked MKL Binaries
Windows	static	mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib
Windows	shared	mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib
Linux	static	libmkl_intel_lp64.a libmkl_core.a libpthread.a libm.so libdl.a
Linux	shared	libmkl_intel_lp64.so libmkl_gnu_thread.so libmkl_core.so libpthread.a libm.so libdl.a

After fixed USE_STATIC_MKL option, we need to install correctly MKL version. Otherwise, it shouldn't find MKL binaries. To install MKL:
Install MKL static version on Windows/Linux:

pip install mkl-include mkl-static

Install MKL shared version on Windows/Linux:

pip install mkl mkl-devel mkl-include

Changes:

Fix USE_STATIC_MKL lost functionality on Linux.
Fix USE_STATIC_MKL lost functionality on Windows.
Set USE_STATIC_MKL default value to ON, we recommanded to link MKL statically.
Add related document to ReadMe.

TODO:
Setup correct USE_STATIC_MKL to CI system.

Merge print USE_STATIC_MKL for further debug. #138902 to help debug CI.
Setup USE_STATIC_MKL correctly in CI, need to match correct installed MKL version.
Merge this PR after all CI passed.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2024-10-26T19:16:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138996

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 7a9d8b3 with merge base f16053f ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3_9-clang9-xla / build (gh)
/var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-internal-utils.cc:186:36: error: too few arguments to function call, expected 2, have 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu) (gh) (similar failure)
'Test'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2024-10-28T16:41:30Z

CMakeLists.txt

@@ -330,7 +330,7 @@ cmake_dependent_option(
 set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN})
 cmake_dependent_option(USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF "USE_MKLDNN"
                       OFF)
-option(USE_STATIC_MKL "Prefer to link with MKL statically (Unix only)" OFF)
+option(USE_STATIC_MKL "Prefer to link with MKL statically (recommanded)." ON)


rec911ended

@ezyang not understand it. And I'm still fixing CI now.

There is a typo. But also, we don't want static MKL, this is a deliberate decision

malfet

Before changing the default, let's discuss benefits/drawback of doing it one way vs another. If I'm building locally, dynamic is always preferred, isn't it?
The only time you want to link with MKL statically if you ship binaries, as that makes your life much easier, but downside is bulkier releases, and considering PyPI size limit we want to rely on https://pypi.org/project/mkl/ which requires dynamic linking

xuhancn · 2024-10-28T18:14:12Z

Before changing the default, let's discuss benefits/drawback of doing it one way vs another. If I'm building locally, dynamic is always preferred, isn't it? The only time you want to link with MKL statically if you ship binaries, as that makes your life much easier, but downside is bulkier releases, and considering PyPI size limit we want to rely on https://pypi.org/project/mkl/ which requires dynamic linking

Thanks for your reply, then let keep using MKL shared as default config.

Fixes #138994 We can turn off `USE_MIMALLOC_ON_MKL` temporary. Due to it caused #138994 For totally fixed, we need fix `USE_STATIC_MKL` lost functionality issue: #138996, and then get the correctly MKL linking type(shared/static). It still need some time to pass all CI and builder scripts. Pull Request resolved: #139204 Approved by: https://github.com/ezyang

Fixes pytorch#138994 We can turn off `USE_MIMALLOC_ON_MKL` temporary. Due to it caused pytorch#138994 For totally fixed, we need fix `USE_STATIC_MKL` lost functionality issue: pytorch#138996, and then get the correctly MKL linking type(shared/static). It still need some time to pass all CI and builder scripts. Pull Request resolved: pytorch#139204 Approved by: https://github.com/ezyang

xuhancn · 2024-11-14T13:57:27Z

@pytorchbot rebase

pytorchmergebot · 2024-11-14T13:59:09Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-14T13:59:13Z

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

xuhancn · 2025-06-13T09:37:43Z

@pytorchbot rebase

pytorchmergebot · 2025-06-13T09:39:10Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-06-13T09:39:13Z

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

xuhancn · 2025-06-19T13:04:34Z

Status update on 2025/6/19:

After rebase code, we still have torch_xla UT fail, and the fail is build issue:

In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-all.cc:39:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock.h:58:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-function-mocker.h:44:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-spec-builders.h:78:
/var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-matchers.h:4795:5: warning: 'GTEST_OS_XTENSA' is not defined, evaluates to 0 [-Wundef]
/opt/conda/include/gtest/internal/gtest-port.h:471:62: note: expanded from macro 'GTEST_HAS_STD_WSTRING'
     GTEST_OS_HAIKU || GTEST_OS_ESP32 || GTEST_OS_ESP8266 || GTEST_OS_XTENSA))
                                                             ^
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-all.cc:43:
/var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-internal-utils.cc:186:36: error: too few arguments to function call, expected 2, have 1
                     actual_to_skip);
                                   ^
/opt/conda/include/gtest/internal/gtest-internal.h:834:24: note: 'GetCurrentOsStackTraceExceptTop' declared here
GTEST_API_ std::string GetCurrentOsStackTraceExceptTop(
                       ^
7 warnings and 1 error generated.

I need some information to continue my work:

Whether the torch_xla need link to mkl? If need, it should be static/dynamic link?
Whether I can build torch_xla in my x86 ubuntu Linux environment? Whether we have a guide to build torch_xla?

@atalman could you please engage some torch_xla developer help me to fix the issue?

xuhancn · 2025-06-30T19:27:56Z

@pytorchbot rebase

pytorchmergebot · 2025-06-30T19:29:43Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-06-30T19:29:47Z

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

xuhancn · 2025-07-01T00:09:18Z

@pytorchbot rebase

pytorchmergebot · 2025-07-01T00:10:53Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

* fix USE_STATIC_MKL on Linux. * fix USE_STATIC_MKL on Windows. * keep set USE_STATIC_MKL off. * fix shared mkl version number.

remove debug log. Work around MKL for CUDA.

pytorchmergebot · 2025-07-01T00:10:57Z

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

xuhancn added module: mkl Related to our MKL support ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel topic: not user facing topic category labels Oct 26, 2024

pytorchbot added the open source label Oct 26, 2024

xuhancn requested a review from EikanWang October 27, 2024 01:53

xuhancn marked this pull request as ready for review October 27, 2024 08:38

xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from a83f021 to b920061 Compare October 28, 2024 01:58

xuhancn requested a review from chuanqi129 October 28, 2024 05:40

xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch 2 times, most recently from dffb3eb to 6661809 Compare October 28, 2024 15:08

xuhancn requested a review from a team as a code owner October 28, 2024 15:08

xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 6661809 to 03e7cd2 Compare October 28, 2024 15:52

ezyang reviewed Oct 28, 2024

View reviewed changes

ezyang requested a review from malfet October 28, 2024 16:41

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024

xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 03e7cd2 to a2799f3 Compare October 28, 2024 17:50

xuhancn marked this pull request as draft October 28, 2024 17:55

malfet requested changes Oct 28, 2024

View reviewed changes

xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch 2 times, most recently from b26c81b to 9f66386 Compare October 29, 2024 02:48

This was referenced Oct 29, 2024

[Windows][cpu] mkl use mimalloc as allocator on Windows #138419

Closed

turn off USE_MIMALLOC_ON_MKL temporary. #139204

Closed

pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 4495017 to bb6cdd9 Compare June 13, 2025 01:27

pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from bb6cdd9 to 6a3c1c4 Compare June 13, 2025 09:39

pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 6a3c1c4 to c468b41 Compare June 30, 2025 19:29

xuhancn added 15 commits July 1, 2025 00:10

Fix USE_STATIC_MKL

173e25e

* fix USE_STATIC_MKL on Linux. * fix USE_STATIC_MKL on Windows. * keep set USE_STATIC_MKL off. * fix shared mkl version number.

setup USE_STATIC_MKL=1

60d972d

update code.

c3713f3

clean env

d54fc0b

setup USE_STATIC_MKL=1 for wheel build.

84b859e

export USE_STATIC_MKL=1 for linux wheel build.

5b6adb7

update code.

90c6cc3

try to fix torch_cuda_linalg mkl dependency.

335fdf1

update code.

fba2243

test code.

26fc07e

try to fix issue 146551

609d1b6

use static mkl for XPU build.

ca010b5

try to fix xla.

61adf9c

add comments for torch_cuda_linalg fixing.

1e4c1d0

remove debug log. Work around MKL for CUDA.

clean up useless files.

1f51b52

pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from c468b41 to 1f51b52 Compare July 1, 2025 00:10

setup mkl build for cuda_v129

7a9d8b3

Fix USE_STATIC_MKL lost functionality #138996

Are you sure you want to change the base?

Fix USE_STATIC_MKL lost functionality #138996

Uh oh!

Conversation

xuhancn commented Oct 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138996

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

ezyang Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

xuhancn Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

xuhancn commented Oct 28, 2024

Uh oh!

xuhancn commented Nov 14, 2024

Uh oh!

pytorchmergebot commented Nov 14, 2024

Uh oh!

pytorchmergebot commented Nov 14, 2024

Uh oh!

xuhancn commented Jun 13, 2025

Uh oh!

pytorchmergebot commented Jun 13, 2025

Uh oh!

pytorchmergebot commented Jun 13, 2025

Uh oh!

xuhancn commented Jun 19, 2025

Uh oh!

xuhancn commented Jun 30, 2025

Uh oh!

pytorchmergebot commented Jun 30, 2025

Uh oh!

pytorchmergebot commented Jun 30, 2025

Uh oh!

xuhancn commented Jul 1, 2025

Uh oh!

pytorchmergebot commented Jul 1, 2025

Uh oh!

pytorchmergebot commented Jul 1, 2025

Uh oh!

Uh oh!

Fix `USE_STATIC_MKL` lost functionality #138996

Fix `USE_STATIC_MKL` lost functionality #138996

xuhancn commented Oct 26, 2024 •

edited

Loading

pytorch-bot bot commented Oct 26, 2024 •

edited

Loading