Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix USE_STATIC_MKL lost functionality #138996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

xuhancn
Copy link
Collaborator

@xuhancn xuhancn commented Oct 26, 2024

Currently, USE_STATIC_MKL is lost functionality to control static or shared link mkl of PyTorch. The reason is cmake/Modules/FindMKL.cmake code ignore USE_STATIC_MKL cmake variable. And search MKL libraries with many work around.

This PR is target to fix this issue. It is important to PyTorch XPU version build, we expected that:

  1. In CPU and CUDA build, link MKL staticly.
  2. In XPU build, link MKL shared link. We would have oneAPI environment, we can re-use shared MKL binaries.

The MKL config, we can reference to Intel official online tool: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.htm

OS Link Type Linked MKL Binaries
Windows static mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib
Windows shared mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib
Linux static libmkl_intel_lp64.a libmkl_core.a libpthread.a libm.so libdl.a
Linux shared libmkl_intel_lp64.so libmkl_gnu_thread.so libmkl_core.so libpthread.a libm.so libdl.a

After fixed USE_STATIC_MKL option, we need to install correctly MKL version. Otherwise, it shouldn't find MKL binaries. To install MKL:
Install MKL static version on Windows/Linux:

pip install mkl-include mkl-static

Install MKL shared version on Windows/Linux:

pip install mkl mkl-devel mkl-include 

Changes:

  1. Fix USE_STATIC_MKL lost functionality on Linux.
  2. Fix USE_STATIC_MKL lost functionality on Windows.
  3. Set USE_STATIC_MKL default value to ON, we recommanded to link MKL statically.
  4. Add related document to ReadMe.

TODO:
Setup correct USE_STATIC_MKL to CI system.

  1. Merge print USE_STATIC_MKL for further debug. #138902 to help debug CI.
  2. Setup USE_STATIC_MKL correctly in CI, need to match correct installed MKL version.
  3. Merge this PR after all CI passed.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link

pytorch-bot bot commented Oct 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138996

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 7a9d8b3 with merge base f16053f (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xuhancn xuhancn added module: mkl Related to our MKL support ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel topic: not user facing topic category labels Oct 26, 2024
@xuhancn xuhancn requested a review from EikanWang October 27, 2024 01:53
@xuhancn xuhancn marked this pull request as ready for review October 27, 2024 08:38
@xuhancn xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from a83f021 to b920061 Compare October 28, 2024 01:58
@xuhancn xuhancn requested a review from chuanqi129 October 28, 2024 05:40
@xuhancn xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch 2 times, most recently from dffb3eb to 6661809 Compare October 28, 2024 15:08
@xuhancn xuhancn requested a review from a team as a code owner October 28, 2024 15:08
@xuhancn xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 6661809 to 03e7cd2 Compare October 28, 2024 15:52
CMakeLists.txt Outdated
@@ -330,7 +330,7 @@ cmake_dependent_option(
set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN})
cmake_dependent_option(USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF "USE_MKLDNN"
OFF)
option(USE_STATIC_MKL "Prefer to link with MKL statically (Unix only)" OFF)
option(USE_STATIC_MKL "Prefer to link with MKL statically (recommanded)." ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rec911ended

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang not understand it. And I'm still fixing CI now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo. But also, we don't want static MKL, this is a deliberate decision

@ezyang ezyang requested a review from malfet October 28, 2024 16:41
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024
@xuhancn xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 03e7cd2 to a2799f3 Compare October 28, 2024 17:50
@xuhancn xuhancn marked this pull request as draft October 28, 2024 17:55
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before changing the default, let's discuss benefits/drawback of doing it one way vs another. If I'm building locally, dynamic is always preferred, isn't it?
The only time you want to link with MKL statically if you ship binaries, as that makes your life much easier, but downside is bulkier releases, and considering PyPI size limit we want to rely on https://pypi.org/project/mkl/ which requires dynamic linking

@xuhancn
Copy link
Collaborator Author

xuhancn commented Oct 28, 2024

Before changing the default, let's discuss benefits/drawback of doing it one way vs another. If I'm building locally, dynamic is always preferred, isn't it? The only time you want to link with MKL statically if you ship binaries, as that makes your life much easier, but downside is bulkier releases, and considering PyPI size limit we want to rely on https://pypi.org/project/mkl/ which requires dynamic linking

Thanks for your reply, then let keep using MKL shared as default config.

@xuhancn xuhancn force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch 2 times, most recently from b26c81b to 9f66386 Compare October 29, 2024 02:48
pytorchmergebot pushed a commit that referenced this pull request Oct 30, 2024
Fixes #138994

We can turn off `USE_MIMALLOC_ON_MKL` temporary. Due to it caused #138994

For totally fixed, we need fix `USE_STATIC_MKL` lost functionality issue: #138996, and then get the correctly MKL linking type(shared/static). It still need some time to pass all CI and builder scripts.

Pull Request resolved: #139204
Approved by: https://github.com/ezyang
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
Fixes pytorch#138994

We can turn off `USE_MIMALLOC_ON_MKL` temporary. Due to it caused pytorch#138994

For totally fixed, we need fix `USE_STATIC_MKL` lost functionality issue: pytorch#138996, and then get the correctly MKL linking type(shared/static). It still need some time to pass all CI and builder scripts.

Pull Request resolved: pytorch#139204
Approved by: https://github.com/ezyang
@xuhancn
Copy link
Collaborator Author

xuhancn commented Nov 14, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 4495017 to bb6cdd9 Compare June 13, 2025 01:27
@xuhancn
Copy link
Collaborator Author

xuhancn commented Jun 13, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from bb6cdd9 to 6a3c1c4 Compare June 13, 2025 09:39
@xuhancn
Copy link
Collaborator Author

xuhancn commented Jun 19, 2025

Status update on 2025/6/19:

After rebase code, we still have torch_xla UT fail, and the fail is build issue:

In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-all.cc:39:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock.h:58:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-function-mocker.h:44:
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-spec-builders.h:78:
/var/lib/jenkins/workspace/third_party/googletest/googlemock/include/gmock/gmock-matchers.h:4795:5: warning: 'GTEST_OS_XTENSA' is not defined, evaluates to 0 [-Wundef]
/opt/conda/include/gtest/internal/gtest-port.h:471:62: note: expanded from macro 'GTEST_HAS_STD_WSTRING'
     GTEST_OS_HAIKU || GTEST_OS_ESP32 || GTEST_OS_ESP8266 || GTEST_OS_XTENSA))
                                                             ^
In file included from /var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-all.cc:43:
/var/lib/jenkins/workspace/third_party/googletest/googlemock/src/gmock-internal-utils.cc:186:36: error: too few arguments to function call, expected 2, have 1
                     actual_to_skip);
                                   ^
/opt/conda/include/gtest/internal/gtest-internal.h:834:24: note: 'GetCurrentOsStackTraceExceptTop' declared here
GTEST_API_ std::string GetCurrentOsStackTraceExceptTop(
                       ^
7 warnings and 1 error generated.

I need some information to continue my work:

  1. Whether the torch_xla need link to mkl? If need, it should be static/dynamic link?
  2. Whether I can build torch_xla in my x86 ubuntu Linux environment? Whether we have a guide to build torch_xla?

@atalman could you please engage some torch_xla developer help me to fix the issue?

@xuhancn
Copy link
Collaborator Author

xuhancn commented Jun 30, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from 6a3c1c4 to c468b41 Compare June 30, 2025 19:29
@xuhancn
Copy link
Collaborator Author

xuhancn commented Jul 1, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xu_fix_USE_STATIC_MKL_lost_functionality onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xu_fix_USE_STATIC_MKL_lost_functionality && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xu_fix_USE_STATIC_MKL_lost_functionality branch from c468b41 to 1f51b52 Compare July 1, 2025 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel module: mkl Related to our MKL support open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants