-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[OpenMP] Change build of OpenMP device runtime to be a separate runtime #136729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: This follows the same build we use for libc, libc++, compiler-rt, and This most importantly will require that users update their build
This also changed where the Patch is 24.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136729.diff 36 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 8646c55060b17..7cc4008ec1f2b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2794,6 +2794,11 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
for (const auto &LibPath : HostTC.getFilePaths())
LibraryPaths.emplace_back(LibPath);
+ // Check the target specific library path for the triple as well.
+ SmallString<128> P(D.Dir);
+ llvm::sys::path::append(P, "..", "lib", Triple.getTriple());
+ LibraryPaths.emplace_back(P);
+
OptSpecifier LibomptargetBCPathOpt =
Triple.isAMDGCN() ? options::OPT_libomptarget_amdgpu_bc_path_EQ
: Triple.isNVPTX() ? options::OPT_libomptarget_nvptx_bc_path_EQ
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 25c879710645c..70ac6a6d1e6c3 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -113,6 +113,13 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()
+# Emit a warning for people who haven't updated their build.
+if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
+ NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
+ message(WARNING "Building the offloading runtime with no device library. See "
+ "https://openmp.llvm.org//SupportAndFAQ.html for help.")
+endif()
+
# Set the path of all resulting libraries to a unified location so that it can
# be used for testing.
set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
@@ -373,7 +380,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
# Build offloading plugins and device RTLs if they are available.
add_subdirectory(plugins-nextgen)
-add_subdirectory(DeviceRTL)
add_subdirectory(tools)
# Build target agnostic offloading library.
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
deleted file mode 100644
index 12f53a30761f3..0000000000000
--- a/offload/DeviceRTL/CMakeLists.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-set(LIBOMPTARGET_BUILD_DEVICERTL_BCLIB TRUE CACHE BOOL
- "Can be set to false to disable building this library.")
-
-if (NOT LIBOMPTARGET_BUILD_DEVICERTL_BCLIB)
- message(STATUS "Not building DeviceRTL: Disabled by LIBOMPTARGET_BUILD_DEVICERTL_BCLIB")
- return()
-endif()
-
-# Check to ensure the host system is a supported host architecture.
-if(NOT ${CMAKE_SIZEOF_VOID_P} EQUAL "8")
- message(STATUS "Not building DeviceRTL: Runtime does not support 32-bit hosts")
- return()
-endif()
-
-if (LLVM_DIR)
- # Builds that use pre-installed LLVM have LLVM_DIR set.
- # A standalone or LLVM_ENABLE_RUNTIMES=openmp build takes this route
- find_program(CLANG_TOOL clang PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH)
-elseif (LLVM_TOOL_CLANG_BUILD AND NOT CMAKE_CROSSCOMPILING AND NOT OPENMP_STANDALONE_BUILD)
- # LLVM in-tree builds may use CMake target names to discover the tools.
- # A LLVM_ENABLE_PROJECTS=openmp build takes this route
- set(CLANG_TOOL $<TARGET_FILE:clang>)
-else()
- message(STATUS "Not building DeviceRTL. No appropriate clang found")
- return()
-endif()
-
-set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
-set(include_directory ${devicertl_base_directory}/include)
-set(source_directory ${devicertl_base_directory}/src)
-
-set(include_files
- ${include_directory}/Allocator.h
- ${include_directory}/Configuration.h
- ${include_directory}/Debug.h
- ${include_directory}/Interface.h
- ${include_directory}/LibC.h
- ${include_directory}/Mapping.h
- ${include_directory}/Profiling.h
- ${include_directory}/State.h
- ${include_directory}/Synchronization.h
- ${include_directory}/DeviceTypes.h
- ${include_directory}/DeviceUtils.h
- ${include_directory}/Workshare.h
-)
-
-set(src_files
- ${source_directory}/Allocator.cpp
- ${source_directory}/Configuration.cpp
- ${source_directory}/Debug.cpp
- ${source_directory}/Kernel.cpp
- ${source_directory}/LibC.cpp
- ${source_directory}/Mapping.cpp
- ${source_directory}/Misc.cpp
- ${source_directory}/Parallelism.cpp
- ${source_directory}/Profiling.cpp
- ${source_directory}/Reduction.cpp
- ${source_directory}/State.cpp
- ${source_directory}/Synchronization.cpp
- ${source_directory}/Tasking.cpp
- ${source_directory}/DeviceUtils.cpp
- ${source_directory}/Workshare.cpp
-)
-
-# We disable the slp vectorizer during the runtime optimization to avoid
-# vectorized accesses to the shared state. Generally, those are "good" but
-# the optimizer pipeline (esp. Attributor) does not fully support vectorized
-# instructions yet and we end up missing out on way more important constant
-# propagation. That said, we will run the vectorizer again after the runtime
-# has been linked into the user program.
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
-
-# If the user built with the GPU C library enabled we will use that instead.
-if(${LIBOMPTARGET_GPU_LIBC_SUPPORT})
- list(APPEND clang_opt_flags -DOMPTARGET_HAS_LIBC)
-endif()
-
-# Set flags for LLVM Bitcode compilation.
-set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
- ${clang_opt_flags} -nogpulib -nostdlibinc
- -fno-rtti -fno-exceptions -fconvergent-functions
- -Wno-unknown-cuda-version
- -DOMPTARGET_DEVICE_RUNTIME
- -I${include_directory}
- -I${devicertl_base_directory}/../include
- -I${devicertl_base_directory}/../../libc
-)
-
-# first create an object target
-function(compileDeviceRTLLibrary target_name target_triple)
- set(target_bc_flags ${ARGN})
-
- foreach(src ${src_files})
- get_filename_component(infile ${src} ABSOLUTE)
- get_filename_component(outfile ${src} NAME)
- set(outfile "${outfile}-${target_name}.o")
- set(depfile "${outfile}.d")
-
- # Passing an empty CPU to -march= suppressed target specific metadata.
- add_custom_command(OUTPUT ${outfile}
- COMMAND ${CLANG_TOOL}
- ${bc_flags}
- --target=${target_triple}
- ${target_bc_flags}
- -MD -MF ${depfile}
- ${infile} -o ${outfile}
- DEPENDS ${infile}
- DEPFILE ${depfile}
- COMMENT "Building LLVM bitcode ${outfile}"
- VERBATIM
- )
- if(TARGET clang)
- # Add a file-level dependency to ensure that clang is up-to-date.
- # By default, add_custom_command only builds clang if the
- # executable is missing.
- add_custom_command(OUTPUT ${outfile}
- DEPENDS clang
- APPEND
- )
- endif()
- set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})
-
- list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
- endforeach()
- # Trick to combine these into a bitcode file via the linker's LTO pass. This
- # is used to provide the legacy `libomptarget-<name>.bc` files. Hack this
- # through as an executable to get it to use the relocatable link.
- add_executable(libomptarget-${target_name} ${obj_files})
- set_target_properties(libomptarget-${target_name} PROPERTIES
- RUNTIME_OUTPUT_DIRECTORY ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}
- LINKER_LANGUAGE CXX
- BUILD_RPATH ""
- INSTALL_RPATH ""
- RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
- target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}" "-march=")
- target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
- "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm" "-march=")
- install(TARGETS libomptarget-${target_name}
- PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
- DESTINATION ${OFFLOAD_INSTALL_LIBDIR})
-
- add_library(omptarget.${target_name}.all_objs OBJECT IMPORTED)
- set_property(TARGET omptarget.${target_name}.all_objs APPEND PROPERTY IMPORTED_OBJECTS
- ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/libomptarget-${target_name}.bc)
-
- # Archive all the object files generated above into a static library
- add_library(omptarget.${target_name} STATIC)
- set_target_properties(omptarget.${target_name} PROPERTIES
- ARCHIVE_OUTPUT_DIRECTORY "${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/${target_triple}"
- ARCHIVE_OUTPUT_NAME ompdevice
- LINKER_LANGUAGE CXX
- )
- target_link_libraries(omptarget.${target_name} PRIVATE omptarget.${target_name}.all_objs)
-
- install(TARGETS omptarget.${target_name}
- ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
-
- if (CMAKE_EXPORT_COMPILE_COMMANDS)
- set(ide_target_name omptarget-ide-${target_name})
- add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files})
- target_compile_options(${ide_target_name} PRIVATE
- -fvisibility=hidden --target=${target_triple}
- -nogpulib -nostdlibinc -Wno-unknown-cuda-version
- )
- target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
- target_include_directories(${ide_target_name} PRIVATE
- ${include_directory}
- ${devicertl_base_directory}/../../libc
- ${devicertl_base_directory}/../include
- )
- install(TARGETS ${ide_target_name} EXCLUDE_FROM_ALL)
- endif()
-endfunction()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
- compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none)
-endif()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
- compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-endif()
diff --git a/offload/cmake/caches/Offload.cmake b/offload/cmake/caches/Offload.cmake
index 5533a6508f5d5..3747a1d3eb299 100644
--- a/offload/cmake/caches/Offload.cmake
+++ b/offload/cmake/caches/Offload.cmake
@@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "")
set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
-set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
-set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index c206386fa6b61..c1c533d00f8bb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -88,6 +88,14 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()
+# Targeting the GPU directly requires a few flags to make CMake happy.
+if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
+elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+ set(CMAKE_REQUIRED_FLAGS
+ "${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
+endif()
+
# Check and set up common compiler flags.
include(config-ix)
include(HandleOpenMPOptions)
@@ -122,35 +130,41 @@ else()
get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
endif()
-# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
-set(ENABLE_OMPT_TOOLS ON)
-# Currently tools are not tested well on Windows or MacOS X.
-if (APPLE OR WIN32)
- set(ENABLE_OMPT_TOOLS OFF)
-endif()
-
-option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
- ${ENABLE_OMPT_TOOLS})
-if (OPENMP_ENABLE_OMPT_TOOLS)
- add_subdirectory(tools)
-endif()
-
-# Propagate OMPT support to offload
-if(NOT ${OPENMP_STANDALONE_BUILD})
- set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
- set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+# Use the current compiler target to determine the appropriate runtime to build.
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
+ add_subdirectory(device)
+else()
+ # Build host runtime library, after LIBOMPTARGET variables are set since they
+ # are needed to enable time profiling support in the OpenMP runtime.
+ add_subdirectory(runtime)
+
+ set(ENABLE_OMPT_TOOLS ON)
+ # Currently tools are not tested well on Windows or MacOS X.
+ if (APPLE OR WIN32)
+ set(ENABLE_OMPT_TOOLS OFF)
+ endif()
+
+ option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
+ ${ENABLE_OMPT_TOOLS})
+ if (OPENMP_ENABLE_OMPT_TOOLS)
+ add_subdirectory(tools)
+ endif()
+
+ # Propagate OMPT support to offload
+ if(NOT ${OPENMP_STANDALONE_BUILD})
+ set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
+ set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+ endif()
+
+ option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
+
+ # Build libompd.so
+ add_subdirectory(libompd)
+
+ # Build documentation
+ add_subdirectory(docs)
+
+ # Now that we have seen all testsuites, create the check-openmp target.
+ construct_check_openmp_target()
endif()
-
-option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
-
-# Build libompd.so
-add_subdirectory(libompd)
-
-# Build documentation
-add_subdirectory(docs)
-
-# Now that we have seen all testsuites, create the check-openmp target.
-construct_check_openmp_target()
diff --git a/openmp/device/CMakeLists.txt b/openmp/device/CMakeLists.txt
new file mode 100644
index 0000000000000..9211186f4012a
--- /dev/null
+++ b/openmp/device/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Ensure the compiler is a valid clang when building the GPU target.
+set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
+if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
+ ${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
+ message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
+ "'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
+ " is not 'Clang ${req_ver}'.")
+endif()
+
+set(src_files
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
+)
+
+list(APPEND compile_options -flto)
+list(APPEND compile_options -fvisibility=hidden)
+list(APPEND compile_options -nogpulib)
+list(APPEND compile_options -nostdlibinc)
+list(APPEND compile_options -fno-rtti)
+list(APPEND compile_options -fno-exceptions)
+list(APPEND compile_options -fconvergent-functions)
+list(APPEND compile_options -Wno-unknown-cuda-version)
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+ list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
+endif()
+
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime
+# has been linked into the user program.
+list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+ set(target_name "amdgpu")
+ list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
+elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+ set(target_name "nvptx")
+ list(APPEND compile_flags --cuda-feature=+ptx63)
+endif()
+
+# Trick to combine these into a bitcode file via the linker's LTO pass.
+add_executable(libompdevice ${src_files})
+set_target_properties(libompdevice PROPERTIES
+ RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+ LINKER_LANGUAGE CXX
+ BUILD_RPATH ""
+ INSTALL_RPATH ""
+ RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+
+# If the user built with the GPU C library enabled we will use that instead.
+if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
+ target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
+endif()
+target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)
+
+target_include_directories(libompdevice PRIVATE
+ ${CMAKE_CURRENT_SOURCE_DIR}/include
+ ${CMAKE_CURRENT_SOURCE_DIR}/../../libc
+ ${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
+target_compile_options(libompdevice PRIVATE ${compile_options})
+target_link_options(libompdevice PRIVATE
+ "-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+ target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
+endif()
+install(TARGETS libompdevice
+ PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
+ DESTINATION ${OPENMP_INSTALL_LIBDIR})
+
+add_library(ompdevice.all_objs OBJECT IMPORTED)
+set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
+ ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)
+
+# Archive all the object files generated above into a static library
+add_library(ompdevice STATIC)
+add_dependencies(ompdevice libompdevice)
+set_target_properties(ompdevice PROPERTIES
+ ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
+ ARCHIVE_OUTPUT_NAME ompdevice
+ LINKER_LANGUAGE CXX
+)
+target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
+install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
diff --git a/offload/DeviceRTL/include/Allocator.h b/openmp/device/include/Allocator.h
similarity index 100%
rename from offload/DeviceRTL/include/Allocator.h
rename to openmp/device/include/Allocator.h
diff --git a/offload/DeviceRTL/include/Configuration.h b/openmp/device/include/Configuration.h
similarity index 100%
rename from offload/DeviceRTL/include/Configuration.h
rename to openmp/device/include/Configuration.h
diff --git a/offload/DeviceRTL/include/Debug.h b/openmp/device/include/Debug.h
similarity index 100%
rename from offload/DeviceRTL/include/Debug.h
rename to openmp/device/include/Debug.h
diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/openmp/device/include/DeviceTypes.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceTypes.h
rename to openmp/device/include/DeviceTypes.h
diff --git a/offload/DeviceRTL/include/DeviceUtils.h b/openmp/device/include/DeviceUtils.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceUtils.h
rename to openmp/device/include/DeviceUtils.h
diff --git a/offload/DeviceRTL/include/Interface.h b/openmp/device/include/Interface.h
similarity index 100%
rename from offload/DeviceRTL/include/Interface.h
rename to openmp/device/include/Interface.h
diff --git a/offload/DeviceRTL/include/LibC.h b/openmp/device/include/LibC.h
similarity index 100%
rename from offload/DeviceRTL/include/LibC.h
rename to openmp/device/include/LibC.h
diff --git a/offload/DeviceRTL/include/Mapping.h b/openmp/device/include/Mapping.h
similarity index 100%
rename from offload/DeviceRTL/include/Mapping.h
rename to openmp/device/include/Mapping.h
diff --git a/offload/DeviceRTL/include/Profiling.h b/openmp/device/include/Profiling.h
similarity index 100%
rename from offload/DeviceRTL/include/Profiling.h
rename to openmp/device/include/Profiling.h
diff --git a/offload/DeviceRTL/include/State.h b/openmp/device/include/State.h
similarity index 100%
rename from offload/Dev...
[truncated]
|
@llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) ChangesSummary: This follows the same build we use for libc, libc++, compiler-rt, and This most importantly will require that users update their build
This also changed where the Patch is 24.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136729.diff 36 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 8646c55060b17..7cc4008ec1f2b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2794,6 +2794,11 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
for (const auto &LibPath : HostTC.getFilePaths())
LibraryPaths.emplace_back(LibPath);
+ // Check the target specific library path for the triple as well.
+ SmallString<128> P(D.Dir);
+ llvm::sys::path::append(P, "..", "lib", Triple.getTriple());
+ LibraryPaths.emplace_back(P);
+
OptSpecifier LibomptargetBCPathOpt =
Triple.isAMDGCN() ? options::OPT_libomptarget_amdgpu_bc_path_EQ
: Triple.isNVPTX() ? options::OPT_libomptarget_nvptx_bc_path_EQ
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 25c879710645c..70ac6a6d1e6c3 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -113,6 +113,13 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()
+# Emit a warning for people who haven't updated their build.
+if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
+ NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
+ message(WARNING "Building the offloading runtime with no device library. See "
+ "https://openmp.llvm.org//SupportAndFAQ.html for help.")
+endif()
+
# Set the path of all resulting libraries to a unified location so that it can
# be used for testing.
set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
@@ -373,7 +380,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
# Build offloading plugins and device RTLs if they are available.
add_subdirectory(plugins-nextgen)
-add_subdirectory(DeviceRTL)
add_subdirectory(tools)
# Build target agnostic offloading library.
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
deleted file mode 100644
index 12f53a30761f3..0000000000000
--- a/offload/DeviceRTL/CMakeLists.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-set(LIBOMPTARGET_BUILD_DEVICERTL_BCLIB TRUE CACHE BOOL
- "Can be set to false to disable building this library.")
-
-if (NOT LIBOMPTARGET_BUILD_DEVICERTL_BCLIB)
- message(STATUS "Not building DeviceRTL: Disabled by LIBOMPTARGET_BUILD_DEVICERTL_BCLIB")
- return()
-endif()
-
-# Check to ensure the host system is a supported host architecture.
-if(NOT ${CMAKE_SIZEOF_VOID_P} EQUAL "8")
- message(STATUS "Not building DeviceRTL: Runtime does not support 32-bit hosts")
- return()
-endif()
-
-if (LLVM_DIR)
- # Builds that use pre-installed LLVM have LLVM_DIR set.
- # A standalone or LLVM_ENABLE_RUNTIMES=openmp build takes this route
- find_program(CLANG_TOOL clang PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH)
-elseif (LLVM_TOOL_CLANG_BUILD AND NOT CMAKE_CROSSCOMPILING AND NOT OPENMP_STANDALONE_BUILD)
- # LLVM in-tree builds may use CMake target names to discover the tools.
- # A LLVM_ENABLE_PROJECTS=openmp build takes this route
- set(CLANG_TOOL $<TARGET_FILE:clang>)
-else()
- message(STATUS "Not building DeviceRTL. No appropriate clang found")
- return()
-endif()
-
-set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
-set(include_directory ${devicertl_base_directory}/include)
-set(source_directory ${devicertl_base_directory}/src)
-
-set(include_files
- ${include_directory}/Allocator.h
- ${include_directory}/Configuration.h
- ${include_directory}/Debug.h
- ${include_directory}/Interface.h
- ${include_directory}/LibC.h
- ${include_directory}/Mapping.h
- ${include_directory}/Profiling.h
- ${include_directory}/State.h
- ${include_directory}/Synchronization.h
- ${include_directory}/DeviceTypes.h
- ${include_directory}/DeviceUtils.h
- ${include_directory}/Workshare.h
-)
-
-set(src_files
- ${source_directory}/Allocator.cpp
- ${source_directory}/Configuration.cpp
- ${source_directory}/Debug.cpp
- ${source_directory}/Kernel.cpp
- ${source_directory}/LibC.cpp
- ${source_directory}/Mapping.cpp
- ${source_directory}/Misc.cpp
- ${source_directory}/Parallelism.cpp
- ${source_directory}/Profiling.cpp
- ${source_directory}/Reduction.cpp
- ${source_directory}/State.cpp
- ${source_directory}/Synchronization.cpp
- ${source_directory}/Tasking.cpp
- ${source_directory}/DeviceUtils.cpp
- ${source_directory}/Workshare.cpp
-)
-
-# We disable the slp vectorizer during the runtime optimization to avoid
-# vectorized accesses to the shared state. Generally, those are "good" but
-# the optimizer pipeline (esp. Attributor) does not fully support vectorized
-# instructions yet and we end up missing out on way more important constant
-# propagation. That said, we will run the vectorizer again after the runtime
-# has been linked into the user program.
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
-
-# If the user built with the GPU C library enabled we will use that instead.
-if(${LIBOMPTARGET_GPU_LIBC_SUPPORT})
- list(APPEND clang_opt_flags -DOMPTARGET_HAS_LIBC)
-endif()
-
-# Set flags for LLVM Bitcode compilation.
-set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
- ${clang_opt_flags} -nogpulib -nostdlibinc
- -fno-rtti -fno-exceptions -fconvergent-functions
- -Wno-unknown-cuda-version
- -DOMPTARGET_DEVICE_RUNTIME
- -I${include_directory}
- -I${devicertl_base_directory}/../include
- -I${devicertl_base_directory}/../../libc
-)
-
-# first create an object target
-function(compileDeviceRTLLibrary target_name target_triple)
- set(target_bc_flags ${ARGN})
-
- foreach(src ${src_files})
- get_filename_component(infile ${src} ABSOLUTE)
- get_filename_component(outfile ${src} NAME)
- set(outfile "${outfile}-${target_name}.o")
- set(depfile "${outfile}.d")
-
- # Passing an empty CPU to -march= suppressed target specific metadata.
- add_custom_command(OUTPUT ${outfile}
- COMMAND ${CLANG_TOOL}
- ${bc_flags}
- --target=${target_triple}
- ${target_bc_flags}
- -MD -MF ${depfile}
- ${infile} -o ${outfile}
- DEPENDS ${infile}
- DEPFILE ${depfile}
- COMMENT "Building LLVM bitcode ${outfile}"
- VERBATIM
- )
- if(TARGET clang)
- # Add a file-level dependency to ensure that clang is up-to-date.
- # By default, add_custom_command only builds clang if the
- # executable is missing.
- add_custom_command(OUTPUT ${outfile}
- DEPENDS clang
- APPEND
- )
- endif()
- set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})
-
- list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
- endforeach()
- # Trick to combine these into a bitcode file via the linker's LTO pass. This
- # is used to provide the legacy `libomptarget-<name>.bc` files. Hack this
- # through as an executable to get it to use the relocatable link.
- add_executable(libomptarget-${target_name} ${obj_files})
- set_target_properties(libomptarget-${target_name} PROPERTIES
- RUNTIME_OUTPUT_DIRECTORY ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}
- LINKER_LANGUAGE CXX
- BUILD_RPATH ""
- INSTALL_RPATH ""
- RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
- target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}" "-march=")
- target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
- "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm" "-march=")
- install(TARGETS libomptarget-${target_name}
- PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
- DESTINATION ${OFFLOAD_INSTALL_LIBDIR})
-
- add_library(omptarget.${target_name}.all_objs OBJECT IMPORTED)
- set_property(TARGET omptarget.${target_name}.all_objs APPEND PROPERTY IMPORTED_OBJECTS
- ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/libomptarget-${target_name}.bc)
-
- # Archive all the object files generated above into a static library
- add_library(omptarget.${target_name} STATIC)
- set_target_properties(omptarget.${target_name} PROPERTIES
- ARCHIVE_OUTPUT_DIRECTORY "${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/${target_triple}"
- ARCHIVE_OUTPUT_NAME ompdevice
- LINKER_LANGUAGE CXX
- )
- target_link_libraries(omptarget.${target_name} PRIVATE omptarget.${target_name}.all_objs)
-
- install(TARGETS omptarget.${target_name}
- ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
-
- if (CMAKE_EXPORT_COMPILE_COMMANDS)
- set(ide_target_name omptarget-ide-${target_name})
- add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files})
- target_compile_options(${ide_target_name} PRIVATE
- -fvisibility=hidden --target=${target_triple}
- -nogpulib -nostdlibinc -Wno-unknown-cuda-version
- )
- target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
- target_include_directories(${ide_target_name} PRIVATE
- ${include_directory}
- ${devicertl_base_directory}/../../libc
- ${devicertl_base_directory}/../include
- )
- install(TARGETS ${ide_target_name} EXCLUDE_FROM_ALL)
- endif()
-endfunction()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
- compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none)
-endif()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
- compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-endif()
diff --git a/offload/cmake/caches/Offload.cmake b/offload/cmake/caches/Offload.cmake
index 5533a6508f5d5..3747a1d3eb299 100644
--- a/offload/cmake/caches/Offload.cmake
+++ b/offload/cmake/caches/Offload.cmake
@@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "")
set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
-set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
-set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index c206386fa6b61..c1c533d00f8bb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -88,6 +88,14 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()
+# Targeting the GPU directly requires a few flags to make CMake happy.
+if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
+elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+ set(CMAKE_REQUIRED_FLAGS
+ "${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
+endif()
+
# Check and set up common compiler flags.
include(config-ix)
include(HandleOpenMPOptions)
@@ -122,35 +130,41 @@ else()
get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
endif()
-# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
-set(ENABLE_OMPT_TOOLS ON)
-# Currently tools are not tested well on Windows or MacOS X.
-if (APPLE OR WIN32)
- set(ENABLE_OMPT_TOOLS OFF)
-endif()
-
-option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
- ${ENABLE_OMPT_TOOLS})
-if (OPENMP_ENABLE_OMPT_TOOLS)
- add_subdirectory(tools)
-endif()
-
-# Propagate OMPT support to offload
-if(NOT ${OPENMP_STANDALONE_BUILD})
- set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
- set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+# Use the current compiler target to determine the appropriate runtime to build.
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
+ add_subdirectory(device)
+else()
+ # Build host runtime library, after LIBOMPTARGET variables are set since they
+ # are needed to enable time profiling support in the OpenMP runtime.
+ add_subdirectory(runtime)
+
+ set(ENABLE_OMPT_TOOLS ON)
+ # Currently tools are not tested well on Windows or MacOS X.
+ if (APPLE OR WIN32)
+ set(ENABLE_OMPT_TOOLS OFF)
+ endif()
+
+ option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
+ ${ENABLE_OMPT_TOOLS})
+ if (OPENMP_ENABLE_OMPT_TOOLS)
+ add_subdirectory(tools)
+ endif()
+
+ # Propagate OMPT support to offload
+ if(NOT ${OPENMP_STANDALONE_BUILD})
+ set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
+ set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+ endif()
+
+ option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
+
+ # Build libompd.so
+ add_subdirectory(libompd)
+
+ # Build documentation
+ add_subdirectory(docs)
+
+ # Now that we have seen all testsuites, create the check-openmp target.
+ construct_check_openmp_target()
endif()
-
-option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
-
-# Build libompd.so
-add_subdirectory(libompd)
-
-# Build documentation
-add_subdirectory(docs)
-
-# Now that we have seen all testsuites, create the check-openmp target.
-construct_check_openmp_target()
diff --git a/openmp/device/CMakeLists.txt b/openmp/device/CMakeLists.txt
new file mode 100644
index 0000000000000..9211186f4012a
--- /dev/null
+++ b/openmp/device/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Ensure the compiler is a valid clang when building the GPU target.
+set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
+if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
+ ${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
+ message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
+ "'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
+ " is not 'Clang ${req_ver}'.")
+endif()
+
+set(src_files
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
+ ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
+)
+
+list(APPEND compile_options -flto)
+list(APPEND compile_options -fvisibility=hidden)
+list(APPEND compile_options -nogpulib)
+list(APPEND compile_options -nostdlibinc)
+list(APPEND compile_options -fno-rtti)
+list(APPEND compile_options -fno-exceptions)
+list(APPEND compile_options -fconvergent-functions)
+list(APPEND compile_options -Wno-unknown-cuda-version)
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+ list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
+endif()
+
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime
+# has been linked into the user program.
+list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+ set(target_name "amdgpu")
+ list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
+elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
+ "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+ set(target_name "nvptx")
+ list(APPEND compile_flags --cuda-feature=+ptx63)
+endif()
+
+# Trick to combine these into a bitcode file via the linker's LTO pass.
+add_executable(libompdevice ${src_files})
+set_target_properties(libompdevice PROPERTIES
+ RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+ LINKER_LANGUAGE CXX
+ BUILD_RPATH ""
+ INSTALL_RPATH ""
+ RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+
+# If the user built with the GPU C library enabled we will use that instead.
+if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
+ target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
+endif()
+target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)
+
+target_include_directories(libompdevice PRIVATE
+ ${CMAKE_CURRENT_SOURCE_DIR}/include
+ ${CMAKE_CURRENT_SOURCE_DIR}/../../libc
+ ${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
+target_compile_options(libompdevice PRIVATE ${compile_options})
+target_link_options(libompdevice PRIVATE
+ "-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+ target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
+endif()
+install(TARGETS libompdevice
+ PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
+ DESTINATION ${OPENMP_INSTALL_LIBDIR})
+
+add_library(ompdevice.all_objs OBJECT IMPORTED)
+set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
+ ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)
+
+# Archive all the object files generated above into a static library
+add_library(ompdevice STATIC)
+add_dependencies(ompdevice libompdevice)
+set_target_properties(ompdevice PROPERTIES
+ ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
+ ARCHIVE_OUTPUT_NAME ompdevice
+ LINKER_LANGUAGE CXX
+)
+target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
+install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
diff --git a/offload/DeviceRTL/include/Allocator.h b/openmp/device/include/Allocator.h
similarity index 100%
rename from offload/DeviceRTL/include/Allocator.h
rename to openmp/device/include/Allocator.h
diff --git a/offload/DeviceRTL/include/Configuration.h b/openmp/device/include/Configuration.h
similarity index 100%
rename from offload/DeviceRTL/include/Configuration.h
rename to openmp/device/include/Configuration.h
diff --git a/offload/DeviceRTL/include/Debug.h b/openmp/device/include/Debug.h
similarity index 100%
rename from offload/DeviceRTL/include/Debug.h
rename to openmp/device/include/Debug.h
diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/openmp/device/include/DeviceTypes.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceTypes.h
rename to openmp/device/include/DeviceTypes.h
diff --git a/offload/DeviceRTL/include/DeviceUtils.h b/openmp/device/include/DeviceUtils.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceUtils.h
rename to openmp/device/include/DeviceUtils.h
diff --git a/offload/DeviceRTL/include/Interface.h b/openmp/device/include/Interface.h
similarity index 100%
rename from offload/DeviceRTL/include/Interface.h
rename to openmp/device/include/Interface.h
diff --git a/offload/DeviceRTL/include/LibC.h b/openmp/device/include/LibC.h
similarity index 100%
rename from offload/DeviceRTL/include/LibC.h
rename to openmp/device/include/LibC.h
diff --git a/offload/DeviceRTL/include/Mapping.h b/openmp/device/include/Mapping.h
similarity index 100%
rename from offload/DeviceRTL/include/Mapping.h
rename to openmp/device/include/Mapping.h
diff --git a/offload/DeviceRTL/include/Profiling.h b/openmp/device/include/Profiling.h
similarity index 100%
rename from offload/DeviceRTL/include/Profiling.h
rename to openmp/device/include/Profiling.h
diff --git a/offload/DeviceRTL/include/State.h b/openmp/device/include/State.h
similarity index 100%
rename from offload/Dev...
[truncated]
|
ee6ca95
to
748a7f7
Compare
Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with llvm#136729.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the LLVM_ENABLE_RUNTIMES-machanism is a great idea.
Regarding the move back to openmp/device
, I don't really have an opinion. However, there are some arguments to make:
- The same arguments apply to
libomptarget
as well - Definitions such as those
Interface.h
are indeed OpenMP-only - Some defintions could be useful for other languages as well, such as
Synchronization.h
. However, they are also in theompx
namespace
if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR | ||
"${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx") | ||
add_subdirectory(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[serious] What happens with host offloading? They also need device-like functions such as omp_get_device_num()
. The device-side implementation and host-side implementation are different. This also matter when e.g. offloading to a remote cluster (non-GPU) node via MPI.
I don't think we should (or can) assume that the triple determines whether it is executing on the host or device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host offloading uses 'libomp.so'. The way I think about it is that this 'ompdeviceis basically
libomp` for GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device-side omp_get_device_num()
(defined in libomptarget.so
, not libomp.so
) only returns omp_get_initial_device()
, which is wrong for any kind of offloading.
After trying out what actuall happens I found that it actually executes the Fortran wrapper (in libomp.so
). It also incorrectly assumes it is always executing on the host. That looks like a bug.
Honestly, I am thoroughly confused about all that openmp ↔ offload moving. But if these don't share much code with the current |
Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with #136729.
Yes, I strongly believe that
Yeah, it's a little confusing because right now |
748a7f7
to
d8eeb33
Compare
Summary: Currently we build the OpenMP device runtime as part of the `offload/` project. This is problematic because it has several restrictions when compared to the normal offloading runtime. It can only be built with an up-to-date clang and we need to set the target appropriately. Currently we hack around this by creating the compiler invocation manually, but this patch moves it into a separate runtimes build. This follows the same build we use for libc, libc++, compiler-rt, and flang-rt. This also moves it from `offload/` into `openmp/` because it is still the `openmp/` runtime and I feel it is more appropriate. We do want a generic `offload/` library at some point, but it would be trivial to then add that as a separate library now that we have the infrastructure that makes adding these new libraries trivial. This most importantly will require that users update their build configs, mostly adding the following lines at a minimum. I was debating whether or not I should 'auto-upgrade' this, but I just went with a warning. ``` -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda' \ -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \ -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \ ``` This also changed where the `.bc` version of the library lives, but it's still created.
d8eeb33
to
145b566
Compare
To make one thing clear early on: Standalone, this only introduces cost. There is no tangible benefit from this PR, but a CMake change that will break people. If this is done after other reorganizations have happened, e.g., a generic device RTL is created, this might change, though I am not sure about tangible benefits then either. Alternative Proposal:
Now Background: Upsides of this PR (as I remember them):
Upsides of my proposal:
Now, one could argue DeviceRTLs should not be in offload but maybe compilerRT. Even then, I'd argue you want [EDIT]
This is not true, and I believe we should avoid making such statements: Offload depends on OpenMP (for now), but OpenMP is useful standalone. Now, should Offload depend on OpenMP: No. |
I'm assuming you mean that moving to
As I understand, we already have a pretty strong tendency toward the former. We have right now |
Yes.
Please describe the usage scenario that benefits from this. Keep in mind that we seem to all agree on a generic GPU runtime inside of offload, which has to be split out of what we have right now. So, with this proposal, there will be a GPU runtime in offload and a GPU runtime in openmp, and ... [EDIT] I was referring to the benefit of the code movement part, not of the separate GPU runtime build part, which can be achieved w/o any code movement at all. |
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm#136729 lands.
So, I'm assuming there's a reasonable consensus that splitting up the device and host builds is the right way to go. Right now the argument is whether or not this should live in For historical context, this library used to live in One argument is that the code in Future languages may want their own runtimes. HIP and CUDA have some kind of device runtime Wanting to share code is somewhat compelling, but there's nothing stopping us from putting generic utility headers in So, I think it should go back in |
Only because the OpenMP DeviceRTL duplicates definitions such as If breaking dependence means copy & pasting shared definitions wholesale then I am strongly against it. This increases the maintanance burdon instead of decreasing it. If you know how to do without, please sketch out you plan.
This should not be about usefulness, but component dependencies. A generic utility library should not contain code that can only be used with only specific project that uses the library, and not have knowledge of the dependent project's internal working even if it is not strictly a dependency due to its definitions just being duplicated. |
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm#136729 lands. Fixes llvm#136822
My understanding (which might be incorrect), is that
I find this argument compelling as well. Perhaps it would make sense to keep |
I don't really like to make a distinction between 'host' and 'device' here. As shown by the |
Summary: Another hacky fix done until llvm#136729 lands. This time for `-mcpu`.
This addresses one of my main concerns: spreading device runtimes all over the place or introducing N new top-level folders. I don't think we want either, but keeping the device code together in a new top-level |
FWIW, this PR contains two conceptual changes, and my objection + comments have all been targeting one of them: the code move. |
…h (#136754) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with llvm/llvm-project#136729.
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm/llvm-project#136729 lands. Fixes llvm/llvm-project#136822
) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with llvm#136729.
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm#136729 lands. Fixes llvm#136822
) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with llvm#136729.
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm#136729 lands. Fixes llvm#136822
) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with llvm#136729.
Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed *properly* when llvm#136729 lands. Fixes llvm#136822
Summary: Another hacky fix done until #136729 lands. This time for `-mcpu`.
Summary: Another hacky fix done until llvm/llvm-project#136729 lands. This time for `-mcpu`.
Summary: Another hacky fix done until llvm#136729 lands. This time for `-mcpu`.
Summary:
Currently we build the OpenMP device runtime as part of the
offload/
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.
This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from
offload/
intoopenmp/
because itis still the
openmp/
runtime and I feel it is more appropriate. We dowant a generic
offload/
library at some point, but it would be trivialto then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.
This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.
This also changed where the
.bc
version of the library lives, but it'sstill created.