Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ELF: Switch to parallelSort for RELR relocations. #138370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcc
Copy link
Contributor

@pcc pcc commented May 3, 2025

For firefox-x64 one of the more time consuming parts
of finalizeSections() was the call to llvm::sort in
RelrSection::updateAllocSize(). Switching that to use parallelSort
yielded the following improvement on firefox-x64 with ldflags -S on
an Apple M2 Ultra:

    N           Min           Max        Median           Avg        Stddev
x 512     1.1446024     1.2462944     1.1918706     1.1929871      0.016145
+ 512     1.1142867     1.2350058     1.1858642     1.1858839   0.016219708
Difference at 95.0% confidence
	-0.00710318 +/- 0.00198234
	-0.595412% +/- 0.166166%
	(Student's t, pooled s = 0.0161824)

Created using spr 1.3.6-beta.1
@pcc pcc requested a review from MaskRay May 3, 2025 00:26
@pcc pcc requested a review from smithp35 May 3, 2025 00:26
@llvmbot
Copy link
Member

llvmbot commented May 3, 2025

@llvm/pr-subscribers-lld-elf

Author: Peter Collingbourne (pcc)

Changes

For firefox-x64 one of the more time consuming parts
of finalizeSections() was the call to llvm::sort in
RelrSection::updateAllocSize(). Switching that to use parallelSort
yielded the following improvement on firefox-x64 with ldflags -S on
an Apple M2 Ultra:

    N           Min           Max        Median           Avg        Stddev
x 512     1.1446024     1.2462944     1.1918706     1.1929871      0.016145
+ 512     1.1142867     1.2350058     1.1858642     1.1858839   0.016219708
Difference at 95.0% confidence
	-0.00710318 +/- 0.00198234
	-0.595412% +/- 0.166166%
	(Student's t, pooled s = 0.0161824)

Full diff: https://github.com/llvm/llvm-project/pull/138370.diff

1 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (+1-1)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index 2531227cb99b7..eceb297dbfc0d 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2111,7 +2111,7 @@ template <class ELFT> bool RelrSection<ELFT>::updateAllocSize(Ctx &ctx) {
   std::unique_ptr<uint64_t[]> offsets(new uint64_t[relocs.size()]);
   for (auto [i, r] : llvm::enumerate(relocs))
     offsets[i] = r.getOffset();
-  llvm::sort(offsets.get(), offsets.get() + relocs.size());
+  llvm::parallelSort(offsets.get(), offsets.get() + relocs.size());
 
   // For each leading relocation, find following ones that can be folded
   // as a bitmap and fold them.

@llvmbot
Copy link
Member

llvmbot commented May 3, 2025

@llvm/pr-subscribers-lld

Author: Peter Collingbourne (pcc)

Changes

For firefox-x64 one of the more time consuming parts
of finalizeSections() was the call to llvm::sort in
RelrSection::updateAllocSize(). Switching that to use parallelSort
yielded the following improvement on firefox-x64 with ldflags -S on
an Apple M2 Ultra:

    N           Min           Max        Median           Avg        Stddev
x 512     1.1446024     1.2462944     1.1918706     1.1929871      0.016145
+ 512     1.1142867     1.2350058     1.1858642     1.1858839   0.016219708
Difference at 95.0% confidence
	-0.00710318 +/- 0.00198234
	-0.595412% +/- 0.166166%
	(Student's t, pooled s = 0.0161824)

Full diff: https://github.com/llvm/llvm-project/pull/138370.diff

1 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (+1-1)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index 2531227cb99b7..eceb297dbfc0d 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2111,7 +2111,7 @@ template <class ELFT> bool RelrSection<ELFT>::updateAllocSize(Ctx &ctx) {
   std::unique_ptr<uint64_t[]> offsets(new uint64_t[relocs.size()]);
   for (auto [i, r] : llvm::enumerate(relocs))
     offsets[i] = r.getOffset();
-  llvm::sort(offsets.get(), offsets.get() + relocs.size());
+  llvm::parallelSort(offsets.get(), offsets.get() + relocs.size());
 
   // For each leading relocation, find following ones that can be folded
   // as a bitmap and fold them.

@MaskRay
Copy link
Member

MaskRay commented May 3, 2025

(Still on a trip with limited computer access)

We call updateAllocSize on sections like .relr.dyn and .rela.dyn. Since .relr.dyn is predominant in PIEs, parallelism could be beneficial.
I likely tested parallelSort around 2022 (around commit da0e5b8) while optimizing -z combreloc but saw no significant improvement, so I didn't pursue it.

Using parallelSort in computeRels is a compromise due to our basic scheduler's lack of support for nested parallelism (see https://reviews.llvm.org/D61115) and how lld parallelizes OutputSection and InputSectionBase write https://reviews.llvm.org/D131247 .

However, updateAllocSize should be fine, as it’s called within finalizeAddressDependentContent without nested parallelism requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants