Thanks to visit codestin.com
Credit goes to github.com

Skip to content

merge main into amd-staging#2793

Merged
ronlieb merged 63 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260604142231
Jun 5, 2026
Merged

merge main into amd-staging#2793
ronlieb merged 63 commits into
amd-stagingfrom
amd/merge/upstream_merge_20260604142231

Conversation

@ronlieb
Copy link
Copy Markdown
Collaborator

@ronlieb ronlieb commented Jun 4, 2026

No description provided.

tbaederr and others added 30 commits June 4, 2026 14:13
…lvm#201369)

If the extern variable is constexpr of of non-array type, we should
diagnose it as missing an initializer. Otherwise, we diagose a read of
non-constexpr variable.
…lvm#185929)

Loop Strength Reduce can give different (and worse) results for a loop
when it is followed by uses of variables used inside the loop. This is
because the uses outside the loop increase the size of the search space,
which can lead to using NarrowSearchSpaceByPickingWinnerRegs which often
discards the best solution.

Solve this by narrowing the search space by merging uses outside the
loop with uses inside the loop. This ignores the Kind and AccessTy of
the use which can mean that the cost may be inaccurate, but it will give
the same cost as if we had just ignored the uses outside of the loop.
…on by `dataFill` (llvm#200202)

`omp_target_memset` was initially implemented before the existance of
`offload`. Because of this, a slow path was chosen to implement
`omp_target_memset`, first allocating memory on the host, calling
`memset` on that memory, and then transferring this to the device.

Aside from the inefficient way of setting device memory, this also
causes a data transfer event for the OpenMP Tools Interface, interfering
with the added memset event in OpenMP v6.0.

Since offload implements setting data via `dataFill` by now, replace the
slow path by just calling `dataFill` instead. This resolves both the
inefficiency, and removes the superfluous event dispatched to a tool.

Signed-off-by: Jan André Reuter <[email protected]>
…storage. (llvm#200886)

Value::setRawBits had inconsistent units: the default value and the size
assert treated the parameter as bytes (sizeof(Storage)), while the
memcpy treated it as bits (NBits / 8). A caller passing the natural byte
count (e.g. sizeof(long long)) ended up copying only sizeof(T)/8 bytes
-- one byte for an 8-byte payload, leaving the rest stale. The one
in-tree caller compensated by multiplying by 8, hiding the bug.

Rename the parameter to NBytes and drop the / 8 so the API name,
default, assert, and memcpy all agree on bytes. Update the caller in
InterpreterValuePrinter.cpp to pass ElemSize directly.

Right-size the Storage::m_RawBits array while we are here: it was
sizeof(long double) * 8 bytes, which reads like a bit/byte confusion
since the widest typed member of the union is long double itself. The
oversized array made sizeof(Value) ~144 bytes on x86_64 instead of ~40,
bloating every copy/move of a Value.

Add a regression test exercising setRawBits with both an explicit byte
count and the default argument. Pre-fix the test fails for both: the
explicit-count branch copies 1 byte instead of 8, and the default branch
copies sizeof(Storage)/8 bytes instead of the full union width.
This PR modifies regex in error message to match on z/OS:
```
[Errno 129] EDC5129I No such file or directory.: 'temp1.txt'
wc: file "missing-file": EDC5129I No such file or directory.
cat: does-not-exist: EDC5129I No such file or directory.
```
Implement the functionality to read and parse a pre-parsed perf-script
profile generated by perf2bolt's '--profile-format=perfscript' option.

The '-ps' option defines the perfscript input profile format. It requires
specifying the aggregation type ('--spe', '--ba') if it differs from
the default one ('brstack'). Note that the profile has to also be generated
using the exact same aggregation type.

Examples:
For ARM SPE:
1) $ perf2bolt BINARY -p perf.data -o test.text --spe --profile-format=perfscript
2) $ perf2bolt BINARY -o test.fdata -p test.text --spe -ps

For Brstack aggregation:
1) $ perf2bolt BINARY -p perf.data -o test.text --profile-format=perfscript
2) $ perf2bolt BINARY -o test.fdata -p test.text -ps
This PR twaeks the clang/test/DebugInfo/line.cpp test to pass on z/OS.

The reason the test was failing is that the RUN lines which specify
-triple %itanium_abi_triple expands to s390x-ibm-zos when run on z/OS.
The IR that is emitted for this triple does not match the patterns
expected by the test.

This PR tweaks the patterns in the CHECK lines so that the test also
passes on z/OS.
By doing the IR printing inside DXILPrettyPrinter, we have the option to
customise what we print and include the info that we collect and
generate in DXILDebugInfo.
Fixes a buildbot failure related to FP rounding error in LV debug
output.
`TestDAP_restart_console` is already failing on Windows. It reliably
crashes (UNRESOLVED) on some Windows version, including inside Docker
containers.

This is preventing us from enabling pre-merge CI testing for lldb on
Windows in llvm#198906.

This patch skips the test entirely. See
llvm#200840 for more details.
`OnCreateThread` runs from the `DebuggerThread` loop after a
`CREATE_THREAD_DEBUG_EVENT`. Each iteration of that loop ends with a
`ContinueDebugEvent`, which on Windows resumes every thread in the
debuggee that *isn't* individually suspended with `SuspendThread`.

If a thread is created while the debuggee is stopped, all the existing
threads are suspended expect the new one. After the next
ContinueDebugEvent it just runs, while lldb's StateType still reads
eStateStopped.

This patch suspends the new thread when the debuggee is stopped.

This fixes `TestTwoHitsOneActual.py` and `TestBreakOnLambdaCapture.py`
when running the test suite with `LLDB_USE_LLDB_SERVER=1`.

rdar://178718627
…sions (llvm#198583)

Update FuncToEmitC to bail-out before creating invalid EmitC ops for
unsupported cases.

FuncToEmitC now rejects functions, calls, and returns whose converted
result type is `emitc.array`, instead of relying on later `emitc.func`,
`emitc.call`, or `emitc.return` verifier failures.

This does not add support for returning memrefs from functions. It only
makes the existing limitation explicit at the conversion boundary.

## Tests

Added negative tests for the standalone conversion pass. This pass marks
their source ops illegal, so when a pattern bails-out the pass reports a
legalization failure. This is the expected behavior and documents the
unsupported cases directly.

`convert-to-emitc` is more permissive because it allows partial
conversion and does not mark the same source ops illegal, so it can
leave unsupported ops unconverted without reporting the same failures.

Assisted-by: Codex (refine description). I reviewed all text before
submission.
…lvm#201596)

The recently-added structured script feature currently relies on
DAP-based debuggers, of which the only one currently supported by Dexter
is LLDB. In order to prevent the tests that depend on this feature from
running for other debuggers, we require LLDB for the script test
directory.
For compile time/memory reasons, dag-maps-huge-region is the number of
memory instructions at which we create a barrier and reset maps.
Previously we'd get to dag-maps-huge-region number of instructions, then
add a barrier in the middle of the current set of instructions, and
continue processing the second half of remaining instructions.

With this change, now we simply add a barrier every time we reach
dag-maps-huge-region number of memory instructions, and blow away all
previous instructions.

So now instead of waiting until we get to 1000 memory operations before
creating a barrier for 500 of them, we do it at 500 and do it for all
500.

With this change, -dag-maps-huge-region=500 still has
addChainDependencies() taking up over half of the codegen pipeline in
some cases I looked at, but it's much better than the previous 90%.
…lvm#200814)

This patch is to rename ClangExecutable to DriverExecutable and 
getClangProgramPath to getDriverProgramPath. This makes the 
name more neutral and less confusing when used in flang.
I looked at llvm/include/llvm/CodeGen/MachineBlockHashInfo.h,
BlendedBlockHash function and rewrote failing test.

---------

Co-authored-by: mattarde <[email protected]>
…#197316)

The majority of these dependencies are available in the
[Bazel-Central-Registry](https://github.com/bazelbuild/bazel-central-registry)
(BCR) and to improve build performance for bzlmod users, llvm-project
should pull from the BCR to consolidate targets.
Part of llvm#185382

Move the test cases to

[intrinsics.c](https://github.com/llvm/llvmproject/pull/clang/test/CodeGen/AArch64/neon/intrinsics.c)
Removed the test cases from

[neon-intrinsics.c](https://github.com/llvm/llvmproject/pull/clang/test/CodeGen/AArch64/neon/intrinsics.c)

Removed [neon-across.c](clang/test/CodeGen/AArch64/neon-across.c)

---------

Co-authored-by: Andrzej Warzyński <[email protected]>
The documentation of the sentinel attribute was missing, this PR
documents the behavior of the sentinel attribute.
We can implement these using combinations of rev, rev8, and ppairoe.*.

Rename REV16->REV16_RV64. A hypothetical REV16 on RV32 would have a
different encoding like REV and REV8.

Long term we should probably custom lower these instead of having
complex isel patterns. That would allow additional optimizations. But I
think the isel patterns are fine as a starting point.
…lvm#201546)

Previously, attempting to select the intrinsic
@llvm.aarch64.neon.scalar.uqxtn would cause GlobalISel to fall back to
SDAG.
This was both due to:
1. RegBankSelect placing the operands on gpr banks.
2. No instruction selection patterns for the intrinsic.
Add pattern, and fix RegBankSelect to place operands on the correct
banks.
… with invalid iterator types (llvm#201461)

Previously, diagnostic notes issued for errors encountered due to invalid
iterator types in C++11 range-based for statements reported the range type
as the iterator type instead of the invalid iterator type.  Now fixed.
…0918)

This patch introduces ISA under BHI_CTRL CPUID.
The following tech paper is published in May, 2025:


[intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#ibhf](https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#ibhf)

As shown in the paper, The encoding is F3 48 0F 1E F8.
It does not need c-intrinsic.

---------

Co-authored-by: mattarde <[email protected]>
Include the AArch64 SME (Scalable Matrix Extension) source files in the
compiler-rt builtins library when targeting aarch64. Added a selection
based on OS platform to use either Apple or Non-Apple sources.
These two functions do expensive per-regunit work, but are no-ops if
there are no Copies, so short-circuit this case.
charles-zablit and others added 19 commits June 4, 2026 18:01
This change creates new FP-specific binary operations and updates the
existing binary operations that previously accepted any arithmetic type
to only allow integer and vector-of-integer types.

This change is being done to prepare for extended floating-point
handling such as strict FP semantics and fast-math handling. It also
simplifies the handling of integer overflow flags.

Assisted-by: Cursor / claude-opus-4.8
These tests are caused by bugs in clang where arm64e support is not yet
complete.
…ion (llvm#201614)

flang has supported this for a long time, but it wasn't documented as an
extension
Problem: LLVM generates `umov w8, v0.h[0]` + `strh w8, [x0]` instead of
`str h0, [x0]` when storing vector lane 0 to memory, specifically when
SimplifyCFG merges stores across branches -- splitting the
extractelement and store into different basic blocks and preventing the
existing DAG combine from firing.

https://godbolt.org/z/v5G9ohMPa

Root cause: SimplifyCFG creates a PHI + merged store in a successor
block. SelectionDAG ISel processes each block independently, so it
lowers the extract to `UMOV` (GPR) in the predecessor and the store sees
only a GPR value via the PHI. Late tail duplication puts the store back
in the same block, but the `UMOV` is already baked in.

Fix: Added a post-RA peephole in `AArch64LoadStoreOptimizer` (step 6 in
`optimizeBlock`) that recognizes `UMOVvi*_idx0` + GPR store patterns and
replaces them with direct FPR sub-register stores. The peephole:
- Handles all element sizes: i8 (`bsub`), i16 (`hsub`), i32 (`ssub`),
i64 (`dsub`)
- Correctly updates liveness by clearing intervening kill flags on the
vector register
- Bails out if the GPR value has other uses, the vector register is
clobbered, or the store doesn't kill the GPR

Assisted-by: Claude

Fixes: llvm#137086

---------

Co-authored-by: Kunal Pathak <[email protected]>
Convert the ten user-facing RST docs under lldb/docs/use/ to MyST
Markdown. This is the third batch of an incremental RST -> Markdown
migration; PR1 covered the small leaf pages and PR2 covered the
contributor-facing docs under resources/.

Files: formatting, intel_pt, map, remote, symbolfilejson, symbolication,
symbols, troubleshooting, tutorial, variable.

Verified by building the docs on origin/main and on this branch with
identical sphinx flags and diffing both the warnings and the rendered
HTML. After file extension and line numbers are normalized, the warning
sets match exactly. Seven of the ten pages are byte-identical. The three
that differ (symbolication, tutorial, variable) differ only in
CommonMark collapsing two-spaces-after-period to one and MyST renaming
auto-numbered footnote IDs (`id6` -> `footnote-1`) plus adding an `<hr>`
separator before footnote sections.

The diff also surfaced three semantic regressions in the conversion,
fixed here:

- variable.md lost cross-reference behavior on single-backtick refs to
`SBValue` and `SBData`. RST's default role is `any`, so single backticks
attempted xrefs; in MyST single backticks are plain code spans.
Converted these occurrences to explicit `{any}`...``.
- map.md emitted bare `[Section Name]` for the page TOC, which
CommonMark treats as an undefined reference shortcut and falls through
to literal text. Converted to `[Section Name](#slug)`.
- variable.md emitted `[format name][format name]` as a similar
undefined reference shortcut. Converted to `[format name](#format-
name)` to match the new `(format-name)=` anchor.

Context:
https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/

Assisted-by: Claude
llvm#201646)

Since this operation is simply a zero-offset view, attach the
FortranObjectViewOpInterface to allow FIR AA to walk this if needed.
Calling `FileManager::GetUniqueIDMapping()` during modular builds gets
very expensive if the `FileManager` has seen lots of files. This
function is used in two places in the `ASTWriter` to look up
`HeaderFileInfo` in `HeaderSearch`.

This PR changes the storage of `HeaderFileInfo` from
`FileEntry::getUID()`-indexed `std::vector<T>` to
`llvm::DenseMap<FileEntryRef, T>`, improving scanning performance by
~2.5%.
…llvm#201509)

All of i64, f64, v2i32, v4i16, v8i8 are assigned to the DoubleRegs
register class (64-bit register pairs). A bitcast between any two of
these types is a machine-level no-op (ie. the same physical register is
reinterpreted with a different type).

HexagonPatterns.td had NopCast_pat entries for all int-to-int bitcasts
within DoubleRegs, and explicit patterns for f64 <-> i64, but was
missing patterns for f64 <-> v2i32, f64 <-> v4i16, and f64 <-> v8i8. The
same gap existed in IntRegs for f32 <-> v2i16 and f32 <-> v4i8.

Without a tableGen pattern for "f64 = bitcast v2i32" node, the
instruction selector crashed with:

  LLVM ERROR: Cannot select: t26: f64 = bitcast t6
    t6: v2i32,ch = CopyFromReg t0, Register:v2i32 %2

Fix by adding the five missing NopCast_pat entries.

Fixes: llvm#195495
…iles (llvm#201643)

Makes it easier to move around crash diagnostics.

Reland of llvm#198838 with crash-diagnostics-tar.c and
crash-report-crashfile.m fixed.
After llvm#199152, CMake failed for me with:

```
CMake Error at cmake/modules/AddLLVM.cmake:2805 (get_target_property):
  get_target_property() called with non-existent target "llvm-nm".
Call Stack (most recent call first):
  F:/Dev/llvm-project/lldb/source/API/CMakeLists.txt:205 (get_host_tool_path)
```

I'm not sure why it didn't fail in CI or on the buildbots. The fix here
is to add llvm-nm before lldb like we do with other projects.
Relands llvm#199528 

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the
template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a
bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the
template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
@ronlieb ronlieb merged commit 16c790e into amd-staging Jun 5, 2026
150 of 161 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20260604142231 branch June 5, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.