Thanks to visit codestin.com
Credit goes to llvm.org

LLVM 22.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229static const unsigned kParamTLSSize = 800;
230static const unsigned kRetvalTLSSize = 800;
231
232// Accesses sizes are powers of two: 1, 2, 4, 8.
233static const size_t kNumberOfAccessSizes = 4;
234
235/// Track origins of uninitialized values.
236///
237/// Adds a section to MemorySanitizer report that points to the allocation
238/// (stack or heap) the uninitialized bits came from originally.
240 "msan-track-origins",
241 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
242 cl::init(0));
243
244static cl::opt<bool> ClKeepGoing("msan-keep-going",
245 cl::desc("keep going after reporting a UMR"),
246 cl::Hidden, cl::init(false));
247
248static cl::opt<bool>
249 ClPoisonStack("msan-poison-stack",
250 cl::desc("poison uninitialized stack variables"), cl::Hidden,
251 cl::init(true));
252
254 "msan-poison-stack-with-call",
255 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
256 cl::init(false));
257
259 "msan-poison-stack-pattern",
260 cl::desc("poison uninitialized stack variables with the given pattern"),
261 cl::Hidden, cl::init(0xff));
262
263static cl::opt<bool>
264 ClPrintStackNames("msan-print-stack-names",
265 cl::desc("Print name of local stack variable"),
266 cl::Hidden, cl::init(true));
267
268static cl::opt<bool>
269 ClPoisonUndef("msan-poison-undef",
270 cl::desc("Poison fully undef temporary values. "
271 "Partially undefined constant vectors "
272 "are unaffected by this flag (see "
273 "-msan-poison-undef-vectors)."),
274 cl::Hidden, cl::init(true));
275
277 "msan-poison-undef-vectors",
278 cl::desc("Precisely poison partially undefined constant vectors. "
279 "If false (legacy behavior), the entire vector is "
280 "considered fully initialized, which may lead to false "
281 "negatives. Fully undefined constant vectors are "
282 "unaffected by this flag (see -msan-poison-undef)."),
283 cl::Hidden, cl::init(false));
284
286 "msan-precise-disjoint-or",
287 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
288 "disjointedness is ignored (i.e., 1|1 is initialized)."),
289 cl::Hidden, cl::init(false));
290
291static cl::opt<bool>
292 ClHandleICmp("msan-handle-icmp",
293 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
294 cl::Hidden, cl::init(true));
295
296static cl::opt<bool>
297 ClHandleICmpExact("msan-handle-icmp-exact",
298 cl::desc("exact handling of relational integer ICmp"),
299 cl::Hidden, cl::init(true));
300
302 "msan-handle-lifetime-intrinsics",
303 cl::desc(
304 "when possible, poison scoped variables at the beginning of the scope "
305 "(slower, but more precise)"),
306 cl::Hidden, cl::init(true));
307
308// When compiling the Linux kernel, we sometimes see false positives related to
309// MSan being unable to understand that inline assembly calls may initialize
310// local variables.
311// This flag makes the compiler conservatively unpoison every memory location
312// passed into an assembly call. Note that this may cause false positives.
313// Because it's impossible to figure out the array sizes, we can only unpoison
314// the first sizeof(type) bytes for each type* pointer.
316 "msan-handle-asm-conservative",
317 cl::desc("conservative handling of inline assembly"), cl::Hidden,
318 cl::init(true));
319
320// This flag controls whether we check the shadow of the address
321// operand of load or store. Such bugs are very rare, since load from
322// a garbage address typically results in SEGV, but still happen
323// (e.g. only lower bits of address are garbage, or the access happens
324// early at program startup where malloc-ed memory is more likely to
325// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
327 "msan-check-access-address",
328 cl::desc("report accesses through a pointer which has poisoned shadow"),
329 cl::Hidden, cl::init(true));
330
332 "msan-eager-checks",
333 cl::desc("check arguments and return values at function call boundaries"),
334 cl::Hidden, cl::init(false));
335
337 "msan-dump-strict-instructions",
338 cl::desc("print out instructions with default strict semantics i.e.,"
339 "check that all the inputs are fully initialized, and mark "
340 "the output as fully initialized. These semantics are applied "
341 "to instructions that could not be handled explicitly nor "
342 "heuristically."),
343 cl::Hidden, cl::init(false));
344
345// Currently, all the heuristically handled instructions are specifically
346// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
347// to parallel 'msan-dump-strict-instructions', and to keep the door open to
348// handling non-intrinsic instructions heuristically.
350 "msan-dump-heuristic-instructions",
351 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
352 "Use -msan-dump-strict-instructions to print instructions that "
353 "could not be handled explicitly nor heuristically."),
354 cl::Hidden, cl::init(false));
355
357 "msan-instrumentation-with-call-threshold",
358 cl::desc(
359 "If the function being instrumented requires more than "
360 "this number of checks and origin stores, use callbacks instead of "
361 "inline checks (-1 means never use callbacks)."),
362 cl::Hidden, cl::init(3500));
363
364static cl::opt<bool>
365 ClEnableKmsan("msan-kernel",
366 cl::desc("Enable KernelMemorySanitizer instrumentation"),
367 cl::Hidden, cl::init(false));
368
369static cl::opt<bool>
370 ClDisableChecks("msan-disable-checks",
371 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
372 cl::init(false));
373
374static cl::opt<bool>
375 ClCheckConstantShadow("msan-check-constant-shadow",
376 cl::desc("Insert checks for constant shadow values"),
377 cl::Hidden, cl::init(true));
378
379// This is off by default because of a bug in gold:
380// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
381static cl::opt<bool>
382 ClWithComdat("msan-with-comdat",
383 cl::desc("Place MSan constructors in comdat sections"),
384 cl::Hidden, cl::init(false));
385
386// These options allow to specify custom memory map parameters
387// See MemoryMapParams for details.
388static cl::opt<uint64_t> ClAndMask("msan-and-mask",
389 cl::desc("Define custom MSan AndMask"),
390 cl::Hidden, cl::init(0));
391
392static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
393 cl::desc("Define custom MSan XorMask"),
394 cl::Hidden, cl::init(0));
395
396static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
397 cl::desc("Define custom MSan ShadowBase"),
398 cl::Hidden, cl::init(0));
399
400static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
401 cl::desc("Define custom MSan OriginBase"),
402 cl::Hidden, cl::init(0));
403
404static cl::opt<int>
405 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
406 cl::desc("Define threshold for number of checks per "
407 "debug location to force origin update."),
408 cl::Hidden, cl::init(3));
409
410const char kMsanModuleCtorName[] = "msan.module_ctor";
411const char kMsanInitName[] = "__msan_init";
412
413namespace {
414
415// Memory map parameters used in application-to-shadow address calculation.
416// Offset = (Addr & ~AndMask) ^ XorMask
417// Shadow = ShadowBase + Offset
418// Origin = OriginBase + Offset
419struct MemoryMapParams {
420 uint64_t AndMask;
421 uint64_t XorMask;
422 uint64_t ShadowBase;
423 uint64_t OriginBase;
424};
425
426struct PlatformMemoryMapParams {
427 const MemoryMapParams *bits32;
428 const MemoryMapParams *bits64;
429};
430
431} // end anonymous namespace
432
433// i386 Linux
434static const MemoryMapParams Linux_I386_MemoryMapParams = {
435 0x000080000000, // AndMask
436 0, // XorMask (not used)
437 0, // ShadowBase (not used)
438 0x000040000000, // OriginBase
439};
440
441// x86_64 Linux
442static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
443 0, // AndMask (not used)
444 0x500000000000, // XorMask
445 0, // ShadowBase (not used)
446 0x100000000000, // OriginBase
447};
448
449// mips32 Linux
450// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
451// after picking good constants
452
453// mips64 Linux
454static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
455 0, // AndMask (not used)
456 0x008000000000, // XorMask
457 0, // ShadowBase (not used)
458 0x002000000000, // OriginBase
459};
460
461// ppc32 Linux
462// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
463// after picking good constants
464
465// ppc64 Linux
466static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
467 0xE00000000000, // AndMask
468 0x100000000000, // XorMask
469 0x080000000000, // ShadowBase
470 0x1C0000000000, // OriginBase
471};
472
473// s390x Linux
474static const MemoryMapParams Linux_S390X_MemoryMapParams = {
475 0xC00000000000, // AndMask
476 0, // XorMask (not used)
477 0x080000000000, // ShadowBase
478 0x1C0000000000, // OriginBase
479};
480
481// arm32 Linux
482// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
483// after picking good constants
484
485// aarch64 Linux
486static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
487 0, // AndMask (not used)
488 0x0B00000000000, // XorMask
489 0, // ShadowBase (not used)
490 0x0200000000000, // OriginBase
491};
492
493// loongarch64 Linux
494static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
495 0, // AndMask (not used)
496 0x500000000000, // XorMask
497 0, // ShadowBase (not used)
498 0x100000000000, // OriginBase
499};
500
501// riscv32 Linux
502// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
503// after picking good constants
504
505// aarch64 FreeBSD
506static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
507 0x1800000000000, // AndMask
508 0x0400000000000, // XorMask
509 0x0200000000000, // ShadowBase
510 0x0700000000000, // OriginBase
511};
512
513// i386 FreeBSD
514static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
515 0x000180000000, // AndMask
516 0x000040000000, // XorMask
517 0x000020000000, // ShadowBase
518 0x000700000000, // OriginBase
519};
520
521// x86_64 FreeBSD
522static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
523 0xc00000000000, // AndMask
524 0x200000000000, // XorMask
525 0x100000000000, // ShadowBase
526 0x380000000000, // OriginBase
527};
528
529// x86_64 NetBSD
530static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
531 0, // AndMask
532 0x500000000000, // XorMask
533 0, // ShadowBase
534 0x100000000000, // OriginBase
535};
536
537static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
540};
541
542static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
543 nullptr,
545};
546
547static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
548 nullptr,
550};
551
552static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
553 nullptr,
555};
556
557static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
558 nullptr,
560};
561
562static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
563 nullptr,
565};
566
567static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
568 nullptr,
570};
571
572static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
575};
576
577static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
578 nullptr,
580};
581
582namespace {
583
584/// Instrument functions of a module to detect uninitialized reads.
585///
586/// Instantiating MemorySanitizer inserts the msan runtime library API function
587/// declarations into the module if they don't exist already. Instantiating
588/// ensures the __msan_init function is in the list of global constructors for
589/// the module.
590class MemorySanitizer {
591public:
592 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
593 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
594 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
595 initializeModule(M);
596 }
597
598 // MSan cannot be moved or copied because of MapParams.
599 MemorySanitizer(MemorySanitizer &&) = delete;
600 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
601 MemorySanitizer(const MemorySanitizer &) = delete;
602 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
603
604 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
605
606private:
607 friend struct MemorySanitizerVisitor;
608 friend struct VarArgHelperBase;
609 friend struct VarArgAMD64Helper;
610 friend struct VarArgAArch64Helper;
611 friend struct VarArgPowerPC64Helper;
612 friend struct VarArgPowerPC32Helper;
613 friend struct VarArgSystemZHelper;
614 friend struct VarArgI386Helper;
615 friend struct VarArgGenericHelper;
616
617 void initializeModule(Module &M);
618 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
619 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
620 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
621
622 template <typename... ArgsTy>
623 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
624 ArgsTy... Args);
625
626 /// True if we're compiling the Linux kernel.
627 bool CompileKernel;
628 /// Track origins (allocation points) of uninitialized values.
629 int TrackOrigins;
630 bool Recover;
631 bool EagerChecks;
632
633 Triple TargetTriple;
634 LLVMContext *C;
635 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
636 Type *OriginTy;
637 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
638
639 // XxxTLS variables represent the per-thread state in MSan and per-task state
640 // in KMSAN.
641 // For the userspace these point to thread-local globals. In the kernel land
642 // they point to the members of a per-task struct obtained via a call to
643 // __msan_get_context_state().
644
645 /// Thread-local shadow storage for function parameters.
646 Value *ParamTLS;
647
648 /// Thread-local origin storage for function parameters.
649 Value *ParamOriginTLS;
650
651 /// Thread-local shadow storage for function return value.
652 Value *RetvalTLS;
653
654 /// Thread-local origin storage for function return value.
655 Value *RetvalOriginTLS;
656
657 /// Thread-local shadow storage for in-register va_arg function.
658 Value *VAArgTLS;
659
660 /// Thread-local shadow storage for in-register va_arg function.
661 Value *VAArgOriginTLS;
662
663 /// Thread-local shadow storage for va_arg overflow area.
664 Value *VAArgOverflowSizeTLS;
665
666 /// Are the instrumentation callbacks set up?
667 bool CallbacksInitialized = false;
668
669 /// The run-time callback to print a warning.
670 FunctionCallee WarningFn;
671
672 // These arrays are indexed by log2(AccessSize).
673 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
674 FunctionCallee MaybeWarningVarSizeFn;
675 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
676
677 /// Run-time helper that generates a new origin value for a stack
678 /// allocation.
679 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
680 // No description version
681 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
682
683 /// Run-time helper that poisons stack on function entry.
684 FunctionCallee MsanPoisonStackFn;
685
686 /// Run-time helper that records a store (or any event) of an
687 /// uninitialized value and returns an updated origin id encoding this info.
688 FunctionCallee MsanChainOriginFn;
689
690 /// Run-time helper that paints an origin over a region.
691 FunctionCallee MsanSetOriginFn;
692
693 /// MSan runtime replacements for memmove, memcpy and memset.
694 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
695
696 /// KMSAN callback for task-local function argument shadow.
697 StructType *MsanContextStateTy;
698 FunctionCallee MsanGetContextStateFn;
699
700 /// Functions for poisoning/unpoisoning local variables
701 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
702
703 /// Pair of shadow/origin pointers.
704 Type *MsanMetadata;
705
706 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
707 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
708 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
709 FunctionCallee MsanMetadataPtrForStore_1_8[4];
710 FunctionCallee MsanInstrumentAsmStoreFn;
711
712 /// Storage for return values of the MsanMetadataPtrXxx functions.
713 Value *MsanMetadataAlloca;
714
715 /// Helper to choose between different MsanMetadataPtrXxx().
716 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
717
718 /// Memory map parameters used in application-to-shadow calculation.
719 const MemoryMapParams *MapParams;
720
721 /// Custom memory map parameters used when -msan-shadow-base or
722 // -msan-origin-base is provided.
723 MemoryMapParams CustomMapParams;
724
725 MDNode *ColdCallWeights;
726
727 /// Branch weights for origin store.
728 MDNode *OriginStoreWeights;
729};
730
731void insertModuleCtor(Module &M) {
734 /*InitArgTypes=*/{},
735 /*InitArgs=*/{},
736 // This callback is invoked when the functions are created the first
737 // time. Hook them into the global ctors list in that case:
738 [&](Function *Ctor, FunctionCallee) {
739 if (!ClWithComdat) {
740 appendToGlobalCtors(M, Ctor, 0);
741 return;
742 }
743 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
744 Ctor->setComdat(MsanCtorComdat);
745 appendToGlobalCtors(M, Ctor, 0, Ctor);
746 });
747}
748
749template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
750 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
751}
752
753} // end anonymous namespace
754
756 bool EagerChecks)
757 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
758 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
759 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
760 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
761
764 // Return early if nosanitize_memory module flag is present for the module.
765 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
766 return PreservedAnalyses::all();
767 bool Modified = false;
768 if (!Options.Kernel) {
769 insertModuleCtor(M);
770 Modified = true;
771 }
772
773 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
774 for (Function &F : M) {
775 if (F.empty())
776 continue;
777 MemorySanitizer Msan(*F.getParent(), Options);
778 Modified |=
779 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
780 }
781
782 if (!Modified)
783 return PreservedAnalyses::all();
784
786 // GlobalsAA is considered stateless and does not get invalidated unless
787 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
788 // make changes that require GlobalsAA to be invalidated.
789 PA.abandon<GlobalsAA>();
790 return PA;
791}
792
794 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
796 OS, MapClassName2PassName);
797 OS << '<';
798 if (Options.Recover)
799 OS << "recover;";
800 if (Options.Kernel)
801 OS << "kernel;";
802 if (Options.EagerChecks)
803 OS << "eager-checks;";
804 OS << "track-origins=" << Options.TrackOrigins;
805 OS << '>';
806}
807
808/// Create a non-const global initialized with the given string.
809///
810/// Creates a writable global for Str so that we can pass it to the
811/// run-time lib. Runtime uses first 4 bytes of the string to store the
812/// frame ID, so the string needs to be mutable.
814 StringRef Str) {
815 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
816 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
817 GlobalValue::PrivateLinkage, StrConst, "");
818}
819
820template <typename... ArgsTy>
822MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
823 ArgsTy... Args) {
824 if (TargetTriple.getArch() == Triple::systemz) {
825 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
826 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
827 std::forward<ArgsTy>(Args)...);
828 }
829
830 return M.getOrInsertFunction(Name, MsanMetadata,
831 std::forward<ArgsTy>(Args)...);
832}
833
834/// Create KMSAN API callbacks.
835void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
836 IRBuilder<> IRB(*C);
837
838 // These will be initialized in insertKmsanPrologue().
839 RetvalTLS = nullptr;
840 RetvalOriginTLS = nullptr;
841 ParamTLS = nullptr;
842 ParamOriginTLS = nullptr;
843 VAArgTLS = nullptr;
844 VAArgOriginTLS = nullptr;
845 VAArgOverflowSizeTLS = nullptr;
846
847 WarningFn = M.getOrInsertFunction("__msan_warning",
848 TLI.getAttrList(C, {0}, /*Signed=*/false),
849 IRB.getVoidTy(), IRB.getInt32Ty());
850
851 // Requests the per-task context state (kmsan_context_state*) from the
852 // runtime library.
853 MsanContextStateTy = StructType::get(
854 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
855 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
858 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
859 OriginTy);
860 MsanGetContextStateFn =
861 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
862
863 MsanMetadata = StructType::get(PtrTy, PtrTy);
864
865 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
866 std::string name_load =
867 "__msan_metadata_ptr_for_load_" + std::to_string(size);
868 std::string name_store =
869 "__msan_metadata_ptr_for_store_" + std::to_string(size);
870 MsanMetadataPtrForLoad_1_8[ind] =
871 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
872 MsanMetadataPtrForStore_1_8[ind] =
873 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
874 }
875
876 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
877 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
878 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
879 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
880
881 // Functions for poisoning and unpoisoning memory.
882 MsanPoisonAllocaFn = M.getOrInsertFunction(
883 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
884 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
885 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
886}
887
889 return M.getOrInsertGlobal(Name, Ty, [&] {
890 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
891 nullptr, Name, nullptr,
893 });
894}
895
896/// Insert declarations for userspace-specific functions and globals.
897void MemorySanitizer::createUserspaceApi(Module &M,
898 const TargetLibraryInfo &TLI) {
899 IRBuilder<> IRB(*C);
900
901 // Create the callback.
902 // FIXME: this function should have "Cold" calling conv,
903 // which is not yet implemented.
904 if (TrackOrigins) {
905 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
906 : "__msan_warning_with_origin_noreturn";
907 WarningFn = M.getOrInsertFunction(WarningFnName,
908 TLI.getAttrList(C, {0}, /*Signed=*/false),
909 IRB.getVoidTy(), IRB.getInt32Ty());
910 } else {
911 StringRef WarningFnName =
912 Recover ? "__msan_warning" : "__msan_warning_noreturn";
913 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
914 }
915
916 // Create the global TLS variables.
917 RetvalTLS =
918 getOrInsertGlobal(M, "__msan_retval_tls",
919 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
920
921 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
922
923 ParamTLS =
924 getOrInsertGlobal(M, "__msan_param_tls",
925 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
926
927 ParamOriginTLS =
928 getOrInsertGlobal(M, "__msan_param_origin_tls",
929 ArrayType::get(OriginTy, kParamTLSSize / 4));
930
931 VAArgTLS =
932 getOrInsertGlobal(M, "__msan_va_arg_tls",
933 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
934
935 VAArgOriginTLS =
936 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
937 ArrayType::get(OriginTy, kParamTLSSize / 4));
938
939 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
940 IRB.getIntPtrTy(M.getDataLayout()));
941
942 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
943 AccessSizeIndex++) {
944 unsigned AccessSize = 1 << AccessSizeIndex;
945 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
946 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
947 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
948 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
949 MaybeWarningVarSizeFn = M.getOrInsertFunction(
950 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
951 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
952 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
953 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
954 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
955 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
956 IRB.getInt32Ty());
957 }
958
959 MsanSetAllocaOriginWithDescriptionFn =
960 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
961 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
962 MsanSetAllocaOriginNoDescriptionFn =
963 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
964 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
965 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
966 IRB.getVoidTy(), PtrTy, IntptrTy);
967}
968
969/// Insert extern declaration of runtime-provided functions and globals.
970void MemorySanitizer::initializeCallbacks(Module &M,
971 const TargetLibraryInfo &TLI) {
972 // Only do this once.
973 if (CallbacksInitialized)
974 return;
975
976 IRBuilder<> IRB(*C);
977 // Initialize callbacks that are common for kernel and userspace
978 // instrumentation.
979 MsanChainOriginFn = M.getOrInsertFunction(
980 "__msan_chain_origin",
981 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
982 IRB.getInt32Ty());
983 MsanSetOriginFn = M.getOrInsertFunction(
984 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
985 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
986 MemmoveFn =
987 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
988 MemcpyFn =
989 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
990 MemsetFn = M.getOrInsertFunction("__msan_memset",
991 TLI.getAttrList(C, {1}, /*Signed=*/true),
992 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
993
994 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
995 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
996
997 if (CompileKernel) {
998 createKernelApi(M, TLI);
999 } else {
1000 createUserspaceApi(M, TLI);
1001 }
1002 CallbacksInitialized = true;
1003}
1004
1005FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1006 int size) {
1007 FunctionCallee *Fns =
1008 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1009 switch (size) {
1010 case 1:
1011 return Fns[0];
1012 case 2:
1013 return Fns[1];
1014 case 4:
1015 return Fns[2];
1016 case 8:
1017 return Fns[3];
1018 default:
1019 return nullptr;
1020 }
1021}
1022
1023/// Module-level initialization.
1024///
1025/// inserts a call to __msan_init to the module's constructor list.
1026void MemorySanitizer::initializeModule(Module &M) {
1027 auto &DL = M.getDataLayout();
1028
1029 TargetTriple = M.getTargetTriple();
1030
1031 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1032 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1033 // Check the overrides first
1034 if (ShadowPassed || OriginPassed) {
1035 CustomMapParams.AndMask = ClAndMask;
1036 CustomMapParams.XorMask = ClXorMask;
1037 CustomMapParams.ShadowBase = ClShadowBase;
1038 CustomMapParams.OriginBase = ClOriginBase;
1039 MapParams = &CustomMapParams;
1040 } else {
1041 switch (TargetTriple.getOS()) {
1042 case Triple::FreeBSD:
1043 switch (TargetTriple.getArch()) {
1044 case Triple::aarch64:
1045 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1046 break;
1047 case Triple::x86_64:
1048 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1049 break;
1050 case Triple::x86:
1051 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1052 break;
1053 default:
1054 report_fatal_error("unsupported architecture");
1055 }
1056 break;
1057 case Triple::NetBSD:
1058 switch (TargetTriple.getArch()) {
1059 case Triple::x86_64:
1060 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1061 break;
1062 default:
1063 report_fatal_error("unsupported architecture");
1064 }
1065 break;
1066 case Triple::Linux:
1067 switch (TargetTriple.getArch()) {
1068 case Triple::x86_64:
1069 MapParams = Linux_X86_MemoryMapParams.bits64;
1070 break;
1071 case Triple::x86:
1072 MapParams = Linux_X86_MemoryMapParams.bits32;
1073 break;
1074 case Triple::mips64:
1075 case Triple::mips64el:
1076 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1077 break;
1078 case Triple::ppc64:
1079 case Triple::ppc64le:
1080 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1081 break;
1082 case Triple::systemz:
1083 MapParams = Linux_S390_MemoryMapParams.bits64;
1084 break;
1085 case Triple::aarch64:
1086 case Triple::aarch64_be:
1087 MapParams = Linux_ARM_MemoryMapParams.bits64;
1088 break;
1090 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1091 break;
1092 default:
1093 report_fatal_error("unsupported architecture");
1094 }
1095 break;
1096 default:
1097 report_fatal_error("unsupported operating system");
1098 }
1099 }
1100
1101 C = &(M.getContext());
1102 IRBuilder<> IRB(*C);
1103 IntptrTy = IRB.getIntPtrTy(DL);
1104 OriginTy = IRB.getInt32Ty();
1105 PtrTy = IRB.getPtrTy();
1106
1107 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1108 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109
1110 if (!CompileKernel) {
1111 if (TrackOrigins)
1112 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1113 return new GlobalVariable(
1114 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1115 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1116 });
1117
1118 if (Recover)
1119 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1120 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1121 GlobalValue::WeakODRLinkage,
1122 IRB.getInt32(Recover), "__msan_keep_going");
1123 });
1124 }
1125}
1126
1127namespace {
1128
1129/// A helper class that handles instrumentation of VarArg
1130/// functions on a particular platform.
1131///
1132/// Implementations are expected to insert the instrumentation
1133/// necessary to propagate argument shadow through VarArg function
1134/// calls. Visit* methods are called during an InstVisitor pass over
1135/// the function, and should avoid creating new basic blocks. A new
1136/// instance of this class is created for each instrumented function.
1137struct VarArgHelper {
1138 virtual ~VarArgHelper() = default;
1139
1140 /// Visit a CallBase.
1141 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1142
1143 /// Visit a va_start call.
1144 virtual void visitVAStartInst(VAStartInst &I) = 0;
1145
1146 /// Visit a va_copy call.
1147 virtual void visitVACopyInst(VACopyInst &I) = 0;
1148
1149 /// Finalize function instrumentation.
1150 ///
1151 /// This method is called after visiting all interesting (see above)
1152 /// instructions in a function.
1153 virtual void finalizeInstrumentation() = 0;
1154};
1155
1156struct MemorySanitizerVisitor;
1157
1158} // end anonymous namespace
1159
1160static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1161 MemorySanitizerVisitor &Visitor);
1162
1163static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1164 if (TS.isScalable())
1165 // Scalable types unconditionally take slowpaths.
1166 return kNumberOfAccessSizes;
1167 unsigned TypeSizeFixed = TS.getFixedValue();
1168 if (TypeSizeFixed <= 8)
1169 return 0;
1170 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1171}
1172
1173namespace {
1174
1175/// Helper class to attach debug information of the given instruction onto new
1176/// instructions inserted after.
1177class NextNodeIRBuilder : public IRBuilder<> {
1178public:
1179 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1180 SetCurrentDebugLocation(IP->getDebugLoc());
1181 }
1182};
1183
1184/// This class does all the work for a given function. Store and Load
1185/// instructions store and load corresponding shadow and origin
1186/// values. Most instructions propagate shadow from arguments to their
1187/// return values. Certain instructions (most importantly, BranchInst)
1188/// test their argument shadow and print reports (with a runtime call) if it's
1189/// non-zero.
1190struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1191 Function &F;
1192 MemorySanitizer &MS;
1193 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1194 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1195 std::unique_ptr<VarArgHelper> VAHelper;
1196 const TargetLibraryInfo *TLI;
1197 Instruction *FnPrologueEnd;
1198 SmallVector<Instruction *, 16> Instructions;
1199
1200 // The following flags disable parts of MSan instrumentation based on
1201 // exclusion list contents and command-line options.
1202 bool InsertChecks;
1203 bool PropagateShadow;
1204 bool PoisonStack;
1205 bool PoisonUndef;
1206 bool PoisonUndefVectors;
1207
1208 struct ShadowOriginAndInsertPoint {
1209 Value *Shadow;
1210 Value *Origin;
1211 Instruction *OrigIns;
1212
1213 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1214 : Shadow(S), Origin(O), OrigIns(I) {}
1215 };
1217 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1218 SmallSetVector<AllocaInst *, 16> AllocaSet;
1221 int64_t SplittableBlocksCount = 0;
1222
1223 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1224 const TargetLibraryInfo &TLI)
1225 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1226 bool SanitizeFunction =
1227 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1228 InsertChecks = SanitizeFunction;
1229 PropagateShadow = SanitizeFunction;
1230 PoisonStack = SanitizeFunction && ClPoisonStack;
1231 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1232 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1233
1234 // In the presence of unreachable blocks, we may see Phi nodes with
1235 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1236 // blocks, such nodes will not have any shadow value associated with them.
1237 // It's easier to remove unreachable blocks than deal with missing shadow.
1239
1240 MS.initializeCallbacks(*F.getParent(), TLI);
1241 FnPrologueEnd =
1242 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1243 .CreateIntrinsic(Intrinsic::donothing, {});
1244
1245 if (MS.CompileKernel) {
1246 IRBuilder<> IRB(FnPrologueEnd);
1247 insertKmsanPrologue(IRB);
1248 }
1249
1250 LLVM_DEBUG(if (!InsertChecks) dbgs()
1251 << "MemorySanitizer is not inserting checks into '"
1252 << F.getName() << "'\n");
1253 }
1254
1255 bool instrumentWithCalls(Value *V) {
1256 // Constants likely will be eliminated by follow-up passes.
1257 if (isa<Constant>(V))
1258 return false;
1259 ++SplittableBlocksCount;
1261 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1262 }
1263
1264 bool isInPrologue(Instruction &I) {
1265 return I.getParent() == FnPrologueEnd->getParent() &&
1266 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1267 }
1268
1269 // Creates a new origin and records the stack trace. In general we can call
1270 // this function for any origin manipulation we like. However it will cost
1271 // runtime resources. So use this wisely only if it can provide additional
1272 // information helpful to a user.
1273 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1274 if (MS.TrackOrigins <= 1)
1275 return V;
1276 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1277 }
1278
1279 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1280 const DataLayout &DL = F.getDataLayout();
1281 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1282 if (IntptrSize == kOriginSize)
1283 return Origin;
1284 assert(IntptrSize == kOriginSize * 2);
1285 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1286 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1287 }
1288
1289 /// Fill memory range with the given origin value.
1290 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1291 TypeSize TS, Align Alignment) {
1292 const DataLayout &DL = F.getDataLayout();
1293 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1294 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1295 assert(IntptrAlignment >= kMinOriginAlignment);
1296 assert(IntptrSize >= kOriginSize);
1297
1298 // Note: The loop based formation works for fixed length vectors too,
1299 // however we prefer to unroll and specialize alignment below.
1300 if (TS.isScalable()) {
1301 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1302 Value *RoundUp =
1303 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1304 Value *End =
1305 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1306 auto [InsertPt, Index] =
1308 IRB.SetInsertPoint(InsertPt);
1309
1310 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1312 return;
1313 }
1314
1315 unsigned Size = TS.getFixedValue();
1316
1317 unsigned Ofs = 0;
1318 Align CurrentAlignment = Alignment;
1319 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1320 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1321 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1322 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1323 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1324 : IntptrOriginPtr;
1325 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1326 Ofs += IntptrSize / kOriginSize;
1327 CurrentAlignment = IntptrAlignment;
1328 }
1329 }
1330
1331 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1332 Value *GEP =
1333 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1334 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1335 CurrentAlignment = kMinOriginAlignment;
1336 }
1337 }
1338
1339 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1340 Value *OriginPtr, Align Alignment) {
1341 const DataLayout &DL = F.getDataLayout();
1342 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1343 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1344 // ZExt cannot convert between vector and scalar
1345 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1346 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1347 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1348 // Origin is not needed: value is initialized or const shadow is
1349 // ignored.
1350 return;
1351 }
1352 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1353 // Copy origin as the value is definitely uninitialized.
1354 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1355 OriginAlignment);
1356 return;
1357 }
1358 // Fallback to runtime check, which still can be optimized out later.
1359 }
1360
1361 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1362 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1363 if (instrumentWithCalls(ConvertedShadow) &&
1364 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1365 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1366 Value *ConvertedShadow2 =
1367 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1368 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1369 CB->addParamAttr(0, Attribute::ZExt);
1370 CB->addParamAttr(2, Attribute::ZExt);
1371 } else {
1372 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1374 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1375 IRBuilder<> IRBNew(CheckTerm);
1376 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1377 OriginAlignment);
1378 }
1379 }
1380
1381 void materializeStores() {
1382 for (StoreInst *SI : StoreList) {
1383 IRBuilder<> IRB(SI);
1384 Value *Val = SI->getValueOperand();
1385 Value *Addr = SI->getPointerOperand();
1386 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1387 Value *ShadowPtr, *OriginPtr;
1388 Type *ShadowTy = Shadow->getType();
1389 const Align Alignment = SI->getAlign();
1390 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1391 std::tie(ShadowPtr, OriginPtr) =
1392 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1393
1394 [[maybe_unused]] StoreInst *NewSI =
1395 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1396 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1397
1398 if (SI->isAtomic())
1399 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1400
1401 if (MS.TrackOrigins && !SI->isAtomic())
1402 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1403 OriginAlignment);
1404 }
1405 }
1406
1407 // Returns true if Debug Location corresponds to multiple warnings.
1408 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1409 if (MS.TrackOrigins < 2)
1410 return false;
1411
1412 if (LazyWarningDebugLocationCount.empty())
1413 for (const auto &I : InstrumentationList)
1414 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1415
1416 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1417 }
1418
1419 /// Helper function to insert a warning at IRB's current insert point.
1420 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1421 if (!Origin)
1422 Origin = (Value *)IRB.getInt32(0);
1423 assert(Origin->getType()->isIntegerTy());
1424
1425 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1426 // Try to create additional origin with debug info of the last origin
1427 // instruction. It may provide additional information to the user.
1428 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1429 assert(MS.TrackOrigins);
1430 auto NewDebugLoc = OI->getDebugLoc();
1431 // Origin update with missing or the same debug location provides no
1432 // additional value.
1433 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1434 // Insert update just before the check, so we call runtime only just
1435 // before the report.
1436 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1437 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1438 Origin = updateOrigin(Origin, IRBOrigin);
1439 }
1440 }
1441 }
1442
1443 if (MS.CompileKernel || MS.TrackOrigins)
1444 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1445 else
1446 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1447 // FIXME: Insert UnreachableInst if !MS.Recover?
1448 // This may invalidate some of the following checks and needs to be done
1449 // at the very end.
1450 }
1451
1452 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1453 Value *Origin) {
1454 const DataLayout &DL = F.getDataLayout();
1455 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1456 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1457 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1458 // ZExt cannot convert between vector and scalar
1459 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1460 Value *ConvertedShadow2 =
1461 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1462
1463 if (SizeIndex < kNumberOfAccessSizes) {
1464 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1465 CallBase *CB = IRB.CreateCall(
1466 Fn,
1467 {ConvertedShadow2,
1468 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1469 CB->addParamAttr(0, Attribute::ZExt);
1470 CB->addParamAttr(1, Attribute::ZExt);
1471 } else {
1472 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1473 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1474 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1475 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1476 CallBase *CB = IRB.CreateCall(
1477 Fn,
1478 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1479 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1480 CB->addParamAttr(1, Attribute::ZExt);
1481 CB->addParamAttr(2, Attribute::ZExt);
1482 }
1483 } else {
1484 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1486 Cmp, &*IRB.GetInsertPoint(),
1487 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1488
1489 IRB.SetInsertPoint(CheckTerm);
1490 insertWarningFn(IRB, Origin);
1491 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1492 }
1493 }
1494
1495 void materializeInstructionChecks(
1496 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1497 const DataLayout &DL = F.getDataLayout();
1498 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1499 // correct origin.
1500 bool Combine = !MS.TrackOrigins;
1501 Instruction *Instruction = InstructionChecks.front().OrigIns;
1502 Value *Shadow = nullptr;
1503 for (const auto &ShadowData : InstructionChecks) {
1504 assert(ShadowData.OrigIns == Instruction);
1505 IRBuilder<> IRB(Instruction);
1506
1507 Value *ConvertedShadow = ShadowData.Shadow;
1508
1509 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1510 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1511 // Skip, value is initialized or const shadow is ignored.
1512 continue;
1513 }
1514 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1515 // Report as the value is definitely uninitialized.
1516 insertWarningFn(IRB, ShadowData.Origin);
1517 if (!MS.Recover)
1518 return; // Always fail and stop here, not need to check the rest.
1519 // Skip entire instruction,
1520 continue;
1521 }
1522 // Fallback to runtime check, which still can be optimized out later.
1523 }
1524
1525 if (!Combine) {
1526 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1527 continue;
1528 }
1529
1530 if (!Shadow) {
1531 Shadow = ConvertedShadow;
1532 continue;
1533 }
1534
1535 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1536 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1537 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1538 }
1539
1540 if (Shadow) {
1541 assert(Combine);
1542 IRBuilder<> IRB(Instruction);
1543 materializeOneCheck(IRB, Shadow, nullptr);
1544 }
1545 }
1546
1547 void materializeChecks() {
1548#ifndef NDEBUG
1549 // For assert below.
1550 SmallPtrSet<Instruction *, 16> Done;
1551#endif
1552
1553 for (auto I = InstrumentationList.begin();
1554 I != InstrumentationList.end();) {
1555 auto OrigIns = I->OrigIns;
1556 // Checks are grouped by the original instruction. We call all
1557 // `insertShadowCheck` for an instruction at once.
1558 assert(Done.insert(OrigIns).second);
1559 auto J = std::find_if(I + 1, InstrumentationList.end(),
1560 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1561 return OrigIns != R.OrigIns;
1562 });
1563 // Process all checks of instruction at once.
1564 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1565 I = J;
1566 }
1567
1568 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1569 }
1570
1571 // Returns the last instruction in the new prologue
1572 void insertKmsanPrologue(IRBuilder<> &IRB) {
1573 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1574 Constant *Zero = IRB.getInt32(0);
1575 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1576 {Zero, IRB.getInt32(0)}, "param_shadow");
1577 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1578 {Zero, IRB.getInt32(1)}, "retval_shadow");
1579 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1580 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1581 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1582 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1583 MS.VAArgOverflowSizeTLS =
1584 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1585 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1586 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1587 {Zero, IRB.getInt32(5)}, "param_origin");
1588 MS.RetvalOriginTLS =
1589 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1590 {Zero, IRB.getInt32(6)}, "retval_origin");
1591 if (MS.TargetTriple.getArch() == Triple::systemz)
1592 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1593 }
1594
1595 /// Add MemorySanitizer instrumentation to a function.
1596 bool runOnFunction() {
1597 // Iterate all BBs in depth-first order and create shadow instructions
1598 // for all instructions (where applicable).
1599 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1600 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1601 visit(*BB);
1602
1603 // `visit` above only collects instructions. Process them after iterating
1604 // CFG to avoid requirement on CFG transformations.
1605 for (Instruction *I : Instructions)
1607
1608 // Finalize PHI nodes.
1609 for (PHINode *PN : ShadowPHINodes) {
1610 PHINode *PNS = cast<PHINode>(getShadow(PN));
1611 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1612 size_t NumValues = PN->getNumIncomingValues();
1613 for (size_t v = 0; v < NumValues; v++) {
1614 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1615 if (PNO)
1616 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1617 }
1618 }
1619
1620 VAHelper->finalizeInstrumentation();
1621
1622 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1623 // instrumenting only allocas.
1625 for (auto Item : LifetimeStartList) {
1626 instrumentAlloca(*Item.second, Item.first);
1627 AllocaSet.remove(Item.second);
1628 }
1629 }
1630 // Poison the allocas for which we didn't instrument the corresponding
1631 // lifetime intrinsics.
1632 for (AllocaInst *AI : AllocaSet)
1633 instrumentAlloca(*AI);
1634
1635 // Insert shadow value checks.
1636 materializeChecks();
1637
1638 // Delayed instrumentation of StoreInst.
1639 // This may not add new address checks.
1640 materializeStores();
1641
1642 return true;
1643 }
1644
1645 /// Compute the shadow type that corresponds to a given Value.
1646 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1647
1648 /// Compute the shadow type that corresponds to a given Type.
1649 Type *getShadowTy(Type *OrigTy) {
1650 if (!OrigTy->isSized()) {
1651 return nullptr;
1652 }
1653 // For integer type, shadow is the same as the original type.
1654 // This may return weird-sized types like i1.
1655 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1656 return IT;
1657 const DataLayout &DL = F.getDataLayout();
1658 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1659 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1660 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1661 VT->getElementCount());
1662 }
1663 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1664 return ArrayType::get(getShadowTy(AT->getElementType()),
1665 AT->getNumElements());
1666 }
1667 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1669 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1670 Elements.push_back(getShadowTy(ST->getElementType(i)));
1671 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1672 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1673 return Res;
1674 }
1675 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1676 return IntegerType::get(*MS.C, TypeSize);
1677 }
1678
1679 /// Extract combined shadow of struct elements as a bool
1680 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1681 IRBuilder<> &IRB) {
1682 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1683 Value *Aggregator = FalseVal;
1684
1685 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1686 // Combine by ORing together each element's bool shadow
1687 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1688 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1689
1690 if (Aggregator != FalseVal)
1691 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1692 else
1693 Aggregator = ShadowBool;
1694 }
1695
1696 return Aggregator;
1697 }
1698
1699 // Extract combined shadow of array elements
1700 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1701 IRBuilder<> &IRB) {
1702 if (!Array->getNumElements())
1703 return IRB.getIntN(/* width */ 1, /* value */ 0);
1704
1705 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1706 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1707
1708 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1709 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1710 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1711 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1712 }
1713 return Aggregator;
1714 }
1715
1716 /// Convert a shadow value to it's flattened variant. The resulting
1717 /// shadow may not necessarily have the same bit width as the input
1718 /// value, but it will always be comparable to zero.
1719 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1720 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1721 return collapseStructShadow(Struct, V, IRB);
1722 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1723 return collapseArrayShadow(Array, V, IRB);
1724 if (isa<VectorType>(V->getType())) {
1725 if (isa<ScalableVectorType>(V->getType()))
1726 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1727 unsigned BitWidth =
1728 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1729 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1730 }
1731 return V;
1732 }
1733
1734 // Convert a scalar value to an i1 by comparing with 0
1735 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1736 Type *VTy = V->getType();
1737 if (!VTy->isIntegerTy())
1738 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1739 if (VTy->getIntegerBitWidth() == 1)
1740 // Just converting a bool to a bool, so do nothing.
1741 return V;
1742 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1743 }
1744
1745 Type *ptrToIntPtrType(Type *PtrTy) const {
1746 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1747 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1748 VectTy->getElementCount());
1749 }
1750 assert(PtrTy->isIntOrPtrTy());
1751 return MS.IntptrTy;
1752 }
1753
1754 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1755 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1756 return VectorType::get(
1757 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1758 VectTy->getElementCount());
1759 }
1760 assert(IntPtrTy == MS.IntptrTy);
1761 return MS.PtrTy;
1762 }
1763
1764 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1765 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1767 VectTy->getElementCount(),
1768 constToIntPtr(VectTy->getElementType(), C));
1769 }
1770 assert(IntPtrTy == MS.IntptrTy);
1771 return ConstantInt::get(MS.IntptrTy, C);
1772 }
1773
1774 /// Returns the integer shadow offset that corresponds to a given
1775 /// application address, whereby:
1776 ///
1777 /// Offset = (Addr & ~AndMask) ^ XorMask
1778 /// Shadow = ShadowBase + Offset
1779 /// Origin = (OriginBase + Offset) & ~Alignment
1780 ///
1781 /// Note: for efficiency, many shadow mappings only require use the XorMask
1782 /// and OriginBase; the AndMask and ShadowBase are often zero.
1783 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1784 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1785 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1786
1787 if (uint64_t AndMask = MS.MapParams->AndMask)
1788 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1789
1790 if (uint64_t XorMask = MS.MapParams->XorMask)
1791 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1792 return OffsetLong;
1793 }
1794
1795 /// Compute the shadow and origin addresses corresponding to a given
1796 /// application address.
1797 ///
1798 /// Shadow = ShadowBase + Offset
1799 /// Origin = (OriginBase + Offset) & ~3ULL
1800 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1801 /// a single pointee.
1802 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1803 std::pair<Value *, Value *>
1804 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1805 MaybeAlign Alignment) {
1806 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1807 if (!VectTy) {
1808 assert(Addr->getType()->isPointerTy());
1809 } else {
1810 assert(VectTy->getElementType()->isPointerTy());
1811 }
1812 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1813 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1814 Value *ShadowLong = ShadowOffset;
1815 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1816 ShadowLong =
1817 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1818 }
1819 Value *ShadowPtr = IRB.CreateIntToPtr(
1820 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1821
1822 Value *OriginPtr = nullptr;
1823 if (MS.TrackOrigins) {
1824 Value *OriginLong = ShadowOffset;
1825 uint64_t OriginBase = MS.MapParams->OriginBase;
1826 if (OriginBase != 0)
1827 OriginLong =
1828 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1829 if (!Alignment || *Alignment < kMinOriginAlignment) {
1830 uint64_t Mask = kMinOriginAlignment.value() - 1;
1831 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1832 }
1833 OriginPtr = IRB.CreateIntToPtr(
1834 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1835 }
1836 return std::make_pair(ShadowPtr, OriginPtr);
1837 }
1838
1839 template <typename... ArgsTy>
1840 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1841 ArgsTy... Args) {
1842 if (MS.TargetTriple.getArch() == Triple::systemz) {
1843 IRB.CreateCall(Callee,
1844 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1845 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1846 }
1847
1848 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1849 }
1850
1851 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1852 IRBuilder<> &IRB,
1853 Type *ShadowTy,
1854 bool isStore) {
1855 Value *ShadowOriginPtrs;
1856 const DataLayout &DL = F.getDataLayout();
1857 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1858
1859 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1860 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1861 if (Getter) {
1862 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1863 } else {
1864 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1865 ShadowOriginPtrs = createMetadataCall(
1866 IRB,
1867 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1868 AddrCast, SizeVal);
1869 }
1870 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1871 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1872 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1873
1874 return std::make_pair(ShadowPtr, OriginPtr);
1875 }
1876
1877 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1878 /// a single pointee.
1879 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1880 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1881 IRBuilder<> &IRB,
1882 Type *ShadowTy,
1883 bool isStore) {
1884 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1885 if (!VectTy) {
1886 assert(Addr->getType()->isPointerTy());
1887 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1888 }
1889
1890 // TODO: Support callbacs with vectors of addresses.
1891 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1892 Value *ShadowPtrs = ConstantInt::getNullValue(
1893 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1894 Value *OriginPtrs = nullptr;
1895 if (MS.TrackOrigins)
1896 OriginPtrs = ConstantInt::getNullValue(
1897 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1898 for (unsigned i = 0; i < NumElements; ++i) {
1899 Value *OneAddr =
1900 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1901 auto [ShadowPtr, OriginPtr] =
1902 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1903
1904 ShadowPtrs = IRB.CreateInsertElement(
1905 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1906 if (MS.TrackOrigins)
1907 OriginPtrs = IRB.CreateInsertElement(
1908 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1909 }
1910 return {ShadowPtrs, OriginPtrs};
1911 }
1912
1913 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1914 Type *ShadowTy,
1915 MaybeAlign Alignment,
1916 bool isStore) {
1917 if (MS.CompileKernel)
1918 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1919 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1920 }
1921
1922 /// Compute the shadow address for a given function argument.
1923 ///
1924 /// Shadow = ParamTLS+ArgOffset.
1925 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1926 return IRB.CreatePtrAdd(MS.ParamTLS,
1927 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1928 }
1929
1930 /// Compute the origin address for a given function argument.
1931 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1932 if (!MS.TrackOrigins)
1933 return nullptr;
1934 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1935 ConstantInt::get(MS.IntptrTy, ArgOffset),
1936 "_msarg_o");
1937 }
1938
1939 /// Compute the shadow address for a retval.
1940 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1941 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1942 }
1943
1944 /// Compute the origin address for a retval.
1945 Value *getOriginPtrForRetval() {
1946 // We keep a single origin for the entire retval. Might be too optimistic.
1947 return MS.RetvalOriginTLS;
1948 }
1949
1950 /// Set SV to be the shadow value for V.
1951 void setShadow(Value *V, Value *SV) {
1952 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1953 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1954 }
1955
1956 /// Set Origin to be the origin value for V.
1957 void setOrigin(Value *V, Value *Origin) {
1958 if (!MS.TrackOrigins)
1959 return;
1960 assert(!OriginMap.count(V) && "Values may only have one origin");
1961 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1962 OriginMap[V] = Origin;
1963 }
1964
1965 Constant *getCleanShadow(Type *OrigTy) {
1966 Type *ShadowTy = getShadowTy(OrigTy);
1967 if (!ShadowTy)
1968 return nullptr;
1969 return Constant::getNullValue(ShadowTy);
1970 }
1971
1972 /// Create a clean shadow value for a given value.
1973 ///
1974 /// Clean shadow (all zeroes) means all bits of the value are defined
1975 /// (initialized).
1976 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
1977
1978 /// Create a dirty shadow of a given shadow type.
1979 Constant *getPoisonedShadow(Type *ShadowTy) {
1980 assert(ShadowTy);
1981 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
1982 return Constant::getAllOnesValue(ShadowTy);
1983 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
1984 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
1985 getPoisonedShadow(AT->getElementType()));
1986 return ConstantArray::get(AT, Vals);
1987 }
1988 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
1990 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1991 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
1992 return ConstantStruct::get(ST, Vals);
1993 }
1994 llvm_unreachable("Unexpected shadow type");
1995 }
1996
1997 /// Create a dirty shadow for a given value.
1998 Constant *getPoisonedShadow(Value *V) {
1999 Type *ShadowTy = getShadowTy(V);
2000 if (!ShadowTy)
2001 return nullptr;
2002 return getPoisonedShadow(ShadowTy);
2003 }
2004
2005 /// Create a clean (zero) origin.
2006 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2007
2008 /// Get the shadow value for a given Value.
2009 ///
2010 /// This function either returns the value set earlier with setShadow,
2011 /// or extracts if from ParamTLS (for function arguments).
2012 Value *getShadow(Value *V) {
2013 if (Instruction *I = dyn_cast<Instruction>(V)) {
2014 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2015 return getCleanShadow(V);
2016 // For instructions the shadow is already stored in the map.
2017 Value *Shadow = ShadowMap[V];
2018 if (!Shadow) {
2019 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2020 assert(Shadow && "No shadow for a value");
2021 }
2022 return Shadow;
2023 }
2024 // Handle fully undefined values
2025 // (partially undefined constant vectors are handled later)
2026 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2027 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2028 : getCleanShadow(V);
2029 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2030 return AllOnes;
2031 }
2032 if (Argument *A = dyn_cast<Argument>(V)) {
2033 // For arguments we compute the shadow on demand and store it in the map.
2034 Value *&ShadowPtr = ShadowMap[V];
2035 if (ShadowPtr)
2036 return ShadowPtr;
2037 Function *F = A->getParent();
2038 IRBuilder<> EntryIRB(FnPrologueEnd);
2039 unsigned ArgOffset = 0;
2040 const DataLayout &DL = F->getDataLayout();
2041 for (auto &FArg : F->args()) {
2042 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2043 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2044 ? "vscale not fully supported\n"
2045 : "Arg is not sized\n"));
2046 if (A == &FArg) {
2047 ShadowPtr = getCleanShadow(V);
2048 setOrigin(A, getCleanOrigin());
2049 break;
2050 }
2051 continue;
2052 }
2053
2054 unsigned Size = FArg.hasByValAttr()
2055 ? DL.getTypeAllocSize(FArg.getParamByValType())
2056 : DL.getTypeAllocSize(FArg.getType());
2057
2058 if (A == &FArg) {
2059 bool Overflow = ArgOffset + Size > kParamTLSSize;
2060 if (FArg.hasByValAttr()) {
2061 // ByVal pointer itself has clean shadow. We copy the actual
2062 // argument shadow to the underlying memory.
2063 // Figure out maximal valid memcpy alignment.
2064 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2065 FArg.getParamAlign(), FArg.getParamByValType());
2066 Value *CpShadowPtr, *CpOriginPtr;
2067 std::tie(CpShadowPtr, CpOriginPtr) =
2068 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2069 /*isStore*/ true);
2070 if (!PropagateShadow || Overflow) {
2071 // ParamTLS overflow.
2072 EntryIRB.CreateMemSet(
2073 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2074 Size, ArgAlign);
2075 } else {
2076 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2077 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2078 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2079 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2080 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2081
2082 if (MS.TrackOrigins) {
2083 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2084 // FIXME: OriginSize should be:
2085 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2086 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2087 EntryIRB.CreateMemCpy(
2088 CpOriginPtr,
2089 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2090 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2091 OriginSize);
2092 }
2093 }
2094 }
2095
2096 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2097 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2098 ShadowPtr = getCleanShadow(V);
2099 setOrigin(A, getCleanOrigin());
2100 } else {
2101 // Shadow over TLS
2102 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2103 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2105 if (MS.TrackOrigins) {
2106 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2107 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2108 }
2109 }
2111 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2112 break;
2113 }
2114
2115 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2116 }
2117 assert(ShadowPtr && "Could not find shadow for an argument");
2118 return ShadowPtr;
2119 }
2120
2121 // Check for partially-undefined constant vectors
2122 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2123 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2124 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2125 PoisonUndefVectors) {
2126 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2127 SmallVector<Constant *, 32> ShadowVector(NumElems);
2128 for (unsigned i = 0; i != NumElems; ++i) {
2129 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2130 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2131 : getCleanShadow(Elem);
2132 }
2133
2134 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2135 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2136 << *ShadowConstant << "\n");
2137
2138 return ShadowConstant;
2139 }
2140
2141 // TODO: partially-undefined constant arrays, structures, and nested types
2142
2143 // For everything else the shadow is zero.
2144 return getCleanShadow(V);
2145 }
2146
2147 /// Get the shadow for i-th argument of the instruction I.
2148 Value *getShadow(Instruction *I, int i) {
2149 return getShadow(I->getOperand(i));
2150 }
2151
2152 /// Get the origin for a value.
2153 Value *getOrigin(Value *V) {
2154 if (!MS.TrackOrigins)
2155 return nullptr;
2156 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2157 return getCleanOrigin();
2159 "Unexpected value type in getOrigin()");
2160 if (Instruction *I = dyn_cast<Instruction>(V)) {
2161 if (I->getMetadata(LLVMContext::MD_nosanitize))
2162 return getCleanOrigin();
2163 }
2164 Value *Origin = OriginMap[V];
2165 assert(Origin && "Missing origin");
2166 return Origin;
2167 }
2168
2169 /// Get the origin for i-th argument of the instruction I.
2170 Value *getOrigin(Instruction *I, int i) {
2171 return getOrigin(I->getOperand(i));
2172 }
2173
2174 /// Remember the place where a shadow check should be inserted.
2175 ///
2176 /// This location will be later instrumented with a check that will print a
2177 /// UMR warning in runtime if the shadow value is not 0.
2178 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2179 assert(Shadow);
2180 if (!InsertChecks)
2181 return;
2182
2183 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2184 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2185 << *OrigIns << "\n");
2186 return;
2187 }
2188#ifndef NDEBUG
2189 Type *ShadowTy = Shadow->getType();
2190 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2191 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2192 "Can only insert checks for integer, vector, and aggregate shadow "
2193 "types");
2194#endif
2195 InstrumentationList.push_back(
2196 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2197 }
2198
2199 /// Get shadow for value, and remember the place where a shadow check should
2200 /// be inserted.
2201 ///
2202 /// This location will be later instrumented with a check that will print a
2203 /// UMR warning in runtime if the value is not fully defined.
2204 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2205 assert(Val);
2206 Value *Shadow, *Origin;
2208 Shadow = getShadow(Val);
2209 if (!Shadow)
2210 return;
2211 Origin = getOrigin(Val);
2212 } else {
2213 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2214 if (!Shadow)
2215 return;
2216 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2217 }
2218 insertCheckShadow(Shadow, Origin, OrigIns);
2219 }
2220
2222 switch (a) {
2223 case AtomicOrdering::NotAtomic:
2224 return AtomicOrdering::NotAtomic;
2225 case AtomicOrdering::Unordered:
2226 case AtomicOrdering::Monotonic:
2227 case AtomicOrdering::Release:
2228 return AtomicOrdering::Release;
2229 case AtomicOrdering::Acquire:
2230 case AtomicOrdering::AcquireRelease:
2231 return AtomicOrdering::AcquireRelease;
2232 case AtomicOrdering::SequentiallyConsistent:
2233 return AtomicOrdering::SequentiallyConsistent;
2234 }
2235 llvm_unreachable("Unknown ordering");
2236 }
2237
2238 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2239 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2240 uint32_t OrderingTable[NumOrderings] = {};
2241
2242 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2243 OrderingTable[(int)AtomicOrderingCABI::release] =
2244 (int)AtomicOrderingCABI::release;
2245 OrderingTable[(int)AtomicOrderingCABI::consume] =
2246 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2247 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2248 (int)AtomicOrderingCABI::acq_rel;
2249 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2250 (int)AtomicOrderingCABI::seq_cst;
2251
2252 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2253 }
2254
2256 switch (a) {
2257 case AtomicOrdering::NotAtomic:
2258 return AtomicOrdering::NotAtomic;
2259 case AtomicOrdering::Unordered:
2260 case AtomicOrdering::Monotonic:
2261 case AtomicOrdering::Acquire:
2262 return AtomicOrdering::Acquire;
2263 case AtomicOrdering::Release:
2264 case AtomicOrdering::AcquireRelease:
2265 return AtomicOrdering::AcquireRelease;
2266 case AtomicOrdering::SequentiallyConsistent:
2267 return AtomicOrdering::SequentiallyConsistent;
2268 }
2269 llvm_unreachable("Unknown ordering");
2270 }
2271
2272 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2273 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2274 uint32_t OrderingTable[NumOrderings] = {};
2275
2276 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2277 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2278 OrderingTable[(int)AtomicOrderingCABI::consume] =
2279 (int)AtomicOrderingCABI::acquire;
2280 OrderingTable[(int)AtomicOrderingCABI::release] =
2281 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2282 (int)AtomicOrderingCABI::acq_rel;
2283 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2284 (int)AtomicOrderingCABI::seq_cst;
2285
2286 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2287 }
2288
2289 // ------------------- Visitors.
2290 using InstVisitor<MemorySanitizerVisitor>::visit;
2291 void visit(Instruction &I) {
2292 if (I.getMetadata(LLVMContext::MD_nosanitize))
2293 return;
2294 // Don't want to visit if we're in the prologue
2295 if (isInPrologue(I))
2296 return;
2297 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2298 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2299 // We still need to set the shadow and origin to clean values.
2300 setShadow(&I, getCleanShadow(&I));
2301 setOrigin(&I, getCleanOrigin());
2302 return;
2303 }
2304
2305 Instructions.push_back(&I);
2306 }
2307
2308 /// Instrument LoadInst
2309 ///
2310 /// Loads the corresponding shadow and (optionally) origin.
2311 /// Optionally, checks that the load address is fully defined.
2312 void visitLoadInst(LoadInst &I) {
2313 assert(I.getType()->isSized() && "Load type must have size");
2314 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2315 NextNodeIRBuilder IRB(&I);
2316 Type *ShadowTy = getShadowTy(&I);
2317 Value *Addr = I.getPointerOperand();
2318 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2319 const Align Alignment = I.getAlign();
2320 if (PropagateShadow) {
2321 std::tie(ShadowPtr, OriginPtr) =
2322 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2323 setShadow(&I,
2324 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2325 } else {
2326 setShadow(&I, getCleanShadow(&I));
2327 }
2328
2330 insertCheckShadowOf(I.getPointerOperand(), &I);
2331
2332 if (I.isAtomic())
2333 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2334
2335 if (MS.TrackOrigins) {
2336 if (PropagateShadow) {
2337 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2338 setOrigin(
2339 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2340 } else {
2341 setOrigin(&I, getCleanOrigin());
2342 }
2343 }
2344 }
2345
2346 /// Instrument StoreInst
2347 ///
2348 /// Stores the corresponding shadow and (optionally) origin.
2349 /// Optionally, checks that the store address is fully defined.
2350 void visitStoreInst(StoreInst &I) {
2351 StoreList.push_back(&I);
2353 insertCheckShadowOf(I.getPointerOperand(), &I);
2354 }
2355
2356 void handleCASOrRMW(Instruction &I) {
2358
2359 IRBuilder<> IRB(&I);
2360 Value *Addr = I.getOperand(0);
2361 Value *Val = I.getOperand(1);
2362 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2363 /*isStore*/ true)
2364 .first;
2365
2367 insertCheckShadowOf(Addr, &I);
2368
2369 // Only test the conditional argument of cmpxchg instruction.
2370 // The other argument can potentially be uninitialized, but we can not
2371 // detect this situation reliably without possible false positives.
2373 insertCheckShadowOf(Val, &I);
2374
2375 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2376
2377 setShadow(&I, getCleanShadow(&I));
2378 setOrigin(&I, getCleanOrigin());
2379 }
2380
2381 void visitAtomicRMWInst(AtomicRMWInst &I) {
2382 handleCASOrRMW(I);
2383 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2384 }
2385
2386 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2387 handleCASOrRMW(I);
2388 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2389 }
2390
2391 // Vector manipulation.
2392 void visitExtractElementInst(ExtractElementInst &I) {
2393 insertCheckShadowOf(I.getOperand(1), &I);
2394 IRBuilder<> IRB(&I);
2395 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2396 "_msprop"));
2397 setOrigin(&I, getOrigin(&I, 0));
2398 }
2399
2400 void visitInsertElementInst(InsertElementInst &I) {
2401 insertCheckShadowOf(I.getOperand(2), &I);
2402 IRBuilder<> IRB(&I);
2403 auto *Shadow0 = getShadow(&I, 0);
2404 auto *Shadow1 = getShadow(&I, 1);
2405 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2406 "_msprop"));
2407 setOriginForNaryOp(I);
2408 }
2409
2410 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2411 IRBuilder<> IRB(&I);
2412 auto *Shadow0 = getShadow(&I, 0);
2413 auto *Shadow1 = getShadow(&I, 1);
2414 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2415 "_msprop"));
2416 setOriginForNaryOp(I);
2417 }
2418
2419 // Casts.
2420 void visitSExtInst(SExtInst &I) {
2421 IRBuilder<> IRB(&I);
2422 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2423 setOrigin(&I, getOrigin(&I, 0));
2424 }
2425
2426 void visitZExtInst(ZExtInst &I) {
2427 IRBuilder<> IRB(&I);
2428 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2429 setOrigin(&I, getOrigin(&I, 0));
2430 }
2431
2432 void visitTruncInst(TruncInst &I) {
2433 IRBuilder<> IRB(&I);
2434 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2435 setOrigin(&I, getOrigin(&I, 0));
2436 }
2437
2438 void visitBitCastInst(BitCastInst &I) {
2439 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2440 // a musttail call and a ret, don't instrument. New instructions are not
2441 // allowed after a musttail call.
2442 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2443 if (CI->isMustTailCall())
2444 return;
2445 IRBuilder<> IRB(&I);
2446 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2447 setOrigin(&I, getOrigin(&I, 0));
2448 }
2449
2450 void visitPtrToIntInst(PtrToIntInst &I) {
2451 IRBuilder<> IRB(&I);
2452 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2453 "_msprop_ptrtoint"));
2454 setOrigin(&I, getOrigin(&I, 0));
2455 }
2456
2457 void visitIntToPtrInst(IntToPtrInst &I) {
2458 IRBuilder<> IRB(&I);
2459 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2460 "_msprop_inttoptr"));
2461 setOrigin(&I, getOrigin(&I, 0));
2462 }
2463
2464 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2465 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2466 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2467 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2468 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2469 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2470
2471 /// Propagate shadow for bitwise AND.
2472 ///
2473 /// This code is exact, i.e. if, for example, a bit in the left argument
2474 /// is defined and 0, then neither the value not definedness of the
2475 /// corresponding bit in B don't affect the resulting shadow.
2476 void visitAnd(BinaryOperator &I) {
2477 IRBuilder<> IRB(&I);
2478 // "And" of 0 and a poisoned value results in unpoisoned value.
2479 // 1&1 => 1; 0&1 => 0; p&1 => p;
2480 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2481 // 1&p => p; 0&p => 0; p&p => p;
2482 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2483 Value *S1 = getShadow(&I, 0);
2484 Value *S2 = getShadow(&I, 1);
2485 Value *V1 = I.getOperand(0);
2486 Value *V2 = I.getOperand(1);
2487 if (V1->getType() != S1->getType()) {
2488 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2489 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2490 }
2491 Value *S1S2 = IRB.CreateAnd(S1, S2);
2492 Value *V1S2 = IRB.CreateAnd(V1, S2);
2493 Value *S1V2 = IRB.CreateAnd(S1, V2);
2494 setShadow(&I, IRB.CreateOr({S1S2, V1S2, S1V2}));
2495 setOriginForNaryOp(I);
2496 }
2497
2498 void visitOr(BinaryOperator &I) {
2499 IRBuilder<> IRB(&I);
2500 // "Or" of 1 and a poisoned value results in unpoisoned value:
2501 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2502 // 1|0 => 1; 0|0 => 0; p|0 => p;
2503 // 1|p => 1; 0|p => p; p|p => p;
2504 //
2505 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2506 //
2507 // If the "disjoint OR" property is violated, the result is poison, and
2508 // hence the entire shadow is uninitialized:
2509 // S = S | SignExt(V1 & V2 != 0)
2510 Value *S1 = getShadow(&I, 0);
2511 Value *S2 = getShadow(&I, 1);
2512 Value *V1 = I.getOperand(0);
2513 Value *V2 = I.getOperand(1);
2514 if (V1->getType() != S1->getType()) {
2515 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2516 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2517 }
2518
2519 Value *NotV1 = IRB.CreateNot(V1);
2520 Value *NotV2 = IRB.CreateNot(V2);
2521
2522 Value *S1S2 = IRB.CreateAnd(S1, S2);
2523 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2524 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2525
2526 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2527
2528 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2529 Value *V1V2 = IRB.CreateAnd(V1, V2);
2530 Value *DisjointOrShadow = IRB.CreateSExt(
2531 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2532 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2533 }
2534
2535 setShadow(&I, S);
2536 setOriginForNaryOp(I);
2537 }
2538
2539 /// Default propagation of shadow and/or origin.
2540 ///
2541 /// This class implements the general case of shadow propagation, used in all
2542 /// cases where we don't know and/or don't care about what the operation
2543 /// actually does. It converts all input shadow values to a common type
2544 /// (extending or truncating as necessary), and bitwise OR's them.
2545 ///
2546 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2547 /// fully initialized), and less prone to false positives.
2548 ///
2549 /// This class also implements the general case of origin propagation. For a
2550 /// Nary operation, result origin is set to the origin of an argument that is
2551 /// not entirely initialized. If there is more than one such arguments, the
2552 /// rightmost of them is picked. It does not matter which one is picked if all
2553 /// arguments are initialized.
2554 template <bool CombineShadow> class Combiner {
2555 Value *Shadow = nullptr;
2556 Value *Origin = nullptr;
2557 IRBuilder<> &IRB;
2558 MemorySanitizerVisitor *MSV;
2559
2560 public:
2561 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2562 : IRB(IRB), MSV(MSV) {}
2563
2564 /// Add a pair of shadow and origin values to the mix.
2565 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2566 if (CombineShadow) {
2567 assert(OpShadow);
2568 if (!Shadow)
2569 Shadow = OpShadow;
2570 else {
2571 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2572 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2573 }
2574 }
2575
2576 if (MSV->MS.TrackOrigins) {
2577 assert(OpOrigin);
2578 if (!Origin) {
2579 Origin = OpOrigin;
2580 } else {
2581 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2582 // No point in adding something that might result in 0 origin value.
2583 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2584 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2585 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2586 }
2587 }
2588 }
2589 return *this;
2590 }
2591
2592 /// Add an application value to the mix.
2593 Combiner &Add(Value *V) {
2594 Value *OpShadow = MSV->getShadow(V);
2595 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2596 return Add(OpShadow, OpOrigin);
2597 }
2598
2599 /// Set the current combined values as the given instruction's shadow
2600 /// and origin.
2601 void Done(Instruction *I) {
2602 if (CombineShadow) {
2603 assert(Shadow);
2604 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2605 MSV->setShadow(I, Shadow);
2606 }
2607 if (MSV->MS.TrackOrigins) {
2608 assert(Origin);
2609 MSV->setOrigin(I, Origin);
2610 }
2611 }
2612
2613 /// Store the current combined value at the specified origin
2614 /// location.
2615 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2616 if (MSV->MS.TrackOrigins) {
2617 assert(Origin);
2618 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2619 }
2620 }
2621 };
2622
2623 using ShadowAndOriginCombiner = Combiner<true>;
2624 using OriginCombiner = Combiner<false>;
2625
2626 /// Propagate origin for arbitrary operation.
2627 void setOriginForNaryOp(Instruction &I) {
2628 if (!MS.TrackOrigins)
2629 return;
2630 IRBuilder<> IRB(&I);
2631 OriginCombiner OC(this, IRB);
2632 for (Use &Op : I.operands())
2633 OC.Add(Op.get());
2634 OC.Done(&I);
2635 }
2636
2637 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2638 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2639 "Vector of pointers is not a valid shadow type");
2640 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2642 : Ty->getPrimitiveSizeInBits();
2643 }
2644
2645 /// Cast between two shadow types, extending or truncating as
2646 /// necessary.
2647 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2648 bool Signed = false) {
2649 Type *srcTy = V->getType();
2650 if (srcTy == dstTy)
2651 return V;
2652 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2653 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2654 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2655 return IRB.CreateICmpNE(V, getCleanShadow(V));
2656
2657 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2658 return IRB.CreateIntCast(V, dstTy, Signed);
2659 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2660 cast<VectorType>(dstTy)->getElementCount() ==
2661 cast<VectorType>(srcTy)->getElementCount())
2662 return IRB.CreateIntCast(V, dstTy, Signed);
2663 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2664 Value *V2 =
2665 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2666 return IRB.CreateBitCast(V2, dstTy);
2667 // TODO: handle struct types.
2668 }
2669
2670 /// Cast an application value to the type of its own shadow.
2671 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2672 Type *ShadowTy = getShadowTy(V);
2673 if (V->getType() == ShadowTy)
2674 return V;
2675 if (V->getType()->isPtrOrPtrVectorTy())
2676 return IRB.CreatePtrToInt(V, ShadowTy);
2677 else
2678 return IRB.CreateBitCast(V, ShadowTy);
2679 }
2680
2681 /// Propagate shadow for arbitrary operation.
2682 void handleShadowOr(Instruction &I) {
2683 IRBuilder<> IRB(&I);
2684 ShadowAndOriginCombiner SC(this, IRB);
2685 for (Use &Op : I.operands())
2686 SC.Add(Op.get());
2687 SC.Done(&I);
2688 }
2689
2690 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2691 // of elements.
2692 //
2693 // For example, suppose we have:
2694 // VectorA: <a1, a2, a3, a4, a5, a6>
2695 // VectorB: <b1, b2, b3, b4, b5, b6>
2696 // ReductionFactor: 3.
2697 // The output would be:
2698 // <a1|a2|a3, a4|a5|a6, b1|b2|b3, b4|b5|b6>
2699 //
2700 // This is convenient for instrumenting horizontal add/sub.
2701 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2702 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2703 Value *VectorA, Value *VectorB) {
2704 assert(isa<FixedVectorType>(VectorA->getType()));
2705 unsigned TotalNumElems =
2706 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2707
2708 if (VectorB) {
2709 assert(VectorA->getType() == VectorB->getType());
2710 TotalNumElems = TotalNumElems * 2;
2711 }
2712
2713 assert(TotalNumElems % ReductionFactor == 0);
2714
2715 Value *Or = nullptr;
2716
2717 IRBuilder<> IRB(&I);
2718 for (unsigned i = 0; i < ReductionFactor; i++) {
2719 SmallVector<int, 16> Mask;
2720 for (unsigned X = 0; X < TotalNumElems; X += ReductionFactor)
2721 Mask.push_back(X + i);
2722
2723 Value *Masked;
2724 if (VectorB)
2725 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2726 else
2727 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2728
2729 if (Or)
2730 Or = IRB.CreateOr(Or, Masked);
2731 else
2732 Or = Masked;
2733 }
2734
2735 return Or;
2736 }
2737
2738 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2739 /// fields.
2740 ///
2741 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2742 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2743 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I) {
2744 assert(I.arg_size() == 1 || I.arg_size() == 2);
2745
2746 assert(I.getType()->isVectorTy());
2747 assert(I.getArgOperand(0)->getType()->isVectorTy());
2748
2749 [[maybe_unused]] FixedVectorType *ParamType =
2750 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2751 assert((I.arg_size() != 2) ||
2752 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2753 [[maybe_unused]] FixedVectorType *ReturnType =
2754 cast<FixedVectorType>(I.getType());
2755 assert(ParamType->getNumElements() * I.arg_size() ==
2756 2 * ReturnType->getNumElements());
2757
2758 IRBuilder<> IRB(&I);
2759
2760 // Horizontal OR of shadow
2761 Value *FirstArgShadow = getShadow(&I, 0);
2762 Value *SecondArgShadow = nullptr;
2763 if (I.arg_size() == 2)
2764 SecondArgShadow = getShadow(&I, 1);
2765
2766 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2767 SecondArgShadow);
2768
2769 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2770
2771 setShadow(&I, OrShadow);
2772 setOriginForNaryOp(I);
2773 }
2774
2775 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2776 /// fields, with the parameters reinterpreted to have elements of a specified
2777 /// width. For example:
2778 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2779 /// conceptually operates on
2780 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2781 /// and can be handled with ReinterpretElemWidth == 16.
2782 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I,
2783 int ReinterpretElemWidth) {
2784 assert(I.arg_size() == 1 || I.arg_size() == 2);
2785
2786 assert(I.getType()->isVectorTy());
2787 assert(I.getArgOperand(0)->getType()->isVectorTy());
2788
2789 FixedVectorType *ParamType =
2790 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2791 assert((I.arg_size() != 2) ||
2792 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2793
2794 [[maybe_unused]] FixedVectorType *ReturnType =
2795 cast<FixedVectorType>(I.getType());
2796 assert(ParamType->getNumElements() * I.arg_size() ==
2797 2 * ReturnType->getNumElements());
2798
2799 IRBuilder<> IRB(&I);
2800
2801 FixedVectorType *ReinterpretShadowTy = nullptr;
2802 assert(isAligned(Align(ReinterpretElemWidth),
2803 ParamType->getPrimitiveSizeInBits()));
2804 ReinterpretShadowTy = FixedVectorType::get(
2805 IRB.getIntNTy(ReinterpretElemWidth),
2806 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2807
2808 // Horizontal OR of shadow
2809 Value *FirstArgShadow = getShadow(&I, 0);
2810 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2811
2812 // If we had two parameters each with an odd number of elements, the total
2813 // number of elements is even, but we have never seen this in extant
2814 // instruction sets, so we enforce that each parameter must have an even
2815 // number of elements.
2817 Align(2),
2818 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2819
2820 Value *SecondArgShadow = nullptr;
2821 if (I.arg_size() == 2) {
2822 SecondArgShadow = getShadow(&I, 1);
2823 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2824 }
2825
2826 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2827 SecondArgShadow);
2828
2829 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2830
2831 setShadow(&I, OrShadow);
2832 setOriginForNaryOp(I);
2833 }
2834
2835 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2836
2837 // Handle multiplication by constant.
2838 //
2839 // Handle a special case of multiplication by constant that may have one or
2840 // more zeros in the lower bits. This makes corresponding number of lower bits
2841 // of the result zero as well. We model it by shifting the other operand
2842 // shadow left by the required number of bits. Effectively, we transform
2843 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2844 // We use multiplication by 2**N instead of shift to cover the case of
2845 // multiplication by 0, which may occur in some elements of a vector operand.
2846 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2847 Value *OtherArg) {
2848 Constant *ShadowMul;
2849 Type *Ty = ConstArg->getType();
2850 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2851 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2852 Type *EltTy = VTy->getElementType();
2854 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2855 if (ConstantInt *Elt =
2857 const APInt &V = Elt->getValue();
2858 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2859 Elements.push_back(ConstantInt::get(EltTy, V2));
2860 } else {
2861 Elements.push_back(ConstantInt::get(EltTy, 1));
2862 }
2863 }
2864 ShadowMul = ConstantVector::get(Elements);
2865 } else {
2866 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2867 const APInt &V = Elt->getValue();
2868 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2869 ShadowMul = ConstantInt::get(Ty, V2);
2870 } else {
2871 ShadowMul = ConstantInt::get(Ty, 1);
2872 }
2873 }
2874
2875 IRBuilder<> IRB(&I);
2876 setShadow(&I,
2877 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2878 setOrigin(&I, getOrigin(OtherArg));
2879 }
2880
2881 void visitMul(BinaryOperator &I) {
2882 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2883 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2884 if (constOp0 && !constOp1)
2885 handleMulByConstant(I, constOp0, I.getOperand(1));
2886 else if (constOp1 && !constOp0)
2887 handleMulByConstant(I, constOp1, I.getOperand(0));
2888 else
2889 handleShadowOr(I);
2890 }
2891
2892 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2893 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2894 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2895 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2896 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2897 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2898
2899 void handleIntegerDiv(Instruction &I) {
2900 IRBuilder<> IRB(&I);
2901 // Strict on the second argument.
2902 insertCheckShadowOf(I.getOperand(1), &I);
2903 setShadow(&I, getShadow(&I, 0));
2904 setOrigin(&I, getOrigin(&I, 0));
2905 }
2906
2907 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2908 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2909 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2910 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2911
2912 // Floating point division is side-effect free. We can not require that the
2913 // divisor is fully initialized and must propagate shadow. See PR37523.
2914 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2915 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2916
2917 /// Instrument == and != comparisons.
2918 ///
2919 /// Sometimes the comparison result is known even if some of the bits of the
2920 /// arguments are not.
2921 void handleEqualityComparison(ICmpInst &I) {
2922 IRBuilder<> IRB(&I);
2923 Value *A = I.getOperand(0);
2924 Value *B = I.getOperand(1);
2925 Value *Sa = getShadow(A);
2926 Value *Sb = getShadow(B);
2927
2928 // Get rid of pointers and vectors of pointers.
2929 // For ints (and vectors of ints), types of A and Sa match,
2930 // and this is a no-op.
2931 A = IRB.CreatePointerCast(A, Sa->getType());
2932 B = IRB.CreatePointerCast(B, Sb->getType());
2933
2934 // A == B <==> (C = A^B) == 0
2935 // A != B <==> (C = A^B) != 0
2936 // Sc = Sa | Sb
2937 Value *C = IRB.CreateXor(A, B);
2938 Value *Sc = IRB.CreateOr(Sa, Sb);
2939 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2940 // Result is defined if one of the following is true
2941 // * there is a defined 1 bit in C
2942 // * C is fully defined
2943 // Si = !(C & ~Sc) && Sc
2945 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2946 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2947 Value *RHS =
2948 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2949 Value *Si = IRB.CreateAnd(LHS, RHS);
2950 Si->setName("_msprop_icmp");
2951 setShadow(&I, Si);
2952 setOriginForNaryOp(I);
2953 }
2954
2955 /// Instrument relational comparisons.
2956 ///
2957 /// This function does exact shadow propagation for all relational
2958 /// comparisons of integers, pointers and vectors of those.
2959 /// FIXME: output seems suboptimal when one of the operands is a constant
2960 void handleRelationalComparisonExact(ICmpInst &I) {
2961 IRBuilder<> IRB(&I);
2962 Value *A = I.getOperand(0);
2963 Value *B = I.getOperand(1);
2964 Value *Sa = getShadow(A);
2965 Value *Sb = getShadow(B);
2966
2967 // Get rid of pointers and vectors of pointers.
2968 // For ints (and vectors of ints), types of A and Sa match,
2969 // and this is a no-op.
2970 A = IRB.CreatePointerCast(A, Sa->getType());
2971 B = IRB.CreatePointerCast(B, Sb->getType());
2972
2973 // Let [a0, a1] be the interval of possible values of A, taking into account
2974 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
2975 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
2976 bool IsSigned = I.isSigned();
2977
2978 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
2979 if (IsSigned) {
2980 // Sign-flip to map from signed range to unsigned range. Relation A vs B
2981 // should be preserved, if checked with `getUnsignedPredicate()`.
2982 // Relationship between Amin, Amax, Bmin, Bmax also will not be
2983 // affected, as they are created by effectively adding/substructing from
2984 // A (or B) a value, derived from shadow, with no overflow, either
2985 // before or after sign flip.
2986 APInt MinVal =
2987 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
2988 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
2989 }
2990 // Minimize undefined bits.
2991 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
2992 Value *Max = IRB.CreateOr(V, S);
2993 return std::make_pair(Min, Max);
2994 };
2995
2996 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
2997 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
2998 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
2999 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3000
3001 Value *Si = IRB.CreateXor(S1, S2);
3002 setShadow(&I, Si);
3003 setOriginForNaryOp(I);
3004 }
3005
3006 /// Instrument signed relational comparisons.
3007 ///
3008 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3009 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3010 void handleSignedRelationalComparison(ICmpInst &I) {
3011 Constant *constOp;
3012 Value *op = nullptr;
3014 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3015 op = I.getOperand(0);
3016 pre = I.getPredicate();
3017 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3018 op = I.getOperand(1);
3019 pre = I.getSwappedPredicate();
3020 } else {
3021 handleShadowOr(I);
3022 return;
3023 }
3024
3025 if ((constOp->isNullValue() &&
3026 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3027 (constOp->isAllOnesValue() &&
3028 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3029 IRBuilder<> IRB(&I);
3030 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3031 "_msprop_icmp_s");
3032 setShadow(&I, Shadow);
3033 setOrigin(&I, getOrigin(op));
3034 } else {
3035 handleShadowOr(I);
3036 }
3037 }
3038
3039 void visitICmpInst(ICmpInst &I) {
3040 if (!ClHandleICmp) {
3041 handleShadowOr(I);
3042 return;
3043 }
3044 if (I.isEquality()) {
3045 handleEqualityComparison(I);
3046 return;
3047 }
3048
3049 assert(I.isRelational());
3050 if (ClHandleICmpExact) {
3051 handleRelationalComparisonExact(I);
3052 return;
3053 }
3054 if (I.isSigned()) {
3055 handleSignedRelationalComparison(I);
3056 return;
3057 }
3058
3059 assert(I.isUnsigned());
3060 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3061 handleRelationalComparisonExact(I);
3062 return;
3063 }
3064
3065 handleShadowOr(I);
3066 }
3067
3068 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3069
3070 void handleShift(BinaryOperator &I) {
3071 IRBuilder<> IRB(&I);
3072 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3073 // Otherwise perform the same shift on S1.
3074 Value *S1 = getShadow(&I, 0);
3075 Value *S2 = getShadow(&I, 1);
3076 Value *S2Conv =
3077 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3078 Value *V2 = I.getOperand(1);
3079 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3080 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3081 setOriginForNaryOp(I);
3082 }
3083
3084 void visitShl(BinaryOperator &I) { handleShift(I); }
3085 void visitAShr(BinaryOperator &I) { handleShift(I); }
3086 void visitLShr(BinaryOperator &I) { handleShift(I); }
3087
3088 void handleFunnelShift(IntrinsicInst &I) {
3089 IRBuilder<> IRB(&I);
3090 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3091 // Otherwise perform the same shift on S0 and S1.
3092 Value *S0 = getShadow(&I, 0);
3093 Value *S1 = getShadow(&I, 1);
3094 Value *S2 = getShadow(&I, 2);
3095 Value *S2Conv =
3096 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3097 Value *V2 = I.getOperand(2);
3098 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3099 {S0, S1, V2});
3100 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3101 setOriginForNaryOp(I);
3102 }
3103
3104 /// Instrument llvm.memmove
3105 ///
3106 /// At this point we don't know if llvm.memmove will be inlined or not.
3107 /// If we don't instrument it and it gets inlined,
3108 /// our interceptor will not kick in and we will lose the memmove.
3109 /// If we instrument the call here, but it does not get inlined,
3110 /// we will memove the shadow twice: which is bad in case
3111 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3112 ///
3113 /// Similar situation exists for memcpy and memset.
3114 void visitMemMoveInst(MemMoveInst &I) {
3115 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3116 IRBuilder<> IRB(&I);
3117 IRB.CreateCall(MS.MemmoveFn,
3118 {I.getArgOperand(0), I.getArgOperand(1),
3119 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3121 }
3122
3123 /// Instrument memcpy
3124 ///
3125 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3126 /// unfortunate as it may slowdown small constant memcpys.
3127 /// FIXME: consider doing manual inline for small constant sizes and proper
3128 /// alignment.
3129 ///
3130 /// Note: This also handles memcpy.inline, which promises no calls to external
3131 /// functions as an optimization. However, with instrumentation enabled this
3132 /// is difficult to promise; additionally, we know that the MSan runtime
3133 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3134 /// instrumentation it's safe to turn memcpy.inline into a call to
3135 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3136 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3137 void visitMemCpyInst(MemCpyInst &I) {
3138 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3139 IRBuilder<> IRB(&I);
3140 IRB.CreateCall(MS.MemcpyFn,
3141 {I.getArgOperand(0), I.getArgOperand(1),
3142 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3144 }
3145
3146 // Same as memcpy.
3147 void visitMemSetInst(MemSetInst &I) {
3148 IRBuilder<> IRB(&I);
3149 IRB.CreateCall(
3150 MS.MemsetFn,
3151 {I.getArgOperand(0),
3152 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3153 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3155 }
3156
3157 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3158
3159 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3160
3161 /// Handle vector store-like intrinsics.
3162 ///
3163 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3164 /// has 1 pointer argument and 1 vector argument, returns void.
3165 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3166 assert(I.arg_size() == 2);
3167
3168 IRBuilder<> IRB(&I);
3169 Value *Addr = I.getArgOperand(0);
3170 Value *Shadow = getShadow(&I, 1);
3171 Value *ShadowPtr, *OriginPtr;
3172
3173 // We don't know the pointer alignment (could be unaligned SSE store!).
3174 // Have to assume to worst case.
3175 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3176 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3177 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3178
3180 insertCheckShadowOf(Addr, &I);
3181
3182 // FIXME: factor out common code from materializeStores
3183 if (MS.TrackOrigins)
3184 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3185 return true;
3186 }
3187
3188 /// Handle vector load-like intrinsics.
3189 ///
3190 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3191 /// has 1 pointer argument, returns a vector.
3192 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3193 assert(I.arg_size() == 1);
3194
3195 IRBuilder<> IRB(&I);
3196 Value *Addr = I.getArgOperand(0);
3197
3198 Type *ShadowTy = getShadowTy(&I);
3199 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3200 if (PropagateShadow) {
3201 // We don't know the pointer alignment (could be unaligned SSE load!).
3202 // Have to assume to worst case.
3203 const Align Alignment = Align(1);
3204 std::tie(ShadowPtr, OriginPtr) =
3205 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3206 setShadow(&I,
3207 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3208 } else {
3209 setShadow(&I, getCleanShadow(&I));
3210 }
3211
3213 insertCheckShadowOf(Addr, &I);
3214
3215 if (MS.TrackOrigins) {
3216 if (PropagateShadow)
3217 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3218 else
3219 setOrigin(&I, getCleanOrigin());
3220 }
3221 return true;
3222 }
3223
3224 /// Handle (SIMD arithmetic)-like intrinsics.
3225 ///
3226 /// Instrument intrinsics with any number of arguments of the same type [*],
3227 /// equal to the return type, plus a specified number of trailing flags of
3228 /// any type.
3229 ///
3230 /// [*] The type should be simple (no aggregates or pointers; vectors are
3231 /// fine).
3232 ///
3233 /// Caller guarantees that this intrinsic does not access memory.
3234 ///
3235 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3236 /// by this handler. See horizontalReduce().
3237 ///
3238 /// TODO: permutation intrinsics are also often incorrectly matched.
3239 [[maybe_unused]] bool
3240 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3241 unsigned int trailingFlags) {
3242 Type *RetTy = I.getType();
3243 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3244 return false;
3245
3246 unsigned NumArgOperands = I.arg_size();
3247 assert(NumArgOperands >= trailingFlags);
3248 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3249 Type *Ty = I.getArgOperand(i)->getType();
3250 if (Ty != RetTy)
3251 return false;
3252 }
3253
3254 IRBuilder<> IRB(&I);
3255 ShadowAndOriginCombiner SC(this, IRB);
3256 for (unsigned i = 0; i < NumArgOperands; ++i)
3257 SC.Add(I.getArgOperand(i));
3258 SC.Done(&I);
3259
3260 return true;
3261 }
3262
3263 /// Returns whether it was able to heuristically instrument unknown
3264 /// intrinsics.
3265 ///
3266 /// The main purpose of this code is to do something reasonable with all
3267 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3268 /// We recognize several classes of intrinsics by their argument types and
3269 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3270 /// sure that we know what the intrinsic does.
3271 ///
3272 /// We special-case intrinsics where this approach fails. See llvm.bswap
3273 /// handling as an example of that.
3274 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3275 unsigned NumArgOperands = I.arg_size();
3276 if (NumArgOperands == 0)
3277 return false;
3278
3279 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3280 I.getArgOperand(1)->getType()->isVectorTy() &&
3281 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3282 // This looks like a vector store.
3283 return handleVectorStoreIntrinsic(I);
3284 }
3285
3286 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3287 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3288 // This looks like a vector load.
3289 return handleVectorLoadIntrinsic(I);
3290 }
3291
3292 if (I.doesNotAccessMemory())
3293 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3294 return true;
3295
3296 // FIXME: detect and handle SSE maskstore/maskload?
3297 // Some cases are now handled in handleAVXMasked{Load,Store}.
3298 return false;
3299 }
3300
3301 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3302 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3304 dumpInst(I);
3305
3306 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3307 << "\n");
3308 return true;
3309 } else
3310 return false;
3311 }
3312
3313 void handleInvariantGroup(IntrinsicInst &I) {
3314 setShadow(&I, getShadow(&I, 0));
3315 setOrigin(&I, getOrigin(&I, 0));
3316 }
3317
3318 void handleLifetimeStart(IntrinsicInst &I) {
3319 if (!PoisonStack)
3320 return;
3321 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3322 if (AI)
3323 LifetimeStartList.push_back(std::make_pair(&I, AI));
3324 }
3325
3326 void handleBswap(IntrinsicInst &I) {
3327 IRBuilder<> IRB(&I);
3328 Value *Op = I.getArgOperand(0);
3329 Type *OpType = Op->getType();
3330 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3331 getShadow(Op)));
3332 setOrigin(&I, getOrigin(Op));
3333 }
3334
3335 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3336 // and a 1. If the input is all zero, it is fully initialized iff
3337 // !is_zero_poison.
3338 //
3339 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3340 // concrete value 0/1, and ? is an uninitialized bit:
3341 // - 0001 0??? is fully initialized
3342 // - 000? ???? is fully uninitialized (*)
3343 // - ???? ???? is fully uninitialized
3344 // - 0000 0000 is fully uninitialized if is_zero_poison,
3345 // fully initialized otherwise
3346 //
3347 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3348 // only need to poison 4 bits.
3349 //
3350 // OutputShadow =
3351 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3352 // || (is_zero_poison && AllZeroSrc)
3353 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3354 IRBuilder<> IRB(&I);
3355 Value *Src = I.getArgOperand(0);
3356 Value *SrcShadow = getShadow(Src);
3357
3358 Value *False = IRB.getInt1(false);
3359 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3360 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3361 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3362 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3363
3364 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3365 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3366
3367 Value *NotAllZeroShadow =
3368 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3369 Value *OutputShadow =
3370 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3371
3372 // If zero poison is requested, mix in with the shadow
3373 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3374 if (!IsZeroPoison->isZeroValue()) {
3375 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3376 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3377 }
3378
3379 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3380
3381 setShadow(&I, OutputShadow);
3382 setOriginForNaryOp(I);
3383 }
3384
3385 /// Handle Arm NEON vector convert intrinsics.
3386 ///
3387 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3388 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3389 ///
3390 /// For x86 SSE vector convert intrinsics, see
3391 /// handleSSEVectorConvertIntrinsic().
3392 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3393 assert(I.arg_size() == 1);
3394
3395 IRBuilder<> IRB(&I);
3396 Value *S0 = getShadow(&I, 0);
3397
3398 /// For scalars:
3399 /// Since they are converting from floating-point to integer, the output is
3400 /// - fully uninitialized if *any* bit of the input is uninitialized
3401 /// - fully ininitialized if all bits of the input are ininitialized
3402 /// We apply the same principle on a per-field basis for vectors.
3403 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3404 getShadowTy(&I));
3405 setShadow(&I, OutShadow);
3406 setOriginForNaryOp(I);
3407 }
3408
3409 /// Some instructions have additional zero-elements in the return type
3410 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3411 ///
3412 /// This function will return a vector type with the same number of elements
3413 /// as the input, but same per-element width as the return value e.g.,
3414 /// <8 x i8>.
3415 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3416 assert(isa<FixedVectorType>(getShadowTy(&I)));
3417 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3418
3419 // TODO: generalize beyond 2x?
3420 if (ShadowType->getElementCount() ==
3421 cast<VectorType>(Src->getType())->getElementCount() * 2)
3422 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3423
3424 assert(ShadowType->getElementCount() ==
3425 cast<VectorType>(Src->getType())->getElementCount());
3426
3427 return ShadowType;
3428 }
3429
3430 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3431 /// to match the length of the shadow for the instruction.
3432 /// If scalar types of the vectors are different, it will use the type of the
3433 /// input vector.
3434 /// This is more type-safe than CreateShadowCast().
3435 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3436 IRBuilder<> IRB(&I);
3438 assert(isa<FixedVectorType>(I.getType()));
3439
3440 Value *FullShadow = getCleanShadow(&I);
3441 unsigned ShadowNumElems =
3442 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3443 unsigned FullShadowNumElems =
3444 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3445
3446 assert((ShadowNumElems == FullShadowNumElems) ||
3447 (ShadowNumElems * 2 == FullShadowNumElems));
3448
3449 if (ShadowNumElems == FullShadowNumElems) {
3450 FullShadow = Shadow;
3451 } else {
3452 // TODO: generalize beyond 2x?
3453 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3454 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3455
3456 // Append zeros
3457 FullShadow =
3458 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3459 }
3460
3461 return FullShadow;
3462 }
3463
3464 /// Handle x86 SSE vector conversion.
3465 ///
3466 /// e.g., single-precision to half-precision conversion:
3467 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3468 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3469 ///
3470 /// floating-point to integer:
3471 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3472 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3473 ///
3474 /// Note: if the output has more elements, they are zero-initialized (and
3475 /// therefore the shadow will also be initialized).
3476 ///
3477 /// This differs from handleSSEVectorConvertIntrinsic() because it
3478 /// propagates uninitialized shadow (instead of checking the shadow).
3479 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3480 bool HasRoundingMode) {
3481 if (HasRoundingMode) {
3482 assert(I.arg_size() == 2);
3483 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3484 assert(RoundingMode->getType()->isIntegerTy());
3485 } else {
3486 assert(I.arg_size() == 1);
3487 }
3488
3489 Value *Src = I.getArgOperand(0);
3490 assert(Src->getType()->isVectorTy());
3491
3492 // The return type might have more elements than the input.
3493 // Temporarily shrink the return type's number of elements.
3494 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3495
3496 IRBuilder<> IRB(&I);
3497 Value *S0 = getShadow(&I, 0);
3498
3499 /// For scalars:
3500 /// Since they are converting to and/or from floating-point, the output is:
3501 /// - fully uninitialized if *any* bit of the input is uninitialized
3502 /// - fully ininitialized if all bits of the input are ininitialized
3503 /// We apply the same principle on a per-field basis for vectors.
3504 Value *Shadow =
3505 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3506
3507 // The return type might have more elements than the input.
3508 // Extend the return type back to its original width if necessary.
3509 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3510
3511 setShadow(&I, FullShadow);
3512 setOriginForNaryOp(I);
3513 }
3514
3515 // Instrument x86 SSE vector convert intrinsic.
3516 //
3517 // This function instruments intrinsics like cvtsi2ss:
3518 // %Out = int_xxx_cvtyyy(%ConvertOp)
3519 // or
3520 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3521 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3522 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3523 // elements from \p CopyOp.
3524 // In most cases conversion involves floating-point value which may trigger a
3525 // hardware exception when not fully initialized. For this reason we require
3526 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3527 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3528 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3529 // return a fully initialized value.
3530 //
3531 // For Arm NEON vector convert intrinsics, see
3532 // handleNEONVectorConvertIntrinsic().
3533 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3534 bool HasRoundingMode = false) {
3535 IRBuilder<> IRB(&I);
3536 Value *CopyOp, *ConvertOp;
3537
3538 assert((!HasRoundingMode ||
3539 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3540 "Invalid rounding mode");
3541
3542 switch (I.arg_size() - HasRoundingMode) {
3543 case 2:
3544 CopyOp = I.getArgOperand(0);
3545 ConvertOp = I.getArgOperand(1);
3546 break;
3547 case 1:
3548 ConvertOp = I.getArgOperand(0);
3549 CopyOp = nullptr;
3550 break;
3551 default:
3552 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3553 }
3554
3555 // The first *NumUsedElements* elements of ConvertOp are converted to the
3556 // same number of output elements. The rest of the output is copied from
3557 // CopyOp, or (if not available) filled with zeroes.
3558 // Combine shadow for elements of ConvertOp that are used in this operation,
3559 // and insert a check.
3560 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3561 // int->any conversion.
3562 Value *ConvertShadow = getShadow(ConvertOp);
3563 Value *AggShadow = nullptr;
3564 if (ConvertOp->getType()->isVectorTy()) {
3565 AggShadow = IRB.CreateExtractElement(
3566 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3567 for (int i = 1; i < NumUsedElements; ++i) {
3568 Value *MoreShadow = IRB.CreateExtractElement(
3569 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3570 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3571 }
3572 } else {
3573 AggShadow = ConvertShadow;
3574 }
3575 assert(AggShadow->getType()->isIntegerTy());
3576 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3577
3578 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3579 // ConvertOp.
3580 if (CopyOp) {
3581 assert(CopyOp->getType() == I.getType());
3582 assert(CopyOp->getType()->isVectorTy());
3583 Value *ResultShadow = getShadow(CopyOp);
3584 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3585 for (int i = 0; i < NumUsedElements; ++i) {
3586 ResultShadow = IRB.CreateInsertElement(
3587 ResultShadow, ConstantInt::getNullValue(EltTy),
3588 ConstantInt::get(IRB.getInt32Ty(), i));
3589 }
3590 setShadow(&I, ResultShadow);
3591 setOrigin(&I, getOrigin(CopyOp));
3592 } else {
3593 setShadow(&I, getCleanShadow(&I));
3594 setOrigin(&I, getCleanOrigin());
3595 }
3596 }
3597
3598 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3599 // zeroes if it is zero, and all ones otherwise.
3600 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3601 if (S->getType()->isVectorTy())
3602 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3603 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3604 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3605 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3606 }
3607
3608 // Given a vector, extract its first element, and return all
3609 // zeroes if it is zero, and all ones otherwise.
3610 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3611 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3612 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3613 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3614 }
3615
3616 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3617 Type *T = S->getType();
3618 assert(T->isVectorTy());
3619 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3620 return IRB.CreateSExt(S2, T);
3621 }
3622
3623 // Instrument vector shift intrinsic.
3624 //
3625 // This function instruments intrinsics like int_x86_avx2_psll_w.
3626 // Intrinsic shifts %In by %ShiftSize bits.
3627 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3628 // size, and the rest is ignored. Behavior is defined even if shift size is
3629 // greater than register (or field) width.
3630 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3631 assert(I.arg_size() == 2);
3632 IRBuilder<> IRB(&I);
3633 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3634 // Otherwise perform the same shift on S1.
3635 Value *S1 = getShadow(&I, 0);
3636 Value *S2 = getShadow(&I, 1);
3637 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3638 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3639 Value *V1 = I.getOperand(0);
3640 Value *V2 = I.getOperand(1);
3641 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3642 {IRB.CreateBitCast(S1, V1->getType()), V2});
3643 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3644 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3645 setOriginForNaryOp(I);
3646 }
3647
3648 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3649 // vectors.
3650 Type *getMMXVectorTy(unsigned EltSizeInBits,
3651 unsigned X86_MMXSizeInBits = 64) {
3652 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3653 "Illegal MMX vector element size");
3654 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3655 X86_MMXSizeInBits / EltSizeInBits);
3656 }
3657
3658 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3659 // intrinsic.
3660 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3661 switch (id) {
3662 case Intrinsic::x86_sse2_packsswb_128:
3663 case Intrinsic::x86_sse2_packuswb_128:
3664 return Intrinsic::x86_sse2_packsswb_128;
3665
3666 case Intrinsic::x86_sse2_packssdw_128:
3667 case Intrinsic::x86_sse41_packusdw:
3668 return Intrinsic::x86_sse2_packssdw_128;
3669
3670 case Intrinsic::x86_avx2_packsswb:
3671 case Intrinsic::x86_avx2_packuswb:
3672 return Intrinsic::x86_avx2_packsswb;
3673
3674 case Intrinsic::x86_avx2_packssdw:
3675 case Intrinsic::x86_avx2_packusdw:
3676 return Intrinsic::x86_avx2_packssdw;
3677
3678 case Intrinsic::x86_mmx_packsswb:
3679 case Intrinsic::x86_mmx_packuswb:
3680 return Intrinsic::x86_mmx_packsswb;
3681
3682 case Intrinsic::x86_mmx_packssdw:
3683 return Intrinsic::x86_mmx_packssdw;
3684
3685 case Intrinsic::x86_avx512_packssdw_512:
3686 case Intrinsic::x86_avx512_packusdw_512:
3687 return Intrinsic::x86_avx512_packssdw_512;
3688
3689 case Intrinsic::x86_avx512_packsswb_512:
3690 case Intrinsic::x86_avx512_packuswb_512:
3691 return Intrinsic::x86_avx512_packsswb_512;
3692
3693 default:
3694 llvm_unreachable("unexpected intrinsic id");
3695 }
3696 }
3697
3698 // Instrument vector pack intrinsic.
3699 //
3700 // This function instruments intrinsics like x86_mmx_packsswb, that
3701 // packs elements of 2 input vectors into half as many bits with saturation.
3702 // Shadow is propagated with the signed variant of the same intrinsic applied
3703 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3704 // MMXEltSizeInBits is used only for x86mmx arguments.
3705 //
3706 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3707 void handleVectorPackIntrinsic(IntrinsicInst &I,
3708 unsigned MMXEltSizeInBits = 0) {
3709 assert(I.arg_size() == 2);
3710 IRBuilder<> IRB(&I);
3711 Value *S1 = getShadow(&I, 0);
3712 Value *S2 = getShadow(&I, 1);
3713 assert(S1->getType()->isVectorTy());
3714
3715 // SExt and ICmpNE below must apply to individual elements of input vectors.
3716 // In case of x86mmx arguments, cast them to appropriate vector types and
3717 // back.
3718 Type *T =
3719 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3720 if (MMXEltSizeInBits) {
3721 S1 = IRB.CreateBitCast(S1, T);
3722 S2 = IRB.CreateBitCast(S2, T);
3723 }
3724 Value *S1_ext =
3726 Value *S2_ext =
3728 if (MMXEltSizeInBits) {
3729 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3730 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3731 }
3732
3733 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3734 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3735 "_msprop_vector_pack");
3736 if (MMXEltSizeInBits)
3737 S = IRB.CreateBitCast(S, getShadowTy(&I));
3738 setShadow(&I, S);
3739 setOriginForNaryOp(I);
3740 }
3741
3742 // Convert `Mask` into `<n x i1>`.
3743 Constant *createDppMask(unsigned Width, unsigned Mask) {
3745 for (auto &M : R) {
3746 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3747 Mask >>= 1;
3748 }
3749 return ConstantVector::get(R);
3750 }
3751
3752 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3753 // arg is poisoned, entire dot product is poisoned.
3754 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3755 unsigned DstMask) {
3756 const unsigned Width =
3757 cast<FixedVectorType>(S->getType())->getNumElements();
3758
3759 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3761 Value *SElem = IRB.CreateOrReduce(S);
3762 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3763 Value *DstMaskV = createDppMask(Width, DstMask);
3764
3765 return IRB.CreateSelect(
3766 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3767 }
3768
3769 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3770 //
3771 // 2 and 4 element versions produce single scalar of dot product, and then
3772 // puts it into elements of output vector, selected by 4 lowest bits of the
3773 // mask. Top 4 bits of the mask control which elements of input to use for dot
3774 // product.
3775 //
3776 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3777 // mask. According to the spec it just operates as 4 element version on first
3778 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3779 // output.
3780 void handleDppIntrinsic(IntrinsicInst &I) {
3781 IRBuilder<> IRB(&I);
3782
3783 Value *S0 = getShadow(&I, 0);
3784 Value *S1 = getShadow(&I, 1);
3785 Value *S = IRB.CreateOr(S0, S1);
3786
3787 const unsigned Width =
3788 cast<FixedVectorType>(S->getType())->getNumElements();
3789 assert(Width == 2 || Width == 4 || Width == 8);
3790
3791 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3792 const unsigned SrcMask = Mask >> 4;
3793 const unsigned DstMask = Mask & 0xf;
3794
3795 // Calculate shadow as `<n x i1>`.
3796 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3797 if (Width == 8) {
3798 // First 4 elements of shadow are already calculated. `makeDppShadow`
3799 // operats on 32 bit masks, so we can just shift masks, and repeat.
3800 SI1 = IRB.CreateOr(
3801 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3802 }
3803 // Extend to real size of shadow, poisoning either all or none bits of an
3804 // element.
3805 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3806
3807 setShadow(&I, S);
3808 setOriginForNaryOp(I);
3809 }
3810
3811 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3812 C = CreateAppToShadowCast(IRB, C);
3813 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3814 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3815 C = IRB.CreateAShr(C, ElSize - 1);
3816 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3817 return IRB.CreateTrunc(C, FVT);
3818 }
3819
3820 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3821 void handleBlendvIntrinsic(IntrinsicInst &I) {
3822 Value *C = I.getOperand(2);
3823 Value *T = I.getOperand(1);
3824 Value *F = I.getOperand(0);
3825
3826 Value *Sc = getShadow(&I, 2);
3827 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3828
3829 {
3830 IRBuilder<> IRB(&I);
3831 // Extract top bit from condition and its shadow.
3832 C = convertBlendvToSelectMask(IRB, C);
3833 Sc = convertBlendvToSelectMask(IRB, Sc);
3834
3835 setShadow(C, Sc);
3836 setOrigin(C, Oc);
3837 }
3838
3839 handleSelectLikeInst(I, C, T, F);
3840 }
3841
3842 // Instrument sum-of-absolute-differences intrinsic.
3843 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3844 const unsigned SignificantBitsPerResultElement = 16;
3845 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3846 unsigned ZeroBitsPerResultElement =
3847 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3848
3849 IRBuilder<> IRB(&I);
3850 auto *Shadow0 = getShadow(&I, 0);
3851 auto *Shadow1 = getShadow(&I, 1);
3852 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3853 S = IRB.CreateBitCast(S, ResTy);
3854 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3855 ResTy);
3856 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3857 S = IRB.CreateBitCast(S, getShadowTy(&I));
3858 setShadow(&I, S);
3859 setOriginForNaryOp(I);
3860 }
3861
3862 // Instrument multiply-add(-accumulate)? intrinsics.
3863 //
3864 // e.g., Two operands:
3865 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3866 //
3867 // Two operands which require an EltSizeInBits override:
3868 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3869 //
3870 // Three operands:
3871 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3872 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3873 // (this is equivalent to multiply-add on %a and %b, followed by
3874 // adding/"accumulating" %s. "Accumulation" stores the result in one
3875 // of the source registers, but this accumulate vs. add distinction
3876 // is lost when dealing with LLVM intrinsics.)
3877 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3878 unsigned EltSizeInBits = 0) {
3879 IRBuilder<> IRB(&I);
3880
3881 [[maybe_unused]] FixedVectorType *ReturnType =
3882 cast<FixedVectorType>(I.getType());
3883 assert(isa<FixedVectorType>(ReturnType));
3884
3885 // Vectors A and B, and shadows
3886 Value *Va = nullptr;
3887 Value *Vb = nullptr;
3888 Value *Sa = nullptr;
3889 Value *Sb = nullptr;
3890
3891 assert(I.arg_size() == 2 || I.arg_size() == 3);
3892 if (I.arg_size() == 2) {
3893 Va = I.getOperand(0);
3894 Vb = I.getOperand(1);
3895
3896 Sa = getShadow(&I, 0);
3897 Sb = getShadow(&I, 1);
3898 } else if (I.arg_size() == 3) {
3899 // Operand 0 is the accumulator. We will deal with that below.
3900 Va = I.getOperand(1);
3901 Vb = I.getOperand(2);
3902
3903 Sa = getShadow(&I, 1);
3904 Sb = getShadow(&I, 2);
3905 }
3906
3907 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3908 assert(ParamType == Vb->getType());
3909
3910 assert(ParamType->getPrimitiveSizeInBits() ==
3911 ReturnType->getPrimitiveSizeInBits());
3912
3913 if (I.arg_size() == 3) {
3914 [[maybe_unused]] auto *AccumulatorType =
3915 cast<FixedVectorType>(I.getOperand(0)->getType());
3916 assert(AccumulatorType == ReturnType);
3917 }
3918
3919 FixedVectorType *ImplicitReturnType = ReturnType;
3920 // Step 1: instrument multiplication of corresponding vector elements
3921 if (EltSizeInBits) {
3922 ImplicitReturnType = cast<FixedVectorType>(
3923 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3924 ParamType->getPrimitiveSizeInBits()));
3925 ParamType = cast<FixedVectorType>(
3926 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3927
3928 Va = IRB.CreateBitCast(Va, ParamType);
3929 Vb = IRB.CreateBitCast(Vb, ParamType);
3930
3931 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
3932 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
3933 } else {
3934 assert(ParamType->getNumElements() ==
3935 ReturnType->getNumElements() * ReductionFactor);
3936 }
3937
3938 // Multiplying an *initialized* zero by an uninitialized element results in
3939 // an initialized zero element.
3940 //
3941 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
3942 // results in an unpoisoned value. We can therefore adapt the visitAnd()
3943 // instrumentation:
3944 // OutShadow = (SaNonZero & SbNonZero)
3945 // | (VaNonZero & SbNonZero)
3946 // | (SaNonZero & VbNonZero)
3947 // where non-zero is checked on a per-element basis (not per bit).
3948 Value *SZero = Constant::getNullValue(Va->getType());
3949 Value *VZero = Constant::getNullValue(Sa->getType());
3950 Value *SaNonZero = IRB.CreateICmpNE(Sa, SZero);
3951 Value *SbNonZero = IRB.CreateICmpNE(Sb, SZero);
3952 Value *VaNonZero = IRB.CreateICmpNE(Va, VZero);
3953 Value *VbNonZero = IRB.CreateICmpNE(Vb, VZero);
3954
3955 Value *SaAndSbNonZero = IRB.CreateAnd(SaNonZero, SbNonZero);
3956 Value *VaAndSbNonZero = IRB.CreateAnd(VaNonZero, SbNonZero);
3957 Value *SaAndVbNonZero = IRB.CreateAnd(SaNonZero, VbNonZero);
3958
3959 // Each element of the vector is represented by a single bit (poisoned or
3960 // not) e.g., <8 x i1>.
3961 Value *And = IRB.CreateOr({SaAndSbNonZero, VaAndSbNonZero, SaAndVbNonZero});
3962
3963 // Extend <8 x i1> to <8 x i16>.
3964 // (The real pmadd intrinsic would have computed intermediate values of
3965 // <8 x i32>, but that is irrelevant for our shadow purposes because we
3966 // consider each element to be either fully initialized or fully
3967 // uninitialized.)
3968 And = IRB.CreateSExt(And, Sa->getType());
3969
3970 // Step 2: instrument horizontal add
3971 // We don't need bit-precise horizontalReduce because we only want to check
3972 // if each pair/quad of elements is fully zero.
3973 // Cast to <4 x i32>.
3974 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
3975
3976 // Compute <4 x i1>, then extend back to <4 x i32>.
3977 Value *OutShadow = IRB.CreateSExt(
3978 IRB.CreateICmpNE(Horizontal,
3979 Constant::getNullValue(Horizontal->getType())),
3980 ImplicitReturnType);
3981
3982 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
3983 // AVX, it is already correct).
3984 if (EltSizeInBits)
3985 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
3986
3987 // Step 3 (if applicable): instrument accumulator
3988 if (I.arg_size() == 3)
3989 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
3990
3991 setShadow(&I, OutShadow);
3992 setOriginForNaryOp(I);
3993 }
3994
3995 // Instrument compare-packed intrinsic.
3996 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
3997 // all-ones shadow.
3998 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
3999 IRBuilder<> IRB(&I);
4000 Type *ResTy = getShadowTy(&I);
4001 auto *Shadow0 = getShadow(&I, 0);
4002 auto *Shadow1 = getShadow(&I, 1);
4003 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4004 Value *S = IRB.CreateSExt(
4005 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4006 setShadow(&I, S);
4007 setOriginForNaryOp(I);
4008 }
4009
4010 // Instrument compare-scalar intrinsic.
4011 // This handles both cmp* intrinsics which return the result in the first
4012 // element of a vector, and comi* which return the result as i32.
4013 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4014 IRBuilder<> IRB(&I);
4015 auto *Shadow0 = getShadow(&I, 0);
4016 auto *Shadow1 = getShadow(&I, 1);
4017 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4018 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4019 setShadow(&I, S);
4020 setOriginForNaryOp(I);
4021 }
4022
4023 // Instrument generic vector reduction intrinsics
4024 // by ORing together all their fields.
4025 //
4026 // If AllowShadowCast is true, the return type does not need to be the same
4027 // type as the fields
4028 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4029 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4030 assert(I.arg_size() == 1);
4031
4032 IRBuilder<> IRB(&I);
4033 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4034 if (AllowShadowCast)
4035 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4036 else
4037 assert(S->getType() == getShadowTy(&I));
4038 setShadow(&I, S);
4039 setOriginForNaryOp(I);
4040 }
4041
4042 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4043 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4044 // %a1)
4045 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4046 //
4047 // The type of the return value, initial starting value, and elements of the
4048 // vector must be identical.
4049 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4050 assert(I.arg_size() == 2);
4051
4052 IRBuilder<> IRB(&I);
4053 Value *Shadow0 = getShadow(&I, 0);
4054 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4055 assert(Shadow0->getType() == Shadow1->getType());
4056 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4057 assert(S->getType() == getShadowTy(&I));
4058 setShadow(&I, S);
4059 setOriginForNaryOp(I);
4060 }
4061
4062 // Instrument vector.reduce.or intrinsic.
4063 // Valid (non-poisoned) set bits in the operand pull low the
4064 // corresponding shadow bits.
4065 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4066 assert(I.arg_size() == 1);
4067
4068 IRBuilder<> IRB(&I);
4069 Value *OperandShadow = getShadow(&I, 0);
4070 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4071 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4072 // Bit N is clean if any field's bit N is 1 and unpoison
4073 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4074 // Otherwise, it is clean if every field's bit N is unpoison
4075 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4076 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4077
4078 setShadow(&I, S);
4079 setOrigin(&I, getOrigin(&I, 0));
4080 }
4081
4082 // Instrument vector.reduce.and intrinsic.
4083 // Valid (non-poisoned) unset bits in the operand pull down the
4084 // corresponding shadow bits.
4085 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4086 assert(I.arg_size() == 1);
4087
4088 IRBuilder<> IRB(&I);
4089 Value *OperandShadow = getShadow(&I, 0);
4090 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4091 // Bit N is clean if any field's bit N is 0 and unpoison
4092 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4093 // Otherwise, it is clean if every field's bit N is unpoison
4094 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4095 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4096
4097 setShadow(&I, S);
4098 setOrigin(&I, getOrigin(&I, 0));
4099 }
4100
4101 void handleStmxcsr(IntrinsicInst &I) {
4102 IRBuilder<> IRB(&I);
4103 Value *Addr = I.getArgOperand(0);
4104 Type *Ty = IRB.getInt32Ty();
4105 Value *ShadowPtr =
4106 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4107
4108 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4109
4111 insertCheckShadowOf(Addr, &I);
4112 }
4113
4114 void handleLdmxcsr(IntrinsicInst &I) {
4115 if (!InsertChecks)
4116 return;
4117
4118 IRBuilder<> IRB(&I);
4119 Value *Addr = I.getArgOperand(0);
4120 Type *Ty = IRB.getInt32Ty();
4121 const Align Alignment = Align(1);
4122 Value *ShadowPtr, *OriginPtr;
4123 std::tie(ShadowPtr, OriginPtr) =
4124 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4125
4127 insertCheckShadowOf(Addr, &I);
4128
4129 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4130 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4131 : getCleanOrigin();
4132 insertCheckShadow(Shadow, Origin, &I);
4133 }
4134
4135 void handleMaskedExpandLoad(IntrinsicInst &I) {
4136 IRBuilder<> IRB(&I);
4137 Value *Ptr = I.getArgOperand(0);
4138 MaybeAlign Align = I.getParamAlign(0);
4139 Value *Mask = I.getArgOperand(1);
4140 Value *PassThru = I.getArgOperand(2);
4141
4143 insertCheckShadowOf(Ptr, &I);
4144 insertCheckShadowOf(Mask, &I);
4145 }
4146
4147 if (!PropagateShadow) {
4148 setShadow(&I, getCleanShadow(&I));
4149 setOrigin(&I, getCleanOrigin());
4150 return;
4151 }
4152
4153 Type *ShadowTy = getShadowTy(&I);
4154 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4155 auto [ShadowPtr, OriginPtr] =
4156 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4157
4158 Value *Shadow =
4159 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4160 getShadow(PassThru), "_msmaskedexpload");
4161
4162 setShadow(&I, Shadow);
4163
4164 // TODO: Store origins.
4165 setOrigin(&I, getCleanOrigin());
4166 }
4167
4168 void handleMaskedCompressStore(IntrinsicInst &I) {
4169 IRBuilder<> IRB(&I);
4170 Value *Values = I.getArgOperand(0);
4171 Value *Ptr = I.getArgOperand(1);
4172 MaybeAlign Align = I.getParamAlign(1);
4173 Value *Mask = I.getArgOperand(2);
4174
4176 insertCheckShadowOf(Ptr, &I);
4177 insertCheckShadowOf(Mask, &I);
4178 }
4179
4180 Value *Shadow = getShadow(Values);
4181 Type *ElementShadowTy =
4182 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4183 auto [ShadowPtr, OriginPtrs] =
4184 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4185
4186 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4187
4188 // TODO: Store origins.
4189 }
4190
4191 void handleMaskedGather(IntrinsicInst &I) {
4192 IRBuilder<> IRB(&I);
4193 Value *Ptrs = I.getArgOperand(0);
4194 const Align Alignment(
4195 cast<ConstantInt>(I.getArgOperand(1))->getZExtValue());
4196 Value *Mask = I.getArgOperand(2);
4197 Value *PassThru = I.getArgOperand(3);
4198
4199 Type *PtrsShadowTy = getShadowTy(Ptrs);
4201 insertCheckShadowOf(Mask, &I);
4202 Value *MaskedPtrShadow = IRB.CreateSelect(
4203 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4204 "_msmaskedptrs");
4205 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4206 }
4207
4208 if (!PropagateShadow) {
4209 setShadow(&I, getCleanShadow(&I));
4210 setOrigin(&I, getCleanOrigin());
4211 return;
4212 }
4213
4214 Type *ShadowTy = getShadowTy(&I);
4215 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4216 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4217 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4218
4219 Value *Shadow =
4220 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4221 getShadow(PassThru), "_msmaskedgather");
4222
4223 setShadow(&I, Shadow);
4224
4225 // TODO: Store origins.
4226 setOrigin(&I, getCleanOrigin());
4227 }
4228
4229 void handleMaskedScatter(IntrinsicInst &I) {
4230 IRBuilder<> IRB(&I);
4231 Value *Values = I.getArgOperand(0);
4232 Value *Ptrs = I.getArgOperand(1);
4233 const Align Alignment(
4234 cast<ConstantInt>(I.getArgOperand(2))->getZExtValue());
4235 Value *Mask = I.getArgOperand(3);
4236
4237 Type *PtrsShadowTy = getShadowTy(Ptrs);
4239 insertCheckShadowOf(Mask, &I);
4240 Value *MaskedPtrShadow = IRB.CreateSelect(
4241 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4242 "_msmaskedptrs");
4243 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4244 }
4245
4246 Value *Shadow = getShadow(Values);
4247 Type *ElementShadowTy =
4248 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4249 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4250 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4251
4252 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4253
4254 // TODO: Store origin.
4255 }
4256
4257 // Intrinsic::masked_store
4258 //
4259 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4260 // stores are lowered to Intrinsic::masked_store.
4261 void handleMaskedStore(IntrinsicInst &I) {
4262 IRBuilder<> IRB(&I);
4263 Value *V = I.getArgOperand(0);
4264 Value *Ptr = I.getArgOperand(1);
4265 const Align Alignment(
4266 cast<ConstantInt>(I.getArgOperand(2))->getZExtValue());
4267 Value *Mask = I.getArgOperand(3);
4268 Value *Shadow = getShadow(V);
4269
4271 insertCheckShadowOf(Ptr, &I);
4272 insertCheckShadowOf(Mask, &I);
4273 }
4274
4275 Value *ShadowPtr;
4276 Value *OriginPtr;
4277 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4278 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4279
4280 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4281
4282 if (!MS.TrackOrigins)
4283 return;
4284
4285 auto &DL = F.getDataLayout();
4286 paintOrigin(IRB, getOrigin(V), OriginPtr,
4287 DL.getTypeStoreSize(Shadow->getType()),
4288 std::max(Alignment, kMinOriginAlignment));
4289 }
4290
4291 // Intrinsic::masked_load
4292 //
4293 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4294 // loads are lowered to Intrinsic::masked_load.
4295 void handleMaskedLoad(IntrinsicInst &I) {
4296 IRBuilder<> IRB(&I);
4297 Value *Ptr = I.getArgOperand(0);
4298 const Align Alignment(
4299 cast<ConstantInt>(I.getArgOperand(1))->getZExtValue());
4300 Value *Mask = I.getArgOperand(2);
4301 Value *PassThru = I.getArgOperand(3);
4302
4304 insertCheckShadowOf(Ptr, &I);
4305 insertCheckShadowOf(Mask, &I);
4306 }
4307
4308 if (!PropagateShadow) {
4309 setShadow(&I, getCleanShadow(&I));
4310 setOrigin(&I, getCleanOrigin());
4311 return;
4312 }
4313
4314 Type *ShadowTy = getShadowTy(&I);
4315 Value *ShadowPtr, *OriginPtr;
4316 std::tie(ShadowPtr, OriginPtr) =
4317 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4318 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4319 getShadow(PassThru), "_msmaskedld"));
4320
4321 if (!MS.TrackOrigins)
4322 return;
4323
4324 // Choose between PassThru's and the loaded value's origins.
4325 Value *MaskedPassThruShadow = IRB.CreateAnd(
4326 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4327
4328 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4329
4330 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4331 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4332
4333 setOrigin(&I, Origin);
4334 }
4335
4336 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4337 // dst mask src
4338 //
4339 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4340 // by handleMaskedStore.
4341 //
4342 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4343 // vector of integers, unlike the LLVM masked intrinsics, which require a
4344 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4345 // mentions that the x86 backend does not know how to efficiently convert
4346 // from a vector of booleans back into the AVX mask format; therefore, they
4347 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4348 // intrinsics.
4349 void handleAVXMaskedStore(IntrinsicInst &I) {
4350 assert(I.arg_size() == 3);
4351
4352 IRBuilder<> IRB(&I);
4353
4354 Value *Dst = I.getArgOperand(0);
4355 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4356
4357 Value *Mask = I.getArgOperand(1);
4358 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4359
4360 Value *Src = I.getArgOperand(2);
4361 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4362
4363 const Align Alignment = Align(1);
4364
4365 Value *SrcShadow = getShadow(Src);
4366
4368 insertCheckShadowOf(Dst, &I);
4369 insertCheckShadowOf(Mask, &I);
4370 }
4371
4372 Value *DstShadowPtr;
4373 Value *DstOriginPtr;
4374 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4375 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4376
4377 SmallVector<Value *, 2> ShadowArgs;
4378 ShadowArgs.append(1, DstShadowPtr);
4379 ShadowArgs.append(1, Mask);
4380 // The intrinsic may require floating-point but shadows can be arbitrary
4381 // bit patterns, of which some would be interpreted as "invalid"
4382 // floating-point values (NaN etc.); we assume the intrinsic will happily
4383 // copy them.
4384 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4385
4386 CallInst *CI =
4387 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4388 setShadow(&I, CI);
4389
4390 if (!MS.TrackOrigins)
4391 return;
4392
4393 // Approximation only
4394 auto &DL = F.getDataLayout();
4395 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4396 DL.getTypeStoreSize(SrcShadow->getType()),
4397 std::max(Alignment, kMinOriginAlignment));
4398 }
4399
4400 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4401 // return src mask
4402 //
4403 // Masked-off values are replaced with 0, which conveniently also represents
4404 // initialized memory.
4405 //
4406 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4407 // by handleMaskedStore.
4408 //
4409 // We do not combine this with handleMaskedLoad; see comment in
4410 // handleAVXMaskedStore for the rationale.
4411 //
4412 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4413 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4414 // parameter.
4415 void handleAVXMaskedLoad(IntrinsicInst &I) {
4416 assert(I.arg_size() == 2);
4417
4418 IRBuilder<> IRB(&I);
4419
4420 Value *Src = I.getArgOperand(0);
4421 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4422
4423 Value *Mask = I.getArgOperand(1);
4424 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4425
4426 const Align Alignment = Align(1);
4427
4429 insertCheckShadowOf(Mask, &I);
4430 }
4431
4432 Type *SrcShadowTy = getShadowTy(Src);
4433 Value *SrcShadowPtr, *SrcOriginPtr;
4434 std::tie(SrcShadowPtr, SrcOriginPtr) =
4435 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4436
4437 SmallVector<Value *, 2> ShadowArgs;
4438 ShadowArgs.append(1, SrcShadowPtr);
4439 ShadowArgs.append(1, Mask);
4440
4441 CallInst *CI =
4442 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4443 // The AVX masked load intrinsics do not have integer variants. We use the
4444 // floating-point variants, which will happily copy the shadows even if
4445 // they are interpreted as "invalid" floating-point values (NaN etc.).
4446 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4447
4448 if (!MS.TrackOrigins)
4449 return;
4450
4451 // The "pass-through" value is always zero (initialized). To the extent
4452 // that that results in initialized aligned 4-byte chunks, the origin value
4453 // is ignored. It is therefore correct to simply copy the origin from src.
4454 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4455 setOrigin(&I, PtrSrcOrigin);
4456 }
4457
4458 // Test whether the mask indices are initialized, only checking the bits that
4459 // are actually used.
4460 //
4461 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4462 // used/checked.
4463 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4464 assert(isFixedIntVector(Idx));
4465 auto IdxVectorSize =
4466 cast<FixedVectorType>(Idx->getType())->getNumElements();
4467 assert(isPowerOf2_64(IdxVectorSize));
4468
4469 // Compiler isn't smart enough, let's help it
4470 if (isa<Constant>(Idx))
4471 return;
4472
4473 auto *IdxShadow = getShadow(Idx);
4474 Value *Truncated = IRB.CreateTrunc(
4475 IdxShadow,
4476 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4477 IdxVectorSize));
4478 insertCheckShadow(Truncated, getOrigin(Idx), I);
4479 }
4480
4481 // Instrument AVX permutation intrinsic.
4482 // We apply the same permutation (argument index 1) to the shadow.
4483 void handleAVXVpermilvar(IntrinsicInst &I) {
4484 IRBuilder<> IRB(&I);
4485 Value *Shadow = getShadow(&I, 0);
4486 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4487
4488 // Shadows are integer-ish types but some intrinsics require a
4489 // different (e.g., floating-point) type.
4490 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4491 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4492 {Shadow, I.getArgOperand(1)});
4493
4494 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4495 setOriginForNaryOp(I);
4496 }
4497
4498 // Instrument AVX permutation intrinsic.
4499 // We apply the same permutation (argument index 1) to the shadows.
4500 void handleAVXVpermi2var(IntrinsicInst &I) {
4501 assert(I.arg_size() == 3);
4502 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4503 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4504 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4505 [[maybe_unused]] auto ArgVectorSize =
4506 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4507 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4508 ->getNumElements() == ArgVectorSize);
4509 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4510 ->getNumElements() == ArgVectorSize);
4511 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4512 assert(I.getType() == I.getArgOperand(0)->getType());
4513 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4514 IRBuilder<> IRB(&I);
4515 Value *AShadow = getShadow(&I, 0);
4516 Value *Idx = I.getArgOperand(1);
4517 Value *BShadow = getShadow(&I, 2);
4518
4519 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4520
4521 // Shadows are integer-ish types but some intrinsics require a
4522 // different (e.g., floating-point) type.
4523 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4524 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4525 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4526 {AShadow, Idx, BShadow});
4527 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4528 setOriginForNaryOp(I);
4529 }
4530
4531 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4532 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4533 }
4534
4535 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4536 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4537 }
4538
4539 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4540 return isFixedIntVectorTy(V->getType());
4541 }
4542
4543 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4544 return isFixedFPVectorTy(V->getType());
4545 }
4546
4547 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4548 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4549 // i32 rounding)
4550 //
4551 // Inconveniently, some similar intrinsics have a different operand order:
4552 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4553 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4554 // i16 mask)
4555 //
4556 // If the return type has more elements than A, the excess elements are
4557 // zeroed (and the corresponding shadow is initialized).
4558 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4559 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4560 // i8 mask)
4561 //
4562 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4563 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4564 // where all_or_nothing(x) is fully uninitialized if x has any
4565 // uninitialized bits
4566 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4567 IRBuilder<> IRB(&I);
4568
4569 assert(I.arg_size() == 4);
4570 Value *A = I.getOperand(0);
4571 Value *WriteThrough;
4572 Value *Mask;
4574 if (LastMask) {
4575 WriteThrough = I.getOperand(2);
4576 Mask = I.getOperand(3);
4577 RoundingMode = I.getOperand(1);
4578 } else {
4579 WriteThrough = I.getOperand(1);
4580 Mask = I.getOperand(2);
4581 RoundingMode = I.getOperand(3);
4582 }
4583
4584 assert(isFixedFPVector(A));
4585 assert(isFixedIntVector(WriteThrough));
4586
4587 unsigned ANumElements =
4588 cast<FixedVectorType>(A->getType())->getNumElements();
4589 [[maybe_unused]] unsigned WriteThruNumElements =
4590 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4591 assert(ANumElements == WriteThruNumElements ||
4592 ANumElements * 2 == WriteThruNumElements);
4593
4594 assert(Mask->getType()->isIntegerTy());
4595 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4596 assert(ANumElements == MaskNumElements ||
4597 ANumElements * 2 == MaskNumElements);
4598
4599 assert(WriteThruNumElements == MaskNumElements);
4600
4601 // Some bits of the mask may be unused, though it's unusual to have partly
4602 // uninitialized bits.
4603 insertCheckShadowOf(Mask, &I);
4604
4605 assert(RoundingMode->getType()->isIntegerTy());
4606 // Only some bits of the rounding mode are used, though it's very
4607 // unusual to have uninitialized bits there (more commonly, it's a
4608 // constant).
4609 insertCheckShadowOf(RoundingMode, &I);
4610
4611 assert(I.getType() == WriteThrough->getType());
4612
4613 Value *AShadow = getShadow(A);
4614 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4615
4616 if (ANumElements * 2 == MaskNumElements) {
4617 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4618 // from the zeroed shadow instead of the writethrough's shadow.
4619 Mask =
4620 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4621 Mask =
4622 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4623 }
4624
4625 // Convert i16 mask to <16 x i1>
4626 Mask = IRB.CreateBitCast(
4627 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4628 "_ms_mask_bitcast");
4629
4630 /// For floating-point to integer conversion, the output is:
4631 /// - fully uninitialized if *any* bit of the input is uninitialized
4632 /// - fully ininitialized if all bits of the input are ininitialized
4633 /// We apply the same principle on a per-element basis for vectors.
4634 ///
4635 /// We use the scalar width of the return type instead of A's.
4636 AShadow = IRB.CreateSExt(
4637 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4638 getShadowTy(&I), "_ms_a_shadow");
4639
4640 Value *WriteThroughShadow = getShadow(WriteThrough);
4641 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4642 "_ms_writethru_select");
4643
4644 setShadow(&I, Shadow);
4645 setOriginForNaryOp(I);
4646 }
4647
4648 // Instrument BMI / BMI2 intrinsics.
4649 // All of these intrinsics are Z = I(X, Y)
4650 // where the types of all operands and the result match, and are either i32 or
4651 // i64. The following instrumentation happens to work for all of them:
4652 // Sz = I(Sx, Y) | (sext (Sy != 0))
4653 void handleBmiIntrinsic(IntrinsicInst &I) {
4654 IRBuilder<> IRB(&I);
4655 Type *ShadowTy = getShadowTy(&I);
4656
4657 // If any bit of the mask operand is poisoned, then the whole thing is.
4658 Value *SMask = getShadow(&I, 1);
4659 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4660 ShadowTy);
4661 // Apply the same intrinsic to the shadow of the first operand.
4662 Value *S = IRB.CreateCall(I.getCalledFunction(),
4663 {getShadow(&I, 0), I.getOperand(1)});
4664 S = IRB.CreateOr(SMask, S);
4665 setShadow(&I, S);
4666 setOriginForNaryOp(I);
4667 }
4668
4669 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4670 SmallVector<int, 8> Mask;
4671 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4672 Mask.append(2, X);
4673 }
4674 return Mask;
4675 }
4676
4677 // Instrument pclmul intrinsics.
4678 // These intrinsics operate either on odd or on even elements of the input
4679 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4680 // Replace the unused elements with copies of the used ones, ex:
4681 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4682 // or
4683 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4684 // and then apply the usual shadow combining logic.
4685 void handlePclmulIntrinsic(IntrinsicInst &I) {
4686 IRBuilder<> IRB(&I);
4687 unsigned Width =
4688 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4689 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4690 "pclmul 3rd operand must be a constant");
4691 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4692 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4693 getPclmulMask(Width, Imm & 0x01));
4694 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4695 getPclmulMask(Width, Imm & 0x10));
4696 ShadowAndOriginCombiner SOC(this, IRB);
4697 SOC.Add(Shuf0, getOrigin(&I, 0));
4698 SOC.Add(Shuf1, getOrigin(&I, 1));
4699 SOC.Done(&I);
4700 }
4701
4702 // Instrument _mm_*_sd|ss intrinsics
4703 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4704 IRBuilder<> IRB(&I);
4705 unsigned Width =
4706 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4707 Value *First = getShadow(&I, 0);
4708 Value *Second = getShadow(&I, 1);
4709 // First element of second operand, remaining elements of first operand
4710 SmallVector<int, 16> Mask;
4711 Mask.push_back(Width);
4712 for (unsigned i = 1; i < Width; i++)
4713 Mask.push_back(i);
4714 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4715
4716 setShadow(&I, Shadow);
4717 setOriginForNaryOp(I);
4718 }
4719
4720 void handleVtestIntrinsic(IntrinsicInst &I) {
4721 IRBuilder<> IRB(&I);
4722 Value *Shadow0 = getShadow(&I, 0);
4723 Value *Shadow1 = getShadow(&I, 1);
4724 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4725 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4726 Value *Scalar = convertShadowToScalar(NZ, IRB);
4727 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4728
4729 setShadow(&I, Shadow);
4730 setOriginForNaryOp(I);
4731 }
4732
4733 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4734 IRBuilder<> IRB(&I);
4735 unsigned Width =
4736 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4737 Value *First = getShadow(&I, 0);
4738 Value *Second = getShadow(&I, 1);
4739 Value *OrShadow = IRB.CreateOr(First, Second);
4740 // First element of both OR'd together, remaining elements of first operand
4741 SmallVector<int, 16> Mask;
4742 Mask.push_back(Width);
4743 for (unsigned i = 1; i < Width; i++)
4744 Mask.push_back(i);
4745 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4746
4747 setShadow(&I, Shadow);
4748 setOriginForNaryOp(I);
4749 }
4750
4751 // _mm_round_ps / _mm_round_ps.
4752 // Similar to maybeHandleSimpleNomemIntrinsic except
4753 // the second argument is guranteed to be a constant integer.
4754 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4755 assert(I.getArgOperand(0)->getType() == I.getType());
4756 assert(I.arg_size() == 2);
4757 assert(isa<ConstantInt>(I.getArgOperand(1)));
4758
4759 IRBuilder<> IRB(&I);
4760 ShadowAndOriginCombiner SC(this, IRB);
4761 SC.Add(I.getArgOperand(0));
4762 SC.Done(&I);
4763 }
4764
4765 // Instrument @llvm.abs intrinsic.
4766 //
4767 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4768 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4769 void handleAbsIntrinsic(IntrinsicInst &I) {
4770 assert(I.arg_size() == 2);
4771 Value *Src = I.getArgOperand(0);
4772 Value *IsIntMinPoison = I.getArgOperand(1);
4773
4774 assert(I.getType()->isIntOrIntVectorTy());
4775
4776 assert(Src->getType() == I.getType());
4777
4778 assert(IsIntMinPoison->getType()->isIntegerTy());
4779 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4780
4781 IRBuilder<> IRB(&I);
4782 Value *SrcShadow = getShadow(Src);
4783
4784 APInt MinVal =
4785 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4786 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4787 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4788
4789 Value *PoisonedShadow = getPoisonedShadow(Src);
4790 Value *PoisonedIfIntMinShadow =
4791 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4792 Value *Shadow =
4793 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4794
4795 setShadow(&I, Shadow);
4796 setOrigin(&I, getOrigin(&I, 0));
4797 }
4798
4799 void handleIsFpClass(IntrinsicInst &I) {
4800 IRBuilder<> IRB(&I);
4801 Value *Shadow = getShadow(&I, 0);
4802 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4803 setOrigin(&I, getOrigin(&I, 0));
4804 }
4805
4806 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4807 IRBuilder<> IRB(&I);
4808 Value *Shadow0 = getShadow(&I, 0);
4809 Value *Shadow1 = getShadow(&I, 1);
4810 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4811 Value *ShadowElt1 =
4812 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4813
4814 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4815 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4816 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4817
4818 setShadow(&I, Shadow);
4819 setOriginForNaryOp(I);
4820 }
4821
4822 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4823 assert(isa<FixedVectorType>(V->getType()));
4824 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4825 Value *Shadow = getShadow(V);
4826 return IRB.CreateExtractElement(Shadow,
4827 ConstantInt::get(IRB.getInt32Ty(), 0));
4828 }
4829
4830 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4831 //
4832 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4833 // (<8 x i64>, <16 x i8>, i8)
4834 // A WriteThru Mask
4835 //
4836 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4837 // (<16 x i32>, <16 x i8>, i16)
4838 //
4839 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4840 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4841 //
4842 // If Dst has more elements than A, the excess elements are zeroed (and the
4843 // corresponding shadow is initialized).
4844 //
4845 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4846 // and is much faster than this handler.
4847 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4848 IRBuilder<> IRB(&I);
4849
4850 assert(I.arg_size() == 3);
4851 Value *A = I.getOperand(0);
4852 Value *WriteThrough = I.getOperand(1);
4853 Value *Mask = I.getOperand(2);
4854
4855 assert(isFixedIntVector(A));
4856 assert(isFixedIntVector(WriteThrough));
4857
4858 unsigned ANumElements =
4859 cast<FixedVectorType>(A->getType())->getNumElements();
4860 unsigned OutputNumElements =
4861 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4862 assert(ANumElements == OutputNumElements ||
4863 ANumElements * 2 == OutputNumElements);
4864
4865 assert(Mask->getType()->isIntegerTy());
4866 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4867 insertCheckShadowOf(Mask, &I);
4868
4869 assert(I.getType() == WriteThrough->getType());
4870
4871 // Widen the mask, if necessary, to have one bit per element of the output
4872 // vector.
4873 // We want the extra bits to have '1's, so that the CreateSelect will
4874 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4875 // versions of the intrinsics are sometimes implemented using an all-1's
4876 // mask and an undefined value for WriteThroughShadow). We accomplish this
4877 // by using bitwise NOT before and after the ZExt.
4878 if (ANumElements != OutputNumElements) {
4879 Mask = IRB.CreateNot(Mask);
4880 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4881 "_ms_widen_mask");
4882 Mask = IRB.CreateNot(Mask);
4883 }
4884 Mask = IRB.CreateBitCast(
4885 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4886
4887 Value *AShadow = getShadow(A);
4888
4889 // The return type might have more elements than the input.
4890 // Temporarily shrink the return type's number of elements.
4891 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4892
4893 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4894 // This handler treats them all as truncation, which leads to some rare
4895 // false positives in the cases where the truncated bytes could
4896 // unambiguously saturate the value e.g., if A = ??????10 ????????
4897 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4898 // fully defined, but the truncated byte is ????????.
4899 //
4900 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4901 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4902 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4903
4904 Value *WriteThroughShadow = getShadow(WriteThrough);
4905
4906 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4907 setShadow(&I, Shadow);
4908 setOriginForNaryOp(I);
4909 }
4910
4911 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4912 // values and perform an operation whose shadow propagation should be handled
4913 // as all-or-nothing [*], with masking provided by a vector and a mask
4914 // supplied as an integer.
4915 //
4916 // [*] if all bits of a vector element are initialized, the output is fully
4917 // initialized; otherwise, the output is fully uninitialized
4918 //
4919 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4920 // (<16 x float>, <16 x float>, i16)
4921 // A WriteThru Mask
4922 //
4923 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4924 // (<2 x double>, <2 x double>, i8)
4925 //
4926 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4927 // (<8 x double>, i32, <8 x double>, i8, i32)
4928 // A Imm WriteThru Mask Rounding
4929 //
4930 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
4931 // be fully initialized.
4932 //
4933 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
4934 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
4935 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
4936 unsigned WriteThruIndex,
4937 unsigned MaskIndex) {
4938 IRBuilder<> IRB(&I);
4939
4940 unsigned NumArgs = I.arg_size();
4941 assert(AIndex < NumArgs);
4942 assert(WriteThruIndex < NumArgs);
4943 assert(MaskIndex < NumArgs);
4944 assert(AIndex != WriteThruIndex);
4945 assert(AIndex != MaskIndex);
4946 assert(WriteThruIndex != MaskIndex);
4947
4948 Value *A = I.getOperand(AIndex);
4949 Value *WriteThru = I.getOperand(WriteThruIndex);
4950 Value *Mask = I.getOperand(MaskIndex);
4951
4952 assert(isFixedFPVector(A));
4953 assert(isFixedFPVector(WriteThru));
4954
4955 [[maybe_unused]] unsigned ANumElements =
4956 cast<FixedVectorType>(A->getType())->getNumElements();
4957 unsigned OutputNumElements =
4958 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
4959 assert(ANumElements == OutputNumElements);
4960
4961 for (unsigned i = 0; i < NumArgs; ++i) {
4962 if (i != AIndex && i != WriteThruIndex) {
4963 // Imm, Mask, Rounding etc. are "control" data, hence we require that
4964 // they be fully initialized.
4965 assert(I.getOperand(i)->getType()->isIntegerTy());
4966 insertCheckShadowOf(I.getOperand(i), &I);
4967 }
4968 }
4969
4970 // The mask has 1 bit per element of A, but a minimum of 8 bits.
4971 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
4972 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
4973 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4974
4975 assert(I.getType() == WriteThru->getType());
4976
4977 Mask = IRB.CreateBitCast(
4978 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4979
4980 Value *AShadow = getShadow(A);
4981
4982 // All-or-nothing shadow
4983 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
4984 AShadow->getType());
4985
4986 Value *WriteThruShadow = getShadow(WriteThru);
4987
4988 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
4989 setShadow(&I, Shadow);
4990
4991 setOriginForNaryOp(I);
4992 }
4993
4994 // For sh.* compiler intrinsics:
4995 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
4996 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
4997 // A B WriteThru Mask RoundingMode
4998 //
4999 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5000 // DstShadow[1..7] = AShadow[1..7]
5001 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5002 IRBuilder<> IRB(&I);
5003
5004 assert(I.arg_size() == 5);
5005 Value *A = I.getOperand(0);
5006 Value *B = I.getOperand(1);
5007 Value *WriteThrough = I.getOperand(2);
5008 Value *Mask = I.getOperand(3);
5009 Value *RoundingMode = I.getOperand(4);
5010
5011 // Technically, we could probably just check whether the LSB is
5012 // initialized, but intuitively it feels like a partly uninitialized mask
5013 // is unintended, and we should warn the user immediately.
5014 insertCheckShadowOf(Mask, &I);
5015 insertCheckShadowOf(RoundingMode, &I);
5016
5017 assert(isa<FixedVectorType>(A->getType()));
5018 unsigned NumElements =
5019 cast<FixedVectorType>(A->getType())->getNumElements();
5020 assert(NumElements == 8);
5021 assert(A->getType() == B->getType());
5022 assert(B->getType() == WriteThrough->getType());
5023 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5024 assert(RoundingMode->getType()->isIntegerTy());
5025
5026 Value *ALowerShadow = extractLowerShadow(IRB, A);
5027 Value *BLowerShadow = extractLowerShadow(IRB, B);
5028
5029 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5030
5031 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5032
5033 Mask = IRB.CreateBitCast(
5034 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5035 Value *MaskLower =
5036 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5037
5038 Value *AShadow = getShadow(A);
5039 Value *DstLowerShadow =
5040 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5041 Value *DstShadow = IRB.CreateInsertElement(
5042 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5043 "_msprop");
5044
5045 setShadow(&I, DstShadow);
5046 setOriginForNaryOp(I);
5047 }
5048
5049 // Approximately handle AVX Galois Field Affine Transformation
5050 //
5051 // e.g.,
5052 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5053 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5054 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5055 // Out A x b
5056 // where A and x are packed matrices, b is a vector,
5057 // Out = A * x + b in GF(2)
5058 //
5059 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5060 // computation also includes a parity calculation.
5061 //
5062 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5063 // Out_Shadow = (V1_Shadow & V2_Shadow)
5064 // | (V1 & V2_Shadow)
5065 // | (V1_Shadow & V2 )
5066 //
5067 // We approximate the shadow of gf2p8affineqb using:
5068 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5069 // | gf2p8affineqb(x, A_shadow, 0)
5070 // | gf2p8affineqb(x_Shadow, A, 0)
5071 // | set1_epi8(b_Shadow)
5072 //
5073 // This approximation has false negatives: if an intermediate dot-product
5074 // contains an even number of 1's, the parity is 0.
5075 // It has no false positives.
5076 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5077 IRBuilder<> IRB(&I);
5078
5079 assert(I.arg_size() == 3);
5080 Value *A = I.getOperand(0);
5081 Value *X = I.getOperand(1);
5082 Value *B = I.getOperand(2);
5083
5084 assert(isFixedIntVector(A));
5085 assert(cast<VectorType>(A->getType())
5086 ->getElementType()
5087 ->getScalarSizeInBits() == 8);
5088
5089 assert(A->getType() == X->getType());
5090
5091 assert(B->getType()->isIntegerTy());
5092 assert(B->getType()->getScalarSizeInBits() == 8);
5093
5094 assert(I.getType() == A->getType());
5095
5096 Value *AShadow = getShadow(A);
5097 Value *XShadow = getShadow(X);
5098 Value *BZeroShadow = getCleanShadow(B);
5099
5100 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5101 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5102 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5103 {X, AShadow, BZeroShadow});
5104 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5105 {XShadow, A, BZeroShadow});
5106
5107 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5108 Value *BShadow = getShadow(B);
5109 Value *BBroadcastShadow = getCleanShadow(AShadow);
5110 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5111 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5112 // lower appropriately (e.g., VPBROADCASTB).
5113 // Besides, b is often a constant, in which case it is fully initialized.
5114 for (unsigned i = 0; i < NumElements; i++)
5115 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5116
5117 setShadow(&I, IRB.CreateOr(
5118 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5119 setOriginForNaryOp(I);
5120 }
5121
5122 // Handle Arm NEON vector load intrinsics (vld*).
5123 //
5124 // The WithLane instructions (ld[234]lane) are similar to:
5125 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5126 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5127 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5128 // %A)
5129 //
5130 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5131 // to:
5132 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5133 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5134 unsigned int numArgs = I.arg_size();
5135
5136 // Return type is a struct of vectors of integers or floating-point
5137 assert(I.getType()->isStructTy());
5138 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5139 assert(RetTy->getNumElements() > 0);
5141 RetTy->getElementType(0)->isFPOrFPVectorTy());
5142 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5143 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5144
5145 if (WithLane) {
5146 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5147 assert(4 <= numArgs && numArgs <= 6);
5148
5149 // Return type is a struct of the input vectors
5150 assert(RetTy->getNumElements() + 2 == numArgs);
5151 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5152 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5153 } else {
5154 assert(numArgs == 1);
5155 }
5156
5157 IRBuilder<> IRB(&I);
5158
5159 SmallVector<Value *, 6> ShadowArgs;
5160 if (WithLane) {
5161 for (unsigned int i = 0; i < numArgs - 2; i++)
5162 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5163
5164 // Lane number, passed verbatim
5165 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5166 ShadowArgs.push_back(LaneNumber);
5167
5168 // TODO: blend shadow of lane number into output shadow?
5169 insertCheckShadowOf(LaneNumber, &I);
5170 }
5171
5172 Value *Src = I.getArgOperand(numArgs - 1);
5173 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5174
5175 Type *SrcShadowTy = getShadowTy(Src);
5176 auto [SrcShadowPtr, SrcOriginPtr] =
5177 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5178 ShadowArgs.push_back(SrcShadowPtr);
5179
5180 // The NEON vector load instructions handled by this function all have
5181 // integer variants. It is easier to use those rather than trying to cast
5182 // a struct of vectors of floats into a struct of vectors of integers.
5183 CallInst *CI =
5184 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5185 setShadow(&I, CI);
5186
5187 if (!MS.TrackOrigins)
5188 return;
5189
5190 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5191 setOrigin(&I, PtrSrcOrigin);
5192 }
5193
5194 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5195 /// and vst{2,3,4}lane).
5196 ///
5197 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5198 /// last argument, with the initial arguments being the inputs (and lane
5199 /// number for vst{2,3,4}lane). They return void.
5200 ///
5201 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5202 /// abcdabcdabcdabcd... into *outP
5203 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5204 /// writes aaaa...bbbb...cccc...dddd... into *outP
5205 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5206 /// These instructions can all be instrumented with essentially the same
5207 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5208 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5209 IRBuilder<> IRB(&I);
5210
5211 // Don't use getNumOperands() because it includes the callee
5212 int numArgOperands = I.arg_size();
5213
5214 // The last arg operand is the output (pointer)
5215 assert(numArgOperands >= 1);
5216 Value *Addr = I.getArgOperand(numArgOperands - 1);
5217 assert(Addr->getType()->isPointerTy());
5218 int skipTrailingOperands = 1;
5219
5221 insertCheckShadowOf(Addr, &I);
5222
5223 // Second-last operand is the lane number (for vst{2,3,4}lane)
5224 if (useLane) {
5225 skipTrailingOperands++;
5226 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5228 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5229 }
5230
5231 SmallVector<Value *, 8> ShadowArgs;
5232 // All the initial operands are the inputs
5233 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5234 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5235 Value *Shadow = getShadow(&I, i);
5236 ShadowArgs.append(1, Shadow);
5237 }
5238
5239 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5240 // e.g., for:
5241 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5242 // we know the type of the output (and its shadow) is <16 x i8>.
5243 //
5244 // Arm NEON VST is unusual because the last argument is the output address:
5245 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5246 // call void @llvm.aarch64.neon.st2.v16i8.p0
5247 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5248 // and we have no type information about P's operand. We must manually
5249 // compute the type (<16 x i8> x 2).
5250 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5251 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5252 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5253 (numArgOperands - skipTrailingOperands));
5254 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5255
5256 if (useLane)
5257 ShadowArgs.append(1,
5258 I.getArgOperand(numArgOperands - skipTrailingOperands));
5259
5260 Value *OutputShadowPtr, *OutputOriginPtr;
5261 // AArch64 NEON does not need alignment (unless OS requires it)
5262 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5263 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5264 ShadowArgs.append(1, OutputShadowPtr);
5265
5266 CallInst *CI =
5267 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5268 setShadow(&I, CI);
5269
5270 if (MS.TrackOrigins) {
5271 // TODO: if we modelled the vst* instruction more precisely, we could
5272 // more accurately track the origins (e.g., if both inputs are
5273 // uninitialized for vst2, we currently blame the second input, even
5274 // though part of the output depends only on the first input).
5275 //
5276 // This is particularly imprecise for vst{2,3,4}lane, since only one
5277 // lane of each input is actually copied to the output.
5278 OriginCombiner OC(this, IRB);
5279 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5280 OC.Add(I.getArgOperand(i));
5281
5282 const DataLayout &DL = F.getDataLayout();
5283 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5284 OutputOriginPtr);
5285 }
5286 }
5287
5288 /// Handle intrinsics by applying the intrinsic to the shadows.
5289 ///
5290 /// The trailing arguments are passed verbatim to the intrinsic, though any
5291 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5292 /// intrinsic with one trailing verbatim argument:
5293 /// out = intrinsic(var1, var2, opType)
5294 /// we compute:
5295 /// shadow[out] =
5296 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5297 ///
5298 /// Typically, shadowIntrinsicID will be specified by the caller to be
5299 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5300 /// intrinsic of the same type.
5301 ///
5302 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5303 /// bit-patterns (for example, if the intrinsic accepts floats for
5304 /// var1, we require that it doesn't care if inputs are NaNs).
5305 ///
5306 /// For example, this can be applied to the Arm NEON vector table intrinsics
5307 /// (tbl{1,2,3,4}).
5308 ///
5309 /// The origin is approximated using setOriginForNaryOp.
5310 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5311 Intrinsic::ID shadowIntrinsicID,
5312 unsigned int trailingVerbatimArgs) {
5313 IRBuilder<> IRB(&I);
5314
5315 assert(trailingVerbatimArgs < I.arg_size());
5316
5317 SmallVector<Value *, 8> ShadowArgs;
5318 // Don't use getNumOperands() because it includes the callee
5319 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5320 Value *Shadow = getShadow(&I, i);
5321
5322 // Shadows are integer-ish types but some intrinsics require a
5323 // different (e.g., floating-point) type.
5324 ShadowArgs.push_back(
5325 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5326 }
5327
5328 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5329 i++) {
5330 Value *Arg = I.getArgOperand(i);
5331 ShadowArgs.push_back(Arg);
5332 }
5333
5334 CallInst *CI =
5335 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5336 Value *CombinedShadow = CI;
5337
5338 // Combine the computed shadow with the shadow of trailing args
5339 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5340 i++) {
5341 Value *Shadow =
5342 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5343 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5344 }
5345
5346 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5347
5348 setOriginForNaryOp(I);
5349 }
5350
5351 // Approximation only
5352 //
5353 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5354 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5355 assert(I.arg_size() == 2);
5356
5357 handleShadowOr(I);
5358 }
5359
5360 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5361 switch (I.getIntrinsicID()) {
5362 case Intrinsic::uadd_with_overflow:
5363 case Intrinsic::sadd_with_overflow:
5364 case Intrinsic::usub_with_overflow:
5365 case Intrinsic::ssub_with_overflow:
5366 case Intrinsic::umul_with_overflow:
5367 case Intrinsic::smul_with_overflow:
5368 handleArithmeticWithOverflow(I);
5369 break;
5370 case Intrinsic::abs:
5371 handleAbsIntrinsic(I);
5372 break;
5373 case Intrinsic::bitreverse:
5374 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5375 /*trailingVerbatimArgs*/ 0);
5376 break;
5377 case Intrinsic::is_fpclass:
5378 handleIsFpClass(I);
5379 break;
5380 case Intrinsic::lifetime_start:
5381 handleLifetimeStart(I);
5382 break;
5383 case Intrinsic::launder_invariant_group:
5384 case Intrinsic::strip_invariant_group:
5385 handleInvariantGroup(I);
5386 break;
5387 case Intrinsic::bswap:
5388 handleBswap(I);
5389 break;
5390 case Intrinsic::ctlz:
5391 case Intrinsic::cttz:
5392 handleCountLeadingTrailingZeros(I);
5393 break;
5394 case Intrinsic::masked_compressstore:
5395 handleMaskedCompressStore(I);
5396 break;
5397 case Intrinsic::masked_expandload:
5398 handleMaskedExpandLoad(I);
5399 break;
5400 case Intrinsic::masked_gather:
5401 handleMaskedGather(I);
5402 break;
5403 case Intrinsic::masked_scatter:
5404 handleMaskedScatter(I);
5405 break;
5406 case Intrinsic::masked_store:
5407 handleMaskedStore(I);
5408 break;
5409 case Intrinsic::masked_load:
5410 handleMaskedLoad(I);
5411 break;
5412 case Intrinsic::vector_reduce_and:
5413 handleVectorReduceAndIntrinsic(I);
5414 break;
5415 case Intrinsic::vector_reduce_or:
5416 handleVectorReduceOrIntrinsic(I);
5417 break;
5418
5419 case Intrinsic::vector_reduce_add:
5420 case Intrinsic::vector_reduce_xor:
5421 case Intrinsic::vector_reduce_mul:
5422 // Signed/Unsigned Min/Max
5423 // TODO: handling similarly to AND/OR may be more precise.
5424 case Intrinsic::vector_reduce_smax:
5425 case Intrinsic::vector_reduce_smin:
5426 case Intrinsic::vector_reduce_umax:
5427 case Intrinsic::vector_reduce_umin:
5428 // TODO: this has no false positives, but arguably we should check that all
5429 // the bits are initialized.
5430 case Intrinsic::vector_reduce_fmax:
5431 case Intrinsic::vector_reduce_fmin:
5432 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5433 break;
5434
5435 case Intrinsic::vector_reduce_fadd:
5436 case Intrinsic::vector_reduce_fmul:
5437 handleVectorReduceWithStarterIntrinsic(I);
5438 break;
5439
5440 case Intrinsic::scmp:
5441 case Intrinsic::ucmp: {
5442 handleShadowOr(I);
5443 break;
5444 }
5445
5446 case Intrinsic::fshl:
5447 case Intrinsic::fshr:
5448 handleFunnelShift(I);
5449 break;
5450
5451 case Intrinsic::is_constant:
5452 // The result of llvm.is.constant() is always defined.
5453 setShadow(&I, getCleanShadow(&I));
5454 setOrigin(&I, getCleanOrigin());
5455 break;
5456
5457 default:
5458 return false;
5459 }
5460
5461 return true;
5462 }
5463
5464 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5465 switch (I.getIntrinsicID()) {
5466 case Intrinsic::x86_sse_stmxcsr:
5467 handleStmxcsr(I);
5468 break;
5469 case Intrinsic::x86_sse_ldmxcsr:
5470 handleLdmxcsr(I);
5471 break;
5472
5473 // Convert Scalar Double Precision Floating-Point Value
5474 // to Unsigned Doubleword Integer
5475 // etc.
5476 case Intrinsic::x86_avx512_vcvtsd2usi64:
5477 case Intrinsic::x86_avx512_vcvtsd2usi32:
5478 case Intrinsic::x86_avx512_vcvtss2usi64:
5479 case Intrinsic::x86_avx512_vcvtss2usi32:
5480 case Intrinsic::x86_avx512_cvttss2usi64:
5481 case Intrinsic::x86_avx512_cvttss2usi:
5482 case Intrinsic::x86_avx512_cvttsd2usi64:
5483 case Intrinsic::x86_avx512_cvttsd2usi:
5484 case Intrinsic::x86_avx512_cvtusi2ss:
5485 case Intrinsic::x86_avx512_cvtusi642sd:
5486 case Intrinsic::x86_avx512_cvtusi642ss:
5487 handleSSEVectorConvertIntrinsic(I, 1, true);
5488 break;
5489 case Intrinsic::x86_sse2_cvtsd2si64:
5490 case Intrinsic::x86_sse2_cvtsd2si:
5491 case Intrinsic::x86_sse2_cvtsd2ss:
5492 case Intrinsic::x86_sse2_cvttsd2si64:
5493 case Intrinsic::x86_sse2_cvttsd2si:
5494 case Intrinsic::x86_sse_cvtss2si64:
5495 case Intrinsic::x86_sse_cvtss2si:
5496 case Intrinsic::x86_sse_cvttss2si64:
5497 case Intrinsic::x86_sse_cvttss2si:
5498 handleSSEVectorConvertIntrinsic(I, 1);
5499 break;
5500 case Intrinsic::x86_sse_cvtps2pi:
5501 case Intrinsic::x86_sse_cvttps2pi:
5502 handleSSEVectorConvertIntrinsic(I, 2);
5503 break;
5504
5505 // TODO:
5506 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5507 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5508 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5509
5510 case Intrinsic::x86_vcvtps2ph_128:
5511 case Intrinsic::x86_vcvtps2ph_256: {
5512 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5513 break;
5514 }
5515
5516 // Convert Packed Single Precision Floating-Point Values
5517 // to Packed Signed Doubleword Integer Values
5518 //
5519 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5520 // (<16 x float>, <16 x i32>, i16, i32)
5521 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5522 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5523 break;
5524
5525 // Convert Packed Double Precision Floating-Point Values
5526 // to Packed Single Precision Floating-Point Values
5527 case Intrinsic::x86_sse2_cvtpd2ps:
5528 case Intrinsic::x86_sse2_cvtps2dq:
5529 case Intrinsic::x86_sse2_cvtpd2dq:
5530 case Intrinsic::x86_sse2_cvttps2dq:
5531 case Intrinsic::x86_sse2_cvttpd2dq:
5532 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5533 case Intrinsic::x86_avx_cvt_ps2dq_256:
5534 case Intrinsic::x86_avx_cvt_pd2dq_256:
5535 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5536 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5537 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5538 break;
5539 }
5540
5541 // Convert Single-Precision FP Value to 16-bit FP Value
5542 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5543 // (<16 x float>, i32, <16 x i16>, i16)
5544 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5545 // (<4 x float>, i32, <8 x i16>, i8)
5546 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5547 // (<8 x float>, i32, <8 x i16>, i8)
5548 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5549 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5550 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5551 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5552 break;
5553
5554 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5555 case Intrinsic::x86_avx512_psll_w_512:
5556 case Intrinsic::x86_avx512_psll_d_512:
5557 case Intrinsic::x86_avx512_psll_q_512:
5558 case Intrinsic::x86_avx512_pslli_w_512:
5559 case Intrinsic::x86_avx512_pslli_d_512:
5560 case Intrinsic::x86_avx512_pslli_q_512:
5561 case Intrinsic::x86_avx512_psrl_w_512:
5562 case Intrinsic::x86_avx512_psrl_d_512:
5563 case Intrinsic::x86_avx512_psrl_q_512:
5564 case Intrinsic::x86_avx512_psra_w_512:
5565 case Intrinsic::x86_avx512_psra_d_512:
5566 case Intrinsic::x86_avx512_psra_q_512:
5567 case Intrinsic::x86_avx512_psrli_w_512:
5568 case Intrinsic::x86_avx512_psrli_d_512:
5569 case Intrinsic::x86_avx512_psrli_q_512:
5570 case Intrinsic::x86_avx512_psrai_w_512:
5571 case Intrinsic::x86_avx512_psrai_d_512:
5572 case Intrinsic::x86_avx512_psrai_q_512:
5573 case Intrinsic::x86_avx512_psra_q_256:
5574 case Intrinsic::x86_avx512_psra_q_128:
5575 case Intrinsic::x86_avx512_psrai_q_256:
5576 case Intrinsic::x86_avx512_psrai_q_128:
5577 case Intrinsic::x86_avx2_psll_w:
5578 case Intrinsic::x86_avx2_psll_d:
5579 case Intrinsic::x86_avx2_psll_q:
5580 case Intrinsic::x86_avx2_pslli_w:
5581 case Intrinsic::x86_avx2_pslli_d:
5582 case Intrinsic::x86_avx2_pslli_q:
5583 case Intrinsic::x86_avx2_psrl_w:
5584 case Intrinsic::x86_avx2_psrl_d:
5585 case Intrinsic::x86_avx2_psrl_q:
5586 case Intrinsic::x86_avx2_psra_w:
5587 case Intrinsic::x86_avx2_psra_d:
5588 case Intrinsic::x86_avx2_psrli_w:
5589 case Intrinsic::x86_avx2_psrli_d:
5590 case Intrinsic::x86_avx2_psrli_q:
5591 case Intrinsic::x86_avx2_psrai_w:
5592 case Intrinsic::x86_avx2_psrai_d:
5593 case Intrinsic::x86_sse2_psll_w:
5594 case Intrinsic::x86_sse2_psll_d:
5595 case Intrinsic::x86_sse2_psll_q:
5596 case Intrinsic::x86_sse2_pslli_w:
5597 case Intrinsic::x86_sse2_pslli_d:
5598 case Intrinsic::x86_sse2_pslli_q:
5599 case Intrinsic::x86_sse2_psrl_w:
5600 case Intrinsic::x86_sse2_psrl_d:
5601 case Intrinsic::x86_sse2_psrl_q:
5602 case Intrinsic::x86_sse2_psra_w:
5603 case Intrinsic::x86_sse2_psra_d:
5604 case Intrinsic::x86_sse2_psrli_w:
5605 case Intrinsic::x86_sse2_psrli_d:
5606 case Intrinsic::x86_sse2_psrli_q:
5607 case Intrinsic::x86_sse2_psrai_w:
5608 case Intrinsic::x86_sse2_psrai_d:
5609 case Intrinsic::x86_mmx_psll_w:
5610 case Intrinsic::x86_mmx_psll_d:
5611 case Intrinsic::x86_mmx_psll_q:
5612 case Intrinsic::x86_mmx_pslli_w:
5613 case Intrinsic::x86_mmx_pslli_d:
5614 case Intrinsic::x86_mmx_pslli_q:
5615 case Intrinsic::x86_mmx_psrl_w:
5616 case Intrinsic::x86_mmx_psrl_d:
5617 case Intrinsic::x86_mmx_psrl_q:
5618 case Intrinsic::x86_mmx_psra_w:
5619 case Intrinsic::x86_mmx_psra_d:
5620 case Intrinsic::x86_mmx_psrli_w:
5621 case Intrinsic::x86_mmx_psrli_d:
5622 case Intrinsic::x86_mmx_psrli_q:
5623 case Intrinsic::x86_mmx_psrai_w:
5624 case Intrinsic::x86_mmx_psrai_d:
5625 handleVectorShiftIntrinsic(I, /* Variable */ false);
5626 break;
5627 case Intrinsic::x86_avx2_psllv_d:
5628 case Intrinsic::x86_avx2_psllv_d_256:
5629 case Intrinsic::x86_avx512_psllv_d_512:
5630 case Intrinsic::x86_avx2_psllv_q:
5631 case Intrinsic::x86_avx2_psllv_q_256:
5632 case Intrinsic::x86_avx512_psllv_q_512:
5633 case Intrinsic::x86_avx2_psrlv_d:
5634 case Intrinsic::x86_avx2_psrlv_d_256:
5635 case Intrinsic::x86_avx512_psrlv_d_512:
5636 case Intrinsic::x86_avx2_psrlv_q:
5637 case Intrinsic::x86_avx2_psrlv_q_256:
5638 case Intrinsic::x86_avx512_psrlv_q_512:
5639 case Intrinsic::x86_avx2_psrav_d:
5640 case Intrinsic::x86_avx2_psrav_d_256:
5641 case Intrinsic::x86_avx512_psrav_d_512:
5642 case Intrinsic::x86_avx512_psrav_q_128:
5643 case Intrinsic::x86_avx512_psrav_q_256:
5644 case Intrinsic::x86_avx512_psrav_q_512:
5645 handleVectorShiftIntrinsic(I, /* Variable */ true);
5646 break;
5647
5648 // Pack with Signed/Unsigned Saturation
5649 case Intrinsic::x86_sse2_packsswb_128:
5650 case Intrinsic::x86_sse2_packssdw_128:
5651 case Intrinsic::x86_sse2_packuswb_128:
5652 case Intrinsic::x86_sse41_packusdw:
5653 case Intrinsic::x86_avx2_packsswb:
5654 case Intrinsic::x86_avx2_packssdw:
5655 case Intrinsic::x86_avx2_packuswb:
5656 case Intrinsic::x86_avx2_packusdw:
5657 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5658 // (<32 x i16> %a, <32 x i16> %b)
5659 // <32 x i16> @llvm.x86.avx512.packssdw.512
5660 // (<16 x i32> %a, <16 x i32> %b)
5661 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5662 case Intrinsic::x86_avx512_packsswb_512:
5663 case Intrinsic::x86_avx512_packssdw_512:
5664 case Intrinsic::x86_avx512_packuswb_512:
5665 case Intrinsic::x86_avx512_packusdw_512:
5666 handleVectorPackIntrinsic(I);
5667 break;
5668
5669 case Intrinsic::x86_sse41_pblendvb:
5670 case Intrinsic::x86_sse41_blendvpd:
5671 case Intrinsic::x86_sse41_blendvps:
5672 case Intrinsic::x86_avx_blendv_pd_256:
5673 case Intrinsic::x86_avx_blendv_ps_256:
5674 case Intrinsic::x86_avx2_pblendvb:
5675 handleBlendvIntrinsic(I);
5676 break;
5677
5678 case Intrinsic::x86_avx_dp_ps_256:
5679 case Intrinsic::x86_sse41_dppd:
5680 case Intrinsic::x86_sse41_dpps:
5681 handleDppIntrinsic(I);
5682 break;
5683
5684 case Intrinsic::x86_mmx_packsswb:
5685 case Intrinsic::x86_mmx_packuswb:
5686 handleVectorPackIntrinsic(I, 16);
5687 break;
5688
5689 case Intrinsic::x86_mmx_packssdw:
5690 handleVectorPackIntrinsic(I, 32);
5691 break;
5692
5693 case Intrinsic::x86_mmx_psad_bw:
5694 handleVectorSadIntrinsic(I, true);
5695 break;
5696 case Intrinsic::x86_sse2_psad_bw:
5697 case Intrinsic::x86_avx2_psad_bw:
5698 handleVectorSadIntrinsic(I);
5699 break;
5700
5701 // Multiply and Add Packed Words
5702 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5703 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5704 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5705 //
5706 // Multiply and Add Packed Signed and Unsigned Bytes
5707 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5708 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5709 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5710 //
5711 // These intrinsics are auto-upgraded into non-masked forms:
5712 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5713 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5714 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5715 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5716 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5717 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5718 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5719 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5720 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5721 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5722 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5723 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5724 case Intrinsic::x86_sse2_pmadd_wd:
5725 case Intrinsic::x86_avx2_pmadd_wd:
5726 case Intrinsic::x86_avx512_pmaddw_d_512:
5727 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5728 case Intrinsic::x86_avx2_pmadd_ub_sw:
5729 case Intrinsic::x86_avx512_pmaddubs_w_512:
5730 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2);
5731 break;
5732
5733 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5734 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5735 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/8);
5736 break;
5737
5738 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5739 case Intrinsic::x86_mmx_pmadd_wd:
5740 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5741 break;
5742
5743 // AVX Vector Neural Network Instructions: bytes
5744 //
5745 // Multiply and Add Packed Signed and Unsigned Bytes
5746 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5747 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5748 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5749 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5750 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5751 // (<16 x i32>, <64 x i8>, <64 x i8>)
5752 //
5753 // Multiply and Add Unsigned and Signed Bytes With Saturation
5754 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5755 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5756 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5757 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5758 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5759 // (<16 x i32>, <64 x i8>, <64 x i8>)
5760 //
5761 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5762 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5763 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5764 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5765 //
5766 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5767 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5768 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5769 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5770 //
5771 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5772 // (<16 x i32>, <16 x i32>, <16 x i32>)
5773 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5774 // (<16 x i32>, <16 x i32>, <16 x i32>)
5775 //
5776 // These intrinsics are auto-upgraded into non-masked forms:
5777 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5778 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5779 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5780 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5781 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5782 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5783 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5784 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5785 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5786 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5787 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5788 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5789 //
5790 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5791 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5792 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5793 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5794 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5795 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5796 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5797 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5798 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5799 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5800 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
5801 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5802 case Intrinsic::x86_avx512_vpdpbusd_128:
5803 case Intrinsic::x86_avx512_vpdpbusd_256:
5804 case Intrinsic::x86_avx512_vpdpbusd_512:
5805 case Intrinsic::x86_avx512_vpdpbusds_128:
5806 case Intrinsic::x86_avx512_vpdpbusds_256:
5807 case Intrinsic::x86_avx512_vpdpbusds_512:
5808 case Intrinsic::x86_avx2_vpdpbssd_128:
5809 case Intrinsic::x86_avx2_vpdpbssd_256:
5810 case Intrinsic::x86_avx10_vpdpbssd_512:
5811 case Intrinsic::x86_avx2_vpdpbssds_128:
5812 case Intrinsic::x86_avx2_vpdpbssds_256:
5813 case Intrinsic::x86_avx10_vpdpbssds_512:
5814 case Intrinsic::x86_avx2_vpdpbsud_128:
5815 case Intrinsic::x86_avx2_vpdpbsud_256:
5816 case Intrinsic::x86_avx10_vpdpbsud_512:
5817 case Intrinsic::x86_avx2_vpdpbsuds_128:
5818 case Intrinsic::x86_avx2_vpdpbsuds_256:
5819 case Intrinsic::x86_avx10_vpdpbsuds_512:
5820 case Intrinsic::x86_avx2_vpdpbuud_128:
5821 case Intrinsic::x86_avx2_vpdpbuud_256:
5822 case Intrinsic::x86_avx10_vpdpbuud_512:
5823 case Intrinsic::x86_avx2_vpdpbuuds_128:
5824 case Intrinsic::x86_avx2_vpdpbuuds_256:
5825 case Intrinsic::x86_avx10_vpdpbuuds_512:
5826 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4, /*EltSize=*/8);
5827 break;
5828
5829 // AVX Vector Neural Network Instructions: words
5830 //
5831 // Multiply and Add Signed Word Integers
5832 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
5833 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5834 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
5835 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5836 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
5837 // (<16 x i32>, <16 x i32>, <16 x i32>)
5838 //
5839 // Multiply and Add Signed Word Integers With Saturation
5840 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
5841 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5842 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
5843 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5844 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
5845 // (<16 x i32>, <16 x i32>, <16 x i32>)
5846 //
5847 // These intrinsics are auto-upgraded into non-masked forms:
5848 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
5849 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5850 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
5851 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5852 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
5853 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5854 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
5855 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5856 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
5857 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5858 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
5859 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5860 //
5861 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
5862 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5863 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
5864 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5865 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
5866 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5867 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
5868 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5869 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
5870 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5871 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
5872 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5873 case Intrinsic::x86_avx512_vpdpwssd_128:
5874 case Intrinsic::x86_avx512_vpdpwssd_256:
5875 case Intrinsic::x86_avx512_vpdpwssd_512:
5876 case Intrinsic::x86_avx512_vpdpwssds_128:
5877 case Intrinsic::x86_avx512_vpdpwssds_256:
5878 case Intrinsic::x86_avx512_vpdpwssds_512:
5879 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5880 break;
5881
5882 // TODO: Dot Product of BF16 Pairs Accumulated Into Packed Single
5883 // Precision
5884 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
5885 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5886 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
5887 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
5888 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
5889 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
5890 // handleVectorPmaddIntrinsic() currently only handles integer types.
5891
5892 case Intrinsic::x86_sse_cmp_ss:
5893 case Intrinsic::x86_sse2_cmp_sd:
5894 case Intrinsic::x86_sse_comieq_ss:
5895 case Intrinsic::x86_sse_comilt_ss:
5896 case Intrinsic::x86_sse_comile_ss:
5897 case Intrinsic::x86_sse_comigt_ss:
5898 case Intrinsic::x86_sse_comige_ss:
5899 case Intrinsic::x86_sse_comineq_ss:
5900 case Intrinsic::x86_sse_ucomieq_ss:
5901 case Intrinsic::x86_sse_ucomilt_ss:
5902 case Intrinsic::x86_sse_ucomile_ss:
5903 case Intrinsic::x86_sse_ucomigt_ss:
5904 case Intrinsic::x86_sse_ucomige_ss:
5905 case Intrinsic::x86_sse_ucomineq_ss:
5906 case Intrinsic::x86_sse2_comieq_sd:
5907 case Intrinsic::x86_sse2_comilt_sd:
5908 case Intrinsic::x86_sse2_comile_sd:
5909 case Intrinsic::x86_sse2_comigt_sd:
5910 case Intrinsic::x86_sse2_comige_sd:
5911 case Intrinsic::x86_sse2_comineq_sd:
5912 case Intrinsic::x86_sse2_ucomieq_sd:
5913 case Intrinsic::x86_sse2_ucomilt_sd:
5914 case Intrinsic::x86_sse2_ucomile_sd:
5915 case Intrinsic::x86_sse2_ucomigt_sd:
5916 case Intrinsic::x86_sse2_ucomige_sd:
5917 case Intrinsic::x86_sse2_ucomineq_sd:
5918 handleVectorCompareScalarIntrinsic(I);
5919 break;
5920
5921 case Intrinsic::x86_avx_cmp_pd_256:
5922 case Intrinsic::x86_avx_cmp_ps_256:
5923 case Intrinsic::x86_sse2_cmp_pd:
5924 case Intrinsic::x86_sse_cmp_ps:
5925 handleVectorComparePackedIntrinsic(I);
5926 break;
5927
5928 case Intrinsic::x86_bmi_bextr_32:
5929 case Intrinsic::x86_bmi_bextr_64:
5930 case Intrinsic::x86_bmi_bzhi_32:
5931 case Intrinsic::x86_bmi_bzhi_64:
5932 case Intrinsic::x86_bmi_pdep_32:
5933 case Intrinsic::x86_bmi_pdep_64:
5934 case Intrinsic::x86_bmi_pext_32:
5935 case Intrinsic::x86_bmi_pext_64:
5936 handleBmiIntrinsic(I);
5937 break;
5938
5939 case Intrinsic::x86_pclmulqdq:
5940 case Intrinsic::x86_pclmulqdq_256:
5941 case Intrinsic::x86_pclmulqdq_512:
5942 handlePclmulIntrinsic(I);
5943 break;
5944
5945 case Intrinsic::x86_avx_round_pd_256:
5946 case Intrinsic::x86_avx_round_ps_256:
5947 case Intrinsic::x86_sse41_round_pd:
5948 case Intrinsic::x86_sse41_round_ps:
5949 handleRoundPdPsIntrinsic(I);
5950 break;
5951
5952 case Intrinsic::x86_sse41_round_sd:
5953 case Intrinsic::x86_sse41_round_ss:
5954 handleUnarySdSsIntrinsic(I);
5955 break;
5956
5957 case Intrinsic::x86_sse2_max_sd:
5958 case Intrinsic::x86_sse_max_ss:
5959 case Intrinsic::x86_sse2_min_sd:
5960 case Intrinsic::x86_sse_min_ss:
5961 handleBinarySdSsIntrinsic(I);
5962 break;
5963
5964 case Intrinsic::x86_avx_vtestc_pd:
5965 case Intrinsic::x86_avx_vtestc_pd_256:
5966 case Intrinsic::x86_avx_vtestc_ps:
5967 case Intrinsic::x86_avx_vtestc_ps_256:
5968 case Intrinsic::x86_avx_vtestnzc_pd:
5969 case Intrinsic::x86_avx_vtestnzc_pd_256:
5970 case Intrinsic::x86_avx_vtestnzc_ps:
5971 case Intrinsic::x86_avx_vtestnzc_ps_256:
5972 case Intrinsic::x86_avx_vtestz_pd:
5973 case Intrinsic::x86_avx_vtestz_pd_256:
5974 case Intrinsic::x86_avx_vtestz_ps:
5975 case Intrinsic::x86_avx_vtestz_ps_256:
5976 case Intrinsic::x86_avx_ptestc_256:
5977 case Intrinsic::x86_avx_ptestnzc_256:
5978 case Intrinsic::x86_avx_ptestz_256:
5979 case Intrinsic::x86_sse41_ptestc:
5980 case Intrinsic::x86_sse41_ptestnzc:
5981 case Intrinsic::x86_sse41_ptestz:
5982 handleVtestIntrinsic(I);
5983 break;
5984
5985 // Packed Horizontal Add/Subtract
5986 case Intrinsic::x86_ssse3_phadd_w:
5987 case Intrinsic::x86_ssse3_phadd_w_128:
5988 case Intrinsic::x86_avx2_phadd_w:
5989 case Intrinsic::x86_ssse3_phsub_w:
5990 case Intrinsic::x86_ssse3_phsub_w_128:
5991 case Intrinsic::x86_avx2_phsub_w: {
5992 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
5993 break;
5994 }
5995
5996 // Packed Horizontal Add/Subtract
5997 case Intrinsic::x86_ssse3_phadd_d:
5998 case Intrinsic::x86_ssse3_phadd_d_128:
5999 case Intrinsic::x86_avx2_phadd_d:
6000 case Intrinsic::x86_ssse3_phsub_d:
6001 case Intrinsic::x86_ssse3_phsub_d_128:
6002 case Intrinsic::x86_avx2_phsub_d: {
6003 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/32);
6004 break;
6005 }
6006
6007 // Packed Horizontal Add/Subtract and Saturate
6008 case Intrinsic::x86_ssse3_phadd_sw:
6009 case Intrinsic::x86_ssse3_phadd_sw_128:
6010 case Intrinsic::x86_avx2_phadd_sw:
6011 case Intrinsic::x86_ssse3_phsub_sw:
6012 case Intrinsic::x86_ssse3_phsub_sw_128:
6013 case Intrinsic::x86_avx2_phsub_sw: {
6014 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
6015 break;
6016 }
6017
6018 // Packed Single/Double Precision Floating-Point Horizontal Add
6019 case Intrinsic::x86_sse3_hadd_ps:
6020 case Intrinsic::x86_sse3_hadd_pd:
6021 case Intrinsic::x86_avx_hadd_pd_256:
6022 case Intrinsic::x86_avx_hadd_ps_256:
6023 case Intrinsic::x86_sse3_hsub_ps:
6024 case Intrinsic::x86_sse3_hsub_pd:
6025 case Intrinsic::x86_avx_hsub_pd_256:
6026 case Intrinsic::x86_avx_hsub_ps_256: {
6027 handlePairwiseShadowOrIntrinsic(I);
6028 break;
6029 }
6030
6031 case Intrinsic::x86_avx_maskstore_ps:
6032 case Intrinsic::x86_avx_maskstore_pd:
6033 case Intrinsic::x86_avx_maskstore_ps_256:
6034 case Intrinsic::x86_avx_maskstore_pd_256:
6035 case Intrinsic::x86_avx2_maskstore_d:
6036 case Intrinsic::x86_avx2_maskstore_q:
6037 case Intrinsic::x86_avx2_maskstore_d_256:
6038 case Intrinsic::x86_avx2_maskstore_q_256: {
6039 handleAVXMaskedStore(I);
6040 break;
6041 }
6042
6043 case Intrinsic::x86_avx_maskload_ps:
6044 case Intrinsic::x86_avx_maskload_pd:
6045 case Intrinsic::x86_avx_maskload_ps_256:
6046 case Intrinsic::x86_avx_maskload_pd_256:
6047 case Intrinsic::x86_avx2_maskload_d:
6048 case Intrinsic::x86_avx2_maskload_q:
6049 case Intrinsic::x86_avx2_maskload_d_256:
6050 case Intrinsic::x86_avx2_maskload_q_256: {
6051 handleAVXMaskedLoad(I);
6052 break;
6053 }
6054
6055 // Packed
6056 case Intrinsic::x86_avx512fp16_add_ph_512:
6057 case Intrinsic::x86_avx512fp16_sub_ph_512:
6058 case Intrinsic::x86_avx512fp16_mul_ph_512:
6059 case Intrinsic::x86_avx512fp16_div_ph_512:
6060 case Intrinsic::x86_avx512fp16_max_ph_512:
6061 case Intrinsic::x86_avx512fp16_min_ph_512:
6062 case Intrinsic::x86_avx512_min_ps_512:
6063 case Intrinsic::x86_avx512_min_pd_512:
6064 case Intrinsic::x86_avx512_max_ps_512:
6065 case Intrinsic::x86_avx512_max_pd_512: {
6066 // These AVX512 variants contain the rounding mode as a trailing flag.
6067 // Earlier variants do not have a trailing flag and are already handled
6068 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6069 // maybeHandleUnknownIntrinsic.
6070 [[maybe_unused]] bool Success =
6071 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6072 assert(Success);
6073 break;
6074 }
6075
6076 case Intrinsic::x86_avx_vpermilvar_pd:
6077 case Intrinsic::x86_avx_vpermilvar_pd_256:
6078 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6079 case Intrinsic::x86_avx_vpermilvar_ps:
6080 case Intrinsic::x86_avx_vpermilvar_ps_256:
6081 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6082 handleAVXVpermilvar(I);
6083 break;
6084 }
6085
6086 case Intrinsic::x86_avx512_vpermi2var_d_128:
6087 case Intrinsic::x86_avx512_vpermi2var_d_256:
6088 case Intrinsic::x86_avx512_vpermi2var_d_512:
6089 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6090 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6091 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6092 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6093 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6094 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6095 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6096 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6097 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6098 case Intrinsic::x86_avx512_vpermi2var_q_128:
6099 case Intrinsic::x86_avx512_vpermi2var_q_256:
6100 case Intrinsic::x86_avx512_vpermi2var_q_512:
6101 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6102 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6103 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6104 handleAVXVpermi2var(I);
6105 break;
6106
6107 // Packed Shuffle
6108 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6109 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6110 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6111 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6112 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6113 //
6114 // The following intrinsics are auto-upgraded:
6115 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6116 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6117 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6118 case Intrinsic::x86_avx2_pshuf_b:
6119 case Intrinsic::x86_sse_pshuf_w:
6120 case Intrinsic::x86_ssse3_pshuf_b_128:
6121 case Intrinsic::x86_ssse3_pshuf_b:
6122 case Intrinsic::x86_avx512_pshuf_b_512:
6123 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6124 /*trailingVerbatimArgs=*/1);
6125 break;
6126
6127 // AVX512 PMOV: Packed MOV, with truncation
6128 // Precisely handled by applying the same intrinsic to the shadow
6129 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6130 case Intrinsic::x86_avx512_mask_pmov_db_512:
6131 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6132 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6133 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6134 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6135 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6136 /*trailingVerbatimArgs=*/1);
6137 break;
6138 }
6139
6140 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6141 // Approximately handled using the corresponding truncation intrinsic
6142 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6143 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6144 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6145 handleIntrinsicByApplyingToShadow(I,
6146 Intrinsic::x86_avx512_mask_pmov_dw_512,
6147 /* trailingVerbatimArgs=*/1);
6148 break;
6149 }
6150
6151 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6152 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6153 handleIntrinsicByApplyingToShadow(I,
6154 Intrinsic::x86_avx512_mask_pmov_db_512,
6155 /* trailingVerbatimArgs=*/1);
6156 break;
6157 }
6158
6159 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6160 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6161 handleIntrinsicByApplyingToShadow(I,
6162 Intrinsic::x86_avx512_mask_pmov_qb_512,
6163 /* trailingVerbatimArgs=*/1);
6164 break;
6165 }
6166
6167 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6168 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6169 handleIntrinsicByApplyingToShadow(I,
6170 Intrinsic::x86_avx512_mask_pmov_qw_512,
6171 /* trailingVerbatimArgs=*/1);
6172 break;
6173 }
6174
6175 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6176 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6177 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6178 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6179 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6180 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6181 // slow-path handler.
6182 handleAVX512VectorDownConvert(I);
6183 break;
6184 }
6185
6186 // AVX512/AVX10 Reciprocal
6187 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6188 // (<16 x float>, <16 x float>, i16)
6189 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6190 // (<8 x float>, <8 x float>, i8)
6191 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6192 // (<4 x float>, <4 x float>, i8)
6193 //
6194 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6195 // (<8 x double>, <8 x double>, i8)
6196 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6197 // (<4 x double>, <4 x double>, i8)
6198 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6199 // (<2 x double>, <2 x double>, i8)
6200 //
6201 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6202 // (<32 x bfloat>, <32 x bfloat>, i32)
6203 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6204 // (<16 x bfloat>, <16 x bfloat>, i16)
6205 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6206 // (<8 x bfloat>, <8 x bfloat>, i8)
6207 //
6208 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6209 // (<32 x half>, <32 x half>, i32)
6210 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6211 // (<16 x half>, <16 x half>, i16)
6212 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6213 // (<8 x half>, <8 x half>, i8)
6214 //
6215 // TODO: 3-operand variants are not handled:
6216 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6217 // (<2 x double>, <2 x double>, <2 x double>, i8)
6218 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6219 // (<4 x float>, <4 x float>, <4 x float>, i8)
6220 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6221 // (<8 x half>, <8 x half>, <8 x half>, i8)
6222 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6223 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6224 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6225 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6226 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6227 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6228 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6229 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6230 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6231 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6232 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6233 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6234 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6235 /*MaskIndex=*/2);
6236 break;
6237
6238 // AVX512/AVX10 Reciprocal Square Root
6239 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6240 // (<16 x float>, <16 x float>, i16)
6241 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6242 // (<8 x float>, <8 x float>, i8)
6243 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6244 // (<4 x float>, <4 x float>, i8)
6245 //
6246 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6247 // (<8 x double>, <8 x double>, i8)
6248 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6249 // (<4 x double>, <4 x double>, i8)
6250 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6251 // (<2 x double>, <2 x double>, i8)
6252 //
6253 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6254 // (<32 x bfloat>, <32 x bfloat>, i32)
6255 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6256 // (<16 x bfloat>, <16 x bfloat>, i16)
6257 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6258 // (<8 x bfloat>, <8 x bfloat>, i8)
6259 //
6260 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6261 // (<32 x half>, <32 x half>, i32)
6262 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6263 // (<16 x half>, <16 x half>, i16)
6264 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6265 // (<8 x half>, <8 x half>, i8)
6266 //
6267 // TODO: 3-operand variants are not handled:
6268 // <2 x double> @llvm.x86.avx512.rcp14.sd
6269 // (<2 x double>, <2 x double>, <2 x double>, i8)
6270 // <4 x float> @llvm.x86.avx512.rcp14.ss
6271 // (<4 x float>, <4 x float>, <4 x float>, i8)
6272 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6273 // (<8 x half>, <8 x half>, <8 x half>, i8)
6274 case Intrinsic::x86_avx512_rcp14_ps_512:
6275 case Intrinsic::x86_avx512_rcp14_ps_256:
6276 case Intrinsic::x86_avx512_rcp14_ps_128:
6277 case Intrinsic::x86_avx512_rcp14_pd_512:
6278 case Intrinsic::x86_avx512_rcp14_pd_256:
6279 case Intrinsic::x86_avx512_rcp14_pd_128:
6280 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6281 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6282 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6283 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6284 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6285 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6286 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6287 /*MaskIndex=*/2);
6288 break;
6289
6290 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6291 // (<32 x half>, i32, <32 x half>, i32, i32)
6292 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6293 // (<16 x half>, i32, <16 x half>, i32, i16)
6294 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6295 // (<8 x half>, i32, <8 x half>, i32, i8)
6296 //
6297 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6298 // (<16 x float>, i32, <16 x float>, i16, i32)
6299 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6300 // (<8 x float>, i32, <8 x float>, i8)
6301 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6302 // (<4 x float>, i32, <4 x float>, i8)
6303 //
6304 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6305 // (<8 x double>, i32, <8 x double>, i8, i32)
6306 // A Imm WriteThru Mask Rounding
6307 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6308 // (<4 x double>, i32, <4 x double>, i8)
6309 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6310 // (<2 x double>, i32, <2 x double>, i8)
6311 // A Imm WriteThru Mask
6312 //
6313 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6314 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6315 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6316 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6317 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6318 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6319 //
6320 // Not supported: three vectors
6321 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6322 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6323 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6324 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6325 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6326 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6327 // i32)
6328 // A B WriteThru Mask Imm
6329 // Rounding
6330 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6331 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6332 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6333 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6334 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6335 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6336 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6337 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6338 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6339 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6340 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6341 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6342 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6343 /*MaskIndex=*/3);
6344 break;
6345
6346 // AVX512 FP16 Arithmetic
6347 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6348 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6349 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6350 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6351 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6352 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6353 visitGenericScalarHalfwordInst(I);
6354 break;
6355 }
6356
6357 // AVX Galois Field New Instructions
6358 case Intrinsic::x86_vgf2p8affineqb_128:
6359 case Intrinsic::x86_vgf2p8affineqb_256:
6360 case Intrinsic::x86_vgf2p8affineqb_512:
6361 handleAVXGF2P8Affine(I);
6362 break;
6363
6364 default:
6365 return false;
6366 }
6367
6368 return true;
6369 }
6370
6371 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6372 switch (I.getIntrinsicID()) {
6373 case Intrinsic::aarch64_neon_rshrn:
6374 case Intrinsic::aarch64_neon_sqrshl:
6375 case Intrinsic::aarch64_neon_sqrshrn:
6376 case Intrinsic::aarch64_neon_sqrshrun:
6377 case Intrinsic::aarch64_neon_sqshl:
6378 case Intrinsic::aarch64_neon_sqshlu:
6379 case Intrinsic::aarch64_neon_sqshrn:
6380 case Intrinsic::aarch64_neon_sqshrun:
6381 case Intrinsic::aarch64_neon_srshl:
6382 case Intrinsic::aarch64_neon_sshl:
6383 case Intrinsic::aarch64_neon_uqrshl:
6384 case Intrinsic::aarch64_neon_uqrshrn:
6385 case Intrinsic::aarch64_neon_uqshl:
6386 case Intrinsic::aarch64_neon_uqshrn:
6387 case Intrinsic::aarch64_neon_urshl:
6388 case Intrinsic::aarch64_neon_ushl:
6389 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6390 handleVectorShiftIntrinsic(I, /* Variable */ false);
6391 break;
6392
6393 // TODO: handling max/min similarly to AND/OR may be more precise
6394 // Floating-Point Maximum/Minimum Pairwise
6395 case Intrinsic::aarch64_neon_fmaxp:
6396 case Intrinsic::aarch64_neon_fminp:
6397 // Floating-Point Maximum/Minimum Number Pairwise
6398 case Intrinsic::aarch64_neon_fmaxnmp:
6399 case Intrinsic::aarch64_neon_fminnmp:
6400 // Signed/Unsigned Maximum/Minimum Pairwise
6401 case Intrinsic::aarch64_neon_smaxp:
6402 case Intrinsic::aarch64_neon_sminp:
6403 case Intrinsic::aarch64_neon_umaxp:
6404 case Intrinsic::aarch64_neon_uminp:
6405 // Add Pairwise
6406 case Intrinsic::aarch64_neon_addp:
6407 // Floating-point Add Pairwise
6408 case Intrinsic::aarch64_neon_faddp:
6409 // Add Long Pairwise
6410 case Intrinsic::aarch64_neon_saddlp:
6411 case Intrinsic::aarch64_neon_uaddlp: {
6412 handlePairwiseShadowOrIntrinsic(I);
6413 break;
6414 }
6415
6416 // Floating-point Convert to integer, rounding to nearest with ties to Away
6417 case Intrinsic::aarch64_neon_fcvtas:
6418 case Intrinsic::aarch64_neon_fcvtau:
6419 // Floating-point convert to integer, rounding toward minus infinity
6420 case Intrinsic::aarch64_neon_fcvtms:
6421 case Intrinsic::aarch64_neon_fcvtmu:
6422 // Floating-point convert to integer, rounding to nearest with ties to even
6423 case Intrinsic::aarch64_neon_fcvtns:
6424 case Intrinsic::aarch64_neon_fcvtnu:
6425 // Floating-point convert to integer, rounding toward plus infinity
6426 case Intrinsic::aarch64_neon_fcvtps:
6427 case Intrinsic::aarch64_neon_fcvtpu:
6428 // Floating-point Convert to integer, rounding toward Zero
6429 case Intrinsic::aarch64_neon_fcvtzs:
6430 case Intrinsic::aarch64_neon_fcvtzu:
6431 // Floating-point convert to lower precision narrow, rounding to odd
6432 case Intrinsic::aarch64_neon_fcvtxn: {
6433 handleNEONVectorConvertIntrinsic(I);
6434 break;
6435 }
6436
6437 // Add reduction to scalar
6438 case Intrinsic::aarch64_neon_faddv:
6439 case Intrinsic::aarch64_neon_saddv:
6440 case Intrinsic::aarch64_neon_uaddv:
6441 // Signed/Unsigned min/max (Vector)
6442 // TODO: handling similarly to AND/OR may be more precise.
6443 case Intrinsic::aarch64_neon_smaxv:
6444 case Intrinsic::aarch64_neon_sminv:
6445 case Intrinsic::aarch64_neon_umaxv:
6446 case Intrinsic::aarch64_neon_uminv:
6447 // Floating-point min/max (vector)
6448 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6449 // but our shadow propagation is the same.
6450 case Intrinsic::aarch64_neon_fmaxv:
6451 case Intrinsic::aarch64_neon_fminv:
6452 case Intrinsic::aarch64_neon_fmaxnmv:
6453 case Intrinsic::aarch64_neon_fminnmv:
6454 // Sum long across vector
6455 case Intrinsic::aarch64_neon_saddlv:
6456 case Intrinsic::aarch64_neon_uaddlv:
6457 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6458 break;
6459
6460 case Intrinsic::aarch64_neon_ld1x2:
6461 case Intrinsic::aarch64_neon_ld1x3:
6462 case Intrinsic::aarch64_neon_ld1x4:
6463 case Intrinsic::aarch64_neon_ld2:
6464 case Intrinsic::aarch64_neon_ld3:
6465 case Intrinsic::aarch64_neon_ld4:
6466 case Intrinsic::aarch64_neon_ld2r:
6467 case Intrinsic::aarch64_neon_ld3r:
6468 case Intrinsic::aarch64_neon_ld4r: {
6469 handleNEONVectorLoad(I, /*WithLane=*/false);
6470 break;
6471 }
6472
6473 case Intrinsic::aarch64_neon_ld2lane:
6474 case Intrinsic::aarch64_neon_ld3lane:
6475 case Intrinsic::aarch64_neon_ld4lane: {
6476 handleNEONVectorLoad(I, /*WithLane=*/true);
6477 break;
6478 }
6479
6480 // Saturating extract narrow
6481 case Intrinsic::aarch64_neon_sqxtn:
6482 case Intrinsic::aarch64_neon_sqxtun:
6483 case Intrinsic::aarch64_neon_uqxtn:
6484 // These only have one argument, but we (ab)use handleShadowOr because it
6485 // does work on single argument intrinsics and will typecast the shadow
6486 // (and update the origin).
6487 handleShadowOr(I);
6488 break;
6489
6490 case Intrinsic::aarch64_neon_st1x2:
6491 case Intrinsic::aarch64_neon_st1x3:
6492 case Intrinsic::aarch64_neon_st1x4:
6493 case Intrinsic::aarch64_neon_st2:
6494 case Intrinsic::aarch64_neon_st3:
6495 case Intrinsic::aarch64_neon_st4: {
6496 handleNEONVectorStoreIntrinsic(I, false);
6497 break;
6498 }
6499
6500 case Intrinsic::aarch64_neon_st2lane:
6501 case Intrinsic::aarch64_neon_st3lane:
6502 case Intrinsic::aarch64_neon_st4lane: {
6503 handleNEONVectorStoreIntrinsic(I, true);
6504 break;
6505 }
6506
6507 // Arm NEON vector table intrinsics have the source/table register(s) as
6508 // arguments, followed by the index register. They return the output.
6509 //
6510 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6511 // original value unchanged in the destination register.'
6512 // Conveniently, zero denotes a clean shadow, which means out-of-range
6513 // indices for TBL will initialize the user data with zero and also clean
6514 // the shadow. (For TBX, neither the user data nor the shadow will be
6515 // updated, which is also correct.)
6516 case Intrinsic::aarch64_neon_tbl1:
6517 case Intrinsic::aarch64_neon_tbl2:
6518 case Intrinsic::aarch64_neon_tbl3:
6519 case Intrinsic::aarch64_neon_tbl4:
6520 case Intrinsic::aarch64_neon_tbx1:
6521 case Intrinsic::aarch64_neon_tbx2:
6522 case Intrinsic::aarch64_neon_tbx3:
6523 case Intrinsic::aarch64_neon_tbx4: {
6524 // The last trailing argument (index register) should be handled verbatim
6525 handleIntrinsicByApplyingToShadow(
6526 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6527 /*trailingVerbatimArgs*/ 1);
6528 break;
6529 }
6530
6531 case Intrinsic::aarch64_neon_fmulx:
6532 case Intrinsic::aarch64_neon_pmul:
6533 case Intrinsic::aarch64_neon_pmull:
6534 case Intrinsic::aarch64_neon_smull:
6535 case Intrinsic::aarch64_neon_pmull64:
6536 case Intrinsic::aarch64_neon_umull: {
6537 handleNEONVectorMultiplyIntrinsic(I);
6538 break;
6539 }
6540
6541 default:
6542 return false;
6543 }
6544
6545 return true;
6546 }
6547
6548 void visitIntrinsicInst(IntrinsicInst &I) {
6549 if (maybeHandleCrossPlatformIntrinsic(I))
6550 return;
6551
6552 if (maybeHandleX86SIMDIntrinsic(I))
6553 return;
6554
6555 if (maybeHandleArmSIMDIntrinsic(I))
6556 return;
6557
6558 if (maybeHandleUnknownIntrinsic(I))
6559 return;
6560
6561 visitInstruction(I);
6562 }
6563
6564 void visitLibAtomicLoad(CallBase &CB) {
6565 // Since we use getNextNode here, we can't have CB terminate the BB.
6566 assert(isa<CallInst>(CB));
6567
6568 IRBuilder<> IRB(&CB);
6569 Value *Size = CB.getArgOperand(0);
6570 Value *SrcPtr = CB.getArgOperand(1);
6571 Value *DstPtr = CB.getArgOperand(2);
6572 Value *Ordering = CB.getArgOperand(3);
6573 // Convert the call to have at least Acquire ordering to make sure
6574 // the shadow operations aren't reordered before it.
6575 Value *NewOrdering =
6576 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6577 CB.setArgOperand(3, NewOrdering);
6578
6579 NextNodeIRBuilder NextIRB(&CB);
6580 Value *SrcShadowPtr, *SrcOriginPtr;
6581 std::tie(SrcShadowPtr, SrcOriginPtr) =
6582 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6583 /*isStore*/ false);
6584 Value *DstShadowPtr =
6585 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6586 /*isStore*/ true)
6587 .first;
6588
6589 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6590 if (MS.TrackOrigins) {
6591 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6593 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6594 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6595 }
6596 }
6597
6598 void visitLibAtomicStore(CallBase &CB) {
6599 IRBuilder<> IRB(&CB);
6600 Value *Size = CB.getArgOperand(0);
6601 Value *DstPtr = CB.getArgOperand(2);
6602 Value *Ordering = CB.getArgOperand(3);
6603 // Convert the call to have at least Release ordering to make sure
6604 // the shadow operations aren't reordered after it.
6605 Value *NewOrdering =
6606 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6607 CB.setArgOperand(3, NewOrdering);
6608
6609 Value *DstShadowPtr =
6610 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6611 /*isStore*/ true)
6612 .first;
6613
6614 // Atomic store always paints clean shadow/origin. See file header.
6615 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6616 Align(1));
6617 }
6618
6619 void visitCallBase(CallBase &CB) {
6620 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6621 if (CB.isInlineAsm()) {
6622 // For inline asm (either a call to asm function, or callbr instruction),
6623 // do the usual thing: check argument shadow and mark all outputs as
6624 // clean. Note that any side effects of the inline asm that are not
6625 // immediately visible in its constraints are not handled.
6627 visitAsmInstruction(CB);
6628 else
6629 visitInstruction(CB);
6630 return;
6631 }
6632 LibFunc LF;
6633 if (TLI->getLibFunc(CB, LF)) {
6634 // libatomic.a functions need to have special handling because there isn't
6635 // a good way to intercept them or compile the library with
6636 // instrumentation.
6637 switch (LF) {
6638 case LibFunc_atomic_load:
6639 if (!isa<CallInst>(CB)) {
6640 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6641 "Ignoring!\n";
6642 break;
6643 }
6644 visitLibAtomicLoad(CB);
6645 return;
6646 case LibFunc_atomic_store:
6647 visitLibAtomicStore(CB);
6648 return;
6649 default:
6650 break;
6651 }
6652 }
6653
6654 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6655 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6656
6657 // We are going to insert code that relies on the fact that the callee
6658 // will become a non-readonly function after it is instrumented by us. To
6659 // prevent this code from being optimized out, mark that function
6660 // non-readonly in advance.
6661 // TODO: We can likely do better than dropping memory() completely here.
6662 AttributeMask B;
6663 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6664
6666 if (Function *Func = Call->getCalledFunction()) {
6667 Func->removeFnAttrs(B);
6668 }
6669
6671 }
6672 IRBuilder<> IRB(&CB);
6673 bool MayCheckCall = MS.EagerChecks;
6674 if (Function *Func = CB.getCalledFunction()) {
6675 // __sanitizer_unaligned_{load,store} functions may be called by users
6676 // and always expects shadows in the TLS. So don't check them.
6677 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6678 }
6679
6680 unsigned ArgOffset = 0;
6681 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6682 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6683 if (!A->getType()->isSized()) {
6684 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6685 continue;
6686 }
6687
6688 if (A->getType()->isScalableTy()) {
6689 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6690 // Handle as noundef, but don't reserve tls slots.
6691 insertCheckShadowOf(A, &CB);
6692 continue;
6693 }
6694
6695 unsigned Size = 0;
6696 const DataLayout &DL = F.getDataLayout();
6697
6698 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6699 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6700 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6701
6702 if (EagerCheck) {
6703 insertCheckShadowOf(A, &CB);
6704 Size = DL.getTypeAllocSize(A->getType());
6705 } else {
6706 [[maybe_unused]] Value *Store = nullptr;
6707 // Compute the Shadow for arg even if it is ByVal, because
6708 // in that case getShadow() will copy the actual arg shadow to
6709 // __msan_param_tls.
6710 Value *ArgShadow = getShadow(A);
6711 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
6712 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
6713 << " Shadow: " << *ArgShadow << "\n");
6714 if (ByVal) {
6715 // ByVal requires some special handling as it's too big for a single
6716 // load
6717 assert(A->getType()->isPointerTy() &&
6718 "ByVal argument is not a pointer!");
6719 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
6720 if (ArgOffset + Size > kParamTLSSize)
6721 break;
6722 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
6723 MaybeAlign Alignment = std::nullopt;
6724 if (ParamAlignment)
6725 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
6726 Value *AShadowPtr, *AOriginPtr;
6727 std::tie(AShadowPtr, AOriginPtr) =
6728 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
6729 /*isStore*/ false);
6730 if (!PropagateShadow) {
6731 Store = IRB.CreateMemSet(ArgShadowBase,
6733 Size, Alignment);
6734 } else {
6735 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
6736 Alignment, Size);
6737 if (MS.TrackOrigins) {
6738 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
6739 // FIXME: OriginSize should be:
6740 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
6741 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
6742 IRB.CreateMemCpy(
6743 ArgOriginBase,
6744 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
6745 AOriginPtr,
6746 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
6747 }
6748 }
6749 } else {
6750 // Any other parameters mean we need bit-grained tracking of uninit
6751 // data
6752 Size = DL.getTypeAllocSize(A->getType());
6753 if (ArgOffset + Size > kParamTLSSize)
6754 break;
6755 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
6757 Constant *Cst = dyn_cast<Constant>(ArgShadow);
6758 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
6759 IRB.CreateStore(getOrigin(A),
6760 getOriginPtrForArgument(IRB, ArgOffset));
6761 }
6762 }
6763 assert(Store != nullptr);
6764 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
6765 }
6766 assert(Size != 0);
6767 ArgOffset += alignTo(Size, kShadowTLSAlignment);
6768 }
6769 LLVM_DEBUG(dbgs() << " done with call args\n");
6770
6771 FunctionType *FT = CB.getFunctionType();
6772 if (FT->isVarArg()) {
6773 VAHelper->visitCallBase(CB, IRB);
6774 }
6775
6776 // Now, get the shadow for the RetVal.
6777 if (!CB.getType()->isSized())
6778 return;
6779 // Don't emit the epilogue for musttail call returns.
6780 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
6781 return;
6782
6783 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
6784 setShadow(&CB, getCleanShadow(&CB));
6785 setOrigin(&CB, getCleanOrigin());
6786 return;
6787 }
6788
6789 IRBuilder<> IRBBefore(&CB);
6790 // Until we have full dynamic coverage, make sure the retval shadow is 0.
6791 Value *Base = getShadowPtrForRetval(IRBBefore);
6792 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
6794 BasicBlock::iterator NextInsn;
6795 if (isa<CallInst>(CB)) {
6796 NextInsn = ++CB.getIterator();
6797 assert(NextInsn != CB.getParent()->end());
6798 } else {
6799 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
6800 if (!NormalDest->getSinglePredecessor()) {
6801 // FIXME: this case is tricky, so we are just conservative here.
6802 // Perhaps we need to split the edge between this BB and NormalDest,
6803 // but a naive attempt to use SplitEdge leads to a crash.
6804 setShadow(&CB, getCleanShadow(&CB));
6805 setOrigin(&CB, getCleanOrigin());
6806 return;
6807 }
6808 // FIXME: NextInsn is likely in a basic block that has not been visited
6809 // yet. Anything inserted there will be instrumented by MSan later!
6810 NextInsn = NormalDest->getFirstInsertionPt();
6811 assert(NextInsn != NormalDest->end() &&
6812 "Could not find insertion point for retval shadow load");
6813 }
6814 IRBuilder<> IRBAfter(&*NextInsn);
6815 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
6816 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
6817 "_msret");
6818 setShadow(&CB, RetvalShadow);
6819 if (MS.TrackOrigins)
6820 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
6821 }
6822
6823 bool isAMustTailRetVal(Value *RetVal) {
6824 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
6825 RetVal = I->getOperand(0);
6826 }
6827 if (auto *I = dyn_cast<CallInst>(RetVal)) {
6828 return I->isMustTailCall();
6829 }
6830 return false;
6831 }
6832
6833 void visitReturnInst(ReturnInst &I) {
6834 IRBuilder<> IRB(&I);
6835 Value *RetVal = I.getReturnValue();
6836 if (!RetVal)
6837 return;
6838 // Don't emit the epilogue for musttail call returns.
6839 if (isAMustTailRetVal(RetVal))
6840 return;
6841 Value *ShadowPtr = getShadowPtrForRetval(IRB);
6842 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
6843 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
6844 // FIXME: Consider using SpecialCaseList to specify a list of functions that
6845 // must always return fully initialized values. For now, we hardcode "main".
6846 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
6847
6848 Value *Shadow = getShadow(RetVal);
6849 bool StoreOrigin = true;
6850 if (EagerCheck) {
6851 insertCheckShadowOf(RetVal, &I);
6852 Shadow = getCleanShadow(RetVal);
6853 StoreOrigin = false;
6854 }
6855
6856 // The caller may still expect information passed over TLS if we pass our
6857 // check
6858 if (StoreShadow) {
6859 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
6860 if (MS.TrackOrigins && StoreOrigin)
6861 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
6862 }
6863 }
6864
6865 void visitPHINode(PHINode &I) {
6866 IRBuilder<> IRB(&I);
6867 if (!PropagateShadow) {
6868 setShadow(&I, getCleanShadow(&I));
6869 setOrigin(&I, getCleanOrigin());
6870 return;
6871 }
6872
6873 ShadowPHINodes.push_back(&I);
6874 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
6875 "_msphi_s"));
6876 if (MS.TrackOrigins)
6877 setOrigin(
6878 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
6879 }
6880
6881 Value *getLocalVarIdptr(AllocaInst &I) {
6882 ConstantInt *IntConst =
6883 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
6884 return new GlobalVariable(*F.getParent(), IntConst->getType(),
6885 /*isConstant=*/false, GlobalValue::PrivateLinkage,
6886 IntConst);
6887 }
6888
6889 Value *getLocalVarDescription(AllocaInst &I) {
6890 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
6891 }
6892
6893 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6894 if (PoisonStack && ClPoisonStackWithCall) {
6895 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
6896 } else {
6897 Value *ShadowBase, *OriginBase;
6898 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
6899 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
6900
6901 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
6902 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
6903 }
6904
6905 if (PoisonStack && MS.TrackOrigins) {
6906 Value *Idptr = getLocalVarIdptr(I);
6907 if (ClPrintStackNames) {
6908 Value *Descr = getLocalVarDescription(I);
6909 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
6910 {&I, Len, Idptr, Descr});
6911 } else {
6912 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
6913 }
6914 }
6915 }
6916
6917 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6918 Value *Descr = getLocalVarDescription(I);
6919 if (PoisonStack) {
6920 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
6921 } else {
6922 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
6923 }
6924 }
6925
6926 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
6927 if (!InsPoint)
6928 InsPoint = &I;
6929 NextNodeIRBuilder IRB(InsPoint);
6930 const DataLayout &DL = F.getDataLayout();
6931 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
6932 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
6933 if (I.isArrayAllocation())
6934 Len = IRB.CreateMul(Len,
6935 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
6936
6937 if (MS.CompileKernel)
6938 poisonAllocaKmsan(I, IRB, Len);
6939 else
6940 poisonAllocaUserspace(I, IRB, Len);
6941 }
6942
6943 void visitAllocaInst(AllocaInst &I) {
6944 setShadow(&I, getCleanShadow(&I));
6945 setOrigin(&I, getCleanOrigin());
6946 // We'll get to this alloca later unless it's poisoned at the corresponding
6947 // llvm.lifetime.start.
6948 AllocaSet.insert(&I);
6949 }
6950
6951 void visitSelectInst(SelectInst &I) {
6952 // a = select b, c, d
6953 Value *B = I.getCondition();
6954 Value *C = I.getTrueValue();
6955 Value *D = I.getFalseValue();
6956
6957 handleSelectLikeInst(I, B, C, D);
6958 }
6959
6960 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
6961 IRBuilder<> IRB(&I);
6962
6963 Value *Sb = getShadow(B);
6964 Value *Sc = getShadow(C);
6965 Value *Sd = getShadow(D);
6966
6967 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
6968 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
6969 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
6970
6971 // Result shadow if condition shadow is 0.
6972 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
6973 Value *Sa1;
6974 if (I.getType()->isAggregateType()) {
6975 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
6976 // an extra "select". This results in much more compact IR.
6977 // Sa = select Sb, poisoned, (select b, Sc, Sd)
6978 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
6979 } else {
6980 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
6981 // If Sb (condition is poisoned), look for bits in c and d that are equal
6982 // and both unpoisoned.
6983 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
6984
6985 // Cast arguments to shadow-compatible type.
6986 C = CreateAppToShadowCast(IRB, C);
6987 D = CreateAppToShadowCast(IRB, D);
6988
6989 // Result shadow if condition shadow is 1.
6990 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
6991 }
6992 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
6993 setShadow(&I, Sa);
6994 if (MS.TrackOrigins) {
6995 // Origins are always i32, so any vector conditions must be flattened.
6996 // FIXME: consider tracking vector origins for app vectors?
6997 if (B->getType()->isVectorTy()) {
6998 B = convertToBool(B, IRB);
6999 Sb = convertToBool(Sb, IRB);
7000 }
7001 // a = select b, c, d
7002 // Oa = Sb ? Ob : (b ? Oc : Od)
7003 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7004 }
7005 }
7006
7007 void visitLandingPadInst(LandingPadInst &I) {
7008 // Do nothing.
7009 // See https://github.com/google/sanitizers/issues/504
7010 setShadow(&I, getCleanShadow(&I));
7011 setOrigin(&I, getCleanOrigin());
7012 }
7013
7014 void visitCatchSwitchInst(CatchSwitchInst &I) {
7015 setShadow(&I, getCleanShadow(&I));
7016 setOrigin(&I, getCleanOrigin());
7017 }
7018
7019 void visitFuncletPadInst(FuncletPadInst &I) {
7020 setShadow(&I, getCleanShadow(&I));
7021 setOrigin(&I, getCleanOrigin());
7022 }
7023
7024 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7025
7026 void visitExtractValueInst(ExtractValueInst &I) {
7027 IRBuilder<> IRB(&I);
7028 Value *Agg = I.getAggregateOperand();
7029 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7030 Value *AggShadow = getShadow(Agg);
7031 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7032 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7033 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7034 setShadow(&I, ResShadow);
7035 setOriginForNaryOp(I);
7036 }
7037
7038 void visitInsertValueInst(InsertValueInst &I) {
7039 IRBuilder<> IRB(&I);
7040 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7041 Value *AggShadow = getShadow(I.getAggregateOperand());
7042 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7043 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7044 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7045 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7046 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7047 setShadow(&I, Res);
7048 setOriginForNaryOp(I);
7049 }
7050
7051 void dumpInst(Instruction &I) {
7052 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7053 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7054 } else {
7055 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7056 }
7057 errs() << "QQQ " << I << "\n";
7058 }
7059
7060 void visitResumeInst(ResumeInst &I) {
7061 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7062 // Nothing to do here.
7063 }
7064
7065 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7066 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7067 // Nothing to do here.
7068 }
7069
7070 void visitCatchReturnInst(CatchReturnInst &CRI) {
7071 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7072 // Nothing to do here.
7073 }
7074
7075 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7076 IRBuilder<> &IRB, const DataLayout &DL,
7077 bool isOutput) {
7078 // For each assembly argument, we check its value for being initialized.
7079 // If the argument is a pointer, we assume it points to a single element
7080 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7081 // Each such pointer is instrumented with a call to the runtime library.
7082 Type *OpType = Operand->getType();
7083 // Check the operand value itself.
7084 insertCheckShadowOf(Operand, &I);
7085 if (!OpType->isPointerTy() || !isOutput) {
7086 assert(!isOutput);
7087 return;
7088 }
7089 if (!ElemTy->isSized())
7090 return;
7091 auto Size = DL.getTypeStoreSize(ElemTy);
7092 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7093 if (MS.CompileKernel) {
7094 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7095 } else {
7096 // ElemTy, derived from elementtype(), does not encode the alignment of
7097 // the pointer. Conservatively assume that the shadow memory is unaligned.
7098 // When Size is large, avoid StoreInst as it would expand to many
7099 // instructions.
7100 auto [ShadowPtr, _] =
7101 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7102 if (Size <= 32)
7103 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7104 else
7105 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7106 SizeVal, Align(1));
7107 }
7108 }
7109
7110 /// Get the number of output arguments returned by pointers.
7111 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7112 int NumRetOutputs = 0;
7113 int NumOutputs = 0;
7114 Type *RetTy = cast<Value>(CB)->getType();
7115 if (!RetTy->isVoidTy()) {
7116 // Register outputs are returned via the CallInst return value.
7117 auto *ST = dyn_cast<StructType>(RetTy);
7118 if (ST)
7119 NumRetOutputs = ST->getNumElements();
7120 else
7121 NumRetOutputs = 1;
7122 }
7123 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7124 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7125 switch (Info.Type) {
7127 NumOutputs++;
7128 break;
7129 default:
7130 break;
7131 }
7132 }
7133 return NumOutputs - NumRetOutputs;
7134 }
7135
7136 void visitAsmInstruction(Instruction &I) {
7137 // Conservative inline assembly handling: check for poisoned shadow of
7138 // asm() arguments, then unpoison the result and all the memory locations
7139 // pointed to by those arguments.
7140 // An inline asm() statement in C++ contains lists of input and output
7141 // arguments used by the assembly code. These are mapped to operands of the
7142 // CallInst as follows:
7143 // - nR register outputs ("=r) are returned by value in a single structure
7144 // (SSA value of the CallInst);
7145 // - nO other outputs ("=m" and others) are returned by pointer as first
7146 // nO operands of the CallInst;
7147 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7148 // remaining nI operands.
7149 // The total number of asm() arguments in the source is nR+nO+nI, and the
7150 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7151 // function to be called).
7152 const DataLayout &DL = F.getDataLayout();
7153 CallBase *CB = cast<CallBase>(&I);
7154 IRBuilder<> IRB(&I);
7155 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7156 int OutputArgs = getNumOutputArgs(IA, CB);
7157 // The last operand of a CallInst is the function itself.
7158 int NumOperands = CB->getNumOperands() - 1;
7159
7160 // Check input arguments. Doing so before unpoisoning output arguments, so
7161 // that we won't overwrite uninit values before checking them.
7162 for (int i = OutputArgs; i < NumOperands; i++) {
7163 Value *Operand = CB->getOperand(i);
7164 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7165 /*isOutput*/ false);
7166 }
7167 // Unpoison output arguments. This must happen before the actual InlineAsm
7168 // call, so that the shadow for memory published in the asm() statement
7169 // remains valid.
7170 for (int i = 0; i < OutputArgs; i++) {
7171 Value *Operand = CB->getOperand(i);
7172 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7173 /*isOutput*/ true);
7174 }
7175
7176 setShadow(&I, getCleanShadow(&I));
7177 setOrigin(&I, getCleanOrigin());
7178 }
7179
7180 void visitFreezeInst(FreezeInst &I) {
7181 // Freeze always returns a fully defined value.
7182 setShadow(&I, getCleanShadow(&I));
7183 setOrigin(&I, getCleanOrigin());
7184 }
7185
7186 void visitInstruction(Instruction &I) {
7187 // Everything else: stop propagating and check for poisoned shadow.
7189 dumpInst(I);
7190 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7191 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7192 Value *Operand = I.getOperand(i);
7193 if (Operand->getType()->isSized())
7194 insertCheckShadowOf(Operand, &I);
7195 }
7196 setShadow(&I, getCleanShadow(&I));
7197 setOrigin(&I, getCleanOrigin());
7198 }
7199};
7200
7201struct VarArgHelperBase : public VarArgHelper {
7202 Function &F;
7203 MemorySanitizer &MS;
7204 MemorySanitizerVisitor &MSV;
7205 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7206 const unsigned VAListTagSize;
7207
7208 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7209 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7210 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7211
7212 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7213 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7214 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7215 }
7216
7217 /// Compute the shadow address for a given va_arg.
7218 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7219 return IRB.CreatePtrAdd(
7220 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7221 }
7222
7223 /// Compute the shadow address for a given va_arg.
7224 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7225 unsigned ArgSize) {
7226 // Make sure we don't overflow __msan_va_arg_tls.
7227 if (ArgOffset + ArgSize > kParamTLSSize)
7228 return nullptr;
7229 return getShadowPtrForVAArgument(IRB, ArgOffset);
7230 }
7231
7232 /// Compute the origin address for a given va_arg.
7233 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7234 // getOriginPtrForVAArgument() is always called after
7235 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7236 // overflow.
7237 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7238 ConstantInt::get(MS.IntptrTy, ArgOffset),
7239 "_msarg_va_o");
7240 }
7241
7242 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7243 unsigned BaseOffset) {
7244 // The tails of __msan_va_arg_tls is not large enough to fit full
7245 // value shadow, but it will be copied to backup anyway. Make it
7246 // clean.
7247 if (BaseOffset >= kParamTLSSize)
7248 return;
7249 Value *TailSize =
7250 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7251 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7252 TailSize, Align(8));
7253 }
7254
7255 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7256 IRBuilder<> IRB(&I);
7257 Value *VAListTag = I.getArgOperand(0);
7258 const Align Alignment = Align(8);
7259 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7260 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7261 // Unpoison the whole __va_list_tag.
7262 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7263 VAListTagSize, Alignment, false);
7264 }
7265
7266 void visitVAStartInst(VAStartInst &I) override {
7267 if (F.getCallingConv() == CallingConv::Win64)
7268 return;
7269 VAStartInstrumentationList.push_back(&I);
7270 unpoisonVAListTagForInst(I);
7271 }
7272
7273 void visitVACopyInst(VACopyInst &I) override {
7274 if (F.getCallingConv() == CallingConv::Win64)
7275 return;
7276 unpoisonVAListTagForInst(I);
7277 }
7278};
7279
7280/// AMD64-specific implementation of VarArgHelper.
7281struct VarArgAMD64Helper : public VarArgHelperBase {
7282 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7283 // See a comment in visitCallBase for more details.
7284 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7285 static const unsigned AMD64FpEndOffsetSSE = 176;
7286 // If SSE is disabled, fp_offset in va_list is zero.
7287 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7288
7289 unsigned AMD64FpEndOffset;
7290 AllocaInst *VAArgTLSCopy = nullptr;
7291 AllocaInst *VAArgTLSOriginCopy = nullptr;
7292 Value *VAArgOverflowSize = nullptr;
7293
7294 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7295
7296 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7297 MemorySanitizerVisitor &MSV)
7298 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7299 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7300 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7301 if (Attr.isStringAttribute() &&
7302 (Attr.getKindAsString() == "target-features")) {
7303 if (Attr.getValueAsString().contains("-sse"))
7304 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7305 break;
7306 }
7307 }
7308 }
7309
7310 ArgKind classifyArgument(Value *arg) {
7311 // A very rough approximation of X86_64 argument classification rules.
7312 Type *T = arg->getType();
7313 if (T->isX86_FP80Ty())
7314 return AK_Memory;
7315 if (T->isFPOrFPVectorTy())
7316 return AK_FloatingPoint;
7317 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7318 return AK_GeneralPurpose;
7319 if (T->isPointerTy())
7320 return AK_GeneralPurpose;
7321 return AK_Memory;
7322 }
7323
7324 // For VarArg functions, store the argument shadow in an ABI-specific format
7325 // that corresponds to va_list layout.
7326 // We do this because Clang lowers va_arg in the frontend, and this pass
7327 // only sees the low level code that deals with va_list internals.
7328 // A much easier alternative (provided that Clang emits va_arg instructions)
7329 // would have been to associate each live instance of va_list with a copy of
7330 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7331 // order.
7332 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7333 unsigned GpOffset = 0;
7334 unsigned FpOffset = AMD64GpEndOffset;
7335 unsigned OverflowOffset = AMD64FpEndOffset;
7336 const DataLayout &DL = F.getDataLayout();
7337
7338 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7339 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7340 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7341 if (IsByVal) {
7342 // ByVal arguments always go to the overflow area.
7343 // Fixed arguments passed through the overflow area will be stepped
7344 // over by va_start, so don't count them towards the offset.
7345 if (IsFixed)
7346 continue;
7347 assert(A->getType()->isPointerTy());
7348 Type *RealTy = CB.getParamByValType(ArgNo);
7349 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7350 uint64_t AlignedSize = alignTo(ArgSize, 8);
7351 unsigned BaseOffset = OverflowOffset;
7352 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7353 Value *OriginBase = nullptr;
7354 if (MS.TrackOrigins)
7355 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7356 OverflowOffset += AlignedSize;
7357
7358 if (OverflowOffset > kParamTLSSize) {
7359 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7360 continue; // We have no space to copy shadow there.
7361 }
7362
7363 Value *ShadowPtr, *OriginPtr;
7364 std::tie(ShadowPtr, OriginPtr) =
7365 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7366 /*isStore*/ false);
7367 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7368 kShadowTLSAlignment, ArgSize);
7369 if (MS.TrackOrigins)
7370 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7371 kShadowTLSAlignment, ArgSize);
7372 } else {
7373 ArgKind AK = classifyArgument(A);
7374 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7375 AK = AK_Memory;
7376 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7377 AK = AK_Memory;
7378 Value *ShadowBase, *OriginBase = nullptr;
7379 switch (AK) {
7380 case AK_GeneralPurpose:
7381 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7382 if (MS.TrackOrigins)
7383 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7384 GpOffset += 8;
7385 assert(GpOffset <= kParamTLSSize);
7386 break;
7387 case AK_FloatingPoint:
7388 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7389 if (MS.TrackOrigins)
7390 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7391 FpOffset += 16;
7392 assert(FpOffset <= kParamTLSSize);
7393 break;
7394 case AK_Memory:
7395 if (IsFixed)
7396 continue;
7397 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7398 uint64_t AlignedSize = alignTo(ArgSize, 8);
7399 unsigned BaseOffset = OverflowOffset;
7400 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7401 if (MS.TrackOrigins) {
7402 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7403 }
7404 OverflowOffset += AlignedSize;
7405 if (OverflowOffset > kParamTLSSize) {
7406 // We have no space to copy shadow there.
7407 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7408 continue;
7409 }
7410 }
7411 // Take fixed arguments into account for GpOffset and FpOffset,
7412 // but don't actually store shadows for them.
7413 // TODO(glider): don't call get*PtrForVAArgument() for them.
7414 if (IsFixed)
7415 continue;
7416 Value *Shadow = MSV.getShadow(A);
7417 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7418 if (MS.TrackOrigins) {
7419 Value *Origin = MSV.getOrigin(A);
7420 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7421 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7423 }
7424 }
7425 }
7426 Constant *OverflowSize =
7427 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7428 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7429 }
7430
7431 void finalizeInstrumentation() override {
7432 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7433 "finalizeInstrumentation called twice");
7434 if (!VAStartInstrumentationList.empty()) {
7435 // If there is a va_start in this function, make a backup copy of
7436 // va_arg_tls somewhere in the function entry block.
7437 IRBuilder<> IRB(MSV.FnPrologueEnd);
7438 VAArgOverflowSize =
7439 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7440 Value *CopySize = IRB.CreateAdd(
7441 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7442 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7443 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7444 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7445 CopySize, kShadowTLSAlignment, false);
7446
7447 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7448 Intrinsic::umin, CopySize,
7449 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7450 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7451 kShadowTLSAlignment, SrcSize);
7452 if (MS.TrackOrigins) {
7453 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7454 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7455 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7456 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7457 }
7458 }
7459
7460 // Instrument va_start.
7461 // Copy va_list shadow from the backup copy of the TLS contents.
7462 for (CallInst *OrigInst : VAStartInstrumentationList) {
7463 NextNodeIRBuilder IRB(OrigInst);
7464 Value *VAListTag = OrigInst->getArgOperand(0);
7465
7466 Value *RegSaveAreaPtrPtr =
7467 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7468 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7469 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7470 const Align Alignment = Align(16);
7471 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7472 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7473 Alignment, /*isStore*/ true);
7474 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7475 AMD64FpEndOffset);
7476 if (MS.TrackOrigins)
7477 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7478 Alignment, AMD64FpEndOffset);
7479 Value *OverflowArgAreaPtrPtr =
7480 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7481 Value *OverflowArgAreaPtr =
7482 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7483 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7484 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7485 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7486 Alignment, /*isStore*/ true);
7487 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7488 AMD64FpEndOffset);
7489 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7490 VAArgOverflowSize);
7491 if (MS.TrackOrigins) {
7492 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7493 AMD64FpEndOffset);
7494 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7495 VAArgOverflowSize);
7496 }
7497 }
7498 }
7499};
7500
7501/// AArch64-specific implementation of VarArgHelper.
7502struct VarArgAArch64Helper : public VarArgHelperBase {
7503 static const unsigned kAArch64GrArgSize = 64;
7504 static const unsigned kAArch64VrArgSize = 128;
7505
7506 static const unsigned AArch64GrBegOffset = 0;
7507 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7508 // Make VR space aligned to 16 bytes.
7509 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7510 static const unsigned AArch64VrEndOffset =
7511 AArch64VrBegOffset + kAArch64VrArgSize;
7512 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7513
7514 AllocaInst *VAArgTLSCopy = nullptr;
7515 Value *VAArgOverflowSize = nullptr;
7516
7517 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7518
7519 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7520 MemorySanitizerVisitor &MSV)
7521 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7522
7523 // A very rough approximation of aarch64 argument classification rules.
7524 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7525 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7526 return {AK_GeneralPurpose, 1};
7527 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7528 return {AK_FloatingPoint, 1};
7529
7530 if (T->isArrayTy()) {
7531 auto R = classifyArgument(T->getArrayElementType());
7532 R.second *= T->getScalarType()->getArrayNumElements();
7533 return R;
7534 }
7535
7536 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7537 auto R = classifyArgument(FV->getScalarType());
7538 R.second *= FV->getNumElements();
7539 return R;
7540 }
7541
7542 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7543 return {AK_Memory, 0};
7544 }
7545
7546 // The instrumentation stores the argument shadow in a non ABI-specific
7547 // format because it does not know which argument is named (since Clang,
7548 // like x86_64 case, lowers the va_args in the frontend and this pass only
7549 // sees the low level code that deals with va_list internals).
7550 // The first seven GR registers are saved in the first 56 bytes of the
7551 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7552 // the remaining arguments.
7553 // Using constant offset within the va_arg TLS array allows fast copy
7554 // in the finalize instrumentation.
7555 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7556 unsigned GrOffset = AArch64GrBegOffset;
7557 unsigned VrOffset = AArch64VrBegOffset;
7558 unsigned OverflowOffset = AArch64VAEndOffset;
7559
7560 const DataLayout &DL = F.getDataLayout();
7561 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7562 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7563 auto [AK, RegNum] = classifyArgument(A->getType());
7564 if (AK == AK_GeneralPurpose &&
7565 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7566 AK = AK_Memory;
7567 if (AK == AK_FloatingPoint &&
7568 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7569 AK = AK_Memory;
7570 Value *Base;
7571 switch (AK) {
7572 case AK_GeneralPurpose:
7573 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7574 GrOffset += 8 * RegNum;
7575 break;
7576 case AK_FloatingPoint:
7577 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7578 VrOffset += 16 * RegNum;
7579 break;
7580 case AK_Memory:
7581 // Don't count fixed arguments in the overflow area - va_start will
7582 // skip right over them.
7583 if (IsFixed)
7584 continue;
7585 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7586 uint64_t AlignedSize = alignTo(ArgSize, 8);
7587 unsigned BaseOffset = OverflowOffset;
7588 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7589 OverflowOffset += AlignedSize;
7590 if (OverflowOffset > kParamTLSSize) {
7591 // We have no space to copy shadow there.
7592 CleanUnusedTLS(IRB, Base, BaseOffset);
7593 continue;
7594 }
7595 break;
7596 }
7597 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7598 // bother to actually store a shadow.
7599 if (IsFixed)
7600 continue;
7601 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7602 }
7603 Constant *OverflowSize =
7604 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7605 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7606 }
7607
7608 // Retrieve a va_list field of 'void*' size.
7609 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7610 Value *SaveAreaPtrPtr =
7611 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7612 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7613 }
7614
7615 // Retrieve a va_list field of 'int' size.
7616 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7617 Value *SaveAreaPtr =
7618 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7619 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7620 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7621 }
7622
7623 void finalizeInstrumentation() override {
7624 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7625 "finalizeInstrumentation called twice");
7626 if (!VAStartInstrumentationList.empty()) {
7627 // If there is a va_start in this function, make a backup copy of
7628 // va_arg_tls somewhere in the function entry block.
7629 IRBuilder<> IRB(MSV.FnPrologueEnd);
7630 VAArgOverflowSize =
7631 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7632 Value *CopySize = IRB.CreateAdd(
7633 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7634 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7635 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7636 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7637 CopySize, kShadowTLSAlignment, false);
7638
7639 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7640 Intrinsic::umin, CopySize,
7641 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7642 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7643 kShadowTLSAlignment, SrcSize);
7644 }
7645
7646 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7647 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7648
7649 // Instrument va_start, copy va_list shadow from the backup copy of
7650 // the TLS contents.
7651 for (CallInst *OrigInst : VAStartInstrumentationList) {
7652 NextNodeIRBuilder IRB(OrigInst);
7653
7654 Value *VAListTag = OrigInst->getArgOperand(0);
7655
7656 // The variadic ABI for AArch64 creates two areas to save the incoming
7657 // argument registers (one for 64-bit general register xn-x7 and another
7658 // for 128-bit FP/SIMD vn-v7).
7659 // We need then to propagate the shadow arguments on both regions
7660 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7661 // The remaining arguments are saved on shadow for 'va::stack'.
7662 // One caveat is it requires only to propagate the non-named arguments,
7663 // however on the call site instrumentation 'all' the arguments are
7664 // saved. So to copy the shadow values from the va_arg TLS array
7665 // we need to adjust the offset for both GR and VR fields based on
7666 // the __{gr,vr}_offs value (since they are stores based on incoming
7667 // named arguments).
7668 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7669
7670 // Read the stack pointer from the va_list.
7671 Value *StackSaveAreaPtr =
7672 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7673
7674 // Read both the __gr_top and __gr_off and add them up.
7675 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7676 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7677
7678 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7679 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7680
7681 // Read both the __vr_top and __vr_off and add them up.
7682 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7683 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7684
7685 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7686 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7687
7688 // It does not know how many named arguments is being used and, on the
7689 // callsite all the arguments were saved. Since __gr_off is defined as
7690 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7691 // argument by ignoring the bytes of shadow from named arguments.
7692 Value *GrRegSaveAreaShadowPtrOff =
7693 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7694
7695 Value *GrRegSaveAreaShadowPtr =
7696 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7697 Align(8), /*isStore*/ true)
7698 .first;
7699
7700 Value *GrSrcPtr =
7701 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
7702 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
7703
7704 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
7705 GrCopySize);
7706
7707 // Again, but for FP/SIMD values.
7708 Value *VrRegSaveAreaShadowPtrOff =
7709 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
7710
7711 Value *VrRegSaveAreaShadowPtr =
7712 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7713 Align(8), /*isStore*/ true)
7714 .first;
7715
7716 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
7717 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
7718 IRB.getInt32(AArch64VrBegOffset)),
7719 VrRegSaveAreaShadowPtrOff);
7720 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
7721
7722 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
7723 VrCopySize);
7724
7725 // And finally for remaining arguments.
7726 Value *StackSaveAreaShadowPtr =
7727 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
7728 Align(16), /*isStore*/ true)
7729 .first;
7730
7731 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
7732 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
7733
7734 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
7735 Align(16), VAArgOverflowSize);
7736 }
7737 }
7738};
7739
7740/// PowerPC64-specific implementation of VarArgHelper.
7741struct VarArgPowerPC64Helper : public VarArgHelperBase {
7742 AllocaInst *VAArgTLSCopy = nullptr;
7743 Value *VAArgSize = nullptr;
7744
7745 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
7746 MemorySanitizerVisitor &MSV)
7747 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
7748
7749 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7750 // For PowerPC, we need to deal with alignment of stack arguments -
7751 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
7752 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
7753 // For that reason, we compute current offset from stack pointer (which is
7754 // always properly aligned), and offset for the first vararg, then subtract
7755 // them.
7756 unsigned VAArgBase;
7757 Triple TargetTriple(F.getParent()->getTargetTriple());
7758 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
7759 // and 32 bytes for ABIv2. This is usually determined by target
7760 // endianness, but in theory could be overridden by function attribute.
7761 if (TargetTriple.isPPC64ELFv2ABI())
7762 VAArgBase = 32;
7763 else
7764 VAArgBase = 48;
7765 unsigned VAArgOffset = VAArgBase;
7766 const DataLayout &DL = F.getDataLayout();
7767 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7768 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7769 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7770 if (IsByVal) {
7771 assert(A->getType()->isPointerTy());
7772 Type *RealTy = CB.getParamByValType(ArgNo);
7773 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7774 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
7775 if (ArgAlign < 8)
7776 ArgAlign = Align(8);
7777 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7778 if (!IsFixed) {
7779 Value *Base =
7780 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7781 if (Base) {
7782 Value *AShadowPtr, *AOriginPtr;
7783 std::tie(AShadowPtr, AOriginPtr) =
7784 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7785 kShadowTLSAlignment, /*isStore*/ false);
7786
7787 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7788 kShadowTLSAlignment, ArgSize);
7789 }
7790 }
7791 VAArgOffset += alignTo(ArgSize, Align(8));
7792 } else {
7793 Value *Base;
7794 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7795 Align ArgAlign = Align(8);
7796 if (A->getType()->isArrayTy()) {
7797 // Arrays are aligned to element size, except for long double
7798 // arrays, which are aligned to 8 bytes.
7799 Type *ElementTy = A->getType()->getArrayElementType();
7800 if (!ElementTy->isPPC_FP128Ty())
7801 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7802 } else if (A->getType()->isVectorTy()) {
7803 // Vectors are naturally aligned.
7804 ArgAlign = Align(ArgSize);
7805 }
7806 if (ArgAlign < 8)
7807 ArgAlign = Align(8);
7808 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7809 if (DL.isBigEndian()) {
7810 // Adjusting the shadow for argument with size < 8 to match the
7811 // placement of bits in big endian system
7812 if (ArgSize < 8)
7813 VAArgOffset += (8 - ArgSize);
7814 }
7815 if (!IsFixed) {
7816 Base =
7817 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7818 if (Base)
7819 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7820 }
7821 VAArgOffset += ArgSize;
7822 VAArgOffset = alignTo(VAArgOffset, Align(8));
7823 }
7824 if (IsFixed)
7825 VAArgBase = VAArgOffset;
7826 }
7827
7828 Constant *TotalVAArgSize =
7829 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7830 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7831 // a new class member i.e. it is the total size of all VarArgs.
7832 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7833 }
7834
7835 void finalizeInstrumentation() override {
7836 assert(!VAArgSize && !VAArgTLSCopy &&
7837 "finalizeInstrumentation called twice");
7838 IRBuilder<> IRB(MSV.FnPrologueEnd);
7839 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7840 Value *CopySize = VAArgSize;
7841
7842 if (!VAStartInstrumentationList.empty()) {
7843 // If there is a va_start in this function, make a backup copy of
7844 // va_arg_tls somewhere in the function entry block.
7845
7846 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7847 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7848 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7849 CopySize, kShadowTLSAlignment, false);
7850
7851 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7852 Intrinsic::umin, CopySize,
7853 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
7854 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7855 kShadowTLSAlignment, SrcSize);
7856 }
7857
7858 // Instrument va_start.
7859 // Copy va_list shadow from the backup copy of the TLS contents.
7860 for (CallInst *OrigInst : VAStartInstrumentationList) {
7861 NextNodeIRBuilder IRB(OrigInst);
7862 Value *VAListTag = OrigInst->getArgOperand(0);
7863 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
7864
7865 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
7866
7867 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7868 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7869 const DataLayout &DL = F.getDataLayout();
7870 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7871 const Align Alignment = Align(IntptrSize);
7872 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7873 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7874 Alignment, /*isStore*/ true);
7875 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7876 CopySize);
7877 }
7878 }
7879};
7880
7881/// PowerPC32-specific implementation of VarArgHelper.
7882struct VarArgPowerPC32Helper : public VarArgHelperBase {
7883 AllocaInst *VAArgTLSCopy = nullptr;
7884 Value *VAArgSize = nullptr;
7885
7886 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
7887 MemorySanitizerVisitor &MSV)
7888 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
7889
7890 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7891 unsigned VAArgBase;
7892 // Parameter save area is 8 bytes from frame pointer in PPC32
7893 VAArgBase = 8;
7894 unsigned VAArgOffset = VAArgBase;
7895 const DataLayout &DL = F.getDataLayout();
7896 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7897 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7898 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7899 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7900 if (IsByVal) {
7901 assert(A->getType()->isPointerTy());
7902 Type *RealTy = CB.getParamByValType(ArgNo);
7903 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7904 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
7905 if (ArgAlign < IntptrSize)
7906 ArgAlign = Align(IntptrSize);
7907 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7908 if (!IsFixed) {
7909 Value *Base =
7910 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7911 if (Base) {
7912 Value *AShadowPtr, *AOriginPtr;
7913 std::tie(AShadowPtr, AOriginPtr) =
7914 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7915 kShadowTLSAlignment, /*isStore*/ false);
7916
7917 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7918 kShadowTLSAlignment, ArgSize);
7919 }
7920 }
7921 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
7922 } else {
7923 Value *Base;
7924 Type *ArgTy = A->getType();
7925
7926 // On PPC 32 floating point variable arguments are stored in separate
7927 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
7928 // them as they will be found when checking call arguments.
7929 if (!ArgTy->isFloatingPointTy()) {
7930 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
7931 Align ArgAlign = Align(IntptrSize);
7932 if (ArgTy->isArrayTy()) {
7933 // Arrays are aligned to element size, except for long double
7934 // arrays, which are aligned to 8 bytes.
7935 Type *ElementTy = ArgTy->getArrayElementType();
7936 if (!ElementTy->isPPC_FP128Ty())
7937 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7938 } else if (ArgTy->isVectorTy()) {
7939 // Vectors are naturally aligned.
7940 ArgAlign = Align(ArgSize);
7941 }
7942 if (ArgAlign < IntptrSize)
7943 ArgAlign = Align(IntptrSize);
7944 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7945 if (DL.isBigEndian()) {
7946 // Adjusting the shadow for argument with size < IntptrSize to match
7947 // the placement of bits in big endian system
7948 if (ArgSize < IntptrSize)
7949 VAArgOffset += (IntptrSize - ArgSize);
7950 }
7951 if (!IsFixed) {
7952 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
7953 ArgSize);
7954 if (Base)
7955 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
7957 }
7958 VAArgOffset += ArgSize;
7959 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
7960 }
7961 }
7962 }
7963
7964 Constant *TotalVAArgSize =
7965 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7966 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7967 // a new class member i.e. it is the total size of all VarArgs.
7968 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7969 }
7970
7971 void finalizeInstrumentation() override {
7972 assert(!VAArgSize && !VAArgTLSCopy &&
7973 "finalizeInstrumentation called twice");
7974 IRBuilder<> IRB(MSV.FnPrologueEnd);
7975 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
7976 Value *CopySize = VAArgSize;
7977
7978 if (!VAStartInstrumentationList.empty()) {
7979 // If there is a va_start in this function, make a backup copy of
7980 // va_arg_tls somewhere in the function entry block.
7981
7982 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7983 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7984 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7985 CopySize, kShadowTLSAlignment, false);
7986
7987 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7988 Intrinsic::umin, CopySize,
7989 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7990 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7991 kShadowTLSAlignment, SrcSize);
7992 }
7993
7994 // Instrument va_start.
7995 // Copy va_list shadow from the backup copy of the TLS contents.
7996 for (CallInst *OrigInst : VAStartInstrumentationList) {
7997 NextNodeIRBuilder IRB(OrigInst);
7998 Value *VAListTag = OrigInst->getArgOperand(0);
7999 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8000 Value *RegSaveAreaSize = CopySize;
8001
8002 // In PPC32 va_list_tag is a struct
8003 RegSaveAreaPtrPtr =
8004 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8005
8006 // On PPC 32 reg_save_area can only hold 32 bytes of data
8007 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8008 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8009
8010 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8011 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8012
8013 const DataLayout &DL = F.getDataLayout();
8014 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8015 const Align Alignment = Align(IntptrSize);
8016
8017 { // Copy reg save area
8018 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8019 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8020 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8021 Alignment, /*isStore*/ true);
8022 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8023 Alignment, RegSaveAreaSize);
8024
8025 RegSaveAreaShadowPtr =
8026 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8027 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8028 ConstantInt::get(MS.IntptrTy, 32));
8029 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8030 // We fill fp shadow with zeroes as uninitialized fp args should have
8031 // been found during call base check
8032 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8033 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8034 }
8035
8036 { // Copy overflow area
8037 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8038 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8039
8040 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8041 OverflowAreaPtrPtr =
8042 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8043 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8044
8045 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8046
8047 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8048 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8049 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8050 Alignment, /*isStore*/ true);
8051
8052 Value *OverflowVAArgTLSCopyPtr =
8053 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8054 OverflowVAArgTLSCopyPtr =
8055 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8056
8057 OverflowVAArgTLSCopyPtr =
8058 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8059 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8060 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8061 }
8062 }
8063 }
8064};
8065
8066/// SystemZ-specific implementation of VarArgHelper.
8067struct VarArgSystemZHelper : public VarArgHelperBase {
8068 static const unsigned SystemZGpOffset = 16;
8069 static const unsigned SystemZGpEndOffset = 56;
8070 static const unsigned SystemZFpOffset = 128;
8071 static const unsigned SystemZFpEndOffset = 160;
8072 static const unsigned SystemZMaxVrArgs = 8;
8073 static const unsigned SystemZRegSaveAreaSize = 160;
8074 static const unsigned SystemZOverflowOffset = 160;
8075 static const unsigned SystemZVAListTagSize = 32;
8076 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8077 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8078
8079 bool IsSoftFloatABI;
8080 AllocaInst *VAArgTLSCopy = nullptr;
8081 AllocaInst *VAArgTLSOriginCopy = nullptr;
8082 Value *VAArgOverflowSize = nullptr;
8083
8084 enum class ArgKind {
8085 GeneralPurpose,
8086 FloatingPoint,
8087 Vector,
8088 Memory,
8089 Indirect,
8090 };
8091
8092 enum class ShadowExtension { None, Zero, Sign };
8093
8094 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8095 MemorySanitizerVisitor &MSV)
8096 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8097 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8098
8099 ArgKind classifyArgument(Type *T) {
8100 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8101 // only a few possibilities of what it can be. In particular, enums, single
8102 // element structs and large types have already been taken care of.
8103
8104 // Some i128 and fp128 arguments are converted to pointers only in the
8105 // back end.
8106 if (T->isIntegerTy(128) || T->isFP128Ty())
8107 return ArgKind::Indirect;
8108 if (T->isFloatingPointTy())
8109 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8110 if (T->isIntegerTy() || T->isPointerTy())
8111 return ArgKind::GeneralPurpose;
8112 if (T->isVectorTy())
8113 return ArgKind::Vector;
8114 return ArgKind::Memory;
8115 }
8116
8117 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8118 // ABI says: "One of the simple integer types no more than 64 bits wide.
8119 // ... If such an argument is shorter than 64 bits, replace it by a full
8120 // 64-bit integer representing the same number, using sign or zero
8121 // extension". Shadow for an integer argument has the same type as the
8122 // argument itself, so it can be sign or zero extended as well.
8123 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8124 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8125 if (ZExt) {
8126 assert(!SExt);
8127 return ShadowExtension::Zero;
8128 }
8129 if (SExt) {
8130 assert(!ZExt);
8131 return ShadowExtension::Sign;
8132 }
8133 return ShadowExtension::None;
8134 }
8135
8136 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8137 unsigned GpOffset = SystemZGpOffset;
8138 unsigned FpOffset = SystemZFpOffset;
8139 unsigned VrIndex = 0;
8140 unsigned OverflowOffset = SystemZOverflowOffset;
8141 const DataLayout &DL = F.getDataLayout();
8142 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8143 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8144 // SystemZABIInfo does not produce ByVal parameters.
8145 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8146 Type *T = A->getType();
8147 ArgKind AK = classifyArgument(T);
8148 if (AK == ArgKind::Indirect) {
8149 T = MS.PtrTy;
8150 AK = ArgKind::GeneralPurpose;
8151 }
8152 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8153 AK = ArgKind::Memory;
8154 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8155 AK = ArgKind::Memory;
8156 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8157 AK = ArgKind::Memory;
8158 Value *ShadowBase = nullptr;
8159 Value *OriginBase = nullptr;
8160 ShadowExtension SE = ShadowExtension::None;
8161 switch (AK) {
8162 case ArgKind::GeneralPurpose: {
8163 // Always keep track of GpOffset, but store shadow only for varargs.
8164 uint64_t ArgSize = 8;
8165 if (GpOffset + ArgSize <= kParamTLSSize) {
8166 if (!IsFixed) {
8167 SE = getShadowExtension(CB, ArgNo);
8168 uint64_t GapSize = 0;
8169 if (SE == ShadowExtension::None) {
8170 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8171 assert(ArgAllocSize <= ArgSize);
8172 GapSize = ArgSize - ArgAllocSize;
8173 }
8174 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8175 if (MS.TrackOrigins)
8176 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8177 }
8178 GpOffset += ArgSize;
8179 } else {
8180 GpOffset = kParamTLSSize;
8181 }
8182 break;
8183 }
8184 case ArgKind::FloatingPoint: {
8185 // Always keep track of FpOffset, but store shadow only for varargs.
8186 uint64_t ArgSize = 8;
8187 if (FpOffset + ArgSize <= kParamTLSSize) {
8188 if (!IsFixed) {
8189 // PoP says: "A short floating-point datum requires only the
8190 // left-most 32 bit positions of a floating-point register".
8191 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8192 // don't extend shadow and don't mind the gap.
8193 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8194 if (MS.TrackOrigins)
8195 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8196 }
8197 FpOffset += ArgSize;
8198 } else {
8199 FpOffset = kParamTLSSize;
8200 }
8201 break;
8202 }
8203 case ArgKind::Vector: {
8204 // Keep track of VrIndex. No need to store shadow, since vector varargs
8205 // go through AK_Memory.
8206 assert(IsFixed);
8207 VrIndex++;
8208 break;
8209 }
8210 case ArgKind::Memory: {
8211 // Keep track of OverflowOffset and store shadow only for varargs.
8212 // Ignore fixed args, since we need to copy only the vararg portion of
8213 // the overflow area shadow.
8214 if (!IsFixed) {
8215 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8216 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8217 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8218 SE = getShadowExtension(CB, ArgNo);
8219 uint64_t GapSize =
8220 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8221 ShadowBase =
8222 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8223 if (MS.TrackOrigins)
8224 OriginBase =
8225 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8226 OverflowOffset += ArgSize;
8227 } else {
8228 OverflowOffset = kParamTLSSize;
8229 }
8230 }
8231 break;
8232 }
8233 case ArgKind::Indirect:
8234 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8235 }
8236 if (ShadowBase == nullptr)
8237 continue;
8238 Value *Shadow = MSV.getShadow(A);
8239 if (SE != ShadowExtension::None)
8240 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8241 /*Signed*/ SE == ShadowExtension::Sign);
8242 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8243 IRB.CreateStore(Shadow, ShadowBase);
8244 if (MS.TrackOrigins) {
8245 Value *Origin = MSV.getOrigin(A);
8246 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8247 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8249 }
8250 }
8251 Constant *OverflowSize = ConstantInt::get(
8252 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8253 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8254 }
8255
8256 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8257 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8258 IRB.CreateAdd(
8259 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8260 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8261 MS.PtrTy);
8262 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8263 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8264 const Align Alignment = Align(8);
8265 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8266 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8267 /*isStore*/ true);
8268 // TODO(iii): copy only fragments filled by visitCallBase()
8269 // TODO(iii): support packed-stack && !use-soft-float
8270 // For use-soft-float functions, it is enough to copy just the GPRs.
8271 unsigned RegSaveAreaSize =
8272 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8273 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8274 RegSaveAreaSize);
8275 if (MS.TrackOrigins)
8276 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8277 Alignment, RegSaveAreaSize);
8278 }
8279
8280 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8281 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8282 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8283 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8284 IRB.CreateAdd(
8285 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8286 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8287 MS.PtrTy);
8288 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8289 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8290 const Align Alignment = Align(8);
8291 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8292 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8293 Alignment, /*isStore*/ true);
8294 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8295 SystemZOverflowOffset);
8296 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8297 VAArgOverflowSize);
8298 if (MS.TrackOrigins) {
8299 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8300 SystemZOverflowOffset);
8301 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8302 VAArgOverflowSize);
8303 }
8304 }
8305
8306 void finalizeInstrumentation() override {
8307 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8308 "finalizeInstrumentation called twice");
8309 if (!VAStartInstrumentationList.empty()) {
8310 // If there is a va_start in this function, make a backup copy of
8311 // va_arg_tls somewhere in the function entry block.
8312 IRBuilder<> IRB(MSV.FnPrologueEnd);
8313 VAArgOverflowSize =
8314 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8315 Value *CopySize =
8316 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8317 VAArgOverflowSize);
8318 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8319 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8320 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8321 CopySize, kShadowTLSAlignment, false);
8322
8323 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8324 Intrinsic::umin, CopySize,
8325 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8326 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8327 kShadowTLSAlignment, SrcSize);
8328 if (MS.TrackOrigins) {
8329 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8330 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8331 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8332 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8333 }
8334 }
8335
8336 // Instrument va_start.
8337 // Copy va_list shadow from the backup copy of the TLS contents.
8338 for (CallInst *OrigInst : VAStartInstrumentationList) {
8339 NextNodeIRBuilder IRB(OrigInst);
8340 Value *VAListTag = OrigInst->getArgOperand(0);
8341 copyRegSaveArea(IRB, VAListTag);
8342 copyOverflowArea(IRB, VAListTag);
8343 }
8344 }
8345};
8346
8347/// i386-specific implementation of VarArgHelper.
8348struct VarArgI386Helper : public VarArgHelperBase {
8349 AllocaInst *VAArgTLSCopy = nullptr;
8350 Value *VAArgSize = nullptr;
8351
8352 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8353 MemorySanitizerVisitor &MSV)
8354 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8355
8356 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8357 const DataLayout &DL = F.getDataLayout();
8358 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8359 unsigned VAArgOffset = 0;
8360 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8361 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8362 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8363 if (IsByVal) {
8364 assert(A->getType()->isPointerTy());
8365 Type *RealTy = CB.getParamByValType(ArgNo);
8366 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8367 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8368 if (ArgAlign < IntptrSize)
8369 ArgAlign = Align(IntptrSize);
8370 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8371 if (!IsFixed) {
8372 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8373 if (Base) {
8374 Value *AShadowPtr, *AOriginPtr;
8375 std::tie(AShadowPtr, AOriginPtr) =
8376 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8377 kShadowTLSAlignment, /*isStore*/ false);
8378
8379 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8380 kShadowTLSAlignment, ArgSize);
8381 }
8382 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8383 }
8384 } else {
8385 Value *Base;
8386 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8387 Align ArgAlign = Align(IntptrSize);
8388 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8389 if (DL.isBigEndian()) {
8390 // Adjusting the shadow for argument with size < IntptrSize to match
8391 // the placement of bits in big endian system
8392 if (ArgSize < IntptrSize)
8393 VAArgOffset += (IntptrSize - ArgSize);
8394 }
8395 if (!IsFixed) {
8396 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8397 if (Base)
8398 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8399 VAArgOffset += ArgSize;
8400 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8401 }
8402 }
8403 }
8404
8405 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8406 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8407 // a new class member i.e. it is the total size of all VarArgs.
8408 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8409 }
8410
8411 void finalizeInstrumentation() override {
8412 assert(!VAArgSize && !VAArgTLSCopy &&
8413 "finalizeInstrumentation called twice");
8414 IRBuilder<> IRB(MSV.FnPrologueEnd);
8415 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8416 Value *CopySize = VAArgSize;
8417
8418 if (!VAStartInstrumentationList.empty()) {
8419 // If there is a va_start in this function, make a backup copy of
8420 // va_arg_tls somewhere in the function entry block.
8421 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8422 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8423 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8424 CopySize, kShadowTLSAlignment, false);
8425
8426 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8427 Intrinsic::umin, CopySize,
8428 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8429 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8430 kShadowTLSAlignment, SrcSize);
8431 }
8432
8433 // Instrument va_start.
8434 // Copy va_list shadow from the backup copy of the TLS contents.
8435 for (CallInst *OrigInst : VAStartInstrumentationList) {
8436 NextNodeIRBuilder IRB(OrigInst);
8437 Value *VAListTag = OrigInst->getArgOperand(0);
8438 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8439 Value *RegSaveAreaPtrPtr =
8440 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8441 PointerType::get(*MS.C, 0));
8442 Value *RegSaveAreaPtr =
8443 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8444 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8445 const DataLayout &DL = F.getDataLayout();
8446 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8447 const Align Alignment = Align(IntptrSize);
8448 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8449 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8450 Alignment, /*isStore*/ true);
8451 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8452 CopySize);
8453 }
8454 }
8455};
8456
8457/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8458/// LoongArch64.
8459struct VarArgGenericHelper : public VarArgHelperBase {
8460 AllocaInst *VAArgTLSCopy = nullptr;
8461 Value *VAArgSize = nullptr;
8462
8463 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8464 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8465 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8466
8467 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8468 unsigned VAArgOffset = 0;
8469 const DataLayout &DL = F.getDataLayout();
8470 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8471 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8472 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8473 if (IsFixed)
8474 continue;
8475 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8476 if (DL.isBigEndian()) {
8477 // Adjusting the shadow for argument with size < IntptrSize to match the
8478 // placement of bits in big endian system
8479 if (ArgSize < IntptrSize)
8480 VAArgOffset += (IntptrSize - ArgSize);
8481 }
8482 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8483 VAArgOffset += ArgSize;
8484 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8485 if (!Base)
8486 continue;
8487 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8488 }
8489
8490 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8491 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8492 // a new class member i.e. it is the total size of all VarArgs.
8493 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8494 }
8495
8496 void finalizeInstrumentation() override {
8497 assert(!VAArgSize && !VAArgTLSCopy &&
8498 "finalizeInstrumentation called twice");
8499 IRBuilder<> IRB(MSV.FnPrologueEnd);
8500 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8501 Value *CopySize = VAArgSize;
8502
8503 if (!VAStartInstrumentationList.empty()) {
8504 // If there is a va_start in this function, make a backup copy of
8505 // va_arg_tls somewhere in the function entry block.
8506 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8507 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8508 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8509 CopySize, kShadowTLSAlignment, false);
8510
8511 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8512 Intrinsic::umin, CopySize,
8513 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8514 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8515 kShadowTLSAlignment, SrcSize);
8516 }
8517
8518 // Instrument va_start.
8519 // Copy va_list shadow from the backup copy of the TLS contents.
8520 for (CallInst *OrigInst : VAStartInstrumentationList) {
8521 NextNodeIRBuilder IRB(OrigInst);
8522 Value *VAListTag = OrigInst->getArgOperand(0);
8523 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8524 Value *RegSaveAreaPtrPtr =
8525 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8526 PointerType::get(*MS.C, 0));
8527 Value *RegSaveAreaPtr =
8528 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8529 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8530 const DataLayout &DL = F.getDataLayout();
8531 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8532 const Align Alignment = Align(IntptrSize);
8533 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8534 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8535 Alignment, /*isStore*/ true);
8536 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8537 CopySize);
8538 }
8539 }
8540};
8541
8542// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8543// regarding VAArgs.
8544using VarArgARM32Helper = VarArgGenericHelper;
8545using VarArgRISCVHelper = VarArgGenericHelper;
8546using VarArgMIPSHelper = VarArgGenericHelper;
8547using VarArgLoongArch64Helper = VarArgGenericHelper;
8548
8549/// A no-op implementation of VarArgHelper.
8550struct VarArgNoOpHelper : public VarArgHelper {
8551 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8552 MemorySanitizerVisitor &MSV) {}
8553
8554 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8555
8556 void visitVAStartInst(VAStartInst &I) override {}
8557
8558 void visitVACopyInst(VACopyInst &I) override {}
8559
8560 void finalizeInstrumentation() override {}
8561};
8562
8563} // end anonymous namespace
8564
8565static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8566 MemorySanitizerVisitor &Visitor) {
8567 // VarArg handling is only implemented on AMD64. False positives are possible
8568 // on other platforms.
8569 Triple TargetTriple(Func.getParent()->getTargetTriple());
8570
8571 if (TargetTriple.getArch() == Triple::x86)
8572 return new VarArgI386Helper(Func, Msan, Visitor);
8573
8574 if (TargetTriple.getArch() == Triple::x86_64)
8575 return new VarArgAMD64Helper(Func, Msan, Visitor);
8576
8577 if (TargetTriple.isARM())
8578 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8579
8580 if (TargetTriple.isAArch64())
8581 return new VarArgAArch64Helper(Func, Msan, Visitor);
8582
8583 if (TargetTriple.isSystemZ())
8584 return new VarArgSystemZHelper(Func, Msan, Visitor);
8585
8586 // On PowerPC32 VAListTag is a struct
8587 // {char, char, i16 padding, char *, char *}
8588 if (TargetTriple.isPPC32())
8589 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8590
8591 if (TargetTriple.isPPC64())
8592 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8593
8594 if (TargetTriple.isRISCV32())
8595 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8596
8597 if (TargetTriple.isRISCV64())
8598 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8599
8600 if (TargetTriple.isMIPS32())
8601 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8602
8603 if (TargetTriple.isMIPS64())
8604 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8605
8606 if (TargetTriple.isLoongArch64())
8607 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8608 /*VAListTagSize=*/8);
8609
8610 return new VarArgNoOpHelper(Func, Msan, Visitor);
8611}
8612
8613bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8614 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8615 return false;
8616
8617 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8618 return false;
8619
8620 MemorySanitizerVisitor Visitor(F, *this, TLI);
8621
8622 // Clear out memory attributes.
8624 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8625 F.removeFnAttrs(B);
8626
8627 return Visitor.runOnFunction();
8628}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Hexagon Vector Combine
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const MemoryMapParams Linux_S390X_MemoryMapParams
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:219
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:150
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:472
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:678
@ ICMP_SLT
signed less than
Definition InstrTypes.h:707
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:708
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:705
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:706
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:131
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(unsigned CounterName)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:803
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2571
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1936
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1830
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2625
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2559
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1864
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2100
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2251
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2618
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2094
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2199
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2036
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2333
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1923
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1781
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2494
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1805
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2329
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:63
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2204
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1847
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2082
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2593
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1860
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2194
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2651
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2508
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2068
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2361
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2341
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2277
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2646
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1883
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2041
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2439
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2780
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:319
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:180
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:414
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1030
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1073
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1046
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:411
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1078
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1019
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1025
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:914
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1051
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:998
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1097
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:281
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:198
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:231
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:232
unsigned getNumOperands() const
Definition User.h:254
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:390
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:201
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:169
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:355
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1657
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2452
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:644
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:293
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:348
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:754
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:548
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:560
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3838
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70