Thanks to visit codestin.com
Credit goes to llvm.org

LLVM 22.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229static const unsigned kParamTLSSize = 800;
230static const unsigned kRetvalTLSSize = 800;
231
232// Accesses sizes are powers of two: 1, 2, 4, 8.
233static const size_t kNumberOfAccessSizes = 4;
234
235/// Track origins of uninitialized values.
236///
237/// Adds a section to MemorySanitizer report that points to the allocation
238/// (stack or heap) the uninitialized bits came from originally.
240 "msan-track-origins",
241 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
242 cl::init(0));
243
244static cl::opt<bool> ClKeepGoing("msan-keep-going",
245 cl::desc("keep going after reporting a UMR"),
246 cl::Hidden, cl::init(false));
247
248static cl::opt<bool>
249 ClPoisonStack("msan-poison-stack",
250 cl::desc("poison uninitialized stack variables"), cl::Hidden,
251 cl::init(true));
252
254 "msan-poison-stack-with-call",
255 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
256 cl::init(false));
257
259 "msan-poison-stack-pattern",
260 cl::desc("poison uninitialized stack variables with the given pattern"),
261 cl::Hidden, cl::init(0xff));
262
263static cl::opt<bool>
264 ClPrintStackNames("msan-print-stack-names",
265 cl::desc("Print name of local stack variable"),
266 cl::Hidden, cl::init(true));
267
268static cl::opt<bool>
269 ClPoisonUndef("msan-poison-undef",
270 cl::desc("Poison fully undef temporary values. "
271 "Partially undefined constant vectors "
272 "are unaffected by this flag (see "
273 "-msan-poison-undef-vectors)."),
274 cl::Hidden, cl::init(true));
275
277 "msan-poison-undef-vectors",
278 cl::desc("Precisely poison partially undefined constant vectors. "
279 "If false (legacy behavior), the entire vector is "
280 "considered fully initialized, which may lead to false "
281 "negatives. Fully undefined constant vectors are "
282 "unaffected by this flag (see -msan-poison-undef)."),
283 cl::Hidden, cl::init(false));
284
286 "msan-precise-disjoint-or",
287 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
288 "disjointedness is ignored (i.e., 1|1 is initialized)."),
289 cl::Hidden, cl::init(false));
290
291static cl::opt<bool>
292 ClHandleICmp("msan-handle-icmp",
293 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
294 cl::Hidden, cl::init(true));
295
296static cl::opt<bool>
297 ClHandleICmpExact("msan-handle-icmp-exact",
298 cl::desc("exact handling of relational integer ICmp"),
299 cl::Hidden, cl::init(true));
300
302 "msan-handle-lifetime-intrinsics",
303 cl::desc(
304 "when possible, poison scoped variables at the beginning of the scope "
305 "(slower, but more precise)"),
306 cl::Hidden, cl::init(true));
307
308// When compiling the Linux kernel, we sometimes see false positives related to
309// MSan being unable to understand that inline assembly calls may initialize
310// local variables.
311// This flag makes the compiler conservatively unpoison every memory location
312// passed into an assembly call. Note that this may cause false positives.
313// Because it's impossible to figure out the array sizes, we can only unpoison
314// the first sizeof(type) bytes for each type* pointer.
316 "msan-handle-asm-conservative",
317 cl::desc("conservative handling of inline assembly"), cl::Hidden,
318 cl::init(true));
319
320// This flag controls whether we check the shadow of the address
321// operand of load or store. Such bugs are very rare, since load from
322// a garbage address typically results in SEGV, but still happen
323// (e.g. only lower bits of address are garbage, or the access happens
324// early at program startup where malloc-ed memory is more likely to
325// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
327 "msan-check-access-address",
328 cl::desc("report accesses through a pointer which has poisoned shadow"),
329 cl::Hidden, cl::init(true));
330
332 "msan-eager-checks",
333 cl::desc("check arguments and return values at function call boundaries"),
334 cl::Hidden, cl::init(false));
335
337 "msan-dump-strict-instructions",
338 cl::desc("print out instructions with default strict semantics i.e.,"
339 "check that all the inputs are fully initialized, and mark "
340 "the output as fully initialized. These semantics are applied "
341 "to instructions that could not be handled explicitly nor "
342 "heuristically."),
343 cl::Hidden, cl::init(false));
344
345// Currently, all the heuristically handled instructions are specifically
346// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
347// to parallel 'msan-dump-strict-instructions', and to keep the door open to
348// handling non-intrinsic instructions heuristically.
350 "msan-dump-heuristic-instructions",
351 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
352 "Use -msan-dump-strict-instructions to print instructions that "
353 "could not be handled explicitly nor heuristically."),
354 cl::Hidden, cl::init(false));
355
357 "msan-instrumentation-with-call-threshold",
358 cl::desc(
359 "If the function being instrumented requires more than "
360 "this number of checks and origin stores, use callbacks instead of "
361 "inline checks (-1 means never use callbacks)."),
362 cl::Hidden, cl::init(3500));
363
364static cl::opt<bool>
365 ClEnableKmsan("msan-kernel",
366 cl::desc("Enable KernelMemorySanitizer instrumentation"),
367 cl::Hidden, cl::init(false));
368
369static cl::opt<bool>
370 ClDisableChecks("msan-disable-checks",
371 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
372 cl::init(false));
373
374static cl::opt<bool>
375 ClCheckConstantShadow("msan-check-constant-shadow",
376 cl::desc("Insert checks for constant shadow values"),
377 cl::Hidden, cl::init(true));
378
379// This is off by default because of a bug in gold:
380// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
381static cl::opt<bool>
382 ClWithComdat("msan-with-comdat",
383 cl::desc("Place MSan constructors in comdat sections"),
384 cl::Hidden, cl::init(false));
385
386// These options allow to specify custom memory map parameters
387// See MemoryMapParams for details.
388static cl::opt<uint64_t> ClAndMask("msan-and-mask",
389 cl::desc("Define custom MSan AndMask"),
390 cl::Hidden, cl::init(0));
391
392static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
393 cl::desc("Define custom MSan XorMask"),
394 cl::Hidden, cl::init(0));
395
396static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
397 cl::desc("Define custom MSan ShadowBase"),
398 cl::Hidden, cl::init(0));
399
400static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
401 cl::desc("Define custom MSan OriginBase"),
402 cl::Hidden, cl::init(0));
403
404static cl::opt<int>
405 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
406 cl::desc("Define threshold for number of checks per "
407 "debug location to force origin update."),
408 cl::Hidden, cl::init(3));
409
410const char kMsanModuleCtorName[] = "msan.module_ctor";
411const char kMsanInitName[] = "__msan_init";
412
413namespace {
414
415// Memory map parameters used in application-to-shadow address calculation.
416// Offset = (Addr & ~AndMask) ^ XorMask
417// Shadow = ShadowBase + Offset
418// Origin = OriginBase + Offset
419struct MemoryMapParams {
420 uint64_t AndMask;
421 uint64_t XorMask;
422 uint64_t ShadowBase;
423 uint64_t OriginBase;
424};
425
426struct PlatformMemoryMapParams {
427 const MemoryMapParams *bits32;
428 const MemoryMapParams *bits64;
429};
430
431} // end anonymous namespace
432
433// i386 Linux
434static const MemoryMapParams Linux_I386_MemoryMapParams = {
435 0x000080000000, // AndMask
436 0, // XorMask (not used)
437 0, // ShadowBase (not used)
438 0x000040000000, // OriginBase
439};
440
441// x86_64 Linux
442static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
443 0, // AndMask (not used)
444 0x500000000000, // XorMask
445 0, // ShadowBase (not used)
446 0x100000000000, // OriginBase
447};
448
449// mips32 Linux
450// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
451// after picking good constants
452
453// mips64 Linux
454static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
455 0, // AndMask (not used)
456 0x008000000000, // XorMask
457 0, // ShadowBase (not used)
458 0x002000000000, // OriginBase
459};
460
461// ppc32 Linux
462// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
463// after picking good constants
464
465// ppc64 Linux
466static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
467 0xE00000000000, // AndMask
468 0x100000000000, // XorMask
469 0x080000000000, // ShadowBase
470 0x1C0000000000, // OriginBase
471};
472
473// s390x Linux
474static const MemoryMapParams Linux_S390X_MemoryMapParams = {
475 0xC00000000000, // AndMask
476 0, // XorMask (not used)
477 0x080000000000, // ShadowBase
478 0x1C0000000000, // OriginBase
479};
480
481// arm32 Linux
482// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
483// after picking good constants
484
485// aarch64 Linux
486static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
487 0, // AndMask (not used)
488 0x0B00000000000, // XorMask
489 0, // ShadowBase (not used)
490 0x0200000000000, // OriginBase
491};
492
493// loongarch64 Linux
494static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
495 0, // AndMask (not used)
496 0x500000000000, // XorMask
497 0, // ShadowBase (not used)
498 0x100000000000, // OriginBase
499};
500
501// riscv32 Linux
502// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
503// after picking good constants
504
505// aarch64 FreeBSD
506static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
507 0x1800000000000, // AndMask
508 0x0400000000000, // XorMask
509 0x0200000000000, // ShadowBase
510 0x0700000000000, // OriginBase
511};
512
513// i386 FreeBSD
514static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
515 0x000180000000, // AndMask
516 0x000040000000, // XorMask
517 0x000020000000, // ShadowBase
518 0x000700000000, // OriginBase
519};
520
521// x86_64 FreeBSD
522static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
523 0xc00000000000, // AndMask
524 0x200000000000, // XorMask
525 0x100000000000, // ShadowBase
526 0x380000000000, // OriginBase
527};
528
529// x86_64 NetBSD
530static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
531 0, // AndMask
532 0x500000000000, // XorMask
533 0, // ShadowBase
534 0x100000000000, // OriginBase
535};
536
537static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
540};
541
542static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
543 nullptr,
545};
546
547static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
548 nullptr,
550};
551
552static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
553 nullptr,
555};
556
557static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
558 nullptr,
560};
561
562static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
563 nullptr,
565};
566
567static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
568 nullptr,
570};
571
572static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
575};
576
577static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
578 nullptr,
580};
581
582namespace {
583
584/// Instrument functions of a module to detect uninitialized reads.
585///
586/// Instantiating MemorySanitizer inserts the msan runtime library API function
587/// declarations into the module if they don't exist already. Instantiating
588/// ensures the __msan_init function is in the list of global constructors for
589/// the module.
590class MemorySanitizer {
591public:
592 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
593 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
594 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
595 initializeModule(M);
596 }
597
598 // MSan cannot be moved or copied because of MapParams.
599 MemorySanitizer(MemorySanitizer &&) = delete;
600 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
601 MemorySanitizer(const MemorySanitizer &) = delete;
602 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
603
604 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
605
606private:
607 friend struct MemorySanitizerVisitor;
608 friend struct VarArgHelperBase;
609 friend struct VarArgAMD64Helper;
610 friend struct VarArgAArch64Helper;
611 friend struct VarArgPowerPC64Helper;
612 friend struct VarArgPowerPC32Helper;
613 friend struct VarArgSystemZHelper;
614 friend struct VarArgI386Helper;
615 friend struct VarArgGenericHelper;
616
617 void initializeModule(Module &M);
618 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
619 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
620 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
621
622 template <typename... ArgsTy>
623 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
624 ArgsTy... Args);
625
626 /// True if we're compiling the Linux kernel.
627 bool CompileKernel;
628 /// Track origins (allocation points) of uninitialized values.
629 int TrackOrigins;
630 bool Recover;
631 bool EagerChecks;
632
633 Triple TargetTriple;
634 LLVMContext *C;
635 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
636 Type *OriginTy;
637 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
638
639 // XxxTLS variables represent the per-thread state in MSan and per-task state
640 // in KMSAN.
641 // For the userspace these point to thread-local globals. In the kernel land
642 // they point to the members of a per-task struct obtained via a call to
643 // __msan_get_context_state().
644
645 /// Thread-local shadow storage for function parameters.
646 Value *ParamTLS;
647
648 /// Thread-local origin storage for function parameters.
649 Value *ParamOriginTLS;
650
651 /// Thread-local shadow storage for function return value.
652 Value *RetvalTLS;
653
654 /// Thread-local origin storage for function return value.
655 Value *RetvalOriginTLS;
656
657 /// Thread-local shadow storage for in-register va_arg function.
658 Value *VAArgTLS;
659
660 /// Thread-local shadow storage for in-register va_arg function.
661 Value *VAArgOriginTLS;
662
663 /// Thread-local shadow storage for va_arg overflow area.
664 Value *VAArgOverflowSizeTLS;
665
666 /// Are the instrumentation callbacks set up?
667 bool CallbacksInitialized = false;
668
669 /// The run-time callback to print a warning.
670 FunctionCallee WarningFn;
671
672 // These arrays are indexed by log2(AccessSize).
673 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
674 FunctionCallee MaybeWarningVarSizeFn;
675 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
676
677 /// Run-time helper that generates a new origin value for a stack
678 /// allocation.
679 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
680 // No description version
681 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
682
683 /// Run-time helper that poisons stack on function entry.
684 FunctionCallee MsanPoisonStackFn;
685
686 /// Run-time helper that records a store (or any event) of an
687 /// uninitialized value and returns an updated origin id encoding this info.
688 FunctionCallee MsanChainOriginFn;
689
690 /// Run-time helper that paints an origin over a region.
691 FunctionCallee MsanSetOriginFn;
692
693 /// MSan runtime replacements for memmove, memcpy and memset.
694 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
695
696 /// KMSAN callback for task-local function argument shadow.
697 StructType *MsanContextStateTy;
698 FunctionCallee MsanGetContextStateFn;
699
700 /// Functions for poisoning/unpoisoning local variables
701 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
702
703 /// Pair of shadow/origin pointers.
704 Type *MsanMetadata;
705
706 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
707 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
708 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
709 FunctionCallee MsanMetadataPtrForStore_1_8[4];
710 FunctionCallee MsanInstrumentAsmStoreFn;
711
712 /// Storage for return values of the MsanMetadataPtrXxx functions.
713 Value *MsanMetadataAlloca;
714
715 /// Helper to choose between different MsanMetadataPtrXxx().
716 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
717
718 /// Memory map parameters used in application-to-shadow calculation.
719 const MemoryMapParams *MapParams;
720
721 /// Custom memory map parameters used when -msan-shadow-base or
722 // -msan-origin-base is provided.
723 MemoryMapParams CustomMapParams;
724
725 MDNode *ColdCallWeights;
726
727 /// Branch weights for origin store.
728 MDNode *OriginStoreWeights;
729};
730
731void insertModuleCtor(Module &M) {
734 /*InitArgTypes=*/{},
735 /*InitArgs=*/{},
736 // This callback is invoked when the functions are created the first
737 // time. Hook them into the global ctors list in that case:
738 [&](Function *Ctor, FunctionCallee) {
739 if (!ClWithComdat) {
740 appendToGlobalCtors(M, Ctor, 0);
741 return;
742 }
743 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
744 Ctor->setComdat(MsanCtorComdat);
745 appendToGlobalCtors(M, Ctor, 0, Ctor);
746 });
747}
748
749template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
750 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
751}
752
753} // end anonymous namespace
754
756 bool EagerChecks)
757 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
758 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
759 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
760 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
761
764 // Return early if nosanitize_memory module flag is present for the module.
765 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
766 return PreservedAnalyses::all();
767 bool Modified = false;
768 if (!Options.Kernel) {
769 insertModuleCtor(M);
770 Modified = true;
771 }
772
773 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
774 for (Function &F : M) {
775 if (F.empty())
776 continue;
777 MemorySanitizer Msan(*F.getParent(), Options);
778 Modified |=
779 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
780 }
781
782 if (!Modified)
783 return PreservedAnalyses::all();
784
786 // GlobalsAA is considered stateless and does not get invalidated unless
787 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
788 // make changes that require GlobalsAA to be invalidated.
789 PA.abandon<GlobalsAA>();
790 return PA;
791}
792
794 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
796 OS, MapClassName2PassName);
797 OS << '<';
798 if (Options.Recover)
799 OS << "recover;";
800 if (Options.Kernel)
801 OS << "kernel;";
802 if (Options.EagerChecks)
803 OS << "eager-checks;";
804 OS << "track-origins=" << Options.TrackOrigins;
805 OS << '>';
806}
807
808/// Create a non-const global initialized with the given string.
809///
810/// Creates a writable global for Str so that we can pass it to the
811/// run-time lib. Runtime uses first 4 bytes of the string to store the
812/// frame ID, so the string needs to be mutable.
814 StringRef Str) {
815 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
816 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
817 GlobalValue::PrivateLinkage, StrConst, "");
818}
819
820template <typename... ArgsTy>
822MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
823 ArgsTy... Args) {
824 if (TargetTriple.getArch() == Triple::systemz) {
825 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
826 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
827 std::forward<ArgsTy>(Args)...);
828 }
829
830 return M.getOrInsertFunction(Name, MsanMetadata,
831 std::forward<ArgsTy>(Args)...);
832}
833
834/// Create KMSAN API callbacks.
835void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
836 IRBuilder<> IRB(*C);
837
838 // These will be initialized in insertKmsanPrologue().
839 RetvalTLS = nullptr;
840 RetvalOriginTLS = nullptr;
841 ParamTLS = nullptr;
842 ParamOriginTLS = nullptr;
843 VAArgTLS = nullptr;
844 VAArgOriginTLS = nullptr;
845 VAArgOverflowSizeTLS = nullptr;
846
847 WarningFn = M.getOrInsertFunction("__msan_warning",
848 TLI.getAttrList(C, {0}, /*Signed=*/false),
849 IRB.getVoidTy(), IRB.getInt32Ty());
850
851 // Requests the per-task context state (kmsan_context_state*) from the
852 // runtime library.
853 MsanContextStateTy = StructType::get(
854 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
855 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
858 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
859 OriginTy);
860 MsanGetContextStateFn =
861 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
862
863 MsanMetadata = StructType::get(PtrTy, PtrTy);
864
865 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
866 std::string name_load =
867 "__msan_metadata_ptr_for_load_" + std::to_string(size);
868 std::string name_store =
869 "__msan_metadata_ptr_for_store_" + std::to_string(size);
870 MsanMetadataPtrForLoad_1_8[ind] =
871 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
872 MsanMetadataPtrForStore_1_8[ind] =
873 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
874 }
875
876 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
877 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
878 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
879 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
880
881 // Functions for poisoning and unpoisoning memory.
882 MsanPoisonAllocaFn = M.getOrInsertFunction(
883 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
884 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
885 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
886}
887
889 return M.getOrInsertGlobal(Name, Ty, [&] {
890 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
891 nullptr, Name, nullptr,
893 });
894}
895
896/// Insert declarations for userspace-specific functions and globals.
897void MemorySanitizer::createUserspaceApi(Module &M,
898 const TargetLibraryInfo &TLI) {
899 IRBuilder<> IRB(*C);
900
901 // Create the callback.
902 // FIXME: this function should have "Cold" calling conv,
903 // which is not yet implemented.
904 if (TrackOrigins) {
905 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
906 : "__msan_warning_with_origin_noreturn";
907 WarningFn = M.getOrInsertFunction(WarningFnName,
908 TLI.getAttrList(C, {0}, /*Signed=*/false),
909 IRB.getVoidTy(), IRB.getInt32Ty());
910 } else {
911 StringRef WarningFnName =
912 Recover ? "__msan_warning" : "__msan_warning_noreturn";
913 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
914 }
915
916 // Create the global TLS variables.
917 RetvalTLS =
918 getOrInsertGlobal(M, "__msan_retval_tls",
919 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
920
921 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
922
923 ParamTLS =
924 getOrInsertGlobal(M, "__msan_param_tls",
925 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
926
927 ParamOriginTLS =
928 getOrInsertGlobal(M, "__msan_param_origin_tls",
929 ArrayType::get(OriginTy, kParamTLSSize / 4));
930
931 VAArgTLS =
932 getOrInsertGlobal(M, "__msan_va_arg_tls",
933 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
934
935 VAArgOriginTLS =
936 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
937 ArrayType::get(OriginTy, kParamTLSSize / 4));
938
939 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
940 IRB.getIntPtrTy(M.getDataLayout()));
941
942 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
943 AccessSizeIndex++) {
944 unsigned AccessSize = 1 << AccessSizeIndex;
945 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
946 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
947 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
948 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
949 MaybeWarningVarSizeFn = M.getOrInsertFunction(
950 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
951 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
952 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
953 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
954 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
955 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
956 IRB.getInt32Ty());
957 }
958
959 MsanSetAllocaOriginWithDescriptionFn =
960 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
961 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
962 MsanSetAllocaOriginNoDescriptionFn =
963 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
964 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
965 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
966 IRB.getVoidTy(), PtrTy, IntptrTy);
967}
968
969/// Insert extern declaration of runtime-provided functions and globals.
970void MemorySanitizer::initializeCallbacks(Module &M,
971 const TargetLibraryInfo &TLI) {
972 // Only do this once.
973 if (CallbacksInitialized)
974 return;
975
976 IRBuilder<> IRB(*C);
977 // Initialize callbacks that are common for kernel and userspace
978 // instrumentation.
979 MsanChainOriginFn = M.getOrInsertFunction(
980 "__msan_chain_origin",
981 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
982 IRB.getInt32Ty());
983 MsanSetOriginFn = M.getOrInsertFunction(
984 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
985 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
986 MemmoveFn =
987 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
988 MemcpyFn =
989 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
990 MemsetFn = M.getOrInsertFunction("__msan_memset",
991 TLI.getAttrList(C, {1}, /*Signed=*/true),
992 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
993
994 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
995 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
996
997 if (CompileKernel) {
998 createKernelApi(M, TLI);
999 } else {
1000 createUserspaceApi(M, TLI);
1001 }
1002 CallbacksInitialized = true;
1003}
1004
1005FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1006 int size) {
1007 FunctionCallee *Fns =
1008 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1009 switch (size) {
1010 case 1:
1011 return Fns[0];
1012 case 2:
1013 return Fns[1];
1014 case 4:
1015 return Fns[2];
1016 case 8:
1017 return Fns[3];
1018 default:
1019 return nullptr;
1020 }
1021}
1022
1023/// Module-level initialization.
1024///
1025/// inserts a call to __msan_init to the module's constructor list.
1026void MemorySanitizer::initializeModule(Module &M) {
1027 auto &DL = M.getDataLayout();
1028
1029 TargetTriple = M.getTargetTriple();
1030
1031 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1032 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1033 // Check the overrides first
1034 if (ShadowPassed || OriginPassed) {
1035 CustomMapParams.AndMask = ClAndMask;
1036 CustomMapParams.XorMask = ClXorMask;
1037 CustomMapParams.ShadowBase = ClShadowBase;
1038 CustomMapParams.OriginBase = ClOriginBase;
1039 MapParams = &CustomMapParams;
1040 } else {
1041 switch (TargetTriple.getOS()) {
1042 case Triple::FreeBSD:
1043 switch (TargetTriple.getArch()) {
1044 case Triple::aarch64:
1045 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1046 break;
1047 case Triple::x86_64:
1048 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1049 break;
1050 case Triple::x86:
1051 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1052 break;
1053 default:
1054 report_fatal_error("unsupported architecture");
1055 }
1056 break;
1057 case Triple::NetBSD:
1058 switch (TargetTriple.getArch()) {
1059 case Triple::x86_64:
1060 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1061 break;
1062 default:
1063 report_fatal_error("unsupported architecture");
1064 }
1065 break;
1066 case Triple::Linux:
1067 switch (TargetTriple.getArch()) {
1068 case Triple::x86_64:
1069 MapParams = Linux_X86_MemoryMapParams.bits64;
1070 break;
1071 case Triple::x86:
1072 MapParams = Linux_X86_MemoryMapParams.bits32;
1073 break;
1074 case Triple::mips64:
1075 case Triple::mips64el:
1076 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1077 break;
1078 case Triple::ppc64:
1079 case Triple::ppc64le:
1080 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1081 break;
1082 case Triple::systemz:
1083 MapParams = Linux_S390_MemoryMapParams.bits64;
1084 break;
1085 case Triple::aarch64:
1086 case Triple::aarch64_be:
1087 MapParams = Linux_ARM_MemoryMapParams.bits64;
1088 break;
1090 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1091 break;
1092 default:
1093 report_fatal_error("unsupported architecture");
1094 }
1095 break;
1096 default:
1097 report_fatal_error("unsupported operating system");
1098 }
1099 }
1100
1101 C = &(M.getContext());
1102 IRBuilder<> IRB(*C);
1103 IntptrTy = IRB.getIntPtrTy(DL);
1104 OriginTy = IRB.getInt32Ty();
1105 PtrTy = IRB.getPtrTy();
1106
1107 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1108 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109
1110 if (!CompileKernel) {
1111 if (TrackOrigins)
1112 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1113 return new GlobalVariable(
1114 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1115 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1116 });
1117
1118 if (Recover)
1119 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1120 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1121 GlobalValue::WeakODRLinkage,
1122 IRB.getInt32(Recover), "__msan_keep_going");
1123 });
1124 }
1125}
1126
1127namespace {
1128
1129/// A helper class that handles instrumentation of VarArg
1130/// functions on a particular platform.
1131///
1132/// Implementations are expected to insert the instrumentation
1133/// necessary to propagate argument shadow through VarArg function
1134/// calls. Visit* methods are called during an InstVisitor pass over
1135/// the function, and should avoid creating new basic blocks. A new
1136/// instance of this class is created for each instrumented function.
1137struct VarArgHelper {
1138 virtual ~VarArgHelper() = default;
1139
1140 /// Visit a CallBase.
1141 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1142
1143 /// Visit a va_start call.
1144 virtual void visitVAStartInst(VAStartInst &I) = 0;
1145
1146 /// Visit a va_copy call.
1147 virtual void visitVACopyInst(VACopyInst &I) = 0;
1148
1149 /// Finalize function instrumentation.
1150 ///
1151 /// This method is called after visiting all interesting (see above)
1152 /// instructions in a function.
1153 virtual void finalizeInstrumentation() = 0;
1154};
1155
1156struct MemorySanitizerVisitor;
1157
1158} // end anonymous namespace
1159
1160static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1161 MemorySanitizerVisitor &Visitor);
1162
1163static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1164 if (TS.isScalable())
1165 // Scalable types unconditionally take slowpaths.
1166 return kNumberOfAccessSizes;
1167 unsigned TypeSizeFixed = TS.getFixedValue();
1168 if (TypeSizeFixed <= 8)
1169 return 0;
1170 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1171}
1172
1173namespace {
1174
1175/// Helper class to attach debug information of the given instruction onto new
1176/// instructions inserted after.
1177class NextNodeIRBuilder : public IRBuilder<> {
1178public:
1179 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1180 SetCurrentDebugLocation(IP->getDebugLoc());
1181 }
1182};
1183
1184/// This class does all the work for a given function. Store and Load
1185/// instructions store and load corresponding shadow and origin
1186/// values. Most instructions propagate shadow from arguments to their
1187/// return values. Certain instructions (most importantly, BranchInst)
1188/// test their argument shadow and print reports (with a runtime call) if it's
1189/// non-zero.
1190struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1191 Function &F;
1192 MemorySanitizer &MS;
1193 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1194 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1195 std::unique_ptr<VarArgHelper> VAHelper;
1196 const TargetLibraryInfo *TLI;
1197 Instruction *FnPrologueEnd;
1198 SmallVector<Instruction *, 16> Instructions;
1199
1200 // The following flags disable parts of MSan instrumentation based on
1201 // exclusion list contents and command-line options.
1202 bool InsertChecks;
1203 bool PropagateShadow;
1204 bool PoisonStack;
1205 bool PoisonUndef;
1206 bool PoisonUndefVectors;
1207
1208 struct ShadowOriginAndInsertPoint {
1209 Value *Shadow;
1210 Value *Origin;
1211 Instruction *OrigIns;
1212
1213 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1214 : Shadow(S), Origin(O), OrigIns(I) {}
1215 };
1217 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1218 SmallSetVector<AllocaInst *, 16> AllocaSet;
1221 int64_t SplittableBlocksCount = 0;
1222
1223 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1224 const TargetLibraryInfo &TLI)
1225 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1226 bool SanitizeFunction =
1227 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1228 InsertChecks = SanitizeFunction;
1229 PropagateShadow = SanitizeFunction;
1230 PoisonStack = SanitizeFunction && ClPoisonStack;
1231 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1232 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1233
1234 // In the presence of unreachable blocks, we may see Phi nodes with
1235 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1236 // blocks, such nodes will not have any shadow value associated with them.
1237 // It's easier to remove unreachable blocks than deal with missing shadow.
1239
1240 MS.initializeCallbacks(*F.getParent(), TLI);
1241 FnPrologueEnd =
1242 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1243 .CreateIntrinsic(Intrinsic::donothing, {});
1244
1245 if (MS.CompileKernel) {
1246 IRBuilder<> IRB(FnPrologueEnd);
1247 insertKmsanPrologue(IRB);
1248 }
1249
1250 LLVM_DEBUG(if (!InsertChecks) dbgs()
1251 << "MemorySanitizer is not inserting checks into '"
1252 << F.getName() << "'\n");
1253 }
1254
1255 bool instrumentWithCalls(Value *V) {
1256 // Constants likely will be eliminated by follow-up passes.
1257 if (isa<Constant>(V))
1258 return false;
1259 ++SplittableBlocksCount;
1261 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1262 }
1263
1264 bool isInPrologue(Instruction &I) {
1265 return I.getParent() == FnPrologueEnd->getParent() &&
1266 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1267 }
1268
1269 // Creates a new origin and records the stack trace. In general we can call
1270 // this function for any origin manipulation we like. However it will cost
1271 // runtime resources. So use this wisely only if it can provide additional
1272 // information helpful to a user.
1273 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1274 if (MS.TrackOrigins <= 1)
1275 return V;
1276 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1277 }
1278
1279 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1280 const DataLayout &DL = F.getDataLayout();
1281 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1282 if (IntptrSize == kOriginSize)
1283 return Origin;
1284 assert(IntptrSize == kOriginSize * 2);
1285 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1286 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1287 }
1288
1289 /// Fill memory range with the given origin value.
1290 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1291 TypeSize TS, Align Alignment) {
1292 const DataLayout &DL = F.getDataLayout();
1293 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1294 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1295 assert(IntptrAlignment >= kMinOriginAlignment);
1296 assert(IntptrSize >= kOriginSize);
1297
1298 // Note: The loop based formation works for fixed length vectors too,
1299 // however we prefer to unroll and specialize alignment below.
1300 if (TS.isScalable()) {
1301 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1302 Value *RoundUp =
1303 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1304 Value *End =
1305 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1306 auto [InsertPt, Index] =
1308 IRB.SetInsertPoint(InsertPt);
1309
1310 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1312 return;
1313 }
1314
1315 unsigned Size = TS.getFixedValue();
1316
1317 unsigned Ofs = 0;
1318 Align CurrentAlignment = Alignment;
1319 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1320 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1321 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1322 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1323 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1324 : IntptrOriginPtr;
1325 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1326 Ofs += IntptrSize / kOriginSize;
1327 CurrentAlignment = IntptrAlignment;
1328 }
1329 }
1330
1331 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1332 Value *GEP =
1333 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1334 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1335 CurrentAlignment = kMinOriginAlignment;
1336 }
1337 }
1338
1339 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1340 Value *OriginPtr, Align Alignment) {
1341 const DataLayout &DL = F.getDataLayout();
1342 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1343 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1344 // ZExt cannot convert between vector and scalar
1345 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1346 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1347 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1348 // Origin is not needed: value is initialized or const shadow is
1349 // ignored.
1350 return;
1351 }
1352 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1353 // Copy origin as the value is definitely uninitialized.
1354 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1355 OriginAlignment);
1356 return;
1357 }
1358 // Fallback to runtime check, which still can be optimized out later.
1359 }
1360
1361 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1362 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1363 if (instrumentWithCalls(ConvertedShadow) &&
1364 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1365 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1366 Value *ConvertedShadow2 =
1367 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1368 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1369 CB->addParamAttr(0, Attribute::ZExt);
1370 CB->addParamAttr(2, Attribute::ZExt);
1371 } else {
1372 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1374 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1375 IRBuilder<> IRBNew(CheckTerm);
1376 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1377 OriginAlignment);
1378 }
1379 }
1380
1381 void materializeStores() {
1382 for (StoreInst *SI : StoreList) {
1383 IRBuilder<> IRB(SI);
1384 Value *Val = SI->getValueOperand();
1385 Value *Addr = SI->getPointerOperand();
1386 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1387 Value *ShadowPtr, *OriginPtr;
1388 Type *ShadowTy = Shadow->getType();
1389 const Align Alignment = SI->getAlign();
1390 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1391 std::tie(ShadowPtr, OriginPtr) =
1392 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1393
1394 [[maybe_unused]] StoreInst *NewSI =
1395 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1396 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1397
1398 if (SI->isAtomic())
1399 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1400
1401 if (MS.TrackOrigins && !SI->isAtomic())
1402 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1403 OriginAlignment);
1404 }
1405 }
1406
1407 // Returns true if Debug Location corresponds to multiple warnings.
1408 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1409 if (MS.TrackOrigins < 2)
1410 return false;
1411
1412 if (LazyWarningDebugLocationCount.empty())
1413 for (const auto &I : InstrumentationList)
1414 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1415
1416 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1417 }
1418
1419 /// Helper function to insert a warning at IRB's current insert point.
1420 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1421 if (!Origin)
1422 Origin = (Value *)IRB.getInt32(0);
1423 assert(Origin->getType()->isIntegerTy());
1424
1425 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1426 // Try to create additional origin with debug info of the last origin
1427 // instruction. It may provide additional information to the user.
1428 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1429 assert(MS.TrackOrigins);
1430 auto NewDebugLoc = OI->getDebugLoc();
1431 // Origin update with missing or the same debug location provides no
1432 // additional value.
1433 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1434 // Insert update just before the check, so we call runtime only just
1435 // before the report.
1436 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1437 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1438 Origin = updateOrigin(Origin, IRBOrigin);
1439 }
1440 }
1441 }
1442
1443 if (MS.CompileKernel || MS.TrackOrigins)
1444 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1445 else
1446 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1447 // FIXME: Insert UnreachableInst if !MS.Recover?
1448 // This may invalidate some of the following checks and needs to be done
1449 // at the very end.
1450 }
1451
1452 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1453 Value *Origin) {
1454 const DataLayout &DL = F.getDataLayout();
1455 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1456 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1457 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1458 // ZExt cannot convert between vector and scalar
1459 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1460 Value *ConvertedShadow2 =
1461 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1462
1463 if (SizeIndex < kNumberOfAccessSizes) {
1464 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1465 CallBase *CB = IRB.CreateCall(
1466 Fn,
1467 {ConvertedShadow2,
1468 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1469 CB->addParamAttr(0, Attribute::ZExt);
1470 CB->addParamAttr(1, Attribute::ZExt);
1471 } else {
1472 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1473 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1474 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1475 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1476 CallBase *CB = IRB.CreateCall(
1477 Fn,
1478 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1479 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1480 CB->addParamAttr(1, Attribute::ZExt);
1481 CB->addParamAttr(2, Attribute::ZExt);
1482 }
1483 } else {
1484 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1486 Cmp, &*IRB.GetInsertPoint(),
1487 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1488
1489 IRB.SetInsertPoint(CheckTerm);
1490 insertWarningFn(IRB, Origin);
1491 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1492 }
1493 }
1494
1495 void materializeInstructionChecks(
1496 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1497 const DataLayout &DL = F.getDataLayout();
1498 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1499 // correct origin.
1500 bool Combine = !MS.TrackOrigins;
1501 Instruction *Instruction = InstructionChecks.front().OrigIns;
1502 Value *Shadow = nullptr;
1503 for (const auto &ShadowData : InstructionChecks) {
1504 assert(ShadowData.OrigIns == Instruction);
1505 IRBuilder<> IRB(Instruction);
1506
1507 Value *ConvertedShadow = ShadowData.Shadow;
1508
1509 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1510 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1511 // Skip, value is initialized or const shadow is ignored.
1512 continue;
1513 }
1514 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1515 // Report as the value is definitely uninitialized.
1516 insertWarningFn(IRB, ShadowData.Origin);
1517 if (!MS.Recover)
1518 return; // Always fail and stop here, not need to check the rest.
1519 // Skip entire instruction,
1520 continue;
1521 }
1522 // Fallback to runtime check, which still can be optimized out later.
1523 }
1524
1525 if (!Combine) {
1526 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1527 continue;
1528 }
1529
1530 if (!Shadow) {
1531 Shadow = ConvertedShadow;
1532 continue;
1533 }
1534
1535 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1536 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1537 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1538 }
1539
1540 if (Shadow) {
1541 assert(Combine);
1542 IRBuilder<> IRB(Instruction);
1543 materializeOneCheck(IRB, Shadow, nullptr);
1544 }
1545 }
1546
1547 void materializeChecks() {
1548#ifndef NDEBUG
1549 // For assert below.
1550 SmallPtrSet<Instruction *, 16> Done;
1551#endif
1552
1553 for (auto I = InstrumentationList.begin();
1554 I != InstrumentationList.end();) {
1555 auto OrigIns = I->OrigIns;
1556 // Checks are grouped by the original instruction. We call all
1557 // `insertShadowCheck` for an instruction at once.
1558 assert(Done.insert(OrigIns).second);
1559 auto J = std::find_if(I + 1, InstrumentationList.end(),
1560 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1561 return OrigIns != R.OrigIns;
1562 });
1563 // Process all checks of instruction at once.
1564 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1565 I = J;
1566 }
1567
1568 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1569 }
1570
1571 // Returns the last instruction in the new prologue
1572 void insertKmsanPrologue(IRBuilder<> &IRB) {
1573 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1574 Constant *Zero = IRB.getInt32(0);
1575 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1576 {Zero, IRB.getInt32(0)}, "param_shadow");
1577 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1578 {Zero, IRB.getInt32(1)}, "retval_shadow");
1579 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1580 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1581 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1582 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1583 MS.VAArgOverflowSizeTLS =
1584 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1585 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1586 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1587 {Zero, IRB.getInt32(5)}, "param_origin");
1588 MS.RetvalOriginTLS =
1589 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1590 {Zero, IRB.getInt32(6)}, "retval_origin");
1591 if (MS.TargetTriple.getArch() == Triple::systemz)
1592 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1593 }
1594
1595 /// Add MemorySanitizer instrumentation to a function.
1596 bool runOnFunction() {
1597 // Iterate all BBs in depth-first order and create shadow instructions
1598 // for all instructions (where applicable).
1599 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1600 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1601 visit(*BB);
1602
1603 // `visit` above only collects instructions. Process them after iterating
1604 // CFG to avoid requirement on CFG transformations.
1605 for (Instruction *I : Instructions)
1607
1608 // Finalize PHI nodes.
1609 for (PHINode *PN : ShadowPHINodes) {
1610 PHINode *PNS = cast<PHINode>(getShadow(PN));
1611 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1612 size_t NumValues = PN->getNumIncomingValues();
1613 for (size_t v = 0; v < NumValues; v++) {
1614 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1615 if (PNO)
1616 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1617 }
1618 }
1619
1620 VAHelper->finalizeInstrumentation();
1621
1622 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1623 // instrumenting only allocas.
1625 for (auto Item : LifetimeStartList) {
1626 instrumentAlloca(*Item.second, Item.first);
1627 AllocaSet.remove(Item.second);
1628 }
1629 }
1630 // Poison the allocas for which we didn't instrument the corresponding
1631 // lifetime intrinsics.
1632 for (AllocaInst *AI : AllocaSet)
1633 instrumentAlloca(*AI);
1634
1635 // Insert shadow value checks.
1636 materializeChecks();
1637
1638 // Delayed instrumentation of StoreInst.
1639 // This may not add new address checks.
1640 materializeStores();
1641
1642 return true;
1643 }
1644
1645 /// Compute the shadow type that corresponds to a given Value.
1646 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1647
1648 /// Compute the shadow type that corresponds to a given Type.
1649 Type *getShadowTy(Type *OrigTy) {
1650 if (!OrigTy->isSized()) {
1651 return nullptr;
1652 }
1653 // For integer type, shadow is the same as the original type.
1654 // This may return weird-sized types like i1.
1655 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1656 return IT;
1657 const DataLayout &DL = F.getDataLayout();
1658 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1659 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1660 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1661 VT->getElementCount());
1662 }
1663 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1664 return ArrayType::get(getShadowTy(AT->getElementType()),
1665 AT->getNumElements());
1666 }
1667 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1669 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1670 Elements.push_back(getShadowTy(ST->getElementType(i)));
1671 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1672 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1673 return Res;
1674 }
1675 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1676 return IntegerType::get(*MS.C, TypeSize);
1677 }
1678
1679 /// Extract combined shadow of struct elements as a bool
1680 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1681 IRBuilder<> &IRB) {
1682 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1683 Value *Aggregator = FalseVal;
1684
1685 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1686 // Combine by ORing together each element's bool shadow
1687 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1688 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1689
1690 if (Aggregator != FalseVal)
1691 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1692 else
1693 Aggregator = ShadowBool;
1694 }
1695
1696 return Aggregator;
1697 }
1698
1699 // Extract combined shadow of array elements
1700 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1701 IRBuilder<> &IRB) {
1702 if (!Array->getNumElements())
1703 return IRB.getIntN(/* width */ 1, /* value */ 0);
1704
1705 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1706 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1707
1708 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1709 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1710 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1711 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1712 }
1713 return Aggregator;
1714 }
1715
1716 /// Convert a shadow value to it's flattened variant. The resulting
1717 /// shadow may not necessarily have the same bit width as the input
1718 /// value, but it will always be comparable to zero.
1719 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1720 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1721 return collapseStructShadow(Struct, V, IRB);
1722 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1723 return collapseArrayShadow(Array, V, IRB);
1724 if (isa<VectorType>(V->getType())) {
1725 if (isa<ScalableVectorType>(V->getType()))
1726 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1727 unsigned BitWidth =
1728 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1729 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1730 }
1731 return V;
1732 }
1733
1734 // Convert a scalar value to an i1 by comparing with 0
1735 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1736 Type *VTy = V->getType();
1737 if (!VTy->isIntegerTy())
1738 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1739 if (VTy->getIntegerBitWidth() == 1)
1740 // Just converting a bool to a bool, so do nothing.
1741 return V;
1742 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1743 }
1744
1745 Type *ptrToIntPtrType(Type *PtrTy) const {
1746 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1747 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1748 VectTy->getElementCount());
1749 }
1750 assert(PtrTy->isIntOrPtrTy());
1751 return MS.IntptrTy;
1752 }
1753
1754 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1755 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1756 return VectorType::get(
1757 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1758 VectTy->getElementCount());
1759 }
1760 assert(IntPtrTy == MS.IntptrTy);
1761 return MS.PtrTy;
1762 }
1763
1764 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1765 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1767 VectTy->getElementCount(),
1768 constToIntPtr(VectTy->getElementType(), C));
1769 }
1770 assert(IntPtrTy == MS.IntptrTy);
1771 return ConstantInt::get(MS.IntptrTy, C);
1772 }
1773
1774 /// Returns the integer shadow offset that corresponds to a given
1775 /// application address, whereby:
1776 ///
1777 /// Offset = (Addr & ~AndMask) ^ XorMask
1778 /// Shadow = ShadowBase + Offset
1779 /// Origin = (OriginBase + Offset) & ~Alignment
1780 ///
1781 /// Note: for efficiency, many shadow mappings only require use the XorMask
1782 /// and OriginBase; the AndMask and ShadowBase are often zero.
1783 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1784 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1785 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1786
1787 if (uint64_t AndMask = MS.MapParams->AndMask)
1788 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1789
1790 if (uint64_t XorMask = MS.MapParams->XorMask)
1791 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1792 return OffsetLong;
1793 }
1794
1795 /// Compute the shadow and origin addresses corresponding to a given
1796 /// application address.
1797 ///
1798 /// Shadow = ShadowBase + Offset
1799 /// Origin = (OriginBase + Offset) & ~3ULL
1800 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1801 /// a single pointee.
1802 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1803 std::pair<Value *, Value *>
1804 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1805 MaybeAlign Alignment) {
1806 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1807 if (!VectTy) {
1808 assert(Addr->getType()->isPointerTy());
1809 } else {
1810 assert(VectTy->getElementType()->isPointerTy());
1811 }
1812 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1813 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1814 Value *ShadowLong = ShadowOffset;
1815 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1816 ShadowLong =
1817 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1818 }
1819 Value *ShadowPtr = IRB.CreateIntToPtr(
1820 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1821
1822 Value *OriginPtr = nullptr;
1823 if (MS.TrackOrigins) {
1824 Value *OriginLong = ShadowOffset;
1825 uint64_t OriginBase = MS.MapParams->OriginBase;
1826 if (OriginBase != 0)
1827 OriginLong =
1828 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1829 if (!Alignment || *Alignment < kMinOriginAlignment) {
1830 uint64_t Mask = kMinOriginAlignment.value() - 1;
1831 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1832 }
1833 OriginPtr = IRB.CreateIntToPtr(
1834 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1835 }
1836 return std::make_pair(ShadowPtr, OriginPtr);
1837 }
1838
1839 template <typename... ArgsTy>
1840 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1841 ArgsTy... Args) {
1842 if (MS.TargetTriple.getArch() == Triple::systemz) {
1843 IRB.CreateCall(Callee,
1844 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1845 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1846 }
1847
1848 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1849 }
1850
1851 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1852 IRBuilder<> &IRB,
1853 Type *ShadowTy,
1854 bool isStore) {
1855 Value *ShadowOriginPtrs;
1856 const DataLayout &DL = F.getDataLayout();
1857 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1858
1859 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1860 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1861 if (Getter) {
1862 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1863 } else {
1864 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1865 ShadowOriginPtrs = createMetadataCall(
1866 IRB,
1867 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1868 AddrCast, SizeVal);
1869 }
1870 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1871 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1872 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1873
1874 return std::make_pair(ShadowPtr, OriginPtr);
1875 }
1876
1877 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1878 /// a single pointee.
1879 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1880 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1881 IRBuilder<> &IRB,
1882 Type *ShadowTy,
1883 bool isStore) {
1884 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1885 if (!VectTy) {
1886 assert(Addr->getType()->isPointerTy());
1887 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1888 }
1889
1890 // TODO: Support callbacs with vectors of addresses.
1891 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1892 Value *ShadowPtrs = ConstantInt::getNullValue(
1893 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1894 Value *OriginPtrs = nullptr;
1895 if (MS.TrackOrigins)
1896 OriginPtrs = ConstantInt::getNullValue(
1897 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1898 for (unsigned i = 0; i < NumElements; ++i) {
1899 Value *OneAddr =
1900 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1901 auto [ShadowPtr, OriginPtr] =
1902 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1903
1904 ShadowPtrs = IRB.CreateInsertElement(
1905 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1906 if (MS.TrackOrigins)
1907 OriginPtrs = IRB.CreateInsertElement(
1908 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1909 }
1910 return {ShadowPtrs, OriginPtrs};
1911 }
1912
1913 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1914 Type *ShadowTy,
1915 MaybeAlign Alignment,
1916 bool isStore) {
1917 if (MS.CompileKernel)
1918 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1919 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1920 }
1921
1922 /// Compute the shadow address for a given function argument.
1923 ///
1924 /// Shadow = ParamTLS+ArgOffset.
1925 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1926 Value *Base = IRB.CreatePointerCast(MS.ParamTLS, MS.IntptrTy);
1927 if (ArgOffset)
1928 Base = IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
1929 return IRB.CreateIntToPtr(Base, IRB.getPtrTy(0), "_msarg");
1930 }
1931
1932 /// Compute the origin address for a given function argument.
1933 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1934 if (!MS.TrackOrigins)
1935 return nullptr;
1936 Value *Base = IRB.CreatePointerCast(MS.ParamOriginTLS, MS.IntptrTy);
1937 if (ArgOffset)
1938 Base = IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
1939 return IRB.CreateIntToPtr(Base, IRB.getPtrTy(0), "_msarg_o");
1940 }
1941
1942 /// Compute the shadow address for a retval.
1943 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1944 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1945 }
1946
1947 /// Compute the origin address for a retval.
1948 Value *getOriginPtrForRetval() {
1949 // We keep a single origin for the entire retval. Might be too optimistic.
1950 return MS.RetvalOriginTLS;
1951 }
1952
1953 /// Set SV to be the shadow value for V.
1954 void setShadow(Value *V, Value *SV) {
1955 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1956 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1957 }
1958
1959 /// Set Origin to be the origin value for V.
1960 void setOrigin(Value *V, Value *Origin) {
1961 if (!MS.TrackOrigins)
1962 return;
1963 assert(!OriginMap.count(V) && "Values may only have one origin");
1964 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1965 OriginMap[V] = Origin;
1966 }
1967
1968 Constant *getCleanShadow(Type *OrigTy) {
1969 Type *ShadowTy = getShadowTy(OrigTy);
1970 if (!ShadowTy)
1971 return nullptr;
1972 return Constant::getNullValue(ShadowTy);
1973 }
1974
1975 /// Create a clean shadow value for a given value.
1976 ///
1977 /// Clean shadow (all zeroes) means all bits of the value are defined
1978 /// (initialized).
1979 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
1980
1981 /// Create a dirty shadow of a given shadow type.
1982 Constant *getPoisonedShadow(Type *ShadowTy) {
1983 assert(ShadowTy);
1984 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
1985 return Constant::getAllOnesValue(ShadowTy);
1986 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
1987 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
1988 getPoisonedShadow(AT->getElementType()));
1989 return ConstantArray::get(AT, Vals);
1990 }
1991 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
1993 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1994 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
1995 return ConstantStruct::get(ST, Vals);
1996 }
1997 llvm_unreachable("Unexpected shadow type");
1998 }
1999
2000 /// Create a dirty shadow for a given value.
2001 Constant *getPoisonedShadow(Value *V) {
2002 Type *ShadowTy = getShadowTy(V);
2003 if (!ShadowTy)
2004 return nullptr;
2005 return getPoisonedShadow(ShadowTy);
2006 }
2007
2008 /// Create a clean (zero) origin.
2009 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2010
2011 /// Get the shadow value for a given Value.
2012 ///
2013 /// This function either returns the value set earlier with setShadow,
2014 /// or extracts if from ParamTLS (for function arguments).
2015 Value *getShadow(Value *V) {
2016 if (Instruction *I = dyn_cast<Instruction>(V)) {
2017 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2018 return getCleanShadow(V);
2019 // For instructions the shadow is already stored in the map.
2020 Value *Shadow = ShadowMap[V];
2021 if (!Shadow) {
2022 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2023 assert(Shadow && "No shadow for a value");
2024 }
2025 return Shadow;
2026 }
2027 // Handle fully undefined values
2028 // (partially undefined constant vectors are handled later)
2029 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2030 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2031 : getCleanShadow(V);
2032 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2033 return AllOnes;
2034 }
2035 if (Argument *A = dyn_cast<Argument>(V)) {
2036 // For arguments we compute the shadow on demand and store it in the map.
2037 Value *&ShadowPtr = ShadowMap[V];
2038 if (ShadowPtr)
2039 return ShadowPtr;
2040 Function *F = A->getParent();
2041 IRBuilder<> EntryIRB(FnPrologueEnd);
2042 unsigned ArgOffset = 0;
2043 const DataLayout &DL = F->getDataLayout();
2044 for (auto &FArg : F->args()) {
2045 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2046 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2047 ? "vscale not fully supported\n"
2048 : "Arg is not sized\n"));
2049 if (A == &FArg) {
2050 ShadowPtr = getCleanShadow(V);
2051 setOrigin(A, getCleanOrigin());
2052 break;
2053 }
2054 continue;
2055 }
2056
2057 unsigned Size = FArg.hasByValAttr()
2058 ? DL.getTypeAllocSize(FArg.getParamByValType())
2059 : DL.getTypeAllocSize(FArg.getType());
2060
2061 if (A == &FArg) {
2062 bool Overflow = ArgOffset + Size > kParamTLSSize;
2063 if (FArg.hasByValAttr()) {
2064 // ByVal pointer itself has clean shadow. We copy the actual
2065 // argument shadow to the underlying memory.
2066 // Figure out maximal valid memcpy alignment.
2067 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2068 FArg.getParamAlign(), FArg.getParamByValType());
2069 Value *CpShadowPtr, *CpOriginPtr;
2070 std::tie(CpShadowPtr, CpOriginPtr) =
2071 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2072 /*isStore*/ true);
2073 if (!PropagateShadow || Overflow) {
2074 // ParamTLS overflow.
2075 EntryIRB.CreateMemSet(
2076 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2077 Size, ArgAlign);
2078 } else {
2079 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2080 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2081 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2082 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2083 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2084
2085 if (MS.TrackOrigins) {
2086 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2087 // FIXME: OriginSize should be:
2088 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2089 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2090 EntryIRB.CreateMemCpy(
2091 CpOriginPtr,
2092 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2093 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2094 OriginSize);
2095 }
2096 }
2097 }
2098
2099 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2100 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2101 ShadowPtr = getCleanShadow(V);
2102 setOrigin(A, getCleanOrigin());
2103 } else {
2104 // Shadow over TLS
2105 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2106 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2108 if (MS.TrackOrigins) {
2109 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2110 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2111 }
2112 }
2114 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2115 break;
2116 }
2117
2118 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2119 }
2120 assert(ShadowPtr && "Could not find shadow for an argument");
2121 return ShadowPtr;
2122 }
2123
2124 // Check for partially-undefined constant vectors
2125 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2126 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2127 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2128 PoisonUndefVectors) {
2129 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2130 SmallVector<Constant *, 32> ShadowVector(NumElems);
2131 for (unsigned i = 0; i != NumElems; ++i) {
2132 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2133 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2134 : getCleanShadow(Elem);
2135 }
2136
2137 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2138 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2139 << *ShadowConstant << "\n");
2140
2141 return ShadowConstant;
2142 }
2143
2144 // TODO: partially-undefined constant arrays, structures, and nested types
2145
2146 // For everything else the shadow is zero.
2147 return getCleanShadow(V);
2148 }
2149
2150 /// Get the shadow for i-th argument of the instruction I.
2151 Value *getShadow(Instruction *I, int i) {
2152 return getShadow(I->getOperand(i));
2153 }
2154
2155 /// Get the origin for a value.
2156 Value *getOrigin(Value *V) {
2157 if (!MS.TrackOrigins)
2158 return nullptr;
2159 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2160 return getCleanOrigin();
2162 "Unexpected value type in getOrigin()");
2163 if (Instruction *I = dyn_cast<Instruction>(V)) {
2164 if (I->getMetadata(LLVMContext::MD_nosanitize))
2165 return getCleanOrigin();
2166 }
2167 Value *Origin = OriginMap[V];
2168 assert(Origin && "Missing origin");
2169 return Origin;
2170 }
2171
2172 /// Get the origin for i-th argument of the instruction I.
2173 Value *getOrigin(Instruction *I, int i) {
2174 return getOrigin(I->getOperand(i));
2175 }
2176
2177 /// Remember the place where a shadow check should be inserted.
2178 ///
2179 /// This location will be later instrumented with a check that will print a
2180 /// UMR warning in runtime if the shadow value is not 0.
2181 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2182 assert(Shadow);
2183 if (!InsertChecks)
2184 return;
2185
2186 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2187 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2188 << *OrigIns << "\n");
2189 return;
2190 }
2191#ifndef NDEBUG
2192 Type *ShadowTy = Shadow->getType();
2193 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2194 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2195 "Can only insert checks for integer, vector, and aggregate shadow "
2196 "types");
2197#endif
2198 InstrumentationList.push_back(
2199 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2200 }
2201
2202 /// Get shadow for value, and remember the place where a shadow check should
2203 /// be inserted.
2204 ///
2205 /// This location will be later instrumented with a check that will print a
2206 /// UMR warning in runtime if the value is not fully defined.
2207 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2208 assert(Val);
2209 Value *Shadow, *Origin;
2211 Shadow = getShadow(Val);
2212 if (!Shadow)
2213 return;
2214 Origin = getOrigin(Val);
2215 } else {
2216 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2217 if (!Shadow)
2218 return;
2219 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2220 }
2221 insertCheckShadow(Shadow, Origin, OrigIns);
2222 }
2223
2225 switch (a) {
2226 case AtomicOrdering::NotAtomic:
2227 return AtomicOrdering::NotAtomic;
2228 case AtomicOrdering::Unordered:
2229 case AtomicOrdering::Monotonic:
2230 case AtomicOrdering::Release:
2231 return AtomicOrdering::Release;
2232 case AtomicOrdering::Acquire:
2233 case AtomicOrdering::AcquireRelease:
2234 return AtomicOrdering::AcquireRelease;
2235 case AtomicOrdering::SequentiallyConsistent:
2236 return AtomicOrdering::SequentiallyConsistent;
2237 }
2238 llvm_unreachable("Unknown ordering");
2239 }
2240
2241 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2242 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2243 uint32_t OrderingTable[NumOrderings] = {};
2244
2245 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2246 OrderingTable[(int)AtomicOrderingCABI::release] =
2247 (int)AtomicOrderingCABI::release;
2248 OrderingTable[(int)AtomicOrderingCABI::consume] =
2249 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2250 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2251 (int)AtomicOrderingCABI::acq_rel;
2252 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2253 (int)AtomicOrderingCABI::seq_cst;
2254
2255 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2256 }
2257
2259 switch (a) {
2260 case AtomicOrdering::NotAtomic:
2261 return AtomicOrdering::NotAtomic;
2262 case AtomicOrdering::Unordered:
2263 case AtomicOrdering::Monotonic:
2264 case AtomicOrdering::Acquire:
2265 return AtomicOrdering::Acquire;
2266 case AtomicOrdering::Release:
2267 case AtomicOrdering::AcquireRelease:
2268 return AtomicOrdering::AcquireRelease;
2269 case AtomicOrdering::SequentiallyConsistent:
2270 return AtomicOrdering::SequentiallyConsistent;
2271 }
2272 llvm_unreachable("Unknown ordering");
2273 }
2274
2275 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2276 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2277 uint32_t OrderingTable[NumOrderings] = {};
2278
2279 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2280 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2281 OrderingTable[(int)AtomicOrderingCABI::consume] =
2282 (int)AtomicOrderingCABI::acquire;
2283 OrderingTable[(int)AtomicOrderingCABI::release] =
2284 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2285 (int)AtomicOrderingCABI::acq_rel;
2286 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2287 (int)AtomicOrderingCABI::seq_cst;
2288
2289 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2290 }
2291
2292 // ------------------- Visitors.
2293 using InstVisitor<MemorySanitizerVisitor>::visit;
2294 void visit(Instruction &I) {
2295 if (I.getMetadata(LLVMContext::MD_nosanitize))
2296 return;
2297 // Don't want to visit if we're in the prologue
2298 if (isInPrologue(I))
2299 return;
2300 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2301 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2302 // We still need to set the shadow and origin to clean values.
2303 setShadow(&I, getCleanShadow(&I));
2304 setOrigin(&I, getCleanOrigin());
2305 return;
2306 }
2307
2308 Instructions.push_back(&I);
2309 }
2310
2311 /// Instrument LoadInst
2312 ///
2313 /// Loads the corresponding shadow and (optionally) origin.
2314 /// Optionally, checks that the load address is fully defined.
2315 void visitLoadInst(LoadInst &I) {
2316 assert(I.getType()->isSized() && "Load type must have size");
2317 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2318 NextNodeIRBuilder IRB(&I);
2319 Type *ShadowTy = getShadowTy(&I);
2320 Value *Addr = I.getPointerOperand();
2321 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2322 const Align Alignment = I.getAlign();
2323 if (PropagateShadow) {
2324 std::tie(ShadowPtr, OriginPtr) =
2325 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2326 setShadow(&I,
2327 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2328 } else {
2329 setShadow(&I, getCleanShadow(&I));
2330 }
2331
2333 insertCheckShadowOf(I.getPointerOperand(), &I);
2334
2335 if (I.isAtomic())
2336 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2337
2338 if (MS.TrackOrigins) {
2339 if (PropagateShadow) {
2340 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2341 setOrigin(
2342 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2343 } else {
2344 setOrigin(&I, getCleanOrigin());
2345 }
2346 }
2347 }
2348
2349 /// Instrument StoreInst
2350 ///
2351 /// Stores the corresponding shadow and (optionally) origin.
2352 /// Optionally, checks that the store address is fully defined.
2353 void visitStoreInst(StoreInst &I) {
2354 StoreList.push_back(&I);
2356 insertCheckShadowOf(I.getPointerOperand(), &I);
2357 }
2358
2359 void handleCASOrRMW(Instruction &I) {
2361
2362 IRBuilder<> IRB(&I);
2363 Value *Addr = I.getOperand(0);
2364 Value *Val = I.getOperand(1);
2365 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2366 /*isStore*/ true)
2367 .first;
2368
2370 insertCheckShadowOf(Addr, &I);
2371
2372 // Only test the conditional argument of cmpxchg instruction.
2373 // The other argument can potentially be uninitialized, but we can not
2374 // detect this situation reliably without possible false positives.
2376 insertCheckShadowOf(Val, &I);
2377
2378 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2379
2380 setShadow(&I, getCleanShadow(&I));
2381 setOrigin(&I, getCleanOrigin());
2382 }
2383
2384 void visitAtomicRMWInst(AtomicRMWInst &I) {
2385 handleCASOrRMW(I);
2386 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2387 }
2388
2389 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2390 handleCASOrRMW(I);
2391 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2392 }
2393
2394 // Vector manipulation.
2395 void visitExtractElementInst(ExtractElementInst &I) {
2396 insertCheckShadowOf(I.getOperand(1), &I);
2397 IRBuilder<> IRB(&I);
2398 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2399 "_msprop"));
2400 setOrigin(&I, getOrigin(&I, 0));
2401 }
2402
2403 void visitInsertElementInst(InsertElementInst &I) {
2404 insertCheckShadowOf(I.getOperand(2), &I);
2405 IRBuilder<> IRB(&I);
2406 auto *Shadow0 = getShadow(&I, 0);
2407 auto *Shadow1 = getShadow(&I, 1);
2408 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2409 "_msprop"));
2410 setOriginForNaryOp(I);
2411 }
2412
2413 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2414 IRBuilder<> IRB(&I);
2415 auto *Shadow0 = getShadow(&I, 0);
2416 auto *Shadow1 = getShadow(&I, 1);
2417 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2418 "_msprop"));
2419 setOriginForNaryOp(I);
2420 }
2421
2422 // Casts.
2423 void visitSExtInst(SExtInst &I) {
2424 IRBuilder<> IRB(&I);
2425 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2426 setOrigin(&I, getOrigin(&I, 0));
2427 }
2428
2429 void visitZExtInst(ZExtInst &I) {
2430 IRBuilder<> IRB(&I);
2431 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2432 setOrigin(&I, getOrigin(&I, 0));
2433 }
2434
2435 void visitTruncInst(TruncInst &I) {
2436 IRBuilder<> IRB(&I);
2437 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2438 setOrigin(&I, getOrigin(&I, 0));
2439 }
2440
2441 void visitBitCastInst(BitCastInst &I) {
2442 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2443 // a musttail call and a ret, don't instrument. New instructions are not
2444 // allowed after a musttail call.
2445 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2446 if (CI->isMustTailCall())
2447 return;
2448 IRBuilder<> IRB(&I);
2449 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2450 setOrigin(&I, getOrigin(&I, 0));
2451 }
2452
2453 void visitPtrToIntInst(PtrToIntInst &I) {
2454 IRBuilder<> IRB(&I);
2455 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2456 "_msprop_ptrtoint"));
2457 setOrigin(&I, getOrigin(&I, 0));
2458 }
2459
2460 void visitIntToPtrInst(IntToPtrInst &I) {
2461 IRBuilder<> IRB(&I);
2462 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2463 "_msprop_inttoptr"));
2464 setOrigin(&I, getOrigin(&I, 0));
2465 }
2466
2467 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2468 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2469 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2470 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2471 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2472 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2473
2474 /// Propagate shadow for bitwise AND.
2475 ///
2476 /// This code is exact, i.e. if, for example, a bit in the left argument
2477 /// is defined and 0, then neither the value not definedness of the
2478 /// corresponding bit in B don't affect the resulting shadow.
2479 void visitAnd(BinaryOperator &I) {
2480 IRBuilder<> IRB(&I);
2481 // "And" of 0 and a poisoned value results in unpoisoned value.
2482 // 1&1 => 1; 0&1 => 0; p&1 => p;
2483 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2484 // 1&p => p; 0&p => 0; p&p => p;
2485 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2486 Value *S1 = getShadow(&I, 0);
2487 Value *S2 = getShadow(&I, 1);
2488 Value *V1 = I.getOperand(0);
2489 Value *V2 = I.getOperand(1);
2490 if (V1->getType() != S1->getType()) {
2491 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2492 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2493 }
2494 Value *S1S2 = IRB.CreateAnd(S1, S2);
2495 Value *V1S2 = IRB.CreateAnd(V1, S2);
2496 Value *S1V2 = IRB.CreateAnd(S1, V2);
2497 setShadow(&I, IRB.CreateOr({S1S2, V1S2, S1V2}));
2498 setOriginForNaryOp(I);
2499 }
2500
2501 void visitOr(BinaryOperator &I) {
2502 IRBuilder<> IRB(&I);
2503 // "Or" of 1 and a poisoned value results in unpoisoned value:
2504 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2505 // 1|0 => 1; 0|0 => 0; p|0 => p;
2506 // 1|p => 1; 0|p => p; p|p => p;
2507 //
2508 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2509 //
2510 // If the "disjoint OR" property is violated, the result is poison, and
2511 // hence the entire shadow is uninitialized:
2512 // S = S | SignExt(V1 & V2 != 0)
2513 Value *S1 = getShadow(&I, 0);
2514 Value *S2 = getShadow(&I, 1);
2515 Value *V1 = I.getOperand(0);
2516 Value *V2 = I.getOperand(1);
2517 if (V1->getType() != S1->getType()) {
2518 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2519 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2520 }
2521
2522 Value *NotV1 = IRB.CreateNot(V1);
2523 Value *NotV2 = IRB.CreateNot(V2);
2524
2525 Value *S1S2 = IRB.CreateAnd(S1, S2);
2526 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2527 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2528
2529 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2530
2531 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2532 Value *V1V2 = IRB.CreateAnd(V1, V2);
2533 Value *DisjointOrShadow = IRB.CreateSExt(
2534 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2535 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2536 }
2537
2538 setShadow(&I, S);
2539 setOriginForNaryOp(I);
2540 }
2541
2542 /// Default propagation of shadow and/or origin.
2543 ///
2544 /// This class implements the general case of shadow propagation, used in all
2545 /// cases where we don't know and/or don't care about what the operation
2546 /// actually does. It converts all input shadow values to a common type
2547 /// (extending or truncating as necessary), and bitwise OR's them.
2548 ///
2549 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2550 /// fully initialized), and less prone to false positives.
2551 ///
2552 /// This class also implements the general case of origin propagation. For a
2553 /// Nary operation, result origin is set to the origin of an argument that is
2554 /// not entirely initialized. If there is more than one such arguments, the
2555 /// rightmost of them is picked. It does not matter which one is picked if all
2556 /// arguments are initialized.
2557 template <bool CombineShadow> class Combiner {
2558 Value *Shadow = nullptr;
2559 Value *Origin = nullptr;
2560 IRBuilder<> &IRB;
2561 MemorySanitizerVisitor *MSV;
2562
2563 public:
2564 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2565 : IRB(IRB), MSV(MSV) {}
2566
2567 /// Add a pair of shadow and origin values to the mix.
2568 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2569 if (CombineShadow) {
2570 assert(OpShadow);
2571 if (!Shadow)
2572 Shadow = OpShadow;
2573 else {
2574 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2575 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2576 }
2577 }
2578
2579 if (MSV->MS.TrackOrigins) {
2580 assert(OpOrigin);
2581 if (!Origin) {
2582 Origin = OpOrigin;
2583 } else {
2584 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2585 // No point in adding something that might result in 0 origin value.
2586 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2587 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2588 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2589 }
2590 }
2591 }
2592 return *this;
2593 }
2594
2595 /// Add an application value to the mix.
2596 Combiner &Add(Value *V) {
2597 Value *OpShadow = MSV->getShadow(V);
2598 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2599 return Add(OpShadow, OpOrigin);
2600 }
2601
2602 /// Set the current combined values as the given instruction's shadow
2603 /// and origin.
2604 void Done(Instruction *I) {
2605 if (CombineShadow) {
2606 assert(Shadow);
2607 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2608 MSV->setShadow(I, Shadow);
2609 }
2610 if (MSV->MS.TrackOrigins) {
2611 assert(Origin);
2612 MSV->setOrigin(I, Origin);
2613 }
2614 }
2615
2616 /// Store the current combined value at the specified origin
2617 /// location.
2618 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2619 if (MSV->MS.TrackOrigins) {
2620 assert(Origin);
2621 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2622 }
2623 }
2624 };
2625
2626 using ShadowAndOriginCombiner = Combiner<true>;
2627 using OriginCombiner = Combiner<false>;
2628
2629 /// Propagate origin for arbitrary operation.
2630 void setOriginForNaryOp(Instruction &I) {
2631 if (!MS.TrackOrigins)
2632 return;
2633 IRBuilder<> IRB(&I);
2634 OriginCombiner OC(this, IRB);
2635 for (Use &Op : I.operands())
2636 OC.Add(Op.get());
2637 OC.Done(&I);
2638 }
2639
2640 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2641 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2642 "Vector of pointers is not a valid shadow type");
2643 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2645 : Ty->getPrimitiveSizeInBits();
2646 }
2647
2648 /// Cast between two shadow types, extending or truncating as
2649 /// necessary.
2650 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2651 bool Signed = false) {
2652 Type *srcTy = V->getType();
2653 if (srcTy == dstTy)
2654 return V;
2655 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2656 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2657 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2658 return IRB.CreateICmpNE(V, getCleanShadow(V));
2659
2660 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2661 return IRB.CreateIntCast(V, dstTy, Signed);
2662 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2663 cast<VectorType>(dstTy)->getElementCount() ==
2664 cast<VectorType>(srcTy)->getElementCount())
2665 return IRB.CreateIntCast(V, dstTy, Signed);
2666 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2667 Value *V2 =
2668 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2669 return IRB.CreateBitCast(V2, dstTy);
2670 // TODO: handle struct types.
2671 }
2672
2673 /// Cast an application value to the type of its own shadow.
2674 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2675 Type *ShadowTy = getShadowTy(V);
2676 if (V->getType() == ShadowTy)
2677 return V;
2678 if (V->getType()->isPtrOrPtrVectorTy())
2679 return IRB.CreatePtrToInt(V, ShadowTy);
2680 else
2681 return IRB.CreateBitCast(V, ShadowTy);
2682 }
2683
2684 /// Propagate shadow for arbitrary operation.
2685 void handleShadowOr(Instruction &I) {
2686 IRBuilder<> IRB(&I);
2687 ShadowAndOriginCombiner SC(this, IRB);
2688 for (Use &Op : I.operands())
2689 SC.Add(Op.get());
2690 SC.Done(&I);
2691 }
2692
2693 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2694 // of elements.
2695 //
2696 // For example, suppose we have:
2697 // VectorA: <a1, a2, a3, a4, a5, a6>
2698 // VectorB: <b1, b2, b3, b4, b5, b6>
2699 // ReductionFactor: 3.
2700 // The output would be:
2701 // <a1|a2|a3, a4|a5|a6, b1|b2|b3, b4|b5|b6>
2702 //
2703 // This is convenient for instrumenting horizontal add/sub.
2704 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2705 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2706 Value *VectorA, Value *VectorB) {
2707 assert(isa<FixedVectorType>(VectorA->getType()));
2708 unsigned TotalNumElems =
2709 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2710
2711 if (VectorB) {
2712 assert(VectorA->getType() == VectorB->getType());
2713 TotalNumElems = TotalNumElems * 2;
2714 }
2715
2716 assert(TotalNumElems % ReductionFactor == 0);
2717
2718 Value *Or = nullptr;
2719
2720 IRBuilder<> IRB(&I);
2721 for (unsigned i = 0; i < ReductionFactor; i++) {
2722 SmallVector<int, 16> Mask;
2723 for (unsigned X = 0; X < TotalNumElems; X += ReductionFactor)
2724 Mask.push_back(X + i);
2725
2726 Value *Masked;
2727 if (VectorB)
2728 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2729 else
2730 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2731
2732 if (Or)
2733 Or = IRB.CreateOr(Or, Masked);
2734 else
2735 Or = Masked;
2736 }
2737
2738 return Or;
2739 }
2740
2741 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2742 /// fields.
2743 ///
2744 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2745 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2746 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I) {
2747 assert(I.arg_size() == 1 || I.arg_size() == 2);
2748
2749 assert(I.getType()->isVectorTy());
2750 assert(I.getArgOperand(0)->getType()->isVectorTy());
2751
2752 [[maybe_unused]] FixedVectorType *ParamType =
2753 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2754 assert((I.arg_size() != 2) ||
2755 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2756 [[maybe_unused]] FixedVectorType *ReturnType =
2757 cast<FixedVectorType>(I.getType());
2758 assert(ParamType->getNumElements() * I.arg_size() ==
2759 2 * ReturnType->getNumElements());
2760
2761 IRBuilder<> IRB(&I);
2762
2763 // Horizontal OR of shadow
2764 Value *FirstArgShadow = getShadow(&I, 0);
2765 Value *SecondArgShadow = nullptr;
2766 if (I.arg_size() == 2)
2767 SecondArgShadow = getShadow(&I, 1);
2768
2769 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2770 SecondArgShadow);
2771
2772 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2773
2774 setShadow(&I, OrShadow);
2775 setOriginForNaryOp(I);
2776 }
2777
2778 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2779 /// fields, with the parameters reinterpreted to have elements of a specified
2780 /// width. For example:
2781 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2782 /// conceptually operates on
2783 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2784 /// and can be handled with ReinterpretElemWidth == 16.
2785 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I,
2786 int ReinterpretElemWidth) {
2787 assert(I.arg_size() == 1 || I.arg_size() == 2);
2788
2789 assert(I.getType()->isVectorTy());
2790 assert(I.getArgOperand(0)->getType()->isVectorTy());
2791
2792 FixedVectorType *ParamType =
2793 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2794 assert((I.arg_size() != 2) ||
2795 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2796
2797 [[maybe_unused]] FixedVectorType *ReturnType =
2798 cast<FixedVectorType>(I.getType());
2799 assert(ParamType->getNumElements() * I.arg_size() ==
2800 2 * ReturnType->getNumElements());
2801
2802 IRBuilder<> IRB(&I);
2803
2804 FixedVectorType *ReinterpretShadowTy = nullptr;
2805 assert(isAligned(Align(ReinterpretElemWidth),
2806 ParamType->getPrimitiveSizeInBits()));
2807 ReinterpretShadowTy = FixedVectorType::get(
2808 IRB.getIntNTy(ReinterpretElemWidth),
2809 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2810
2811 // Horizontal OR of shadow
2812 Value *FirstArgShadow = getShadow(&I, 0);
2813 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2814
2815 // If we had two parameters each with an odd number of elements, the total
2816 // number of elements is even, but we have never seen this in extant
2817 // instruction sets, so we enforce that each parameter must have an even
2818 // number of elements.
2820 Align(2),
2821 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2822
2823 Value *SecondArgShadow = nullptr;
2824 if (I.arg_size() == 2) {
2825 SecondArgShadow = getShadow(&I, 1);
2826 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2827 }
2828
2829 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2830 SecondArgShadow);
2831
2832 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2833
2834 setShadow(&I, OrShadow);
2835 setOriginForNaryOp(I);
2836 }
2837
2838 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2839
2840 // Handle multiplication by constant.
2841 //
2842 // Handle a special case of multiplication by constant that may have one or
2843 // more zeros in the lower bits. This makes corresponding number of lower bits
2844 // of the result zero as well. We model it by shifting the other operand
2845 // shadow left by the required number of bits. Effectively, we transform
2846 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2847 // We use multiplication by 2**N instead of shift to cover the case of
2848 // multiplication by 0, which may occur in some elements of a vector operand.
2849 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2850 Value *OtherArg) {
2851 Constant *ShadowMul;
2852 Type *Ty = ConstArg->getType();
2853 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2854 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2855 Type *EltTy = VTy->getElementType();
2857 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2858 if (ConstantInt *Elt =
2860 const APInt &V = Elt->getValue();
2861 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2862 Elements.push_back(ConstantInt::get(EltTy, V2));
2863 } else {
2864 Elements.push_back(ConstantInt::get(EltTy, 1));
2865 }
2866 }
2867 ShadowMul = ConstantVector::get(Elements);
2868 } else {
2869 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2870 const APInt &V = Elt->getValue();
2871 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2872 ShadowMul = ConstantInt::get(Ty, V2);
2873 } else {
2874 ShadowMul = ConstantInt::get(Ty, 1);
2875 }
2876 }
2877
2878 IRBuilder<> IRB(&I);
2879 setShadow(&I,
2880 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2881 setOrigin(&I, getOrigin(OtherArg));
2882 }
2883
2884 void visitMul(BinaryOperator &I) {
2885 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2886 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2887 if (constOp0 && !constOp1)
2888 handleMulByConstant(I, constOp0, I.getOperand(1));
2889 else if (constOp1 && !constOp0)
2890 handleMulByConstant(I, constOp1, I.getOperand(0));
2891 else
2892 handleShadowOr(I);
2893 }
2894
2895 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2896 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2897 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2898 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2899 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2900 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2901
2902 void handleIntegerDiv(Instruction &I) {
2903 IRBuilder<> IRB(&I);
2904 // Strict on the second argument.
2905 insertCheckShadowOf(I.getOperand(1), &I);
2906 setShadow(&I, getShadow(&I, 0));
2907 setOrigin(&I, getOrigin(&I, 0));
2908 }
2909
2910 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2911 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2912 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2913 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2914
2915 // Floating point division is side-effect free. We can not require that the
2916 // divisor is fully initialized and must propagate shadow. See PR37523.
2917 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2918 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2919
2920 /// Instrument == and != comparisons.
2921 ///
2922 /// Sometimes the comparison result is known even if some of the bits of the
2923 /// arguments are not.
2924 void handleEqualityComparison(ICmpInst &I) {
2925 IRBuilder<> IRB(&I);
2926 Value *A = I.getOperand(0);
2927 Value *B = I.getOperand(1);
2928 Value *Sa = getShadow(A);
2929 Value *Sb = getShadow(B);
2930
2931 // Get rid of pointers and vectors of pointers.
2932 // For ints (and vectors of ints), types of A and Sa match,
2933 // and this is a no-op.
2934 A = IRB.CreatePointerCast(A, Sa->getType());
2935 B = IRB.CreatePointerCast(B, Sb->getType());
2936
2937 // A == B <==> (C = A^B) == 0
2938 // A != B <==> (C = A^B) != 0
2939 // Sc = Sa | Sb
2940 Value *C = IRB.CreateXor(A, B);
2941 Value *Sc = IRB.CreateOr(Sa, Sb);
2942 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2943 // Result is defined if one of the following is true
2944 // * there is a defined 1 bit in C
2945 // * C is fully defined
2946 // Si = !(C & ~Sc) && Sc
2948 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2949 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2950 Value *RHS =
2951 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2952 Value *Si = IRB.CreateAnd(LHS, RHS);
2953 Si->setName("_msprop_icmp");
2954 setShadow(&I, Si);
2955 setOriginForNaryOp(I);
2956 }
2957
2958 /// Instrument relational comparisons.
2959 ///
2960 /// This function does exact shadow propagation for all relational
2961 /// comparisons of integers, pointers and vectors of those.
2962 /// FIXME: output seems suboptimal when one of the operands is a constant
2963 void handleRelationalComparisonExact(ICmpInst &I) {
2964 IRBuilder<> IRB(&I);
2965 Value *A = I.getOperand(0);
2966 Value *B = I.getOperand(1);
2967 Value *Sa = getShadow(A);
2968 Value *Sb = getShadow(B);
2969
2970 // Get rid of pointers and vectors of pointers.
2971 // For ints (and vectors of ints), types of A and Sa match,
2972 // and this is a no-op.
2973 A = IRB.CreatePointerCast(A, Sa->getType());
2974 B = IRB.CreatePointerCast(B, Sb->getType());
2975
2976 // Let [a0, a1] be the interval of possible values of A, taking into account
2977 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
2978 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
2979 bool IsSigned = I.isSigned();
2980
2981 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
2982 if (IsSigned) {
2983 // Sign-flip to map from signed range to unsigned range. Relation A vs B
2984 // should be preserved, if checked with `getUnsignedPredicate()`.
2985 // Relationship between Amin, Amax, Bmin, Bmax also will not be
2986 // affected, as they are created by effectively adding/substructing from
2987 // A (or B) a value, derived from shadow, with no overflow, either
2988 // before or after sign flip.
2989 APInt MinVal =
2990 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
2991 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
2992 }
2993 // Minimize undefined bits.
2994 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
2995 Value *Max = IRB.CreateOr(V, S);
2996 return std::make_pair(Min, Max);
2997 };
2998
2999 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3000 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3001 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3002 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3003
3004 Value *Si = IRB.CreateXor(S1, S2);
3005 setShadow(&I, Si);
3006 setOriginForNaryOp(I);
3007 }
3008
3009 /// Instrument signed relational comparisons.
3010 ///
3011 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3012 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3013 void handleSignedRelationalComparison(ICmpInst &I) {
3014 Constant *constOp;
3015 Value *op = nullptr;
3017 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3018 op = I.getOperand(0);
3019 pre = I.getPredicate();
3020 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3021 op = I.getOperand(1);
3022 pre = I.getSwappedPredicate();
3023 } else {
3024 handleShadowOr(I);
3025 return;
3026 }
3027
3028 if ((constOp->isNullValue() &&
3029 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3030 (constOp->isAllOnesValue() &&
3031 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3032 IRBuilder<> IRB(&I);
3033 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3034 "_msprop_icmp_s");
3035 setShadow(&I, Shadow);
3036 setOrigin(&I, getOrigin(op));
3037 } else {
3038 handleShadowOr(I);
3039 }
3040 }
3041
3042 void visitICmpInst(ICmpInst &I) {
3043 if (!ClHandleICmp) {
3044 handleShadowOr(I);
3045 return;
3046 }
3047 if (I.isEquality()) {
3048 handleEqualityComparison(I);
3049 return;
3050 }
3051
3052 assert(I.isRelational());
3053 if (ClHandleICmpExact) {
3054 handleRelationalComparisonExact(I);
3055 return;
3056 }
3057 if (I.isSigned()) {
3058 handleSignedRelationalComparison(I);
3059 return;
3060 }
3061
3062 assert(I.isUnsigned());
3063 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3064 handleRelationalComparisonExact(I);
3065 return;
3066 }
3067
3068 handleShadowOr(I);
3069 }
3070
3071 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3072
3073 void handleShift(BinaryOperator &I) {
3074 IRBuilder<> IRB(&I);
3075 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3076 // Otherwise perform the same shift on S1.
3077 Value *S1 = getShadow(&I, 0);
3078 Value *S2 = getShadow(&I, 1);
3079 Value *S2Conv =
3080 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3081 Value *V2 = I.getOperand(1);
3082 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3083 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3084 setOriginForNaryOp(I);
3085 }
3086
3087 void visitShl(BinaryOperator &I) { handleShift(I); }
3088 void visitAShr(BinaryOperator &I) { handleShift(I); }
3089 void visitLShr(BinaryOperator &I) { handleShift(I); }
3090
3091 void handleFunnelShift(IntrinsicInst &I) {
3092 IRBuilder<> IRB(&I);
3093 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3094 // Otherwise perform the same shift on S0 and S1.
3095 Value *S0 = getShadow(&I, 0);
3096 Value *S1 = getShadow(&I, 1);
3097 Value *S2 = getShadow(&I, 2);
3098 Value *S2Conv =
3099 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3100 Value *V2 = I.getOperand(2);
3101 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3102 {S0, S1, V2});
3103 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3104 setOriginForNaryOp(I);
3105 }
3106
3107 /// Instrument llvm.memmove
3108 ///
3109 /// At this point we don't know if llvm.memmove will be inlined or not.
3110 /// If we don't instrument it and it gets inlined,
3111 /// our interceptor will not kick in and we will lose the memmove.
3112 /// If we instrument the call here, but it does not get inlined,
3113 /// we will memove the shadow twice: which is bad in case
3114 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3115 ///
3116 /// Similar situation exists for memcpy and memset.
3117 void visitMemMoveInst(MemMoveInst &I) {
3118 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3119 IRBuilder<> IRB(&I);
3120 IRB.CreateCall(MS.MemmoveFn,
3121 {I.getArgOperand(0), I.getArgOperand(1),
3122 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3124 }
3125
3126 /// Instrument memcpy
3127 ///
3128 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3129 /// unfortunate as it may slowdown small constant memcpys.
3130 /// FIXME: consider doing manual inline for small constant sizes and proper
3131 /// alignment.
3132 ///
3133 /// Note: This also handles memcpy.inline, which promises no calls to external
3134 /// functions as an optimization. However, with instrumentation enabled this
3135 /// is difficult to promise; additionally, we know that the MSan runtime
3136 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3137 /// instrumentation it's safe to turn memcpy.inline into a call to
3138 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3139 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3140 void visitMemCpyInst(MemCpyInst &I) {
3141 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3142 IRBuilder<> IRB(&I);
3143 IRB.CreateCall(MS.MemcpyFn,
3144 {I.getArgOperand(0), I.getArgOperand(1),
3145 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3147 }
3148
3149 // Same as memcpy.
3150 void visitMemSetInst(MemSetInst &I) {
3151 IRBuilder<> IRB(&I);
3152 IRB.CreateCall(
3153 MS.MemsetFn,
3154 {I.getArgOperand(0),
3155 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3156 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3158 }
3159
3160 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3161
3162 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3163
3164 /// Handle vector store-like intrinsics.
3165 ///
3166 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3167 /// has 1 pointer argument and 1 vector argument, returns void.
3168 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3169 assert(I.arg_size() == 2);
3170
3171 IRBuilder<> IRB(&I);
3172 Value *Addr = I.getArgOperand(0);
3173 Value *Shadow = getShadow(&I, 1);
3174 Value *ShadowPtr, *OriginPtr;
3175
3176 // We don't know the pointer alignment (could be unaligned SSE store!).
3177 // Have to assume to worst case.
3178 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3179 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3180 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3181
3183 insertCheckShadowOf(Addr, &I);
3184
3185 // FIXME: factor out common code from materializeStores
3186 if (MS.TrackOrigins)
3187 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3188 return true;
3189 }
3190
3191 /// Handle vector load-like intrinsics.
3192 ///
3193 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3194 /// has 1 pointer argument, returns a vector.
3195 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3196 assert(I.arg_size() == 1);
3197
3198 IRBuilder<> IRB(&I);
3199 Value *Addr = I.getArgOperand(0);
3200
3201 Type *ShadowTy = getShadowTy(&I);
3202 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3203 if (PropagateShadow) {
3204 // We don't know the pointer alignment (could be unaligned SSE load!).
3205 // Have to assume to worst case.
3206 const Align Alignment = Align(1);
3207 std::tie(ShadowPtr, OriginPtr) =
3208 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3209 setShadow(&I,
3210 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3211 } else {
3212 setShadow(&I, getCleanShadow(&I));
3213 }
3214
3216 insertCheckShadowOf(Addr, &I);
3217
3218 if (MS.TrackOrigins) {
3219 if (PropagateShadow)
3220 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3221 else
3222 setOrigin(&I, getCleanOrigin());
3223 }
3224 return true;
3225 }
3226
3227 /// Handle (SIMD arithmetic)-like intrinsics.
3228 ///
3229 /// Instrument intrinsics with any number of arguments of the same type [*],
3230 /// equal to the return type, plus a specified number of trailing flags of
3231 /// any type.
3232 ///
3233 /// [*] The type should be simple (no aggregates or pointers; vectors are
3234 /// fine).
3235 ///
3236 /// Caller guarantees that this intrinsic does not access memory.
3237 ///
3238 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3239 /// by this handler. See horizontalReduce().
3240 ///
3241 /// TODO: permutation intrinsics are also often incorrectly matched.
3242 [[maybe_unused]] bool
3243 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3244 unsigned int trailingFlags) {
3245 Type *RetTy = I.getType();
3246 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3247 return false;
3248
3249 unsigned NumArgOperands = I.arg_size();
3250 assert(NumArgOperands >= trailingFlags);
3251 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3252 Type *Ty = I.getArgOperand(i)->getType();
3253 if (Ty != RetTy)
3254 return false;
3255 }
3256
3257 IRBuilder<> IRB(&I);
3258 ShadowAndOriginCombiner SC(this, IRB);
3259 for (unsigned i = 0; i < NumArgOperands; ++i)
3260 SC.Add(I.getArgOperand(i));
3261 SC.Done(&I);
3262
3263 return true;
3264 }
3265
3266 /// Returns whether it was able to heuristically instrument unknown
3267 /// intrinsics.
3268 ///
3269 /// The main purpose of this code is to do something reasonable with all
3270 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3271 /// We recognize several classes of intrinsics by their argument types and
3272 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3273 /// sure that we know what the intrinsic does.
3274 ///
3275 /// We special-case intrinsics where this approach fails. See llvm.bswap
3276 /// handling as an example of that.
3277 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3278 unsigned NumArgOperands = I.arg_size();
3279 if (NumArgOperands == 0)
3280 return false;
3281
3282 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3283 I.getArgOperand(1)->getType()->isVectorTy() &&
3284 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3285 // This looks like a vector store.
3286 return handleVectorStoreIntrinsic(I);
3287 }
3288
3289 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3290 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3291 // This looks like a vector load.
3292 return handleVectorLoadIntrinsic(I);
3293 }
3294
3295 if (I.doesNotAccessMemory())
3296 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3297 return true;
3298
3299 // FIXME: detect and handle SSE maskstore/maskload?
3300 // Some cases are now handled in handleAVXMasked{Load,Store}.
3301 return false;
3302 }
3303
3304 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3305 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3307 dumpInst(I);
3308
3309 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3310 << "\n");
3311 return true;
3312 } else
3313 return false;
3314 }
3315
3316 void handleInvariantGroup(IntrinsicInst &I) {
3317 setShadow(&I, getShadow(&I, 0));
3318 setOrigin(&I, getOrigin(&I, 0));
3319 }
3320
3321 void handleLifetimeStart(IntrinsicInst &I) {
3322 if (!PoisonStack)
3323 return;
3324 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3325 if (AI)
3326 LifetimeStartList.push_back(std::make_pair(&I, AI));
3327 }
3328
3329 void handleBswap(IntrinsicInst &I) {
3330 IRBuilder<> IRB(&I);
3331 Value *Op = I.getArgOperand(0);
3332 Type *OpType = Op->getType();
3333 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3334 getShadow(Op)));
3335 setOrigin(&I, getOrigin(Op));
3336 }
3337
3338 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3339 // and a 1. If the input is all zero, it is fully initialized iff
3340 // !is_zero_poison.
3341 //
3342 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3343 // concrete value 0/1, and ? is an uninitialized bit:
3344 // - 0001 0??? is fully initialized
3345 // - 000? ???? is fully uninitialized (*)
3346 // - ???? ???? is fully uninitialized
3347 // - 0000 0000 is fully uninitialized if is_zero_poison,
3348 // fully initialized otherwise
3349 //
3350 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3351 // only need to poison 4 bits.
3352 //
3353 // OutputShadow =
3354 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3355 // || (is_zero_poison && AllZeroSrc)
3356 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3357 IRBuilder<> IRB(&I);
3358 Value *Src = I.getArgOperand(0);
3359 Value *SrcShadow = getShadow(Src);
3360
3361 Value *False = IRB.getInt1(false);
3362 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3363 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3364 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3365 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3366
3367 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3368 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3369
3370 Value *NotAllZeroShadow =
3371 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3372 Value *OutputShadow =
3373 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3374
3375 // If zero poison is requested, mix in with the shadow
3376 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3377 if (!IsZeroPoison->isZeroValue()) {
3378 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3379 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3380 }
3381
3382 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3383
3384 setShadow(&I, OutputShadow);
3385 setOriginForNaryOp(I);
3386 }
3387
3388 /// Handle Arm NEON vector convert intrinsics.
3389 ///
3390 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3391 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3392 ///
3393 /// For x86 SSE vector convert intrinsics, see
3394 /// handleSSEVectorConvertIntrinsic().
3395 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3396 assert(I.arg_size() == 1);
3397
3398 IRBuilder<> IRB(&I);
3399 Value *S0 = getShadow(&I, 0);
3400
3401 /// For scalars:
3402 /// Since they are converting from floating-point to integer, the output is
3403 /// - fully uninitialized if *any* bit of the input is uninitialized
3404 /// - fully ininitialized if all bits of the input are ininitialized
3405 /// We apply the same principle on a per-field basis for vectors.
3406 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3407 getShadowTy(&I));
3408 setShadow(&I, OutShadow);
3409 setOriginForNaryOp(I);
3410 }
3411
3412 /// Some instructions have additional zero-elements in the return type
3413 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3414 ///
3415 /// This function will return a vector type with the same number of elements
3416 /// as the input, but same per-element width as the return value e.g.,
3417 /// <8 x i8>.
3418 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3419 assert(isa<FixedVectorType>(getShadowTy(&I)));
3420 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3421
3422 // TODO: generalize beyond 2x?
3423 if (ShadowType->getElementCount() ==
3424 cast<VectorType>(Src->getType())->getElementCount() * 2)
3425 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3426
3427 assert(ShadowType->getElementCount() ==
3428 cast<VectorType>(Src->getType())->getElementCount());
3429
3430 return ShadowType;
3431 }
3432
3433 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3434 /// to match the length of the shadow for the instruction.
3435 /// If scalar types of the vectors are different, it will use the type of the
3436 /// input vector.
3437 /// This is more type-safe than CreateShadowCast().
3438 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3439 IRBuilder<> IRB(&I);
3441 assert(isa<FixedVectorType>(I.getType()));
3442
3443 Value *FullShadow = getCleanShadow(&I);
3444 unsigned ShadowNumElems =
3445 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3446 unsigned FullShadowNumElems =
3447 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3448
3449 assert((ShadowNumElems == FullShadowNumElems) ||
3450 (ShadowNumElems * 2 == FullShadowNumElems));
3451
3452 if (ShadowNumElems == FullShadowNumElems) {
3453 FullShadow = Shadow;
3454 } else {
3455 // TODO: generalize beyond 2x?
3456 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3457 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3458
3459 // Append zeros
3460 FullShadow =
3461 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3462 }
3463
3464 return FullShadow;
3465 }
3466
3467 /// Handle x86 SSE vector conversion.
3468 ///
3469 /// e.g., single-precision to half-precision conversion:
3470 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3471 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3472 ///
3473 /// floating-point to integer:
3474 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3475 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3476 ///
3477 /// Note: if the output has more elements, they are zero-initialized (and
3478 /// therefore the shadow will also be initialized).
3479 ///
3480 /// This differs from handleSSEVectorConvertIntrinsic() because it
3481 /// propagates uninitialized shadow (instead of checking the shadow).
3482 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3483 bool HasRoundingMode) {
3484 if (HasRoundingMode) {
3485 assert(I.arg_size() == 2);
3486 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3487 assert(RoundingMode->getType()->isIntegerTy());
3488 } else {
3489 assert(I.arg_size() == 1);
3490 }
3491
3492 Value *Src = I.getArgOperand(0);
3493 assert(Src->getType()->isVectorTy());
3494
3495 // The return type might have more elements than the input.
3496 // Temporarily shrink the return type's number of elements.
3497 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3498
3499 IRBuilder<> IRB(&I);
3500 Value *S0 = getShadow(&I, 0);
3501
3502 /// For scalars:
3503 /// Since they are converting to and/or from floating-point, the output is:
3504 /// - fully uninitialized if *any* bit of the input is uninitialized
3505 /// - fully ininitialized if all bits of the input are ininitialized
3506 /// We apply the same principle on a per-field basis for vectors.
3507 Value *Shadow =
3508 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3509
3510 // The return type might have more elements than the input.
3511 // Extend the return type back to its original width if necessary.
3512 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3513
3514 setShadow(&I, FullShadow);
3515 setOriginForNaryOp(I);
3516 }
3517
3518 // Instrument x86 SSE vector convert intrinsic.
3519 //
3520 // This function instruments intrinsics like cvtsi2ss:
3521 // %Out = int_xxx_cvtyyy(%ConvertOp)
3522 // or
3523 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3524 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3525 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3526 // elements from \p CopyOp.
3527 // In most cases conversion involves floating-point value which may trigger a
3528 // hardware exception when not fully initialized. For this reason we require
3529 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3530 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3531 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3532 // return a fully initialized value.
3533 //
3534 // For Arm NEON vector convert intrinsics, see
3535 // handleNEONVectorConvertIntrinsic().
3536 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3537 bool HasRoundingMode = false) {
3538 IRBuilder<> IRB(&I);
3539 Value *CopyOp, *ConvertOp;
3540
3541 assert((!HasRoundingMode ||
3542 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3543 "Invalid rounding mode");
3544
3545 switch (I.arg_size() - HasRoundingMode) {
3546 case 2:
3547 CopyOp = I.getArgOperand(0);
3548 ConvertOp = I.getArgOperand(1);
3549 break;
3550 case 1:
3551 ConvertOp = I.getArgOperand(0);
3552 CopyOp = nullptr;
3553 break;
3554 default:
3555 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3556 }
3557
3558 // The first *NumUsedElements* elements of ConvertOp are converted to the
3559 // same number of output elements. The rest of the output is copied from
3560 // CopyOp, or (if not available) filled with zeroes.
3561 // Combine shadow for elements of ConvertOp that are used in this operation,
3562 // and insert a check.
3563 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3564 // int->any conversion.
3565 Value *ConvertShadow = getShadow(ConvertOp);
3566 Value *AggShadow = nullptr;
3567 if (ConvertOp->getType()->isVectorTy()) {
3568 AggShadow = IRB.CreateExtractElement(
3569 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3570 for (int i = 1; i < NumUsedElements; ++i) {
3571 Value *MoreShadow = IRB.CreateExtractElement(
3572 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3573 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3574 }
3575 } else {
3576 AggShadow = ConvertShadow;
3577 }
3578 assert(AggShadow->getType()->isIntegerTy());
3579 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3580
3581 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3582 // ConvertOp.
3583 if (CopyOp) {
3584 assert(CopyOp->getType() == I.getType());
3585 assert(CopyOp->getType()->isVectorTy());
3586 Value *ResultShadow = getShadow(CopyOp);
3587 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3588 for (int i = 0; i < NumUsedElements; ++i) {
3589 ResultShadow = IRB.CreateInsertElement(
3590 ResultShadow, ConstantInt::getNullValue(EltTy),
3591 ConstantInt::get(IRB.getInt32Ty(), i));
3592 }
3593 setShadow(&I, ResultShadow);
3594 setOrigin(&I, getOrigin(CopyOp));
3595 } else {
3596 setShadow(&I, getCleanShadow(&I));
3597 setOrigin(&I, getCleanOrigin());
3598 }
3599 }
3600
3601 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3602 // zeroes if it is zero, and all ones otherwise.
3603 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3604 if (S->getType()->isVectorTy())
3605 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3606 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3607 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3608 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3609 }
3610
3611 // Given a vector, extract its first element, and return all
3612 // zeroes if it is zero, and all ones otherwise.
3613 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3614 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3615 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3616 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3617 }
3618
3619 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3620 Type *T = S->getType();
3621 assert(T->isVectorTy());
3622 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3623 return IRB.CreateSExt(S2, T);
3624 }
3625
3626 // Instrument vector shift intrinsic.
3627 //
3628 // This function instruments intrinsics like int_x86_avx2_psll_w.
3629 // Intrinsic shifts %In by %ShiftSize bits.
3630 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3631 // size, and the rest is ignored. Behavior is defined even if shift size is
3632 // greater than register (or field) width.
3633 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3634 assert(I.arg_size() == 2);
3635 IRBuilder<> IRB(&I);
3636 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3637 // Otherwise perform the same shift on S1.
3638 Value *S1 = getShadow(&I, 0);
3639 Value *S2 = getShadow(&I, 1);
3640 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3641 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3642 Value *V1 = I.getOperand(0);
3643 Value *V2 = I.getOperand(1);
3644 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3645 {IRB.CreateBitCast(S1, V1->getType()), V2});
3646 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3647 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3648 setOriginForNaryOp(I);
3649 }
3650
3651 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3652 // vectors.
3653 Type *getMMXVectorTy(unsigned EltSizeInBits,
3654 unsigned X86_MMXSizeInBits = 64) {
3655 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3656 "Illegal MMX vector element size");
3657 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3658 X86_MMXSizeInBits / EltSizeInBits);
3659 }
3660
3661 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3662 // intrinsic.
3663 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3664 switch (id) {
3665 case Intrinsic::x86_sse2_packsswb_128:
3666 case Intrinsic::x86_sse2_packuswb_128:
3667 return Intrinsic::x86_sse2_packsswb_128;
3668
3669 case Intrinsic::x86_sse2_packssdw_128:
3670 case Intrinsic::x86_sse41_packusdw:
3671 return Intrinsic::x86_sse2_packssdw_128;
3672
3673 case Intrinsic::x86_avx2_packsswb:
3674 case Intrinsic::x86_avx2_packuswb:
3675 return Intrinsic::x86_avx2_packsswb;
3676
3677 case Intrinsic::x86_avx2_packssdw:
3678 case Intrinsic::x86_avx2_packusdw:
3679 return Intrinsic::x86_avx2_packssdw;
3680
3681 case Intrinsic::x86_mmx_packsswb:
3682 case Intrinsic::x86_mmx_packuswb:
3683 return Intrinsic::x86_mmx_packsswb;
3684
3685 case Intrinsic::x86_mmx_packssdw:
3686 return Intrinsic::x86_mmx_packssdw;
3687
3688 case Intrinsic::x86_avx512_packssdw_512:
3689 case Intrinsic::x86_avx512_packusdw_512:
3690 return Intrinsic::x86_avx512_packssdw_512;
3691
3692 case Intrinsic::x86_avx512_packsswb_512:
3693 case Intrinsic::x86_avx512_packuswb_512:
3694 return Intrinsic::x86_avx512_packsswb_512;
3695
3696 default:
3697 llvm_unreachable("unexpected intrinsic id");
3698 }
3699 }
3700
3701 // Instrument vector pack intrinsic.
3702 //
3703 // This function instruments intrinsics like x86_mmx_packsswb, that
3704 // packs elements of 2 input vectors into half as many bits with saturation.
3705 // Shadow is propagated with the signed variant of the same intrinsic applied
3706 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3707 // MMXEltSizeInBits is used only for x86mmx arguments.
3708 //
3709 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3710 void handleVectorPackIntrinsic(IntrinsicInst &I,
3711 unsigned MMXEltSizeInBits = 0) {
3712 assert(I.arg_size() == 2);
3713 IRBuilder<> IRB(&I);
3714 Value *S1 = getShadow(&I, 0);
3715 Value *S2 = getShadow(&I, 1);
3716 assert(S1->getType()->isVectorTy());
3717
3718 // SExt and ICmpNE below must apply to individual elements of input vectors.
3719 // In case of x86mmx arguments, cast them to appropriate vector types and
3720 // back.
3721 Type *T =
3722 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3723 if (MMXEltSizeInBits) {
3724 S1 = IRB.CreateBitCast(S1, T);
3725 S2 = IRB.CreateBitCast(S2, T);
3726 }
3727 Value *S1_ext =
3729 Value *S2_ext =
3731 if (MMXEltSizeInBits) {
3732 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3733 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3734 }
3735
3736 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3737 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3738 "_msprop_vector_pack");
3739 if (MMXEltSizeInBits)
3740 S = IRB.CreateBitCast(S, getShadowTy(&I));
3741 setShadow(&I, S);
3742 setOriginForNaryOp(I);
3743 }
3744
3745 // Convert `Mask` into `<n x i1>`.
3746 Constant *createDppMask(unsigned Width, unsigned Mask) {
3748 for (auto &M : R) {
3749 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3750 Mask >>= 1;
3751 }
3752 return ConstantVector::get(R);
3753 }
3754
3755 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3756 // arg is poisoned, entire dot product is poisoned.
3757 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3758 unsigned DstMask) {
3759 const unsigned Width =
3760 cast<FixedVectorType>(S->getType())->getNumElements();
3761
3762 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3764 Value *SElem = IRB.CreateOrReduce(S);
3765 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3766 Value *DstMaskV = createDppMask(Width, DstMask);
3767
3768 return IRB.CreateSelect(
3769 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3770 }
3771
3772 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3773 //
3774 // 2 and 4 element versions produce single scalar of dot product, and then
3775 // puts it into elements of output vector, selected by 4 lowest bits of the
3776 // mask. Top 4 bits of the mask control which elements of input to use for dot
3777 // product.
3778 //
3779 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3780 // mask. According to the spec it just operates as 4 element version on first
3781 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3782 // output.
3783 void handleDppIntrinsic(IntrinsicInst &I) {
3784 IRBuilder<> IRB(&I);
3785
3786 Value *S0 = getShadow(&I, 0);
3787 Value *S1 = getShadow(&I, 1);
3788 Value *S = IRB.CreateOr(S0, S1);
3789
3790 const unsigned Width =
3791 cast<FixedVectorType>(S->getType())->getNumElements();
3792 assert(Width == 2 || Width == 4 || Width == 8);
3793
3794 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3795 const unsigned SrcMask = Mask >> 4;
3796 const unsigned DstMask = Mask & 0xf;
3797
3798 // Calculate shadow as `<n x i1>`.
3799 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3800 if (Width == 8) {
3801 // First 4 elements of shadow are already calculated. `makeDppShadow`
3802 // operats on 32 bit masks, so we can just shift masks, and repeat.
3803 SI1 = IRB.CreateOr(
3804 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3805 }
3806 // Extend to real size of shadow, poisoning either all or none bits of an
3807 // element.
3808 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3809
3810 setShadow(&I, S);
3811 setOriginForNaryOp(I);
3812 }
3813
3814 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3815 C = CreateAppToShadowCast(IRB, C);
3816 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3817 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3818 C = IRB.CreateAShr(C, ElSize - 1);
3819 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3820 return IRB.CreateTrunc(C, FVT);
3821 }
3822
3823 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3824 void handleBlendvIntrinsic(IntrinsicInst &I) {
3825 Value *C = I.getOperand(2);
3826 Value *T = I.getOperand(1);
3827 Value *F = I.getOperand(0);
3828
3829 Value *Sc = getShadow(&I, 2);
3830 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3831
3832 {
3833 IRBuilder<> IRB(&I);
3834 // Extract top bit from condition and its shadow.
3835 C = convertBlendvToSelectMask(IRB, C);
3836 Sc = convertBlendvToSelectMask(IRB, Sc);
3837
3838 setShadow(C, Sc);
3839 setOrigin(C, Oc);
3840 }
3841
3842 handleSelectLikeInst(I, C, T, F);
3843 }
3844
3845 // Instrument sum-of-absolute-differences intrinsic.
3846 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3847 const unsigned SignificantBitsPerResultElement = 16;
3848 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3849 unsigned ZeroBitsPerResultElement =
3850 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3851
3852 IRBuilder<> IRB(&I);
3853 auto *Shadow0 = getShadow(&I, 0);
3854 auto *Shadow1 = getShadow(&I, 1);
3855 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3856 S = IRB.CreateBitCast(S, ResTy);
3857 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3858 ResTy);
3859 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3860 S = IRB.CreateBitCast(S, getShadowTy(&I));
3861 setShadow(&I, S);
3862 setOriginForNaryOp(I);
3863 }
3864
3865 // Instrument multiply-add(-accumulate)? intrinsics.
3866 //
3867 // e.g., Two operands:
3868 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3869 //
3870 // Two operands which require an EltSizeInBits override:
3871 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3872 //
3873 // Three operands:
3874 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3875 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3876 // (this is equivalent to multiply-add on %a and %b, followed by
3877 // adding/"accumulating" %s. "Accumulation" stores the result in one
3878 // of the source registers, but this accumulate vs. add distinction
3879 // is lost when dealing with LLVM intrinsics.)
3880 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3881 unsigned EltSizeInBits = 0) {
3882 IRBuilder<> IRB(&I);
3883
3884 [[maybe_unused]] FixedVectorType *ReturnType =
3885 cast<FixedVectorType>(I.getType());
3886 assert(isa<FixedVectorType>(ReturnType));
3887
3888 // Vectors A and B, and shadows
3889 Value *Va = nullptr;
3890 Value *Vb = nullptr;
3891 Value *Sa = nullptr;
3892 Value *Sb = nullptr;
3893
3894 assert(I.arg_size() == 2 || I.arg_size() == 3);
3895 if (I.arg_size() == 2) {
3896 Va = I.getOperand(0);
3897 Vb = I.getOperand(1);
3898
3899 Sa = getShadow(&I, 0);
3900 Sb = getShadow(&I, 1);
3901 } else if (I.arg_size() == 3) {
3902 // Operand 0 is the accumulator. We will deal with that below.
3903 Va = I.getOperand(1);
3904 Vb = I.getOperand(2);
3905
3906 Sa = getShadow(&I, 1);
3907 Sb = getShadow(&I, 2);
3908 }
3909
3910 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3911 assert(ParamType == Vb->getType());
3912
3913 assert(ParamType->getPrimitiveSizeInBits() ==
3914 ReturnType->getPrimitiveSizeInBits());
3915
3916 if (I.arg_size() == 3) {
3917 [[maybe_unused]] auto *AccumulatorType =
3918 cast<FixedVectorType>(I.getOperand(0)->getType());
3919 assert(AccumulatorType == ReturnType);
3920 }
3921
3922 FixedVectorType *ImplicitReturnType = ReturnType;
3923 // Step 1: instrument multiplication of corresponding vector elements
3924 if (EltSizeInBits) {
3925 ImplicitReturnType = cast<FixedVectorType>(
3926 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3927 ParamType->getPrimitiveSizeInBits()));
3928 ParamType = cast<FixedVectorType>(
3929 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3930
3931 Va = IRB.CreateBitCast(Va, ParamType);
3932 Vb = IRB.CreateBitCast(Vb, ParamType);
3933
3934 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
3935 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
3936 } else {
3937 assert(ParamType->getNumElements() ==
3938 ReturnType->getNumElements() * ReductionFactor);
3939 }
3940
3941 // Multiplying an *initialized* zero by an uninitialized element results in
3942 // an initialized zero element.
3943 //
3944 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
3945 // results in an unpoisoned value. We can therefore adapt the visitAnd()
3946 // instrumentation:
3947 // OutShadow = (SaNonZero & SbNonZero)
3948 // | (VaNonZero & SbNonZero)
3949 // | (SaNonZero & VbNonZero)
3950 // where non-zero is checked on a per-element basis (not per bit).
3951 Value *SZero = Constant::getNullValue(Va->getType());
3952 Value *VZero = Constant::getNullValue(Sa->getType());
3953 Value *SaNonZero = IRB.CreateICmpNE(Sa, SZero);
3954 Value *SbNonZero = IRB.CreateICmpNE(Sb, SZero);
3955 Value *VaNonZero = IRB.CreateICmpNE(Va, VZero);
3956 Value *VbNonZero = IRB.CreateICmpNE(Vb, VZero);
3957
3958 Value *SaAndSbNonZero = IRB.CreateAnd(SaNonZero, SbNonZero);
3959 Value *VaAndSbNonZero = IRB.CreateAnd(VaNonZero, SbNonZero);
3960 Value *SaAndVbNonZero = IRB.CreateAnd(SaNonZero, VbNonZero);
3961
3962 // Each element of the vector is represented by a single bit (poisoned or
3963 // not) e.g., <8 x i1>.
3964 Value *And = IRB.CreateOr({SaAndSbNonZero, VaAndSbNonZero, SaAndVbNonZero});
3965
3966 // Extend <8 x i1> to <8 x i16>.
3967 // (The real pmadd intrinsic would have computed intermediate values of
3968 // <8 x i32>, but that is irrelevant for our shadow purposes because we
3969 // consider each element to be either fully initialized or fully
3970 // uninitialized.)
3971 And = IRB.CreateSExt(And, Sa->getType());
3972
3973 // Step 2: instrument horizontal add
3974 // We don't need bit-precise horizontalReduce because we only want to check
3975 // if each pair/quad of elements is fully zero.
3976 // Cast to <4 x i32>.
3977 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
3978
3979 // Compute <4 x i1>, then extend back to <4 x i32>.
3980 Value *OutShadow = IRB.CreateSExt(
3981 IRB.CreateICmpNE(Horizontal,
3982 Constant::getNullValue(Horizontal->getType())),
3983 ImplicitReturnType);
3984
3985 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
3986 // AVX, it is already correct).
3987 if (EltSizeInBits)
3988 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
3989
3990 // Step 3 (if applicable): instrument accumulator
3991 if (I.arg_size() == 3)
3992 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
3993
3994 setShadow(&I, OutShadow);
3995 setOriginForNaryOp(I);
3996 }
3997
3998 // Instrument compare-packed intrinsic.
3999 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4000 // all-ones shadow.
4001 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
4002 IRBuilder<> IRB(&I);
4003 Type *ResTy = getShadowTy(&I);
4004 auto *Shadow0 = getShadow(&I, 0);
4005 auto *Shadow1 = getShadow(&I, 1);
4006 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4007 Value *S = IRB.CreateSExt(
4008 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4009 setShadow(&I, S);
4010 setOriginForNaryOp(I);
4011 }
4012
4013 // Instrument compare-scalar intrinsic.
4014 // This handles both cmp* intrinsics which return the result in the first
4015 // element of a vector, and comi* which return the result as i32.
4016 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4017 IRBuilder<> IRB(&I);
4018 auto *Shadow0 = getShadow(&I, 0);
4019 auto *Shadow1 = getShadow(&I, 1);
4020 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4021 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4022 setShadow(&I, S);
4023 setOriginForNaryOp(I);
4024 }
4025
4026 // Instrument generic vector reduction intrinsics
4027 // by ORing together all their fields.
4028 //
4029 // If AllowShadowCast is true, the return type does not need to be the same
4030 // type as the fields
4031 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4032 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4033 assert(I.arg_size() == 1);
4034
4035 IRBuilder<> IRB(&I);
4036 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4037 if (AllowShadowCast)
4038 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4039 else
4040 assert(S->getType() == getShadowTy(&I));
4041 setShadow(&I, S);
4042 setOriginForNaryOp(I);
4043 }
4044
4045 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4046 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4047 // %a1)
4048 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4049 //
4050 // The type of the return value, initial starting value, and elements of the
4051 // vector must be identical.
4052 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4053 assert(I.arg_size() == 2);
4054
4055 IRBuilder<> IRB(&I);
4056 Value *Shadow0 = getShadow(&I, 0);
4057 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4058 assert(Shadow0->getType() == Shadow1->getType());
4059 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4060 assert(S->getType() == getShadowTy(&I));
4061 setShadow(&I, S);
4062 setOriginForNaryOp(I);
4063 }
4064
4065 // Instrument vector.reduce.or intrinsic.
4066 // Valid (non-poisoned) set bits in the operand pull low the
4067 // corresponding shadow bits.
4068 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4069 assert(I.arg_size() == 1);
4070
4071 IRBuilder<> IRB(&I);
4072 Value *OperandShadow = getShadow(&I, 0);
4073 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4074 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4075 // Bit N is clean if any field's bit N is 1 and unpoison
4076 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4077 // Otherwise, it is clean if every field's bit N is unpoison
4078 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4079 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4080
4081 setShadow(&I, S);
4082 setOrigin(&I, getOrigin(&I, 0));
4083 }
4084
4085 // Instrument vector.reduce.and intrinsic.
4086 // Valid (non-poisoned) unset bits in the operand pull down the
4087 // corresponding shadow bits.
4088 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4089 assert(I.arg_size() == 1);
4090
4091 IRBuilder<> IRB(&I);
4092 Value *OperandShadow = getShadow(&I, 0);
4093 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4094 // Bit N is clean if any field's bit N is 0 and unpoison
4095 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4096 // Otherwise, it is clean if every field's bit N is unpoison
4097 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4098 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4099
4100 setShadow(&I, S);
4101 setOrigin(&I, getOrigin(&I, 0));
4102 }
4103
4104 void handleStmxcsr(IntrinsicInst &I) {
4105 IRBuilder<> IRB(&I);
4106 Value *Addr = I.getArgOperand(0);
4107 Type *Ty = IRB.getInt32Ty();
4108 Value *ShadowPtr =
4109 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4110
4111 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4112
4114 insertCheckShadowOf(Addr, &I);
4115 }
4116
4117 void handleLdmxcsr(IntrinsicInst &I) {
4118 if (!InsertChecks)
4119 return;
4120
4121 IRBuilder<> IRB(&I);
4122 Value *Addr = I.getArgOperand(0);
4123 Type *Ty = IRB.getInt32Ty();
4124 const Align Alignment = Align(1);
4125 Value *ShadowPtr, *OriginPtr;
4126 std::tie(ShadowPtr, OriginPtr) =
4127 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4128
4130 insertCheckShadowOf(Addr, &I);
4131
4132 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4133 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4134 : getCleanOrigin();
4135 insertCheckShadow(Shadow, Origin, &I);
4136 }
4137
4138 void handleMaskedExpandLoad(IntrinsicInst &I) {
4139 IRBuilder<> IRB(&I);
4140 Value *Ptr = I.getArgOperand(0);
4141 MaybeAlign Align = I.getParamAlign(0);
4142 Value *Mask = I.getArgOperand(1);
4143 Value *PassThru = I.getArgOperand(2);
4144
4146 insertCheckShadowOf(Ptr, &I);
4147 insertCheckShadowOf(Mask, &I);
4148 }
4149
4150 if (!PropagateShadow) {
4151 setShadow(&I, getCleanShadow(&I));
4152 setOrigin(&I, getCleanOrigin());
4153 return;
4154 }
4155
4156 Type *ShadowTy = getShadowTy(&I);
4157 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4158 auto [ShadowPtr, OriginPtr] =
4159 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4160
4161 Value *Shadow =
4162 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4163 getShadow(PassThru), "_msmaskedexpload");
4164
4165 setShadow(&I, Shadow);
4166
4167 // TODO: Store origins.
4168 setOrigin(&I, getCleanOrigin());
4169 }
4170
4171 void handleMaskedCompressStore(IntrinsicInst &I) {
4172 IRBuilder<> IRB(&I);
4173 Value *Values = I.getArgOperand(0);
4174 Value *Ptr = I.getArgOperand(1);
4175 MaybeAlign Align = I.getParamAlign(1);
4176 Value *Mask = I.getArgOperand(2);
4177
4179 insertCheckShadowOf(Ptr, &I);
4180 insertCheckShadowOf(Mask, &I);
4181 }
4182
4183 Value *Shadow = getShadow(Values);
4184 Type *ElementShadowTy =
4185 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4186 auto [ShadowPtr, OriginPtrs] =
4187 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4188
4189 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4190
4191 // TODO: Store origins.
4192 }
4193
4194 void handleMaskedGather(IntrinsicInst &I) {
4195 IRBuilder<> IRB(&I);
4196 Value *Ptrs = I.getArgOperand(0);
4197 const Align Alignment(
4198 cast<ConstantInt>(I.getArgOperand(1))->getZExtValue());
4199 Value *Mask = I.getArgOperand(2);
4200 Value *PassThru = I.getArgOperand(3);
4201
4202 Type *PtrsShadowTy = getShadowTy(Ptrs);
4204 insertCheckShadowOf(Mask, &I);
4205 Value *MaskedPtrShadow = IRB.CreateSelect(
4206 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4207 "_msmaskedptrs");
4208 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4209 }
4210
4211 if (!PropagateShadow) {
4212 setShadow(&I, getCleanShadow(&I));
4213 setOrigin(&I, getCleanOrigin());
4214 return;
4215 }
4216
4217 Type *ShadowTy = getShadowTy(&I);
4218 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4219 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4220 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4221
4222 Value *Shadow =
4223 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4224 getShadow(PassThru), "_msmaskedgather");
4225
4226 setShadow(&I, Shadow);
4227
4228 // TODO: Store origins.
4229 setOrigin(&I, getCleanOrigin());
4230 }
4231
4232 void handleMaskedScatter(IntrinsicInst &I) {
4233 IRBuilder<> IRB(&I);
4234 Value *Values = I.getArgOperand(0);
4235 Value *Ptrs = I.getArgOperand(1);
4236 const Align Alignment(
4237 cast<ConstantInt>(I.getArgOperand(2))->getZExtValue());
4238 Value *Mask = I.getArgOperand(3);
4239
4240 Type *PtrsShadowTy = getShadowTy(Ptrs);
4242 insertCheckShadowOf(Mask, &I);
4243 Value *MaskedPtrShadow = IRB.CreateSelect(
4244 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4245 "_msmaskedptrs");
4246 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4247 }
4248
4249 Value *Shadow = getShadow(Values);
4250 Type *ElementShadowTy =
4251 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4252 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4253 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4254
4255 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4256
4257 // TODO: Store origin.
4258 }
4259
4260 // Intrinsic::masked_store
4261 //
4262 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4263 // stores are lowered to Intrinsic::masked_store.
4264 void handleMaskedStore(IntrinsicInst &I) {
4265 IRBuilder<> IRB(&I);
4266 Value *V = I.getArgOperand(0);
4267 Value *Ptr = I.getArgOperand(1);
4268 const Align Alignment(
4269 cast<ConstantInt>(I.getArgOperand(2))->getZExtValue());
4270 Value *Mask = I.getArgOperand(3);
4271 Value *Shadow = getShadow(V);
4272
4274 insertCheckShadowOf(Ptr, &I);
4275 insertCheckShadowOf(Mask, &I);
4276 }
4277
4278 Value *ShadowPtr;
4279 Value *OriginPtr;
4280 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4281 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4282
4283 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4284
4285 if (!MS.TrackOrigins)
4286 return;
4287
4288 auto &DL = F.getDataLayout();
4289 paintOrigin(IRB, getOrigin(V), OriginPtr,
4290 DL.getTypeStoreSize(Shadow->getType()),
4291 std::max(Alignment, kMinOriginAlignment));
4292 }
4293
4294 // Intrinsic::masked_load
4295 //
4296 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4297 // loads are lowered to Intrinsic::masked_load.
4298 void handleMaskedLoad(IntrinsicInst &I) {
4299 IRBuilder<> IRB(&I);
4300 Value *Ptr = I.getArgOperand(0);
4301 const Align Alignment(
4302 cast<ConstantInt>(I.getArgOperand(1))->getZExtValue());
4303 Value *Mask = I.getArgOperand(2);
4304 Value *PassThru = I.getArgOperand(3);
4305
4307 insertCheckShadowOf(Ptr, &I);
4308 insertCheckShadowOf(Mask, &I);
4309 }
4310
4311 if (!PropagateShadow) {
4312 setShadow(&I, getCleanShadow(&I));
4313 setOrigin(&I, getCleanOrigin());
4314 return;
4315 }
4316
4317 Type *ShadowTy = getShadowTy(&I);
4318 Value *ShadowPtr, *OriginPtr;
4319 std::tie(ShadowPtr, OriginPtr) =
4320 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4321 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4322 getShadow(PassThru), "_msmaskedld"));
4323
4324 if (!MS.TrackOrigins)
4325 return;
4326
4327 // Choose between PassThru's and the loaded value's origins.
4328 Value *MaskedPassThruShadow = IRB.CreateAnd(
4329 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4330
4331 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4332
4333 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4334 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4335
4336 setOrigin(&I, Origin);
4337 }
4338
4339 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4340 // dst mask src
4341 //
4342 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4343 // by handleMaskedStore.
4344 //
4345 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4346 // vector of integers, unlike the LLVM masked intrinsics, which require a
4347 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4348 // mentions that the x86 backend does not know how to efficiently convert
4349 // from a vector of booleans back into the AVX mask format; therefore, they
4350 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4351 // intrinsics.
4352 void handleAVXMaskedStore(IntrinsicInst &I) {
4353 assert(I.arg_size() == 3);
4354
4355 IRBuilder<> IRB(&I);
4356
4357 Value *Dst = I.getArgOperand(0);
4358 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4359
4360 Value *Mask = I.getArgOperand(1);
4361 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4362
4363 Value *Src = I.getArgOperand(2);
4364 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4365
4366 const Align Alignment = Align(1);
4367
4368 Value *SrcShadow = getShadow(Src);
4369
4371 insertCheckShadowOf(Dst, &I);
4372 insertCheckShadowOf(Mask, &I);
4373 }
4374
4375 Value *DstShadowPtr;
4376 Value *DstOriginPtr;
4377 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4378 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4379
4380 SmallVector<Value *, 2> ShadowArgs;
4381 ShadowArgs.append(1, DstShadowPtr);
4382 ShadowArgs.append(1, Mask);
4383 // The intrinsic may require floating-point but shadows can be arbitrary
4384 // bit patterns, of which some would be interpreted as "invalid"
4385 // floating-point values (NaN etc.); we assume the intrinsic will happily
4386 // copy them.
4387 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4388
4389 CallInst *CI =
4390 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4391 setShadow(&I, CI);
4392
4393 if (!MS.TrackOrigins)
4394 return;
4395
4396 // Approximation only
4397 auto &DL = F.getDataLayout();
4398 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4399 DL.getTypeStoreSize(SrcShadow->getType()),
4400 std::max(Alignment, kMinOriginAlignment));
4401 }
4402
4403 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4404 // return src mask
4405 //
4406 // Masked-off values are replaced with 0, which conveniently also represents
4407 // initialized memory.
4408 //
4409 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4410 // by handleMaskedStore.
4411 //
4412 // We do not combine this with handleMaskedLoad; see comment in
4413 // handleAVXMaskedStore for the rationale.
4414 //
4415 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4416 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4417 // parameter.
4418 void handleAVXMaskedLoad(IntrinsicInst &I) {
4419 assert(I.arg_size() == 2);
4420
4421 IRBuilder<> IRB(&I);
4422
4423 Value *Src = I.getArgOperand(0);
4424 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4425
4426 Value *Mask = I.getArgOperand(1);
4427 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4428
4429 const Align Alignment = Align(1);
4430
4432 insertCheckShadowOf(Mask, &I);
4433 }
4434
4435 Type *SrcShadowTy = getShadowTy(Src);
4436 Value *SrcShadowPtr, *SrcOriginPtr;
4437 std::tie(SrcShadowPtr, SrcOriginPtr) =
4438 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4439
4440 SmallVector<Value *, 2> ShadowArgs;
4441 ShadowArgs.append(1, SrcShadowPtr);
4442 ShadowArgs.append(1, Mask);
4443
4444 CallInst *CI =
4445 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4446 // The AVX masked load intrinsics do not have integer variants. We use the
4447 // floating-point variants, which will happily copy the shadows even if
4448 // they are interpreted as "invalid" floating-point values (NaN etc.).
4449 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4450
4451 if (!MS.TrackOrigins)
4452 return;
4453
4454 // The "pass-through" value is always zero (initialized). To the extent
4455 // that that results in initialized aligned 4-byte chunks, the origin value
4456 // is ignored. It is therefore correct to simply copy the origin from src.
4457 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4458 setOrigin(&I, PtrSrcOrigin);
4459 }
4460
4461 // Test whether the mask indices are initialized, only checking the bits that
4462 // are actually used.
4463 //
4464 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4465 // used/checked.
4466 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4467 assert(isFixedIntVector(Idx));
4468 auto IdxVectorSize =
4469 cast<FixedVectorType>(Idx->getType())->getNumElements();
4470 assert(isPowerOf2_64(IdxVectorSize));
4471
4472 // Compiler isn't smart enough, let's help it
4473 if (isa<Constant>(Idx))
4474 return;
4475
4476 auto *IdxShadow = getShadow(Idx);
4477 Value *Truncated = IRB.CreateTrunc(
4478 IdxShadow,
4479 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4480 IdxVectorSize));
4481 insertCheckShadow(Truncated, getOrigin(Idx), I);
4482 }
4483
4484 // Instrument AVX permutation intrinsic.
4485 // We apply the same permutation (argument index 1) to the shadow.
4486 void handleAVXVpermilvar(IntrinsicInst &I) {
4487 IRBuilder<> IRB(&I);
4488 Value *Shadow = getShadow(&I, 0);
4489 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4490
4491 // Shadows are integer-ish types but some intrinsics require a
4492 // different (e.g., floating-point) type.
4493 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4494 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4495 {Shadow, I.getArgOperand(1)});
4496
4497 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4498 setOriginForNaryOp(I);
4499 }
4500
4501 // Instrument AVX permutation intrinsic.
4502 // We apply the same permutation (argument index 1) to the shadows.
4503 void handleAVXVpermi2var(IntrinsicInst &I) {
4504 assert(I.arg_size() == 3);
4505 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4506 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4507 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4508 [[maybe_unused]] auto ArgVectorSize =
4509 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4510 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4511 ->getNumElements() == ArgVectorSize);
4512 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4513 ->getNumElements() == ArgVectorSize);
4514 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4515 assert(I.getType() == I.getArgOperand(0)->getType());
4516 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4517 IRBuilder<> IRB(&I);
4518 Value *AShadow = getShadow(&I, 0);
4519 Value *Idx = I.getArgOperand(1);
4520 Value *BShadow = getShadow(&I, 2);
4521
4522 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4523
4524 // Shadows are integer-ish types but some intrinsics require a
4525 // different (e.g., floating-point) type.
4526 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4527 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4528 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4529 {AShadow, Idx, BShadow});
4530 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4531 setOriginForNaryOp(I);
4532 }
4533
4534 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4535 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4536 }
4537
4538 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4539 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4540 }
4541
4542 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4543 return isFixedIntVectorTy(V->getType());
4544 }
4545
4546 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4547 return isFixedFPVectorTy(V->getType());
4548 }
4549
4550 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4551 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4552 // i32 rounding)
4553 //
4554 // Inconveniently, some similar intrinsics have a different operand order:
4555 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4556 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4557 // i16 mask)
4558 //
4559 // If the return type has more elements than A, the excess elements are
4560 // zeroed (and the corresponding shadow is initialized).
4561 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4562 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4563 // i8 mask)
4564 //
4565 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4566 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4567 // where all_or_nothing(x) is fully uninitialized if x has any
4568 // uninitialized bits
4569 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4570 IRBuilder<> IRB(&I);
4571
4572 assert(I.arg_size() == 4);
4573 Value *A = I.getOperand(0);
4574 Value *WriteThrough;
4575 Value *Mask;
4577 if (LastMask) {
4578 WriteThrough = I.getOperand(2);
4579 Mask = I.getOperand(3);
4580 RoundingMode = I.getOperand(1);
4581 } else {
4582 WriteThrough = I.getOperand(1);
4583 Mask = I.getOperand(2);
4584 RoundingMode = I.getOperand(3);
4585 }
4586
4587 assert(isFixedFPVector(A));
4588 assert(isFixedIntVector(WriteThrough));
4589
4590 unsigned ANumElements =
4591 cast<FixedVectorType>(A->getType())->getNumElements();
4592 [[maybe_unused]] unsigned WriteThruNumElements =
4593 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4594 assert(ANumElements == WriteThruNumElements ||
4595 ANumElements * 2 == WriteThruNumElements);
4596
4597 assert(Mask->getType()->isIntegerTy());
4598 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4599 assert(ANumElements == MaskNumElements ||
4600 ANumElements * 2 == MaskNumElements);
4601
4602 assert(WriteThruNumElements == MaskNumElements);
4603
4604 // Some bits of the mask may be unused, though it's unusual to have partly
4605 // uninitialized bits.
4606 insertCheckShadowOf(Mask, &I);
4607
4608 assert(RoundingMode->getType()->isIntegerTy());
4609 // Only some bits of the rounding mode are used, though it's very
4610 // unusual to have uninitialized bits there (more commonly, it's a
4611 // constant).
4612 insertCheckShadowOf(RoundingMode, &I);
4613
4614 assert(I.getType() == WriteThrough->getType());
4615
4616 Value *AShadow = getShadow(A);
4617 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4618
4619 if (ANumElements * 2 == MaskNumElements) {
4620 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4621 // from the zeroed shadow instead of the writethrough's shadow.
4622 Mask =
4623 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4624 Mask =
4625 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4626 }
4627
4628 // Convert i16 mask to <16 x i1>
4629 Mask = IRB.CreateBitCast(
4630 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4631 "_ms_mask_bitcast");
4632
4633 /// For floating-point to integer conversion, the output is:
4634 /// - fully uninitialized if *any* bit of the input is uninitialized
4635 /// - fully ininitialized if all bits of the input are ininitialized
4636 /// We apply the same principle on a per-element basis for vectors.
4637 ///
4638 /// We use the scalar width of the return type instead of A's.
4639 AShadow = IRB.CreateSExt(
4640 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4641 getShadowTy(&I), "_ms_a_shadow");
4642
4643 Value *WriteThroughShadow = getShadow(WriteThrough);
4644 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4645 "_ms_writethru_select");
4646
4647 setShadow(&I, Shadow);
4648 setOriginForNaryOp(I);
4649 }
4650
4651 // Instrument BMI / BMI2 intrinsics.
4652 // All of these intrinsics are Z = I(X, Y)
4653 // where the types of all operands and the result match, and are either i32 or
4654 // i64. The following instrumentation happens to work for all of them:
4655 // Sz = I(Sx, Y) | (sext (Sy != 0))
4656 void handleBmiIntrinsic(IntrinsicInst &I) {
4657 IRBuilder<> IRB(&I);
4658 Type *ShadowTy = getShadowTy(&I);
4659
4660 // If any bit of the mask operand is poisoned, then the whole thing is.
4661 Value *SMask = getShadow(&I, 1);
4662 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4663 ShadowTy);
4664 // Apply the same intrinsic to the shadow of the first operand.
4665 Value *S = IRB.CreateCall(I.getCalledFunction(),
4666 {getShadow(&I, 0), I.getOperand(1)});
4667 S = IRB.CreateOr(SMask, S);
4668 setShadow(&I, S);
4669 setOriginForNaryOp(I);
4670 }
4671
4672 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4673 SmallVector<int, 8> Mask;
4674 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4675 Mask.append(2, X);
4676 }
4677 return Mask;
4678 }
4679
4680 // Instrument pclmul intrinsics.
4681 // These intrinsics operate either on odd or on even elements of the input
4682 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4683 // Replace the unused elements with copies of the used ones, ex:
4684 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4685 // or
4686 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4687 // and then apply the usual shadow combining logic.
4688 void handlePclmulIntrinsic(IntrinsicInst &I) {
4689 IRBuilder<> IRB(&I);
4690 unsigned Width =
4691 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4692 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4693 "pclmul 3rd operand must be a constant");
4694 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4695 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4696 getPclmulMask(Width, Imm & 0x01));
4697 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4698 getPclmulMask(Width, Imm & 0x10));
4699 ShadowAndOriginCombiner SOC(this, IRB);
4700 SOC.Add(Shuf0, getOrigin(&I, 0));
4701 SOC.Add(Shuf1, getOrigin(&I, 1));
4702 SOC.Done(&I);
4703 }
4704
4705 // Instrument _mm_*_sd|ss intrinsics
4706 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4707 IRBuilder<> IRB(&I);
4708 unsigned Width =
4709 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4710 Value *First = getShadow(&I, 0);
4711 Value *Second = getShadow(&I, 1);
4712 // First element of second operand, remaining elements of first operand
4713 SmallVector<int, 16> Mask;
4714 Mask.push_back(Width);
4715 for (unsigned i = 1; i < Width; i++)
4716 Mask.push_back(i);
4717 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4718
4719 setShadow(&I, Shadow);
4720 setOriginForNaryOp(I);
4721 }
4722
4723 void handleVtestIntrinsic(IntrinsicInst &I) {
4724 IRBuilder<> IRB(&I);
4725 Value *Shadow0 = getShadow(&I, 0);
4726 Value *Shadow1 = getShadow(&I, 1);
4727 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4728 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4729 Value *Scalar = convertShadowToScalar(NZ, IRB);
4730 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4731
4732 setShadow(&I, Shadow);
4733 setOriginForNaryOp(I);
4734 }
4735
4736 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4737 IRBuilder<> IRB(&I);
4738 unsigned Width =
4739 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4740 Value *First = getShadow(&I, 0);
4741 Value *Second = getShadow(&I, 1);
4742 Value *OrShadow = IRB.CreateOr(First, Second);
4743 // First element of both OR'd together, remaining elements of first operand
4744 SmallVector<int, 16> Mask;
4745 Mask.push_back(Width);
4746 for (unsigned i = 1; i < Width; i++)
4747 Mask.push_back(i);
4748 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4749
4750 setShadow(&I, Shadow);
4751 setOriginForNaryOp(I);
4752 }
4753
4754 // _mm_round_ps / _mm_round_ps.
4755 // Similar to maybeHandleSimpleNomemIntrinsic except
4756 // the second argument is guranteed to be a constant integer.
4757 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4758 assert(I.getArgOperand(0)->getType() == I.getType());
4759 assert(I.arg_size() == 2);
4760 assert(isa<ConstantInt>(I.getArgOperand(1)));
4761
4762 IRBuilder<> IRB(&I);
4763 ShadowAndOriginCombiner SC(this, IRB);
4764 SC.Add(I.getArgOperand(0));
4765 SC.Done(&I);
4766 }
4767
4768 // Instrument @llvm.abs intrinsic.
4769 //
4770 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4771 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4772 void handleAbsIntrinsic(IntrinsicInst &I) {
4773 assert(I.arg_size() == 2);
4774 Value *Src = I.getArgOperand(0);
4775 Value *IsIntMinPoison = I.getArgOperand(1);
4776
4777 assert(I.getType()->isIntOrIntVectorTy());
4778
4779 assert(Src->getType() == I.getType());
4780
4781 assert(IsIntMinPoison->getType()->isIntegerTy());
4782 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4783
4784 IRBuilder<> IRB(&I);
4785 Value *SrcShadow = getShadow(Src);
4786
4787 APInt MinVal =
4788 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4789 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4790 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4791
4792 Value *PoisonedShadow = getPoisonedShadow(Src);
4793 Value *PoisonedIfIntMinShadow =
4794 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4795 Value *Shadow =
4796 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4797
4798 setShadow(&I, Shadow);
4799 setOrigin(&I, getOrigin(&I, 0));
4800 }
4801
4802 void handleIsFpClass(IntrinsicInst &I) {
4803 IRBuilder<> IRB(&I);
4804 Value *Shadow = getShadow(&I, 0);
4805 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4806 setOrigin(&I, getOrigin(&I, 0));
4807 }
4808
4809 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4810 IRBuilder<> IRB(&I);
4811 Value *Shadow0 = getShadow(&I, 0);
4812 Value *Shadow1 = getShadow(&I, 1);
4813 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4814 Value *ShadowElt1 =
4815 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4816
4817 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4818 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4819 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4820
4821 setShadow(&I, Shadow);
4822 setOriginForNaryOp(I);
4823 }
4824
4825 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4826 assert(isa<FixedVectorType>(V->getType()));
4827 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4828 Value *Shadow = getShadow(V);
4829 return IRB.CreateExtractElement(Shadow,
4830 ConstantInt::get(IRB.getInt32Ty(), 0));
4831 }
4832
4833 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4834 //
4835 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4836 // (<8 x i64>, <16 x i8>, i8)
4837 // A WriteThru Mask
4838 //
4839 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4840 // (<16 x i32>, <16 x i8>, i16)
4841 //
4842 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4843 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4844 //
4845 // If Dst has more elements than A, the excess elements are zeroed (and the
4846 // corresponding shadow is initialized).
4847 //
4848 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4849 // and is much faster than this handler.
4850 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4851 IRBuilder<> IRB(&I);
4852
4853 assert(I.arg_size() == 3);
4854 Value *A = I.getOperand(0);
4855 Value *WriteThrough = I.getOperand(1);
4856 Value *Mask = I.getOperand(2);
4857
4858 assert(isFixedIntVector(A));
4859 assert(isFixedIntVector(WriteThrough));
4860
4861 unsigned ANumElements =
4862 cast<FixedVectorType>(A->getType())->getNumElements();
4863 unsigned OutputNumElements =
4864 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4865 assert(ANumElements == OutputNumElements ||
4866 ANumElements * 2 == OutputNumElements);
4867
4868 assert(Mask->getType()->isIntegerTy());
4869 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4870 insertCheckShadowOf(Mask, &I);
4871
4872 assert(I.getType() == WriteThrough->getType());
4873
4874 // Widen the mask, if necessary, to have one bit per element of the output
4875 // vector.
4876 // We want the extra bits to have '1's, so that the CreateSelect will
4877 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4878 // versions of the intrinsics are sometimes implemented using an all-1's
4879 // mask and an undefined value for WriteThroughShadow). We accomplish this
4880 // by using bitwise NOT before and after the ZExt.
4881 if (ANumElements != OutputNumElements) {
4882 Mask = IRB.CreateNot(Mask);
4883 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4884 "_ms_widen_mask");
4885 Mask = IRB.CreateNot(Mask);
4886 }
4887 Mask = IRB.CreateBitCast(
4888 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4889
4890 Value *AShadow = getShadow(A);
4891
4892 // The return type might have more elements than the input.
4893 // Temporarily shrink the return type's number of elements.
4894 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4895
4896 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4897 // This handler treats them all as truncation, which leads to some rare
4898 // false positives in the cases where the truncated bytes could
4899 // unambiguously saturate the value e.g., if A = ??????10 ????????
4900 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4901 // fully defined, but the truncated byte is ????????.
4902 //
4903 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4904 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4905 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4906
4907 Value *WriteThroughShadow = getShadow(WriteThrough);
4908
4909 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4910 setShadow(&I, Shadow);
4911 setOriginForNaryOp(I);
4912 }
4913
4914 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4915 // values and perform an operation whose shadow propagation should be handled
4916 // as all-or-nothing [*], with masking provided by a vector and a mask
4917 // supplied as an integer.
4918 //
4919 // [*] if all bits of a vector element are initialized, the output is fully
4920 // initialized; otherwise, the output is fully uninitialized
4921 //
4922 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4923 // (<16 x float>, <16 x float>, i16)
4924 // A WriteThru Mask
4925 //
4926 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4927 // (<2 x double>, <2 x double>, i8)
4928 //
4929 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4930 // (<8 x double>, i32, <8 x double>, i8, i32)
4931 // A Imm WriteThru Mask Rounding
4932 //
4933 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
4934 // be fully initialized.
4935 //
4936 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
4937 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
4938 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
4939 unsigned WriteThruIndex,
4940 unsigned MaskIndex) {
4941 IRBuilder<> IRB(&I);
4942
4943 unsigned NumArgs = I.arg_size();
4944 assert(AIndex < NumArgs);
4945 assert(WriteThruIndex < NumArgs);
4946 assert(MaskIndex < NumArgs);
4947 assert(AIndex != WriteThruIndex);
4948 assert(AIndex != MaskIndex);
4949 assert(WriteThruIndex != MaskIndex);
4950
4951 Value *A = I.getOperand(AIndex);
4952 Value *WriteThru = I.getOperand(WriteThruIndex);
4953 Value *Mask = I.getOperand(MaskIndex);
4954
4955 assert(isFixedFPVector(A));
4956 assert(isFixedFPVector(WriteThru));
4957
4958 [[maybe_unused]] unsigned ANumElements =
4959 cast<FixedVectorType>(A->getType())->getNumElements();
4960 unsigned OutputNumElements =
4961 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
4962 assert(ANumElements == OutputNumElements);
4963
4964 for (unsigned i = 0; i < NumArgs; ++i) {
4965 if (i != AIndex && i != WriteThruIndex) {
4966 // Imm, Mask, Rounding etc. are "control" data, hence we require that
4967 // they be fully initialized.
4968 assert(I.getOperand(i)->getType()->isIntegerTy());
4969 insertCheckShadowOf(I.getOperand(i), &I);
4970 }
4971 }
4972
4973 // The mask has 1 bit per element of A, but a minimum of 8 bits.
4974 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
4975 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
4976 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4977
4978 assert(I.getType() == WriteThru->getType());
4979
4980 Mask = IRB.CreateBitCast(
4981 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4982
4983 Value *AShadow = getShadow(A);
4984
4985 // All-or-nothing shadow
4986 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
4987 AShadow->getType());
4988
4989 Value *WriteThruShadow = getShadow(WriteThru);
4990
4991 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
4992 setShadow(&I, Shadow);
4993
4994 setOriginForNaryOp(I);
4995 }
4996
4997 // For sh.* compiler intrinsics:
4998 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
4999 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5000 // A B WriteThru Mask RoundingMode
5001 //
5002 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5003 // DstShadow[1..7] = AShadow[1..7]
5004 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5005 IRBuilder<> IRB(&I);
5006
5007 assert(I.arg_size() == 5);
5008 Value *A = I.getOperand(0);
5009 Value *B = I.getOperand(1);
5010 Value *WriteThrough = I.getOperand(2);
5011 Value *Mask = I.getOperand(3);
5012 Value *RoundingMode = I.getOperand(4);
5013
5014 // Technically, we could probably just check whether the LSB is
5015 // initialized, but intuitively it feels like a partly uninitialized mask
5016 // is unintended, and we should warn the user immediately.
5017 insertCheckShadowOf(Mask, &I);
5018 insertCheckShadowOf(RoundingMode, &I);
5019
5020 assert(isa<FixedVectorType>(A->getType()));
5021 unsigned NumElements =
5022 cast<FixedVectorType>(A->getType())->getNumElements();
5023 assert(NumElements == 8);
5024 assert(A->getType() == B->getType());
5025 assert(B->getType() == WriteThrough->getType());
5026 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5027 assert(RoundingMode->getType()->isIntegerTy());
5028
5029 Value *ALowerShadow = extractLowerShadow(IRB, A);
5030 Value *BLowerShadow = extractLowerShadow(IRB, B);
5031
5032 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5033
5034 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5035
5036 Mask = IRB.CreateBitCast(
5037 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5038 Value *MaskLower =
5039 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5040
5041 Value *AShadow = getShadow(A);
5042 Value *DstLowerShadow =
5043 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5044 Value *DstShadow = IRB.CreateInsertElement(
5045 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5046 "_msprop");
5047
5048 setShadow(&I, DstShadow);
5049 setOriginForNaryOp(I);
5050 }
5051
5052 // Approximately handle AVX Galois Field Affine Transformation
5053 //
5054 // e.g.,
5055 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5056 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5057 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5058 // Out A x b
5059 // where A and x are packed matrices, b is a vector,
5060 // Out = A * x + b in GF(2)
5061 //
5062 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5063 // computation also includes a parity calculation.
5064 //
5065 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5066 // Out_Shadow = (V1_Shadow & V2_Shadow)
5067 // | (V1 & V2_Shadow)
5068 // | (V1_Shadow & V2 )
5069 //
5070 // We approximate the shadow of gf2p8affineqb using:
5071 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5072 // | gf2p8affineqb(x, A_shadow, 0)
5073 // | gf2p8affineqb(x_Shadow, A, 0)
5074 // | set1_epi8(b_Shadow)
5075 //
5076 // This approximation has false negatives: if an intermediate dot-product
5077 // contains an even number of 1's, the parity is 0.
5078 // It has no false positives.
5079 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5080 IRBuilder<> IRB(&I);
5081
5082 assert(I.arg_size() == 3);
5083 Value *A = I.getOperand(0);
5084 Value *X = I.getOperand(1);
5085 Value *B = I.getOperand(2);
5086
5087 assert(isFixedIntVector(A));
5088 assert(cast<VectorType>(A->getType())
5089 ->getElementType()
5090 ->getScalarSizeInBits() == 8);
5091
5092 assert(A->getType() == X->getType());
5093
5094 assert(B->getType()->isIntegerTy());
5095 assert(B->getType()->getScalarSizeInBits() == 8);
5096
5097 assert(I.getType() == A->getType());
5098
5099 Value *AShadow = getShadow(A);
5100 Value *XShadow = getShadow(X);
5101 Value *BZeroShadow = getCleanShadow(B);
5102
5103 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5104 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5105 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5106 {X, AShadow, BZeroShadow});
5107 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5108 {XShadow, A, BZeroShadow});
5109
5110 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5111 Value *BShadow = getShadow(B);
5112 Value *BBroadcastShadow = getCleanShadow(AShadow);
5113 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5114 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5115 // lower appropriately (e.g., VPBROADCASTB).
5116 // Besides, b is often a constant, in which case it is fully initialized.
5117 for (unsigned i = 0; i < NumElements; i++)
5118 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5119
5120 setShadow(&I, IRB.CreateOr(
5121 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5122 setOriginForNaryOp(I);
5123 }
5124
5125 // Handle Arm NEON vector load intrinsics (vld*).
5126 //
5127 // The WithLane instructions (ld[234]lane) are similar to:
5128 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5129 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5130 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5131 // %A)
5132 //
5133 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5134 // to:
5135 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5136 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5137 unsigned int numArgs = I.arg_size();
5138
5139 // Return type is a struct of vectors of integers or floating-point
5140 assert(I.getType()->isStructTy());
5141 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5142 assert(RetTy->getNumElements() > 0);
5144 RetTy->getElementType(0)->isFPOrFPVectorTy());
5145 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5146 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5147
5148 if (WithLane) {
5149 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5150 assert(4 <= numArgs && numArgs <= 6);
5151
5152 // Return type is a struct of the input vectors
5153 assert(RetTy->getNumElements() + 2 == numArgs);
5154 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5155 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5156 } else {
5157 assert(numArgs == 1);
5158 }
5159
5160 IRBuilder<> IRB(&I);
5161
5162 SmallVector<Value *, 6> ShadowArgs;
5163 if (WithLane) {
5164 for (unsigned int i = 0; i < numArgs - 2; i++)
5165 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5166
5167 // Lane number, passed verbatim
5168 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5169 ShadowArgs.push_back(LaneNumber);
5170
5171 // TODO: blend shadow of lane number into output shadow?
5172 insertCheckShadowOf(LaneNumber, &I);
5173 }
5174
5175 Value *Src = I.getArgOperand(numArgs - 1);
5176 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5177
5178 Type *SrcShadowTy = getShadowTy(Src);
5179 auto [SrcShadowPtr, SrcOriginPtr] =
5180 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5181 ShadowArgs.push_back(SrcShadowPtr);
5182
5183 // The NEON vector load instructions handled by this function all have
5184 // integer variants. It is easier to use those rather than trying to cast
5185 // a struct of vectors of floats into a struct of vectors of integers.
5186 CallInst *CI =
5187 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5188 setShadow(&I, CI);
5189
5190 if (!MS.TrackOrigins)
5191 return;
5192
5193 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5194 setOrigin(&I, PtrSrcOrigin);
5195 }
5196
5197 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5198 /// and vst{2,3,4}lane).
5199 ///
5200 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5201 /// last argument, with the initial arguments being the inputs (and lane
5202 /// number for vst{2,3,4}lane). They return void.
5203 ///
5204 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5205 /// abcdabcdabcdabcd... into *outP
5206 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5207 /// writes aaaa...bbbb...cccc...dddd... into *outP
5208 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5209 /// These instructions can all be instrumented with essentially the same
5210 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5211 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5212 IRBuilder<> IRB(&I);
5213
5214 // Don't use getNumOperands() because it includes the callee
5215 int numArgOperands = I.arg_size();
5216
5217 // The last arg operand is the output (pointer)
5218 assert(numArgOperands >= 1);
5219 Value *Addr = I.getArgOperand(numArgOperands - 1);
5220 assert(Addr->getType()->isPointerTy());
5221 int skipTrailingOperands = 1;
5222
5224 insertCheckShadowOf(Addr, &I);
5225
5226 // Second-last operand is the lane number (for vst{2,3,4}lane)
5227 if (useLane) {
5228 skipTrailingOperands++;
5229 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5231 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5232 }
5233
5234 SmallVector<Value *, 8> ShadowArgs;
5235 // All the initial operands are the inputs
5236 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5237 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5238 Value *Shadow = getShadow(&I, i);
5239 ShadowArgs.append(1, Shadow);
5240 }
5241
5242 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5243 // e.g., for:
5244 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5245 // we know the type of the output (and its shadow) is <16 x i8>.
5246 //
5247 // Arm NEON VST is unusual because the last argument is the output address:
5248 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5249 // call void @llvm.aarch64.neon.st2.v16i8.p0
5250 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5251 // and we have no type information about P's operand. We must manually
5252 // compute the type (<16 x i8> x 2).
5253 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5254 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5255 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5256 (numArgOperands - skipTrailingOperands));
5257 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5258
5259 if (useLane)
5260 ShadowArgs.append(1,
5261 I.getArgOperand(numArgOperands - skipTrailingOperands));
5262
5263 Value *OutputShadowPtr, *OutputOriginPtr;
5264 // AArch64 NEON does not need alignment (unless OS requires it)
5265 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5266 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5267 ShadowArgs.append(1, OutputShadowPtr);
5268
5269 CallInst *CI =
5270 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5271 setShadow(&I, CI);
5272
5273 if (MS.TrackOrigins) {
5274 // TODO: if we modelled the vst* instruction more precisely, we could
5275 // more accurately track the origins (e.g., if both inputs are
5276 // uninitialized for vst2, we currently blame the second input, even
5277 // though part of the output depends only on the first input).
5278 //
5279 // This is particularly imprecise for vst{2,3,4}lane, since only one
5280 // lane of each input is actually copied to the output.
5281 OriginCombiner OC(this, IRB);
5282 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5283 OC.Add(I.getArgOperand(i));
5284
5285 const DataLayout &DL = F.getDataLayout();
5286 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5287 OutputOriginPtr);
5288 }
5289 }
5290
5291 /// Handle intrinsics by applying the intrinsic to the shadows.
5292 ///
5293 /// The trailing arguments are passed verbatim to the intrinsic, though any
5294 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5295 /// intrinsic with one trailing verbatim argument:
5296 /// out = intrinsic(var1, var2, opType)
5297 /// we compute:
5298 /// shadow[out] =
5299 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5300 ///
5301 /// Typically, shadowIntrinsicID will be specified by the caller to be
5302 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5303 /// intrinsic of the same type.
5304 ///
5305 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5306 /// bit-patterns (for example, if the intrinsic accepts floats for
5307 /// var1, we require that it doesn't care if inputs are NaNs).
5308 ///
5309 /// For example, this can be applied to the Arm NEON vector table intrinsics
5310 /// (tbl{1,2,3,4}).
5311 ///
5312 /// The origin is approximated using setOriginForNaryOp.
5313 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5314 Intrinsic::ID shadowIntrinsicID,
5315 unsigned int trailingVerbatimArgs) {
5316 IRBuilder<> IRB(&I);
5317
5318 assert(trailingVerbatimArgs < I.arg_size());
5319
5320 SmallVector<Value *, 8> ShadowArgs;
5321 // Don't use getNumOperands() because it includes the callee
5322 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5323 Value *Shadow = getShadow(&I, i);
5324
5325 // Shadows are integer-ish types but some intrinsics require a
5326 // different (e.g., floating-point) type.
5327 ShadowArgs.push_back(
5328 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5329 }
5330
5331 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5332 i++) {
5333 Value *Arg = I.getArgOperand(i);
5334 ShadowArgs.push_back(Arg);
5335 }
5336
5337 CallInst *CI =
5338 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5339 Value *CombinedShadow = CI;
5340
5341 // Combine the computed shadow with the shadow of trailing args
5342 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5343 i++) {
5344 Value *Shadow =
5345 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5346 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5347 }
5348
5349 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5350
5351 setOriginForNaryOp(I);
5352 }
5353
5354 // Approximation only
5355 //
5356 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5357 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5358 assert(I.arg_size() == 2);
5359
5360 handleShadowOr(I);
5361 }
5362
5363 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5364 switch (I.getIntrinsicID()) {
5365 case Intrinsic::uadd_with_overflow:
5366 case Intrinsic::sadd_with_overflow:
5367 case Intrinsic::usub_with_overflow:
5368 case Intrinsic::ssub_with_overflow:
5369 case Intrinsic::umul_with_overflow:
5370 case Intrinsic::smul_with_overflow:
5371 handleArithmeticWithOverflow(I);
5372 break;
5373 case Intrinsic::abs:
5374 handleAbsIntrinsic(I);
5375 break;
5376 case Intrinsic::bitreverse:
5377 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5378 /*trailingVerbatimArgs*/ 0);
5379 break;
5380 case Intrinsic::is_fpclass:
5381 handleIsFpClass(I);
5382 break;
5383 case Intrinsic::lifetime_start:
5384 handleLifetimeStart(I);
5385 break;
5386 case Intrinsic::launder_invariant_group:
5387 case Intrinsic::strip_invariant_group:
5388 handleInvariantGroup(I);
5389 break;
5390 case Intrinsic::bswap:
5391 handleBswap(I);
5392 break;
5393 case Intrinsic::ctlz:
5394 case Intrinsic::cttz:
5395 handleCountLeadingTrailingZeros(I);
5396 break;
5397 case Intrinsic::masked_compressstore:
5398 handleMaskedCompressStore(I);
5399 break;
5400 case Intrinsic::masked_expandload:
5401 handleMaskedExpandLoad(I);
5402 break;
5403 case Intrinsic::masked_gather:
5404 handleMaskedGather(I);
5405 break;
5406 case Intrinsic::masked_scatter:
5407 handleMaskedScatter(I);
5408 break;
5409 case Intrinsic::masked_store:
5410 handleMaskedStore(I);
5411 break;
5412 case Intrinsic::masked_load:
5413 handleMaskedLoad(I);
5414 break;
5415 case Intrinsic::vector_reduce_and:
5416 handleVectorReduceAndIntrinsic(I);
5417 break;
5418 case Intrinsic::vector_reduce_or:
5419 handleVectorReduceOrIntrinsic(I);
5420 break;
5421
5422 case Intrinsic::vector_reduce_add:
5423 case Intrinsic::vector_reduce_xor:
5424 case Intrinsic::vector_reduce_mul:
5425 // Signed/Unsigned Min/Max
5426 // TODO: handling similarly to AND/OR may be more precise.
5427 case Intrinsic::vector_reduce_smax:
5428 case Intrinsic::vector_reduce_smin:
5429 case Intrinsic::vector_reduce_umax:
5430 case Intrinsic::vector_reduce_umin:
5431 // TODO: this has no false positives, but arguably we should check that all
5432 // the bits are initialized.
5433 case Intrinsic::vector_reduce_fmax:
5434 case Intrinsic::vector_reduce_fmin:
5435 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5436 break;
5437
5438 case Intrinsic::vector_reduce_fadd:
5439 case Intrinsic::vector_reduce_fmul:
5440 handleVectorReduceWithStarterIntrinsic(I);
5441 break;
5442
5443 case Intrinsic::scmp:
5444 case Intrinsic::ucmp: {
5445 handleShadowOr(I);
5446 break;
5447 }
5448
5449 case Intrinsic::fshl:
5450 case Intrinsic::fshr:
5451 handleFunnelShift(I);
5452 break;
5453
5454 case Intrinsic::is_constant:
5455 // The result of llvm.is.constant() is always defined.
5456 setShadow(&I, getCleanShadow(&I));
5457 setOrigin(&I, getCleanOrigin());
5458 break;
5459
5460 default:
5461 return false;
5462 }
5463
5464 return true;
5465 }
5466
5467 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5468 switch (I.getIntrinsicID()) {
5469 case Intrinsic::x86_sse_stmxcsr:
5470 handleStmxcsr(I);
5471 break;
5472 case Intrinsic::x86_sse_ldmxcsr:
5473 handleLdmxcsr(I);
5474 break;
5475
5476 // Convert Scalar Double Precision Floating-Point Value
5477 // to Unsigned Doubleword Integer
5478 // etc.
5479 case Intrinsic::x86_avx512_vcvtsd2usi64:
5480 case Intrinsic::x86_avx512_vcvtsd2usi32:
5481 case Intrinsic::x86_avx512_vcvtss2usi64:
5482 case Intrinsic::x86_avx512_vcvtss2usi32:
5483 case Intrinsic::x86_avx512_cvttss2usi64:
5484 case Intrinsic::x86_avx512_cvttss2usi:
5485 case Intrinsic::x86_avx512_cvttsd2usi64:
5486 case Intrinsic::x86_avx512_cvttsd2usi:
5487 case Intrinsic::x86_avx512_cvtusi2ss:
5488 case Intrinsic::x86_avx512_cvtusi642sd:
5489 case Intrinsic::x86_avx512_cvtusi642ss:
5490 handleSSEVectorConvertIntrinsic(I, 1, true);
5491 break;
5492 case Intrinsic::x86_sse2_cvtsd2si64:
5493 case Intrinsic::x86_sse2_cvtsd2si:
5494 case Intrinsic::x86_sse2_cvtsd2ss:
5495 case Intrinsic::x86_sse2_cvttsd2si64:
5496 case Intrinsic::x86_sse2_cvttsd2si:
5497 case Intrinsic::x86_sse_cvtss2si64:
5498 case Intrinsic::x86_sse_cvtss2si:
5499 case Intrinsic::x86_sse_cvttss2si64:
5500 case Intrinsic::x86_sse_cvttss2si:
5501 handleSSEVectorConvertIntrinsic(I, 1);
5502 break;
5503 case Intrinsic::x86_sse_cvtps2pi:
5504 case Intrinsic::x86_sse_cvttps2pi:
5505 handleSSEVectorConvertIntrinsic(I, 2);
5506 break;
5507
5508 // TODO:
5509 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5510 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5511 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5512
5513 case Intrinsic::x86_vcvtps2ph_128:
5514 case Intrinsic::x86_vcvtps2ph_256: {
5515 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5516 break;
5517 }
5518
5519 // Convert Packed Single Precision Floating-Point Values
5520 // to Packed Signed Doubleword Integer Values
5521 //
5522 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5523 // (<16 x float>, <16 x i32>, i16, i32)
5524 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5525 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5526 break;
5527
5528 // Convert Packed Double Precision Floating-Point Values
5529 // to Packed Single Precision Floating-Point Values
5530 case Intrinsic::x86_sse2_cvtpd2ps:
5531 case Intrinsic::x86_sse2_cvtps2dq:
5532 case Intrinsic::x86_sse2_cvtpd2dq:
5533 case Intrinsic::x86_sse2_cvttps2dq:
5534 case Intrinsic::x86_sse2_cvttpd2dq:
5535 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5536 case Intrinsic::x86_avx_cvt_ps2dq_256:
5537 case Intrinsic::x86_avx_cvt_pd2dq_256:
5538 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5539 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5540 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5541 break;
5542 }
5543
5544 // Convert Single-Precision FP Value to 16-bit FP Value
5545 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5546 // (<16 x float>, i32, <16 x i16>, i16)
5547 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5548 // (<4 x float>, i32, <8 x i16>, i8)
5549 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5550 // (<8 x float>, i32, <8 x i16>, i8)
5551 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5552 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5553 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5554 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5555 break;
5556
5557 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5558 case Intrinsic::x86_avx512_psll_w_512:
5559 case Intrinsic::x86_avx512_psll_d_512:
5560 case Intrinsic::x86_avx512_psll_q_512:
5561 case Intrinsic::x86_avx512_pslli_w_512:
5562 case Intrinsic::x86_avx512_pslli_d_512:
5563 case Intrinsic::x86_avx512_pslli_q_512:
5564 case Intrinsic::x86_avx512_psrl_w_512:
5565 case Intrinsic::x86_avx512_psrl_d_512:
5566 case Intrinsic::x86_avx512_psrl_q_512:
5567 case Intrinsic::x86_avx512_psra_w_512:
5568 case Intrinsic::x86_avx512_psra_d_512:
5569 case Intrinsic::x86_avx512_psra_q_512:
5570 case Intrinsic::x86_avx512_psrli_w_512:
5571 case Intrinsic::x86_avx512_psrli_d_512:
5572 case Intrinsic::x86_avx512_psrli_q_512:
5573 case Intrinsic::x86_avx512_psrai_w_512:
5574 case Intrinsic::x86_avx512_psrai_d_512:
5575 case Intrinsic::x86_avx512_psrai_q_512:
5576 case Intrinsic::x86_avx512_psra_q_256:
5577 case Intrinsic::x86_avx512_psra_q_128:
5578 case Intrinsic::x86_avx512_psrai_q_256:
5579 case Intrinsic::x86_avx512_psrai_q_128:
5580 case Intrinsic::x86_avx2_psll_w:
5581 case Intrinsic::x86_avx2_psll_d:
5582 case Intrinsic::x86_avx2_psll_q:
5583 case Intrinsic::x86_avx2_pslli_w:
5584 case Intrinsic::x86_avx2_pslli_d:
5585 case Intrinsic::x86_avx2_pslli_q:
5586 case Intrinsic::x86_avx2_psrl_w:
5587 case Intrinsic::x86_avx2_psrl_d:
5588 case Intrinsic::x86_avx2_psrl_q:
5589 case Intrinsic::x86_avx2_psra_w:
5590 case Intrinsic::x86_avx2_psra_d:
5591 case Intrinsic::x86_avx2_psrli_w:
5592 case Intrinsic::x86_avx2_psrli_d:
5593 case Intrinsic::x86_avx2_psrli_q:
5594 case Intrinsic::x86_avx2_psrai_w:
5595 case Intrinsic::x86_avx2_psrai_d:
5596 case Intrinsic::x86_sse2_psll_w:
5597 case Intrinsic::x86_sse2_psll_d:
5598 case Intrinsic::x86_sse2_psll_q:
5599 case Intrinsic::x86_sse2_pslli_w:
5600 case Intrinsic::x86_sse2_pslli_d:
5601 case Intrinsic::x86_sse2_pslli_q:
5602 case Intrinsic::x86_sse2_psrl_w:
5603 case Intrinsic::x86_sse2_psrl_d:
5604 case Intrinsic::x86_sse2_psrl_q:
5605 case Intrinsic::x86_sse2_psra_w:
5606 case Intrinsic::x86_sse2_psra_d:
5607 case Intrinsic::x86_sse2_psrli_w:
5608 case Intrinsic::x86_sse2_psrli_d:
5609 case Intrinsic::x86_sse2_psrli_q:
5610 case Intrinsic::x86_sse2_psrai_w:
5611 case Intrinsic::x86_sse2_psrai_d:
5612 case Intrinsic::x86_mmx_psll_w:
5613 case Intrinsic::x86_mmx_psll_d:
5614 case Intrinsic::x86_mmx_psll_q:
5615 case Intrinsic::x86_mmx_pslli_w:
5616 case Intrinsic::x86_mmx_pslli_d:
5617 case Intrinsic::x86_mmx_pslli_q:
5618 case Intrinsic::x86_mmx_psrl_w:
5619 case Intrinsic::x86_mmx_psrl_d:
5620 case Intrinsic::x86_mmx_psrl_q:
5621 case Intrinsic::x86_mmx_psra_w:
5622 case Intrinsic::x86_mmx_psra_d:
5623 case Intrinsic::x86_mmx_psrli_w:
5624 case Intrinsic::x86_mmx_psrli_d:
5625 case Intrinsic::x86_mmx_psrli_q:
5626 case Intrinsic::x86_mmx_psrai_w:
5627 case Intrinsic::x86_mmx_psrai_d:
5628 handleVectorShiftIntrinsic(I, /* Variable */ false);
5629 break;
5630 case Intrinsic::x86_avx2_psllv_d:
5631 case Intrinsic::x86_avx2_psllv_d_256:
5632 case Intrinsic::x86_avx512_psllv_d_512:
5633 case Intrinsic::x86_avx2_psllv_q:
5634 case Intrinsic::x86_avx2_psllv_q_256:
5635 case Intrinsic::x86_avx512_psllv_q_512:
5636 case Intrinsic::x86_avx2_psrlv_d:
5637 case Intrinsic::x86_avx2_psrlv_d_256:
5638 case Intrinsic::x86_avx512_psrlv_d_512:
5639 case Intrinsic::x86_avx2_psrlv_q:
5640 case Intrinsic::x86_avx2_psrlv_q_256:
5641 case Intrinsic::x86_avx512_psrlv_q_512:
5642 case Intrinsic::x86_avx2_psrav_d:
5643 case Intrinsic::x86_avx2_psrav_d_256:
5644 case Intrinsic::x86_avx512_psrav_d_512:
5645 case Intrinsic::x86_avx512_psrav_q_128:
5646 case Intrinsic::x86_avx512_psrav_q_256:
5647 case Intrinsic::x86_avx512_psrav_q_512:
5648 handleVectorShiftIntrinsic(I, /* Variable */ true);
5649 break;
5650
5651 // Pack with Signed/Unsigned Saturation
5652 case Intrinsic::x86_sse2_packsswb_128:
5653 case Intrinsic::x86_sse2_packssdw_128:
5654 case Intrinsic::x86_sse2_packuswb_128:
5655 case Intrinsic::x86_sse41_packusdw:
5656 case Intrinsic::x86_avx2_packsswb:
5657 case Intrinsic::x86_avx2_packssdw:
5658 case Intrinsic::x86_avx2_packuswb:
5659 case Intrinsic::x86_avx2_packusdw:
5660 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5661 // (<32 x i16> %a, <32 x i16> %b)
5662 // <32 x i16> @llvm.x86.avx512.packssdw.512
5663 // (<16 x i32> %a, <16 x i32> %b)
5664 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5665 case Intrinsic::x86_avx512_packsswb_512:
5666 case Intrinsic::x86_avx512_packssdw_512:
5667 case Intrinsic::x86_avx512_packuswb_512:
5668 case Intrinsic::x86_avx512_packusdw_512:
5669 handleVectorPackIntrinsic(I);
5670 break;
5671
5672 case Intrinsic::x86_sse41_pblendvb:
5673 case Intrinsic::x86_sse41_blendvpd:
5674 case Intrinsic::x86_sse41_blendvps:
5675 case Intrinsic::x86_avx_blendv_pd_256:
5676 case Intrinsic::x86_avx_blendv_ps_256:
5677 case Intrinsic::x86_avx2_pblendvb:
5678 handleBlendvIntrinsic(I);
5679 break;
5680
5681 case Intrinsic::x86_avx_dp_ps_256:
5682 case Intrinsic::x86_sse41_dppd:
5683 case Intrinsic::x86_sse41_dpps:
5684 handleDppIntrinsic(I);
5685 break;
5686
5687 case Intrinsic::x86_mmx_packsswb:
5688 case Intrinsic::x86_mmx_packuswb:
5689 handleVectorPackIntrinsic(I, 16);
5690 break;
5691
5692 case Intrinsic::x86_mmx_packssdw:
5693 handleVectorPackIntrinsic(I, 32);
5694 break;
5695
5696 case Intrinsic::x86_mmx_psad_bw:
5697 handleVectorSadIntrinsic(I, true);
5698 break;
5699 case Intrinsic::x86_sse2_psad_bw:
5700 case Intrinsic::x86_avx2_psad_bw:
5701 handleVectorSadIntrinsic(I);
5702 break;
5703
5704 // Multiply and Add Packed Words
5705 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5706 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5707 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5708 //
5709 // Multiply and Add Packed Signed and Unsigned Bytes
5710 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5711 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5712 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5713 //
5714 // These intrinsics are auto-upgraded into non-masked forms:
5715 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5716 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5717 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5718 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5719 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5720 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5721 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5722 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5723 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5724 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5725 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5726 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5727 case Intrinsic::x86_sse2_pmadd_wd:
5728 case Intrinsic::x86_avx2_pmadd_wd:
5729 case Intrinsic::x86_avx512_pmaddw_d_512:
5730 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5731 case Intrinsic::x86_avx2_pmadd_ub_sw:
5732 case Intrinsic::x86_avx512_pmaddubs_w_512:
5733 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2);
5734 break;
5735
5736 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5737 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5738 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/8);
5739 break;
5740
5741 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5742 case Intrinsic::x86_mmx_pmadd_wd:
5743 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5744 break;
5745
5746 // AVX Vector Neural Network Instructions: bytes
5747 //
5748 // Multiply and Add Packed Signed and Unsigned Bytes
5749 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5750 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5751 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5752 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5753 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5754 // (<16 x i32>, <64 x i8>, <64 x i8>)
5755 //
5756 // Multiply and Add Unsigned and Signed Bytes With Saturation
5757 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5758 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5759 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5760 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5761 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5762 // (<16 x i32>, <64 x i8>, <64 x i8>)
5763 //
5764 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5765 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5766 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5767 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5768 //
5769 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5770 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5771 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5772 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5773 //
5774 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5775 // (<16 x i32>, <16 x i32>, <16 x i32>)
5776 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5777 // (<16 x i32>, <16 x i32>, <16 x i32>)
5778 //
5779 // These intrinsics are auto-upgraded into non-masked forms:
5780 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5781 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5782 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5783 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5784 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5785 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5786 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5787 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5788 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5789 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5790 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5791 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5792 //
5793 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5794 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5795 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5796 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5797 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5798 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5799 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5800 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5801 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5802 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5803 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
5804 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5805 case Intrinsic::x86_avx512_vpdpbusd_128:
5806 case Intrinsic::x86_avx512_vpdpbusd_256:
5807 case Intrinsic::x86_avx512_vpdpbusd_512:
5808 case Intrinsic::x86_avx512_vpdpbusds_128:
5809 case Intrinsic::x86_avx512_vpdpbusds_256:
5810 case Intrinsic::x86_avx512_vpdpbusds_512:
5811 case Intrinsic::x86_avx2_vpdpbssd_128:
5812 case Intrinsic::x86_avx2_vpdpbssd_256:
5813 case Intrinsic::x86_avx10_vpdpbssd_512:
5814 case Intrinsic::x86_avx2_vpdpbssds_128:
5815 case Intrinsic::x86_avx2_vpdpbssds_256:
5816 case Intrinsic::x86_avx10_vpdpbssds_512:
5817 case Intrinsic::x86_avx2_vpdpbsud_128:
5818 case Intrinsic::x86_avx2_vpdpbsud_256:
5819 case Intrinsic::x86_avx10_vpdpbsud_512:
5820 case Intrinsic::x86_avx2_vpdpbsuds_128:
5821 case Intrinsic::x86_avx2_vpdpbsuds_256:
5822 case Intrinsic::x86_avx10_vpdpbsuds_512:
5823 case Intrinsic::x86_avx2_vpdpbuud_128:
5824 case Intrinsic::x86_avx2_vpdpbuud_256:
5825 case Intrinsic::x86_avx10_vpdpbuud_512:
5826 case Intrinsic::x86_avx2_vpdpbuuds_128:
5827 case Intrinsic::x86_avx2_vpdpbuuds_256:
5828 case Intrinsic::x86_avx10_vpdpbuuds_512:
5829 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4, /*EltSize=*/8);
5830 break;
5831
5832 // AVX Vector Neural Network Instructions: words
5833 //
5834 // Multiply and Add Signed Word Integers
5835 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
5836 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5837 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
5838 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5839 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
5840 // (<16 x i32>, <16 x i32>, <16 x i32>)
5841 //
5842 // Multiply and Add Signed Word Integers With Saturation
5843 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
5844 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5845 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
5846 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5847 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
5848 // (<16 x i32>, <16 x i32>, <16 x i32>)
5849 //
5850 // These intrinsics are auto-upgraded into non-masked forms:
5851 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
5852 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5853 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
5854 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5855 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
5856 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5857 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
5858 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5859 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
5860 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5861 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
5862 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5863 //
5864 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
5865 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5866 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
5867 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5868 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
5869 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5870 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
5871 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5872 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
5873 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5874 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
5875 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5876 case Intrinsic::x86_avx512_vpdpwssd_128:
5877 case Intrinsic::x86_avx512_vpdpwssd_256:
5878 case Intrinsic::x86_avx512_vpdpwssd_512:
5879 case Intrinsic::x86_avx512_vpdpwssds_128:
5880 case Intrinsic::x86_avx512_vpdpwssds_256:
5881 case Intrinsic::x86_avx512_vpdpwssds_512:
5882 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5883 break;
5884
5885 // TODO: Dot Product of BF16 Pairs Accumulated Into Packed Single
5886 // Precision
5887 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
5888 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5889 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
5890 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
5891 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
5892 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
5893 // handleVectorPmaddIntrinsic() currently only handles integer types.
5894
5895 case Intrinsic::x86_sse_cmp_ss:
5896 case Intrinsic::x86_sse2_cmp_sd:
5897 case Intrinsic::x86_sse_comieq_ss:
5898 case Intrinsic::x86_sse_comilt_ss:
5899 case Intrinsic::x86_sse_comile_ss:
5900 case Intrinsic::x86_sse_comigt_ss:
5901 case Intrinsic::x86_sse_comige_ss:
5902 case Intrinsic::x86_sse_comineq_ss:
5903 case Intrinsic::x86_sse_ucomieq_ss:
5904 case Intrinsic::x86_sse_ucomilt_ss:
5905 case Intrinsic::x86_sse_ucomile_ss:
5906 case Intrinsic::x86_sse_ucomigt_ss:
5907 case Intrinsic::x86_sse_ucomige_ss:
5908 case Intrinsic::x86_sse_ucomineq_ss:
5909 case Intrinsic::x86_sse2_comieq_sd:
5910 case Intrinsic::x86_sse2_comilt_sd:
5911 case Intrinsic::x86_sse2_comile_sd:
5912 case Intrinsic::x86_sse2_comigt_sd:
5913 case Intrinsic::x86_sse2_comige_sd:
5914 case Intrinsic::x86_sse2_comineq_sd:
5915 case Intrinsic::x86_sse2_ucomieq_sd:
5916 case Intrinsic::x86_sse2_ucomilt_sd:
5917 case Intrinsic::x86_sse2_ucomile_sd:
5918 case Intrinsic::x86_sse2_ucomigt_sd:
5919 case Intrinsic::x86_sse2_ucomige_sd:
5920 case Intrinsic::x86_sse2_ucomineq_sd:
5921 handleVectorCompareScalarIntrinsic(I);
5922 break;
5923
5924 case Intrinsic::x86_avx_cmp_pd_256:
5925 case Intrinsic::x86_avx_cmp_ps_256:
5926 case Intrinsic::x86_sse2_cmp_pd:
5927 case Intrinsic::x86_sse_cmp_ps:
5928 handleVectorComparePackedIntrinsic(I);
5929 break;
5930
5931 case Intrinsic::x86_bmi_bextr_32:
5932 case Intrinsic::x86_bmi_bextr_64:
5933 case Intrinsic::x86_bmi_bzhi_32:
5934 case Intrinsic::x86_bmi_bzhi_64:
5935 case Intrinsic::x86_bmi_pdep_32:
5936 case Intrinsic::x86_bmi_pdep_64:
5937 case Intrinsic::x86_bmi_pext_32:
5938 case Intrinsic::x86_bmi_pext_64:
5939 handleBmiIntrinsic(I);
5940 break;
5941
5942 case Intrinsic::x86_pclmulqdq:
5943 case Intrinsic::x86_pclmulqdq_256:
5944 case Intrinsic::x86_pclmulqdq_512:
5945 handlePclmulIntrinsic(I);
5946 break;
5947
5948 case Intrinsic::x86_avx_round_pd_256:
5949 case Intrinsic::x86_avx_round_ps_256:
5950 case Intrinsic::x86_sse41_round_pd:
5951 case Intrinsic::x86_sse41_round_ps:
5952 handleRoundPdPsIntrinsic(I);
5953 break;
5954
5955 case Intrinsic::x86_sse41_round_sd:
5956 case Intrinsic::x86_sse41_round_ss:
5957 handleUnarySdSsIntrinsic(I);
5958 break;
5959
5960 case Intrinsic::x86_sse2_max_sd:
5961 case Intrinsic::x86_sse_max_ss:
5962 case Intrinsic::x86_sse2_min_sd:
5963 case Intrinsic::x86_sse_min_ss:
5964 handleBinarySdSsIntrinsic(I);
5965 break;
5966
5967 case Intrinsic::x86_avx_vtestc_pd:
5968 case Intrinsic::x86_avx_vtestc_pd_256:
5969 case Intrinsic::x86_avx_vtestc_ps:
5970 case Intrinsic::x86_avx_vtestc_ps_256:
5971 case Intrinsic::x86_avx_vtestnzc_pd:
5972 case Intrinsic::x86_avx_vtestnzc_pd_256:
5973 case Intrinsic::x86_avx_vtestnzc_ps:
5974 case Intrinsic::x86_avx_vtestnzc_ps_256:
5975 case Intrinsic::x86_avx_vtestz_pd:
5976 case Intrinsic::x86_avx_vtestz_pd_256:
5977 case Intrinsic::x86_avx_vtestz_ps:
5978 case Intrinsic::x86_avx_vtestz_ps_256:
5979 case Intrinsic::x86_avx_ptestc_256:
5980 case Intrinsic::x86_avx_ptestnzc_256:
5981 case Intrinsic::x86_avx_ptestz_256:
5982 case Intrinsic::x86_sse41_ptestc:
5983 case Intrinsic::x86_sse41_ptestnzc:
5984 case Intrinsic::x86_sse41_ptestz:
5985 handleVtestIntrinsic(I);
5986 break;
5987
5988 // Packed Horizontal Add/Subtract
5989 case Intrinsic::x86_ssse3_phadd_w:
5990 case Intrinsic::x86_ssse3_phadd_w_128:
5991 case Intrinsic::x86_avx2_phadd_w:
5992 case Intrinsic::x86_ssse3_phsub_w:
5993 case Intrinsic::x86_ssse3_phsub_w_128:
5994 case Intrinsic::x86_avx2_phsub_w: {
5995 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
5996 break;
5997 }
5998
5999 // Packed Horizontal Add/Subtract
6000 case Intrinsic::x86_ssse3_phadd_d:
6001 case Intrinsic::x86_ssse3_phadd_d_128:
6002 case Intrinsic::x86_avx2_phadd_d:
6003 case Intrinsic::x86_ssse3_phsub_d:
6004 case Intrinsic::x86_ssse3_phsub_d_128:
6005 case Intrinsic::x86_avx2_phsub_d: {
6006 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/32);
6007 break;
6008 }
6009
6010 // Packed Horizontal Add/Subtract and Saturate
6011 case Intrinsic::x86_ssse3_phadd_sw:
6012 case Intrinsic::x86_ssse3_phadd_sw_128:
6013 case Intrinsic::x86_avx2_phadd_sw:
6014 case Intrinsic::x86_ssse3_phsub_sw:
6015 case Intrinsic::x86_ssse3_phsub_sw_128:
6016 case Intrinsic::x86_avx2_phsub_sw: {
6017 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
6018 break;
6019 }
6020
6021 // Packed Single/Double Precision Floating-Point Horizontal Add
6022 case Intrinsic::x86_sse3_hadd_ps:
6023 case Intrinsic::x86_sse3_hadd_pd:
6024 case Intrinsic::x86_avx_hadd_pd_256:
6025 case Intrinsic::x86_avx_hadd_ps_256:
6026 case Intrinsic::x86_sse3_hsub_ps:
6027 case Intrinsic::x86_sse3_hsub_pd:
6028 case Intrinsic::x86_avx_hsub_pd_256:
6029 case Intrinsic::x86_avx_hsub_ps_256: {
6030 handlePairwiseShadowOrIntrinsic(I);
6031 break;
6032 }
6033
6034 case Intrinsic::x86_avx_maskstore_ps:
6035 case Intrinsic::x86_avx_maskstore_pd:
6036 case Intrinsic::x86_avx_maskstore_ps_256:
6037 case Intrinsic::x86_avx_maskstore_pd_256:
6038 case Intrinsic::x86_avx2_maskstore_d:
6039 case Intrinsic::x86_avx2_maskstore_q:
6040 case Intrinsic::x86_avx2_maskstore_d_256:
6041 case Intrinsic::x86_avx2_maskstore_q_256: {
6042 handleAVXMaskedStore(I);
6043 break;
6044 }
6045
6046 case Intrinsic::x86_avx_maskload_ps:
6047 case Intrinsic::x86_avx_maskload_pd:
6048 case Intrinsic::x86_avx_maskload_ps_256:
6049 case Intrinsic::x86_avx_maskload_pd_256:
6050 case Intrinsic::x86_avx2_maskload_d:
6051 case Intrinsic::x86_avx2_maskload_q:
6052 case Intrinsic::x86_avx2_maskload_d_256:
6053 case Intrinsic::x86_avx2_maskload_q_256: {
6054 handleAVXMaskedLoad(I);
6055 break;
6056 }
6057
6058 // Packed
6059 case Intrinsic::x86_avx512fp16_add_ph_512:
6060 case Intrinsic::x86_avx512fp16_sub_ph_512:
6061 case Intrinsic::x86_avx512fp16_mul_ph_512:
6062 case Intrinsic::x86_avx512fp16_div_ph_512:
6063 case Intrinsic::x86_avx512fp16_max_ph_512:
6064 case Intrinsic::x86_avx512fp16_min_ph_512:
6065 case Intrinsic::x86_avx512_min_ps_512:
6066 case Intrinsic::x86_avx512_min_pd_512:
6067 case Intrinsic::x86_avx512_max_ps_512:
6068 case Intrinsic::x86_avx512_max_pd_512: {
6069 // These AVX512 variants contain the rounding mode as a trailing flag.
6070 // Earlier variants do not have a trailing flag and are already handled
6071 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6072 // maybeHandleUnknownIntrinsic.
6073 [[maybe_unused]] bool Success =
6074 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6075 assert(Success);
6076 break;
6077 }
6078
6079 case Intrinsic::x86_avx_vpermilvar_pd:
6080 case Intrinsic::x86_avx_vpermilvar_pd_256:
6081 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6082 case Intrinsic::x86_avx_vpermilvar_ps:
6083 case Intrinsic::x86_avx_vpermilvar_ps_256:
6084 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6085 handleAVXVpermilvar(I);
6086 break;
6087 }
6088
6089 case Intrinsic::x86_avx512_vpermi2var_d_128:
6090 case Intrinsic::x86_avx512_vpermi2var_d_256:
6091 case Intrinsic::x86_avx512_vpermi2var_d_512:
6092 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6093 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6094 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6095 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6096 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6097 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6098 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6099 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6100 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6101 case Intrinsic::x86_avx512_vpermi2var_q_128:
6102 case Intrinsic::x86_avx512_vpermi2var_q_256:
6103 case Intrinsic::x86_avx512_vpermi2var_q_512:
6104 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6105 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6106 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6107 handleAVXVpermi2var(I);
6108 break;
6109
6110 // Packed Shuffle
6111 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6112 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6113 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6114 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6115 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6116 //
6117 // The following intrinsics are auto-upgraded:
6118 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6119 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6120 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6121 case Intrinsic::x86_avx2_pshuf_b:
6122 case Intrinsic::x86_sse_pshuf_w:
6123 case Intrinsic::x86_ssse3_pshuf_b_128:
6124 case Intrinsic::x86_ssse3_pshuf_b:
6125 case Intrinsic::x86_avx512_pshuf_b_512:
6126 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6127 /*trailingVerbatimArgs=*/1);
6128 break;
6129
6130 // AVX512 PMOV: Packed MOV, with truncation
6131 // Precisely handled by applying the same intrinsic to the shadow
6132 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6133 case Intrinsic::x86_avx512_mask_pmov_db_512:
6134 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6135 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6136 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6137 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6138 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6139 /*trailingVerbatimArgs=*/1);
6140 break;
6141 }
6142
6143 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6144 // Approximately handled using the corresponding truncation intrinsic
6145 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6146 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6147 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6148 handleIntrinsicByApplyingToShadow(I,
6149 Intrinsic::x86_avx512_mask_pmov_dw_512,
6150 /* trailingVerbatimArgs=*/1);
6151 break;
6152 }
6153
6154 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6155 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6156 handleIntrinsicByApplyingToShadow(I,
6157 Intrinsic::x86_avx512_mask_pmov_db_512,
6158 /* trailingVerbatimArgs=*/1);
6159 break;
6160 }
6161
6162 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6163 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6164 handleIntrinsicByApplyingToShadow(I,
6165 Intrinsic::x86_avx512_mask_pmov_qb_512,
6166 /* trailingVerbatimArgs=*/1);
6167 break;
6168 }
6169
6170 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6171 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6172 handleIntrinsicByApplyingToShadow(I,
6173 Intrinsic::x86_avx512_mask_pmov_qw_512,
6174 /* trailingVerbatimArgs=*/1);
6175 break;
6176 }
6177
6178 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6179 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6180 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6181 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6182 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6183 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6184 // slow-path handler.
6185 handleAVX512VectorDownConvert(I);
6186 break;
6187 }
6188
6189 // AVX512/AVX10 Reciprocal
6190 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6191 // (<16 x float>, <16 x float>, i16)
6192 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6193 // (<8 x float>, <8 x float>, i8)
6194 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6195 // (<4 x float>, <4 x float>, i8)
6196 //
6197 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6198 // (<8 x double>, <8 x double>, i8)
6199 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6200 // (<4 x double>, <4 x double>, i8)
6201 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6202 // (<2 x double>, <2 x double>, i8)
6203 //
6204 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6205 // (<32 x bfloat>, <32 x bfloat>, i32)
6206 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6207 // (<16 x bfloat>, <16 x bfloat>, i16)
6208 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6209 // (<8 x bfloat>, <8 x bfloat>, i8)
6210 //
6211 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6212 // (<32 x half>, <32 x half>, i32)
6213 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6214 // (<16 x half>, <16 x half>, i16)
6215 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6216 // (<8 x half>, <8 x half>, i8)
6217 //
6218 // TODO: 3-operand variants are not handled:
6219 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6220 // (<2 x double>, <2 x double>, <2 x double>, i8)
6221 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6222 // (<4 x float>, <4 x float>, <4 x float>, i8)
6223 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6224 // (<8 x half>, <8 x half>, <8 x half>, i8)
6225 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6226 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6227 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6228 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6229 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6230 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6231 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6232 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6233 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6234 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6235 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6236 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6237 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6238 /*MaskIndex=*/2);
6239 break;
6240
6241 // AVX512/AVX10 Reciprocal Square Root
6242 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6243 // (<16 x float>, <16 x float>, i16)
6244 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6245 // (<8 x float>, <8 x float>, i8)
6246 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6247 // (<4 x float>, <4 x float>, i8)
6248 //
6249 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6250 // (<8 x double>, <8 x double>, i8)
6251 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6252 // (<4 x double>, <4 x double>, i8)
6253 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6254 // (<2 x double>, <2 x double>, i8)
6255 //
6256 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6257 // (<32 x bfloat>, <32 x bfloat>, i32)
6258 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6259 // (<16 x bfloat>, <16 x bfloat>, i16)
6260 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6261 // (<8 x bfloat>, <8 x bfloat>, i8)
6262 //
6263 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6264 // (<32 x half>, <32 x half>, i32)
6265 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6266 // (<16 x half>, <16 x half>, i16)
6267 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6268 // (<8 x half>, <8 x half>, i8)
6269 //
6270 // TODO: 3-operand variants are not handled:
6271 // <2 x double> @llvm.x86.avx512.rcp14.sd
6272 // (<2 x double>, <2 x double>, <2 x double>, i8)
6273 // <4 x float> @llvm.x86.avx512.rcp14.ss
6274 // (<4 x float>, <4 x float>, <4 x float>, i8)
6275 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6276 // (<8 x half>, <8 x half>, <8 x half>, i8)
6277 case Intrinsic::x86_avx512_rcp14_ps_512:
6278 case Intrinsic::x86_avx512_rcp14_ps_256:
6279 case Intrinsic::x86_avx512_rcp14_ps_128:
6280 case Intrinsic::x86_avx512_rcp14_pd_512:
6281 case Intrinsic::x86_avx512_rcp14_pd_256:
6282 case Intrinsic::x86_avx512_rcp14_pd_128:
6283 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6284 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6285 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6286 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6287 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6288 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6289 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6290 /*MaskIndex=*/2);
6291 break;
6292
6293 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6294 // (<32 x half>, i32, <32 x half>, i32, i32)
6295 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6296 // (<16 x half>, i32, <16 x half>, i32, i16)
6297 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6298 // (<8 x half>, i32, <8 x half>, i32, i8)
6299 //
6300 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6301 // (<16 x float>, i32, <16 x float>, i16, i32)
6302 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6303 // (<8 x float>, i32, <8 x float>, i8)
6304 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6305 // (<4 x float>, i32, <4 x float>, i8)
6306 //
6307 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6308 // (<8 x double>, i32, <8 x double>, i8, i32)
6309 // A Imm WriteThru Mask Rounding
6310 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6311 // (<4 x double>, i32, <4 x double>, i8)
6312 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6313 // (<2 x double>, i32, <2 x double>, i8)
6314 // A Imm WriteThru Mask
6315 //
6316 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6317 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6318 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6319 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6320 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6321 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6322 //
6323 // Not supported: three vectors
6324 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6325 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6326 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6327 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6328 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6329 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6330 // i32)
6331 // A B WriteThru Mask Imm
6332 // Rounding
6333 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6334 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6335 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6336 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6337 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6338 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6339 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6340 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6341 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6342 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6343 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6344 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6345 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6346 /*MaskIndex=*/3);
6347 break;
6348
6349 // AVX512 FP16 Arithmetic
6350 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6351 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6352 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6353 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6354 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6355 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6356 visitGenericScalarHalfwordInst(I);
6357 break;
6358 }
6359
6360 // AVX Galois Field New Instructions
6361 case Intrinsic::x86_vgf2p8affineqb_128:
6362 case Intrinsic::x86_vgf2p8affineqb_256:
6363 case Intrinsic::x86_vgf2p8affineqb_512:
6364 handleAVXGF2P8Affine(I);
6365 break;
6366
6367 default:
6368 return false;
6369 }
6370
6371 return true;
6372 }
6373
6374 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6375 switch (I.getIntrinsicID()) {
6376 case Intrinsic::aarch64_neon_rshrn:
6377 case Intrinsic::aarch64_neon_sqrshl:
6378 case Intrinsic::aarch64_neon_sqrshrn:
6379 case Intrinsic::aarch64_neon_sqrshrun:
6380 case Intrinsic::aarch64_neon_sqshl:
6381 case Intrinsic::aarch64_neon_sqshlu:
6382 case Intrinsic::aarch64_neon_sqshrn:
6383 case Intrinsic::aarch64_neon_sqshrun:
6384 case Intrinsic::aarch64_neon_srshl:
6385 case Intrinsic::aarch64_neon_sshl:
6386 case Intrinsic::aarch64_neon_uqrshl:
6387 case Intrinsic::aarch64_neon_uqrshrn:
6388 case Intrinsic::aarch64_neon_uqshl:
6389 case Intrinsic::aarch64_neon_uqshrn:
6390 case Intrinsic::aarch64_neon_urshl:
6391 case Intrinsic::aarch64_neon_ushl:
6392 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6393 handleVectorShiftIntrinsic(I, /* Variable */ false);
6394 break;
6395
6396 // TODO: handling max/min similarly to AND/OR may be more precise
6397 // Floating-Point Maximum/Minimum Pairwise
6398 case Intrinsic::aarch64_neon_fmaxp:
6399 case Intrinsic::aarch64_neon_fminp:
6400 // Floating-Point Maximum/Minimum Number Pairwise
6401 case Intrinsic::aarch64_neon_fmaxnmp:
6402 case Intrinsic::aarch64_neon_fminnmp:
6403 // Signed/Unsigned Maximum/Minimum Pairwise
6404 case Intrinsic::aarch64_neon_smaxp:
6405 case Intrinsic::aarch64_neon_sminp:
6406 case Intrinsic::aarch64_neon_umaxp:
6407 case Intrinsic::aarch64_neon_uminp:
6408 // Add Pairwise
6409 case Intrinsic::aarch64_neon_addp:
6410 // Floating-point Add Pairwise
6411 case Intrinsic::aarch64_neon_faddp:
6412 // Add Long Pairwise
6413 case Intrinsic::aarch64_neon_saddlp:
6414 case Intrinsic::aarch64_neon_uaddlp: {
6415 handlePairwiseShadowOrIntrinsic(I);
6416 break;
6417 }
6418
6419 // Floating-point Convert to integer, rounding to nearest with ties to Away
6420 case Intrinsic::aarch64_neon_fcvtas:
6421 case Intrinsic::aarch64_neon_fcvtau:
6422 // Floating-point convert to integer, rounding toward minus infinity
6423 case Intrinsic::aarch64_neon_fcvtms:
6424 case Intrinsic::aarch64_neon_fcvtmu:
6425 // Floating-point convert to integer, rounding to nearest with ties to even
6426 case Intrinsic::aarch64_neon_fcvtns:
6427 case Intrinsic::aarch64_neon_fcvtnu:
6428 // Floating-point convert to integer, rounding toward plus infinity
6429 case Intrinsic::aarch64_neon_fcvtps:
6430 case Intrinsic::aarch64_neon_fcvtpu:
6431 // Floating-point Convert to integer, rounding toward Zero
6432 case Intrinsic::aarch64_neon_fcvtzs:
6433 case Intrinsic::aarch64_neon_fcvtzu:
6434 // Floating-point convert to lower precision narrow, rounding to odd
6435 case Intrinsic::aarch64_neon_fcvtxn: {
6436 handleNEONVectorConvertIntrinsic(I);
6437 break;
6438 }
6439
6440 // Add reduction to scalar
6441 case Intrinsic::aarch64_neon_faddv:
6442 case Intrinsic::aarch64_neon_saddv:
6443 case Intrinsic::aarch64_neon_uaddv:
6444 // Signed/Unsigned min/max (Vector)
6445 // TODO: handling similarly to AND/OR may be more precise.
6446 case Intrinsic::aarch64_neon_smaxv:
6447 case Intrinsic::aarch64_neon_sminv:
6448 case Intrinsic::aarch64_neon_umaxv:
6449 case Intrinsic::aarch64_neon_uminv:
6450 // Floating-point min/max (vector)
6451 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6452 // but our shadow propagation is the same.
6453 case Intrinsic::aarch64_neon_fmaxv:
6454 case Intrinsic::aarch64_neon_fminv:
6455 case Intrinsic::aarch64_neon_fmaxnmv:
6456 case Intrinsic::aarch64_neon_fminnmv:
6457 // Sum long across vector
6458 case Intrinsic::aarch64_neon_saddlv:
6459 case Intrinsic::aarch64_neon_uaddlv:
6460 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6461 break;
6462
6463 case Intrinsic::aarch64_neon_ld1x2:
6464 case Intrinsic::aarch64_neon_ld1x3:
6465 case Intrinsic::aarch64_neon_ld1x4:
6466 case Intrinsic::aarch64_neon_ld2:
6467 case Intrinsic::aarch64_neon_ld3:
6468 case Intrinsic::aarch64_neon_ld4:
6469 case Intrinsic::aarch64_neon_ld2r:
6470 case Intrinsic::aarch64_neon_ld3r:
6471 case Intrinsic::aarch64_neon_ld4r: {
6472 handleNEONVectorLoad(I, /*WithLane=*/false);
6473 break;
6474 }
6475
6476 case Intrinsic::aarch64_neon_ld2lane:
6477 case Intrinsic::aarch64_neon_ld3lane:
6478 case Intrinsic::aarch64_neon_ld4lane: {
6479 handleNEONVectorLoad(I, /*WithLane=*/true);
6480 break;
6481 }
6482
6483 // Saturating extract narrow
6484 case Intrinsic::aarch64_neon_sqxtn:
6485 case Intrinsic::aarch64_neon_sqxtun:
6486 case Intrinsic::aarch64_neon_uqxtn:
6487 // These only have one argument, but we (ab)use handleShadowOr because it
6488 // does work on single argument intrinsics and will typecast the shadow
6489 // (and update the origin).
6490 handleShadowOr(I);
6491 break;
6492
6493 case Intrinsic::aarch64_neon_st1x2:
6494 case Intrinsic::aarch64_neon_st1x3:
6495 case Intrinsic::aarch64_neon_st1x4:
6496 case Intrinsic::aarch64_neon_st2:
6497 case Intrinsic::aarch64_neon_st3:
6498 case Intrinsic::aarch64_neon_st4: {
6499 handleNEONVectorStoreIntrinsic(I, false);
6500 break;
6501 }
6502
6503 case Intrinsic::aarch64_neon_st2lane:
6504 case Intrinsic::aarch64_neon_st3lane:
6505 case Intrinsic::aarch64_neon_st4lane: {
6506 handleNEONVectorStoreIntrinsic(I, true);
6507 break;
6508 }
6509
6510 // Arm NEON vector table intrinsics have the source/table register(s) as
6511 // arguments, followed by the index register. They return the output.
6512 //
6513 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6514 // original value unchanged in the destination register.'
6515 // Conveniently, zero denotes a clean shadow, which means out-of-range
6516 // indices for TBL will initialize the user data with zero and also clean
6517 // the shadow. (For TBX, neither the user data nor the shadow will be
6518 // updated, which is also correct.)
6519 case Intrinsic::aarch64_neon_tbl1:
6520 case Intrinsic::aarch64_neon_tbl2:
6521 case Intrinsic::aarch64_neon_tbl3:
6522 case Intrinsic::aarch64_neon_tbl4:
6523 case Intrinsic::aarch64_neon_tbx1:
6524 case Intrinsic::aarch64_neon_tbx2:
6525 case Intrinsic::aarch64_neon_tbx3:
6526 case Intrinsic::aarch64_neon_tbx4: {
6527 // The last trailing argument (index register) should be handled verbatim
6528 handleIntrinsicByApplyingToShadow(
6529 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6530 /*trailingVerbatimArgs*/ 1);
6531 break;
6532 }
6533
6534 case Intrinsic::aarch64_neon_fmulx:
6535 case Intrinsic::aarch64_neon_pmul:
6536 case Intrinsic::aarch64_neon_pmull:
6537 case Intrinsic::aarch64_neon_smull:
6538 case Intrinsic::aarch64_neon_pmull64:
6539 case Intrinsic::aarch64_neon_umull: {
6540 handleNEONVectorMultiplyIntrinsic(I);
6541 break;
6542 }
6543
6544 default:
6545 return false;
6546 }
6547
6548 return true;
6549 }
6550
6551 void visitIntrinsicInst(IntrinsicInst &I) {
6552 if (maybeHandleCrossPlatformIntrinsic(I))
6553 return;
6554
6555 if (maybeHandleX86SIMDIntrinsic(I))
6556 return;
6557
6558 if (maybeHandleArmSIMDIntrinsic(I))
6559 return;
6560
6561 if (maybeHandleUnknownIntrinsic(I))
6562 return;
6563
6564 visitInstruction(I);
6565 }
6566
6567 void visitLibAtomicLoad(CallBase &CB) {
6568 // Since we use getNextNode here, we can't have CB terminate the BB.
6569 assert(isa<CallInst>(CB));
6570
6571 IRBuilder<> IRB(&CB);
6572 Value *Size = CB.getArgOperand(0);
6573 Value *SrcPtr = CB.getArgOperand(1);
6574 Value *DstPtr = CB.getArgOperand(2);
6575 Value *Ordering = CB.getArgOperand(3);
6576 // Convert the call to have at least Acquire ordering to make sure
6577 // the shadow operations aren't reordered before it.
6578 Value *NewOrdering =
6579 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6580 CB.setArgOperand(3, NewOrdering);
6581
6582 NextNodeIRBuilder NextIRB(&CB);
6583 Value *SrcShadowPtr, *SrcOriginPtr;
6584 std::tie(SrcShadowPtr, SrcOriginPtr) =
6585 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6586 /*isStore*/ false);
6587 Value *DstShadowPtr =
6588 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6589 /*isStore*/ true)
6590 .first;
6591
6592 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6593 if (MS.TrackOrigins) {
6594 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6596 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6597 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6598 }
6599 }
6600
6601 void visitLibAtomicStore(CallBase &CB) {
6602 IRBuilder<> IRB(&CB);
6603 Value *Size = CB.getArgOperand(0);
6604 Value *DstPtr = CB.getArgOperand(2);
6605 Value *Ordering = CB.getArgOperand(3);
6606 // Convert the call to have at least Release ordering to make sure
6607 // the shadow operations aren't reordered after it.
6608 Value *NewOrdering =
6609 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6610 CB.setArgOperand(3, NewOrdering);
6611
6612 Value *DstShadowPtr =
6613 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6614 /*isStore*/ true)
6615 .first;
6616
6617 // Atomic store always paints clean shadow/origin. See file header.
6618 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6619 Align(1));
6620 }
6621
6622 void visitCallBase(CallBase &CB) {
6623 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6624 if (CB.isInlineAsm()) {
6625 // For inline asm (either a call to asm function, or callbr instruction),
6626 // do the usual thing: check argument shadow and mark all outputs as
6627 // clean. Note that any side effects of the inline asm that are not
6628 // immediately visible in its constraints are not handled.
6630 visitAsmInstruction(CB);
6631 else
6632 visitInstruction(CB);
6633 return;
6634 }
6635 LibFunc LF;
6636 if (TLI->getLibFunc(CB, LF)) {
6637 // libatomic.a functions need to have special handling because there isn't
6638 // a good way to intercept them or compile the library with
6639 // instrumentation.
6640 switch (LF) {
6641 case LibFunc_atomic_load:
6642 if (!isa<CallInst>(CB)) {
6643 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6644 "Ignoring!\n";
6645 break;
6646 }
6647 visitLibAtomicLoad(CB);
6648 return;
6649 case LibFunc_atomic_store:
6650 visitLibAtomicStore(CB);
6651 return;
6652 default:
6653 break;
6654 }
6655 }
6656
6657 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6658 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6659
6660 // We are going to insert code that relies on the fact that the callee
6661 // will become a non-readonly function after it is instrumented by us. To
6662 // prevent this code from being optimized out, mark that function
6663 // non-readonly in advance.
6664 // TODO: We can likely do better than dropping memory() completely here.
6665 AttributeMask B;
6666 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6667
6669 if (Function *Func = Call->getCalledFunction()) {
6670 Func->removeFnAttrs(B);
6671 }
6672
6674 }
6675 IRBuilder<> IRB(&CB);
6676 bool MayCheckCall = MS.EagerChecks;
6677 if (Function *Func = CB.getCalledFunction()) {
6678 // __sanitizer_unaligned_{load,store} functions may be called by users
6679 // and always expects shadows in the TLS. So don't check them.
6680 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6681 }
6682
6683 unsigned ArgOffset = 0;
6684 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6685 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6686 if (!A->getType()->isSized()) {
6687 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6688 continue;
6689 }
6690
6691 if (A->getType()->isScalableTy()) {
6692 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6693 // Handle as noundef, but don't reserve tls slots.
6694 insertCheckShadowOf(A, &CB);
6695 continue;
6696 }
6697
6698 unsigned Size = 0;
6699 const DataLayout &DL = F.getDataLayout();
6700
6701 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6702 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6703 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6704
6705 if (EagerCheck) {
6706 insertCheckShadowOf(A, &CB);
6707 Size = DL.getTypeAllocSize(A->getType());
6708 } else {
6709 [[maybe_unused]] Value *Store = nullptr;
6710 // Compute the Shadow for arg even if it is ByVal, because
6711 // in that case getShadow() will copy the actual arg shadow to
6712 // __msan_param_tls.
6713 Value *ArgShadow = getShadow(A);
6714 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
6715 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
6716 << " Shadow: " << *ArgShadow << "\n");
6717 if (ByVal) {
6718 // ByVal requires some special handling as it's too big for a single
6719 // load
6720 assert(A->getType()->isPointerTy() &&
6721 "ByVal argument is not a pointer!");
6722 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
6723 if (ArgOffset + Size > kParamTLSSize)
6724 break;
6725 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
6726 MaybeAlign Alignment = std::nullopt;
6727 if (ParamAlignment)
6728 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
6729 Value *AShadowPtr, *AOriginPtr;
6730 std::tie(AShadowPtr, AOriginPtr) =
6731 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
6732 /*isStore*/ false);
6733 if (!PropagateShadow) {
6734 Store = IRB.CreateMemSet(ArgShadowBase,
6736 Size, Alignment);
6737 } else {
6738 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
6739 Alignment, Size);
6740 if (MS.TrackOrigins) {
6741 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
6742 // FIXME: OriginSize should be:
6743 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
6744 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
6745 IRB.CreateMemCpy(
6746 ArgOriginBase,
6747 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
6748 AOriginPtr,
6749 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
6750 }
6751 }
6752 } else {
6753 // Any other parameters mean we need bit-grained tracking of uninit
6754 // data
6755 Size = DL.getTypeAllocSize(A->getType());
6756 if (ArgOffset + Size > kParamTLSSize)
6757 break;
6758 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
6760 Constant *Cst = dyn_cast<Constant>(ArgShadow);
6761 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
6762 IRB.CreateStore(getOrigin(A),
6763 getOriginPtrForArgument(IRB, ArgOffset));
6764 }
6765 }
6766 assert(Store != nullptr);
6767 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
6768 }
6769 assert(Size != 0);
6770 ArgOffset += alignTo(Size, kShadowTLSAlignment);
6771 }
6772 LLVM_DEBUG(dbgs() << " done with call args\n");
6773
6774 FunctionType *FT = CB.getFunctionType();
6775 if (FT->isVarArg()) {
6776 VAHelper->visitCallBase(CB, IRB);
6777 }
6778
6779 // Now, get the shadow for the RetVal.
6780 if (!CB.getType()->isSized())
6781 return;
6782 // Don't emit the epilogue for musttail call returns.
6783 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
6784 return;
6785
6786 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
6787 setShadow(&CB, getCleanShadow(&CB));
6788 setOrigin(&CB, getCleanOrigin());
6789 return;
6790 }
6791
6792 IRBuilder<> IRBBefore(&CB);
6793 // Until we have full dynamic coverage, make sure the retval shadow is 0.
6794 Value *Base = getShadowPtrForRetval(IRBBefore);
6795 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
6797 BasicBlock::iterator NextInsn;
6798 if (isa<CallInst>(CB)) {
6799 NextInsn = ++CB.getIterator();
6800 assert(NextInsn != CB.getParent()->end());
6801 } else {
6802 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
6803 if (!NormalDest->getSinglePredecessor()) {
6804 // FIXME: this case is tricky, so we are just conservative here.
6805 // Perhaps we need to split the edge between this BB and NormalDest,
6806 // but a naive attempt to use SplitEdge leads to a crash.
6807 setShadow(&CB, getCleanShadow(&CB));
6808 setOrigin(&CB, getCleanOrigin());
6809 return;
6810 }
6811 // FIXME: NextInsn is likely in a basic block that has not been visited
6812 // yet. Anything inserted there will be instrumented by MSan later!
6813 NextInsn = NormalDest->getFirstInsertionPt();
6814 assert(NextInsn != NormalDest->end() &&
6815 "Could not find insertion point for retval shadow load");
6816 }
6817 IRBuilder<> IRBAfter(&*NextInsn);
6818 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
6819 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
6820 "_msret");
6821 setShadow(&CB, RetvalShadow);
6822 if (MS.TrackOrigins)
6823 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
6824 }
6825
6826 bool isAMustTailRetVal(Value *RetVal) {
6827 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
6828 RetVal = I->getOperand(0);
6829 }
6830 if (auto *I = dyn_cast<CallInst>(RetVal)) {
6831 return I->isMustTailCall();
6832 }
6833 return false;
6834 }
6835
6836 void visitReturnInst(ReturnInst &I) {
6837 IRBuilder<> IRB(&I);
6838 Value *RetVal = I.getReturnValue();
6839 if (!RetVal)
6840 return;
6841 // Don't emit the epilogue for musttail call returns.
6842 if (isAMustTailRetVal(RetVal))
6843 return;
6844 Value *ShadowPtr = getShadowPtrForRetval(IRB);
6845 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
6846 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
6847 // FIXME: Consider using SpecialCaseList to specify a list of functions that
6848 // must always return fully initialized values. For now, we hardcode "main".
6849 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
6850
6851 Value *Shadow = getShadow(RetVal);
6852 bool StoreOrigin = true;
6853 if (EagerCheck) {
6854 insertCheckShadowOf(RetVal, &I);
6855 Shadow = getCleanShadow(RetVal);
6856 StoreOrigin = false;
6857 }
6858
6859 // The caller may still expect information passed over TLS if we pass our
6860 // check
6861 if (StoreShadow) {
6862 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
6863 if (MS.TrackOrigins && StoreOrigin)
6864 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
6865 }
6866 }
6867
6868 void visitPHINode(PHINode &I) {
6869 IRBuilder<> IRB(&I);
6870 if (!PropagateShadow) {
6871 setShadow(&I, getCleanShadow(&I));
6872 setOrigin(&I, getCleanOrigin());
6873 return;
6874 }
6875
6876 ShadowPHINodes.push_back(&I);
6877 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
6878 "_msphi_s"));
6879 if (MS.TrackOrigins)
6880 setOrigin(
6881 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
6882 }
6883
6884 Value *getLocalVarIdptr(AllocaInst &I) {
6885 ConstantInt *IntConst =
6886 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
6887 return new GlobalVariable(*F.getParent(), IntConst->getType(),
6888 /*isConstant=*/false, GlobalValue::PrivateLinkage,
6889 IntConst);
6890 }
6891
6892 Value *getLocalVarDescription(AllocaInst &I) {
6893 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
6894 }
6895
6896 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6897 if (PoisonStack && ClPoisonStackWithCall) {
6898 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
6899 } else {
6900 Value *ShadowBase, *OriginBase;
6901 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
6902 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
6903
6904 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
6905 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
6906 }
6907
6908 if (PoisonStack && MS.TrackOrigins) {
6909 Value *Idptr = getLocalVarIdptr(I);
6910 if (ClPrintStackNames) {
6911 Value *Descr = getLocalVarDescription(I);
6912 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
6913 {&I, Len, Idptr, Descr});
6914 } else {
6915 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
6916 }
6917 }
6918 }
6919
6920 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6921 Value *Descr = getLocalVarDescription(I);
6922 if (PoisonStack) {
6923 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
6924 } else {
6925 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
6926 }
6927 }
6928
6929 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
6930 if (!InsPoint)
6931 InsPoint = &I;
6932 NextNodeIRBuilder IRB(InsPoint);
6933 const DataLayout &DL = F.getDataLayout();
6934 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
6935 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
6936 if (I.isArrayAllocation())
6937 Len = IRB.CreateMul(Len,
6938 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
6939
6940 if (MS.CompileKernel)
6941 poisonAllocaKmsan(I, IRB, Len);
6942 else
6943 poisonAllocaUserspace(I, IRB, Len);
6944 }
6945
6946 void visitAllocaInst(AllocaInst &I) {
6947 setShadow(&I, getCleanShadow(&I));
6948 setOrigin(&I, getCleanOrigin());
6949 // We'll get to this alloca later unless it's poisoned at the corresponding
6950 // llvm.lifetime.start.
6951 AllocaSet.insert(&I);
6952 }
6953
6954 void visitSelectInst(SelectInst &I) {
6955 // a = select b, c, d
6956 Value *B = I.getCondition();
6957 Value *C = I.getTrueValue();
6958 Value *D = I.getFalseValue();
6959
6960 handleSelectLikeInst(I, B, C, D);
6961 }
6962
6963 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
6964 IRBuilder<> IRB(&I);
6965
6966 Value *Sb = getShadow(B);
6967 Value *Sc = getShadow(C);
6968 Value *Sd = getShadow(D);
6969
6970 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
6971 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
6972 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
6973
6974 // Result shadow if condition shadow is 0.
6975 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
6976 Value *Sa1;
6977 if (I.getType()->isAggregateType()) {
6978 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
6979 // an extra "select". This results in much more compact IR.
6980 // Sa = select Sb, poisoned, (select b, Sc, Sd)
6981 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
6982 } else {
6983 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
6984 // If Sb (condition is poisoned), look for bits in c and d that are equal
6985 // and both unpoisoned.
6986 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
6987
6988 // Cast arguments to shadow-compatible type.
6989 C = CreateAppToShadowCast(IRB, C);
6990 D = CreateAppToShadowCast(IRB, D);
6991
6992 // Result shadow if condition shadow is 1.
6993 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
6994 }
6995 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
6996 setShadow(&I, Sa);
6997 if (MS.TrackOrigins) {
6998 // Origins are always i32, so any vector conditions must be flattened.
6999 // FIXME: consider tracking vector origins for app vectors?
7000 if (B->getType()->isVectorTy()) {
7001 B = convertToBool(B, IRB);
7002 Sb = convertToBool(Sb, IRB);
7003 }
7004 // a = select b, c, d
7005 // Oa = Sb ? Ob : (b ? Oc : Od)
7006 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7007 }
7008 }
7009
7010 void visitLandingPadInst(LandingPadInst &I) {
7011 // Do nothing.
7012 // See https://github.com/google/sanitizers/issues/504
7013 setShadow(&I, getCleanShadow(&I));
7014 setOrigin(&I, getCleanOrigin());
7015 }
7016
7017 void visitCatchSwitchInst(CatchSwitchInst &I) {
7018 setShadow(&I, getCleanShadow(&I));
7019 setOrigin(&I, getCleanOrigin());
7020 }
7021
7022 void visitFuncletPadInst(FuncletPadInst &I) {
7023 setShadow(&I, getCleanShadow(&I));
7024 setOrigin(&I, getCleanOrigin());
7025 }
7026
7027 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7028
7029 void visitExtractValueInst(ExtractValueInst &I) {
7030 IRBuilder<> IRB(&I);
7031 Value *Agg = I.getAggregateOperand();
7032 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7033 Value *AggShadow = getShadow(Agg);
7034 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7035 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7036 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7037 setShadow(&I, ResShadow);
7038 setOriginForNaryOp(I);
7039 }
7040
7041 void visitInsertValueInst(InsertValueInst &I) {
7042 IRBuilder<> IRB(&I);
7043 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7044 Value *AggShadow = getShadow(I.getAggregateOperand());
7045 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7046 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7047 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7048 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7049 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7050 setShadow(&I, Res);
7051 setOriginForNaryOp(I);
7052 }
7053
7054 void dumpInst(Instruction &I) {
7055 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7056 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7057 } else {
7058 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7059 }
7060 errs() << "QQQ " << I << "\n";
7061 }
7062
7063 void visitResumeInst(ResumeInst &I) {
7064 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7065 // Nothing to do here.
7066 }
7067
7068 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7069 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7070 // Nothing to do here.
7071 }
7072
7073 void visitCatchReturnInst(CatchReturnInst &CRI) {
7074 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7075 // Nothing to do here.
7076 }
7077
7078 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7079 IRBuilder<> &IRB, const DataLayout &DL,
7080 bool isOutput) {
7081 // For each assembly argument, we check its value for being initialized.
7082 // If the argument is a pointer, we assume it points to a single element
7083 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7084 // Each such pointer is instrumented with a call to the runtime library.
7085 Type *OpType = Operand->getType();
7086 // Check the operand value itself.
7087 insertCheckShadowOf(Operand, &I);
7088 if (!OpType->isPointerTy() || !isOutput) {
7089 assert(!isOutput);
7090 return;
7091 }
7092 if (!ElemTy->isSized())
7093 return;
7094 auto Size = DL.getTypeStoreSize(ElemTy);
7095 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7096 if (MS.CompileKernel) {
7097 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7098 } else {
7099 // ElemTy, derived from elementtype(), does not encode the alignment of
7100 // the pointer. Conservatively assume that the shadow memory is unaligned.
7101 // When Size is large, avoid StoreInst as it would expand to many
7102 // instructions.
7103 auto [ShadowPtr, _] =
7104 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7105 if (Size <= 32)
7106 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7107 else
7108 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7109 SizeVal, Align(1));
7110 }
7111 }
7112
7113 /// Get the number of output arguments returned by pointers.
7114 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7115 int NumRetOutputs = 0;
7116 int NumOutputs = 0;
7117 Type *RetTy = cast<Value>(CB)->getType();
7118 if (!RetTy->isVoidTy()) {
7119 // Register outputs are returned via the CallInst return value.
7120 auto *ST = dyn_cast<StructType>(RetTy);
7121 if (ST)
7122 NumRetOutputs = ST->getNumElements();
7123 else
7124 NumRetOutputs = 1;
7125 }
7126 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7127 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7128 switch (Info.Type) {
7130 NumOutputs++;
7131 break;
7132 default:
7133 break;
7134 }
7135 }
7136 return NumOutputs - NumRetOutputs;
7137 }
7138
7139 void visitAsmInstruction(Instruction &I) {
7140 // Conservative inline assembly handling: check for poisoned shadow of
7141 // asm() arguments, then unpoison the result and all the memory locations
7142 // pointed to by those arguments.
7143 // An inline asm() statement in C++ contains lists of input and output
7144 // arguments used by the assembly code. These are mapped to operands of the
7145 // CallInst as follows:
7146 // - nR register outputs ("=r) are returned by value in a single structure
7147 // (SSA value of the CallInst);
7148 // - nO other outputs ("=m" and others) are returned by pointer as first
7149 // nO operands of the CallInst;
7150 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7151 // remaining nI operands.
7152 // The total number of asm() arguments in the source is nR+nO+nI, and the
7153 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7154 // function to be called).
7155 const DataLayout &DL = F.getDataLayout();
7156 CallBase *CB = cast<CallBase>(&I);
7157 IRBuilder<> IRB(&I);
7158 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7159 int OutputArgs = getNumOutputArgs(IA, CB);
7160 // The last operand of a CallInst is the function itself.
7161 int NumOperands = CB->getNumOperands() - 1;
7162
7163 // Check input arguments. Doing so before unpoisoning output arguments, so
7164 // that we won't overwrite uninit values before checking them.
7165 for (int i = OutputArgs; i < NumOperands; i++) {
7166 Value *Operand = CB->getOperand(i);
7167 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7168 /*isOutput*/ false);
7169 }
7170 // Unpoison output arguments. This must happen before the actual InlineAsm
7171 // call, so that the shadow for memory published in the asm() statement
7172 // remains valid.
7173 for (int i = 0; i < OutputArgs; i++) {
7174 Value *Operand = CB->getOperand(i);
7175 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7176 /*isOutput*/ true);
7177 }
7178
7179 setShadow(&I, getCleanShadow(&I));
7180 setOrigin(&I, getCleanOrigin());
7181 }
7182
7183 void visitFreezeInst(FreezeInst &I) {
7184 // Freeze always returns a fully defined value.
7185 setShadow(&I, getCleanShadow(&I));
7186 setOrigin(&I, getCleanOrigin());
7187 }
7188
7189 void visitInstruction(Instruction &I) {
7190 // Everything else: stop propagating and check for poisoned shadow.
7192 dumpInst(I);
7193 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7194 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7195 Value *Operand = I.getOperand(i);
7196 if (Operand->getType()->isSized())
7197 insertCheckShadowOf(Operand, &I);
7198 }
7199 setShadow(&I, getCleanShadow(&I));
7200 setOrigin(&I, getCleanOrigin());
7201 }
7202};
7203
7204struct VarArgHelperBase : public VarArgHelper {
7205 Function &F;
7206 MemorySanitizer &MS;
7207 MemorySanitizerVisitor &MSV;
7208 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7209 const unsigned VAListTagSize;
7210
7211 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7212 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7213 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7214
7215 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7216 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7217 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7218 }
7219
7220 /// Compute the shadow address for a given va_arg.
7221 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7222 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7223 Base = IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7224 return IRB.CreateIntToPtr(Base, MS.PtrTy, "_msarg_va_s");
7225 }
7226
7227 /// Compute the shadow address for a given va_arg.
7228 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7229 unsigned ArgSize) {
7230 // Make sure we don't overflow __msan_va_arg_tls.
7231 if (ArgOffset + ArgSize > kParamTLSSize)
7232 return nullptr;
7233 return getShadowPtrForVAArgument(IRB, ArgOffset);
7234 }
7235
7236 /// Compute the origin address for a given va_arg.
7237 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7238 Value *Base = IRB.CreatePointerCast(MS.VAArgOriginTLS, MS.IntptrTy);
7239 // getOriginPtrForVAArgument() is always called after
7240 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7241 // overflow.
7242 Base = IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7243 return IRB.CreateIntToPtr(Base, MS.PtrTy, "_msarg_va_o");
7244 }
7245
7246 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7247 unsigned BaseOffset) {
7248 // The tails of __msan_va_arg_tls is not large enough to fit full
7249 // value shadow, but it will be copied to backup anyway. Make it
7250 // clean.
7251 if (BaseOffset >= kParamTLSSize)
7252 return;
7253 Value *TailSize =
7254 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7255 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7256 TailSize, Align(8));
7257 }
7258
7259 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7260 IRBuilder<> IRB(&I);
7261 Value *VAListTag = I.getArgOperand(0);
7262 const Align Alignment = Align(8);
7263 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7264 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7265 // Unpoison the whole __va_list_tag.
7266 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7267 VAListTagSize, Alignment, false);
7268 }
7269
7270 void visitVAStartInst(VAStartInst &I) override {
7271 if (F.getCallingConv() == CallingConv::Win64)
7272 return;
7273 VAStartInstrumentationList.push_back(&I);
7274 unpoisonVAListTagForInst(I);
7275 }
7276
7277 void visitVACopyInst(VACopyInst &I) override {
7278 if (F.getCallingConv() == CallingConv::Win64)
7279 return;
7280 unpoisonVAListTagForInst(I);
7281 }
7282};
7283
7284/// AMD64-specific implementation of VarArgHelper.
7285struct VarArgAMD64Helper : public VarArgHelperBase {
7286 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7287 // See a comment in visitCallBase for more details.
7288 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7289 static const unsigned AMD64FpEndOffsetSSE = 176;
7290 // If SSE is disabled, fp_offset in va_list is zero.
7291 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7292
7293 unsigned AMD64FpEndOffset;
7294 AllocaInst *VAArgTLSCopy = nullptr;
7295 AllocaInst *VAArgTLSOriginCopy = nullptr;
7296 Value *VAArgOverflowSize = nullptr;
7297
7298 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7299
7300 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7301 MemorySanitizerVisitor &MSV)
7302 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7303 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7304 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7305 if (Attr.isStringAttribute() &&
7306 (Attr.getKindAsString() == "target-features")) {
7307 if (Attr.getValueAsString().contains("-sse"))
7308 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7309 break;
7310 }
7311 }
7312 }
7313
7314 ArgKind classifyArgument(Value *arg) {
7315 // A very rough approximation of X86_64 argument classification rules.
7316 Type *T = arg->getType();
7317 if (T->isX86_FP80Ty())
7318 return AK_Memory;
7319 if (T->isFPOrFPVectorTy())
7320 return AK_FloatingPoint;
7321 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7322 return AK_GeneralPurpose;
7323 if (T->isPointerTy())
7324 return AK_GeneralPurpose;
7325 return AK_Memory;
7326 }
7327
7328 // For VarArg functions, store the argument shadow in an ABI-specific format
7329 // that corresponds to va_list layout.
7330 // We do this because Clang lowers va_arg in the frontend, and this pass
7331 // only sees the low level code that deals with va_list internals.
7332 // A much easier alternative (provided that Clang emits va_arg instructions)
7333 // would have been to associate each live instance of va_list with a copy of
7334 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7335 // order.
7336 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7337 unsigned GpOffset = 0;
7338 unsigned FpOffset = AMD64GpEndOffset;
7339 unsigned OverflowOffset = AMD64FpEndOffset;
7340 const DataLayout &DL = F.getDataLayout();
7341
7342 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7343 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7344 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7345 if (IsByVal) {
7346 // ByVal arguments always go to the overflow area.
7347 // Fixed arguments passed through the overflow area will be stepped
7348 // over by va_start, so don't count them towards the offset.
7349 if (IsFixed)
7350 continue;
7351 assert(A->getType()->isPointerTy());
7352 Type *RealTy = CB.getParamByValType(ArgNo);
7353 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7354 uint64_t AlignedSize = alignTo(ArgSize, 8);
7355 unsigned BaseOffset = OverflowOffset;
7356 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7357 Value *OriginBase = nullptr;
7358 if (MS.TrackOrigins)
7359 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7360 OverflowOffset += AlignedSize;
7361
7362 if (OverflowOffset > kParamTLSSize) {
7363 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7364 continue; // We have no space to copy shadow there.
7365 }
7366
7367 Value *ShadowPtr, *OriginPtr;
7368 std::tie(ShadowPtr, OriginPtr) =
7369 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7370 /*isStore*/ false);
7371 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7372 kShadowTLSAlignment, ArgSize);
7373 if (MS.TrackOrigins)
7374 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7375 kShadowTLSAlignment, ArgSize);
7376 } else {
7377 ArgKind AK = classifyArgument(A);
7378 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7379 AK = AK_Memory;
7380 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7381 AK = AK_Memory;
7382 Value *ShadowBase, *OriginBase = nullptr;
7383 switch (AK) {
7384 case AK_GeneralPurpose:
7385 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7386 if (MS.TrackOrigins)
7387 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7388 GpOffset += 8;
7389 assert(GpOffset <= kParamTLSSize);
7390 break;
7391 case AK_FloatingPoint:
7392 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7393 if (MS.TrackOrigins)
7394 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7395 FpOffset += 16;
7396 assert(FpOffset <= kParamTLSSize);
7397 break;
7398 case AK_Memory:
7399 if (IsFixed)
7400 continue;
7401 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7402 uint64_t AlignedSize = alignTo(ArgSize, 8);
7403 unsigned BaseOffset = OverflowOffset;
7404 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7405 if (MS.TrackOrigins) {
7406 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7407 }
7408 OverflowOffset += AlignedSize;
7409 if (OverflowOffset > kParamTLSSize) {
7410 // We have no space to copy shadow there.
7411 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7412 continue;
7413 }
7414 }
7415 // Take fixed arguments into account for GpOffset and FpOffset,
7416 // but don't actually store shadows for them.
7417 // TODO(glider): don't call get*PtrForVAArgument() for them.
7418 if (IsFixed)
7419 continue;
7420 Value *Shadow = MSV.getShadow(A);
7421 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7422 if (MS.TrackOrigins) {
7423 Value *Origin = MSV.getOrigin(A);
7424 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7425 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7427 }
7428 }
7429 }
7430 Constant *OverflowSize =
7431 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7432 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7433 }
7434
7435 void finalizeInstrumentation() override {
7436 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7437 "finalizeInstrumentation called twice");
7438 if (!VAStartInstrumentationList.empty()) {
7439 // If there is a va_start in this function, make a backup copy of
7440 // va_arg_tls somewhere in the function entry block.
7441 IRBuilder<> IRB(MSV.FnPrologueEnd);
7442 VAArgOverflowSize =
7443 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7444 Value *CopySize = IRB.CreateAdd(
7445 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7446 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7447 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7448 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7449 CopySize, kShadowTLSAlignment, false);
7450
7451 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7452 Intrinsic::umin, CopySize,
7453 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7454 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7455 kShadowTLSAlignment, SrcSize);
7456 if (MS.TrackOrigins) {
7457 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7458 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7459 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7460 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7461 }
7462 }
7463
7464 // Instrument va_start.
7465 // Copy va_list shadow from the backup copy of the TLS contents.
7466 for (CallInst *OrigInst : VAStartInstrumentationList) {
7467 NextNodeIRBuilder IRB(OrigInst);
7468 Value *VAListTag = OrigInst->getArgOperand(0);
7469
7470 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
7471 IRB.CreateAdd(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
7472 ConstantInt::get(MS.IntptrTy, 16)),
7473 MS.PtrTy);
7474 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7475 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7476 const Align Alignment = Align(16);
7477 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7478 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7479 Alignment, /*isStore*/ true);
7480 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7481 AMD64FpEndOffset);
7482 if (MS.TrackOrigins)
7483 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7484 Alignment, AMD64FpEndOffset);
7485 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
7486 IRB.CreateAdd(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
7487 ConstantInt::get(MS.IntptrTy, 8)),
7488 MS.PtrTy);
7489 Value *OverflowArgAreaPtr =
7490 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7491 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7492 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7493 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7494 Alignment, /*isStore*/ true);
7495 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7496 AMD64FpEndOffset);
7497 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7498 VAArgOverflowSize);
7499 if (MS.TrackOrigins) {
7500 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7501 AMD64FpEndOffset);
7502 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7503 VAArgOverflowSize);
7504 }
7505 }
7506 }
7507};
7508
7509/// AArch64-specific implementation of VarArgHelper.
7510struct VarArgAArch64Helper : public VarArgHelperBase {
7511 static const unsigned kAArch64GrArgSize = 64;
7512 static const unsigned kAArch64VrArgSize = 128;
7513
7514 static const unsigned AArch64GrBegOffset = 0;
7515 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7516 // Make VR space aligned to 16 bytes.
7517 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7518 static const unsigned AArch64VrEndOffset =
7519 AArch64VrBegOffset + kAArch64VrArgSize;
7520 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7521
7522 AllocaInst *VAArgTLSCopy = nullptr;
7523 Value *VAArgOverflowSize = nullptr;
7524
7525 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7526
7527 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7528 MemorySanitizerVisitor &MSV)
7529 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7530
7531 // A very rough approximation of aarch64 argument classification rules.
7532 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7533 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7534 return {AK_GeneralPurpose, 1};
7535 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7536 return {AK_FloatingPoint, 1};
7537
7538 if (T->isArrayTy()) {
7539 auto R = classifyArgument(T->getArrayElementType());
7540 R.second *= T->getScalarType()->getArrayNumElements();
7541 return R;
7542 }
7543
7544 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7545 auto R = classifyArgument(FV->getScalarType());
7546 R.second *= FV->getNumElements();
7547 return R;
7548 }
7549
7550 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7551 return {AK_Memory, 0};
7552 }
7553
7554 // The instrumentation stores the argument shadow in a non ABI-specific
7555 // format because it does not know which argument is named (since Clang,
7556 // like x86_64 case, lowers the va_args in the frontend and this pass only
7557 // sees the low level code that deals with va_list internals).
7558 // The first seven GR registers are saved in the first 56 bytes of the
7559 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7560 // the remaining arguments.
7561 // Using constant offset within the va_arg TLS array allows fast copy
7562 // in the finalize instrumentation.
7563 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7564 unsigned GrOffset = AArch64GrBegOffset;
7565 unsigned VrOffset = AArch64VrBegOffset;
7566 unsigned OverflowOffset = AArch64VAEndOffset;
7567
7568 const DataLayout &DL = F.getDataLayout();
7569 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7570 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7571 auto [AK, RegNum] = classifyArgument(A->getType());
7572 if (AK == AK_GeneralPurpose &&
7573 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7574 AK = AK_Memory;
7575 if (AK == AK_FloatingPoint &&
7576 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7577 AK = AK_Memory;
7578 Value *Base;
7579 switch (AK) {
7580 case AK_GeneralPurpose:
7581 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7582 GrOffset += 8 * RegNum;
7583 break;
7584 case AK_FloatingPoint:
7585 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7586 VrOffset += 16 * RegNum;
7587 break;
7588 case AK_Memory:
7589 // Don't count fixed arguments in the overflow area - va_start will
7590 // skip right over them.
7591 if (IsFixed)
7592 continue;
7593 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7594 uint64_t AlignedSize = alignTo(ArgSize, 8);
7595 unsigned BaseOffset = OverflowOffset;
7596 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7597 OverflowOffset += AlignedSize;
7598 if (OverflowOffset > kParamTLSSize) {
7599 // We have no space to copy shadow there.
7600 CleanUnusedTLS(IRB, Base, BaseOffset);
7601 continue;
7602 }
7603 break;
7604 }
7605 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7606 // bother to actually store a shadow.
7607 if (IsFixed)
7608 continue;
7609 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7610 }
7611 Constant *OverflowSize =
7612 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7613 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7614 }
7615
7616 // Retrieve a va_list field of 'void*' size.
7617 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7618 Value *SaveAreaPtrPtr = IRB.CreateIntToPtr(
7619 IRB.CreateAdd(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
7620 ConstantInt::get(MS.IntptrTy, offset)),
7621 MS.PtrTy);
7622 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7623 }
7624
7625 // Retrieve a va_list field of 'int' size.
7626 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7627 Value *SaveAreaPtr = IRB.CreateIntToPtr(
7628 IRB.CreateAdd(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
7629 ConstantInt::get(MS.IntptrTy, offset)),
7630 MS.PtrTy);
7631 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7632 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7633 }
7634
7635 void finalizeInstrumentation() override {
7636 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7637 "finalizeInstrumentation called twice");
7638 if (!VAStartInstrumentationList.empty()) {
7639 // If there is a va_start in this function, make a backup copy of
7640 // va_arg_tls somewhere in the function entry block.
7641 IRBuilder<> IRB(MSV.FnPrologueEnd);
7642 VAArgOverflowSize =
7643 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7644 Value *CopySize = IRB.CreateAdd(
7645 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7646 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7647 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7648 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7649 CopySize, kShadowTLSAlignment, false);
7650
7651 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7652 Intrinsic::umin, CopySize,
7653 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7654 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7655 kShadowTLSAlignment, SrcSize);
7656 }
7657
7658 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7659 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7660
7661 // Instrument va_start, copy va_list shadow from the backup copy of
7662 // the TLS contents.
7663 for (CallInst *OrigInst : VAStartInstrumentationList) {
7664 NextNodeIRBuilder IRB(OrigInst);
7665
7666 Value *VAListTag = OrigInst->getArgOperand(0);
7667
7668 // The variadic ABI for AArch64 creates two areas to save the incoming
7669 // argument registers (one for 64-bit general register xn-x7 and another
7670 // for 128-bit FP/SIMD vn-v7).
7671 // We need then to propagate the shadow arguments on both regions
7672 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7673 // The remaining arguments are saved on shadow for 'va::stack'.
7674 // One caveat is it requires only to propagate the non-named arguments,
7675 // however on the call site instrumentation 'all' the arguments are
7676 // saved. So to copy the shadow values from the va_arg TLS array
7677 // we need to adjust the offset for both GR and VR fields based on
7678 // the __{gr,vr}_offs value (since they are stores based on incoming
7679 // named arguments).
7680 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7681
7682 // Read the stack pointer from the va_list.
7683 Value *StackSaveAreaPtr =
7684 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7685
7686 // Read both the __gr_top and __gr_off and add them up.
7687 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7688 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7689
7690 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7691 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7692
7693 // Read both the __vr_top and __vr_off and add them up.
7694 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7695 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7696
7697 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7698 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7699
7700 // It does not know how many named arguments is being used and, on the
7701 // callsite all the arguments were saved. Since __gr_off is defined as
7702 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7703 // argument by ignoring the bytes of shadow from named arguments.
7704 Value *GrRegSaveAreaShadowPtrOff =
7705 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7706
7707 Value *GrRegSaveAreaShadowPtr =
7708 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7709 Align(8), /*isStore*/ true)
7710 .first;
7711
7712 Value *GrSrcPtr =
7713 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
7714 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
7715
7716 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
7717 GrCopySize);
7718
7719 // Again, but for FP/SIMD values.
7720 Value *VrRegSaveAreaShadowPtrOff =
7721 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
7722
7723 Value *VrRegSaveAreaShadowPtr =
7724 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7725 Align(8), /*isStore*/ true)
7726 .first;
7727
7728 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
7729 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
7730 IRB.getInt32(AArch64VrBegOffset)),
7731 VrRegSaveAreaShadowPtrOff);
7732 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
7733
7734 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
7735 VrCopySize);
7736
7737 // And finally for remaining arguments.
7738 Value *StackSaveAreaShadowPtr =
7739 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
7740 Align(16), /*isStore*/ true)
7741 .first;
7742
7743 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
7744 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
7745
7746 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
7747 Align(16), VAArgOverflowSize);
7748 }
7749 }
7750};
7751
7752/// PowerPC64-specific implementation of VarArgHelper.
7753struct VarArgPowerPC64Helper : public VarArgHelperBase {
7754 AllocaInst *VAArgTLSCopy = nullptr;
7755 Value *VAArgSize = nullptr;
7756
7757 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
7758 MemorySanitizerVisitor &MSV)
7759 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
7760
7761 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7762 // For PowerPC, we need to deal with alignment of stack arguments -
7763 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
7764 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
7765 // For that reason, we compute current offset from stack pointer (which is
7766 // always properly aligned), and offset for the first vararg, then subtract
7767 // them.
7768 unsigned VAArgBase;
7769 Triple TargetTriple(F.getParent()->getTargetTriple());
7770 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
7771 // and 32 bytes for ABIv2. This is usually determined by target
7772 // endianness, but in theory could be overridden by function attribute.
7773 if (TargetTriple.isPPC64ELFv2ABI())
7774 VAArgBase = 32;
7775 else
7776 VAArgBase = 48;
7777 unsigned VAArgOffset = VAArgBase;
7778 const DataLayout &DL = F.getDataLayout();
7779 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7780 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7781 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7782 if (IsByVal) {
7783 assert(A->getType()->isPointerTy());
7784 Type *RealTy = CB.getParamByValType(ArgNo);
7785 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7786 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
7787 if (ArgAlign < 8)
7788 ArgAlign = Align(8);
7789 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7790 if (!IsFixed) {
7791 Value *Base =
7792 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7793 if (Base) {
7794 Value *AShadowPtr, *AOriginPtr;
7795 std::tie(AShadowPtr, AOriginPtr) =
7796 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7797 kShadowTLSAlignment, /*isStore*/ false);
7798
7799 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7800 kShadowTLSAlignment, ArgSize);
7801 }
7802 }
7803 VAArgOffset += alignTo(ArgSize, Align(8));
7804 } else {
7805 Value *Base;
7806 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7807 Align ArgAlign = Align(8);
7808 if (A->getType()->isArrayTy()) {
7809 // Arrays are aligned to element size, except for long double
7810 // arrays, which are aligned to 8 bytes.
7811 Type *ElementTy = A->getType()->getArrayElementType();
7812 if (!ElementTy->isPPC_FP128Ty())
7813 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7814 } else if (A->getType()->isVectorTy()) {
7815 // Vectors are naturally aligned.
7816 ArgAlign = Align(ArgSize);
7817 }
7818 if (ArgAlign < 8)
7819 ArgAlign = Align(8);
7820 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7821 if (DL.isBigEndian()) {
7822 // Adjusting the shadow for argument with size < 8 to match the
7823 // placement of bits in big endian system
7824 if (ArgSize < 8)
7825 VAArgOffset += (8 - ArgSize);
7826 }
7827 if (!IsFixed) {
7828 Base =
7829 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7830 if (Base)
7831 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7832 }
7833 VAArgOffset += ArgSize;
7834 VAArgOffset = alignTo(VAArgOffset, Align(8));
7835 }
7836 if (IsFixed)
7837 VAArgBase = VAArgOffset;
7838 }
7839
7840 Constant *TotalVAArgSize =
7841 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7842 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7843 // a new class member i.e. it is the total size of all VarArgs.
7844 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7845 }
7846
7847 void finalizeInstrumentation() override {
7848 assert(!VAArgSize && !VAArgTLSCopy &&
7849 "finalizeInstrumentation called twice");
7850 IRBuilder<> IRB(MSV.FnPrologueEnd);
7851 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7852 Value *CopySize = VAArgSize;
7853
7854 if (!VAStartInstrumentationList.empty()) {
7855 // If there is a va_start in this function, make a backup copy of
7856 // va_arg_tls somewhere in the function entry block.
7857
7858 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7859 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7860 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7861 CopySize, kShadowTLSAlignment, false);
7862
7863 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7864 Intrinsic::umin, CopySize,
7865 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
7866 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7867 kShadowTLSAlignment, SrcSize);
7868 }
7869
7870 // Instrument va_start.
7871 // Copy va_list shadow from the backup copy of the TLS contents.
7872 for (CallInst *OrigInst : VAStartInstrumentationList) {
7873 NextNodeIRBuilder IRB(OrigInst);
7874 Value *VAListTag = OrigInst->getArgOperand(0);
7875 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
7876
7877 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
7878
7879 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7880 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7881 const DataLayout &DL = F.getDataLayout();
7882 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7883 const Align Alignment = Align(IntptrSize);
7884 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7885 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7886 Alignment, /*isStore*/ true);
7887 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7888 CopySize);
7889 }
7890 }
7891};
7892
7893/// PowerPC32-specific implementation of VarArgHelper.
7894struct VarArgPowerPC32Helper : public VarArgHelperBase {
7895 AllocaInst *VAArgTLSCopy = nullptr;
7896 Value *VAArgSize = nullptr;
7897
7898 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
7899 MemorySanitizerVisitor &MSV)
7900 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
7901
7902 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7903 unsigned VAArgBase;
7904 // Parameter save area is 8 bytes from frame pointer in PPC32
7905 VAArgBase = 8;
7906 unsigned VAArgOffset = VAArgBase;
7907 const DataLayout &DL = F.getDataLayout();
7908 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7909 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7910 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7911 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7912 if (IsByVal) {
7913 assert(A->getType()->isPointerTy());
7914 Type *RealTy = CB.getParamByValType(ArgNo);
7915 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7916 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
7917 if (ArgAlign < IntptrSize)
7918 ArgAlign = Align(IntptrSize);
7919 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7920 if (!IsFixed) {
7921 Value *Base =
7922 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7923 if (Base) {
7924 Value *AShadowPtr, *AOriginPtr;
7925 std::tie(AShadowPtr, AOriginPtr) =
7926 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7927 kShadowTLSAlignment, /*isStore*/ false);
7928
7929 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7930 kShadowTLSAlignment, ArgSize);
7931 }
7932 }
7933 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
7934 } else {
7935 Value *Base;
7936 Type *ArgTy = A->getType();
7937
7938 // On PPC 32 floating point variable arguments are stored in separate
7939 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
7940 // them as they will be found when checking call arguments.
7941 if (!ArgTy->isFloatingPointTy()) {
7942 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
7943 Align ArgAlign = Align(IntptrSize);
7944 if (ArgTy->isArrayTy()) {
7945 // Arrays are aligned to element size, except for long double
7946 // arrays, which are aligned to 8 bytes.
7947 Type *ElementTy = ArgTy->getArrayElementType();
7948 if (!ElementTy->isPPC_FP128Ty())
7949 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7950 } else if (ArgTy->isVectorTy()) {
7951 // Vectors are naturally aligned.
7952 ArgAlign = Align(ArgSize);
7953 }
7954 if (ArgAlign < IntptrSize)
7955 ArgAlign = Align(IntptrSize);
7956 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7957 if (DL.isBigEndian()) {
7958 // Adjusting the shadow for argument with size < IntptrSize to match
7959 // the placement of bits in big endian system
7960 if (ArgSize < IntptrSize)
7961 VAArgOffset += (IntptrSize - ArgSize);
7962 }
7963 if (!IsFixed) {
7964 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
7965 ArgSize);
7966 if (Base)
7967 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
7969 }
7970 VAArgOffset += ArgSize;
7971 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
7972 }
7973 }
7974 }
7975
7976 Constant *TotalVAArgSize =
7977 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7978 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7979 // a new class member i.e. it is the total size of all VarArgs.
7980 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7981 }
7982
7983 void finalizeInstrumentation() override {
7984 assert(!VAArgSize && !VAArgTLSCopy &&
7985 "finalizeInstrumentation called twice");
7986 IRBuilder<> IRB(MSV.FnPrologueEnd);
7987 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
7988 Value *CopySize = VAArgSize;
7989
7990 if (!VAStartInstrumentationList.empty()) {
7991 // If there is a va_start in this function, make a backup copy of
7992 // va_arg_tls somewhere in the function entry block.
7993
7994 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7995 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7996 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7997 CopySize, kShadowTLSAlignment, false);
7998
7999 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8000 Intrinsic::umin, CopySize,
8001 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8002 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8003 kShadowTLSAlignment, SrcSize);
8004 }
8005
8006 // Instrument va_start.
8007 // Copy va_list shadow from the backup copy of the TLS contents.
8008 for (CallInst *OrigInst : VAStartInstrumentationList) {
8009 NextNodeIRBuilder IRB(OrigInst);
8010 Value *VAListTag = OrigInst->getArgOperand(0);
8011 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8012 Value *RegSaveAreaSize = CopySize;
8013
8014 // In PPC32 va_list_tag is a struct
8015 RegSaveAreaPtrPtr =
8016 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8017
8018 // On PPC 32 reg_save_area can only hold 32 bytes of data
8019 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8020 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8021
8022 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8023 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8024
8025 const DataLayout &DL = F.getDataLayout();
8026 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8027 const Align Alignment = Align(IntptrSize);
8028
8029 { // Copy reg save area
8030 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8031 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8032 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8033 Alignment, /*isStore*/ true);
8034 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8035 Alignment, RegSaveAreaSize);
8036
8037 RegSaveAreaShadowPtr =
8038 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8039 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8040 ConstantInt::get(MS.IntptrTy, 32));
8041 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8042 // We fill fp shadow with zeroes as uninitialized fp args should have
8043 // been found during call base check
8044 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8045 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8046 }
8047
8048 { // Copy overflow area
8049 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8050 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8051
8052 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8053 OverflowAreaPtrPtr =
8054 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8055 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8056
8057 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8058
8059 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8060 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8061 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8062 Alignment, /*isStore*/ true);
8063
8064 Value *OverflowVAArgTLSCopyPtr =
8065 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8066 OverflowVAArgTLSCopyPtr =
8067 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8068
8069 OverflowVAArgTLSCopyPtr =
8070 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8071 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8072 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8073 }
8074 }
8075 }
8076};
8077
8078/// SystemZ-specific implementation of VarArgHelper.
8079struct VarArgSystemZHelper : public VarArgHelperBase {
8080 static const unsigned SystemZGpOffset = 16;
8081 static const unsigned SystemZGpEndOffset = 56;
8082 static const unsigned SystemZFpOffset = 128;
8083 static const unsigned SystemZFpEndOffset = 160;
8084 static const unsigned SystemZMaxVrArgs = 8;
8085 static const unsigned SystemZRegSaveAreaSize = 160;
8086 static const unsigned SystemZOverflowOffset = 160;
8087 static const unsigned SystemZVAListTagSize = 32;
8088 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8089 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8090
8091 bool IsSoftFloatABI;
8092 AllocaInst *VAArgTLSCopy = nullptr;
8093 AllocaInst *VAArgTLSOriginCopy = nullptr;
8094 Value *VAArgOverflowSize = nullptr;
8095
8096 enum class ArgKind {
8097 GeneralPurpose,
8098 FloatingPoint,
8099 Vector,
8100 Memory,
8101 Indirect,
8102 };
8103
8104 enum class ShadowExtension { None, Zero, Sign };
8105
8106 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8107 MemorySanitizerVisitor &MSV)
8108 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8109 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8110
8111 ArgKind classifyArgument(Type *T) {
8112 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8113 // only a few possibilities of what it can be. In particular, enums, single
8114 // element structs and large types have already been taken care of.
8115
8116 // Some i128 and fp128 arguments are converted to pointers only in the
8117 // back end.
8118 if (T->isIntegerTy(128) || T->isFP128Ty())
8119 return ArgKind::Indirect;
8120 if (T->isFloatingPointTy())
8121 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8122 if (T->isIntegerTy() || T->isPointerTy())
8123 return ArgKind::GeneralPurpose;
8124 if (T->isVectorTy())
8125 return ArgKind::Vector;
8126 return ArgKind::Memory;
8127 }
8128
8129 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8130 // ABI says: "One of the simple integer types no more than 64 bits wide.
8131 // ... If such an argument is shorter than 64 bits, replace it by a full
8132 // 64-bit integer representing the same number, using sign or zero
8133 // extension". Shadow for an integer argument has the same type as the
8134 // argument itself, so it can be sign or zero extended as well.
8135 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8136 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8137 if (ZExt) {
8138 assert(!SExt);
8139 return ShadowExtension::Zero;
8140 }
8141 if (SExt) {
8142 assert(!ZExt);
8143 return ShadowExtension::Sign;
8144 }
8145 return ShadowExtension::None;
8146 }
8147
8148 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8149 unsigned GpOffset = SystemZGpOffset;
8150 unsigned FpOffset = SystemZFpOffset;
8151 unsigned VrIndex = 0;
8152 unsigned OverflowOffset = SystemZOverflowOffset;
8153 const DataLayout &DL = F.getDataLayout();
8154 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8155 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8156 // SystemZABIInfo does not produce ByVal parameters.
8157 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8158 Type *T = A->getType();
8159 ArgKind AK = classifyArgument(T);
8160 if (AK == ArgKind::Indirect) {
8161 T = MS.PtrTy;
8162 AK = ArgKind::GeneralPurpose;
8163 }
8164 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8165 AK = ArgKind::Memory;
8166 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8167 AK = ArgKind::Memory;
8168 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8169 AK = ArgKind::Memory;
8170 Value *ShadowBase = nullptr;
8171 Value *OriginBase = nullptr;
8172 ShadowExtension SE = ShadowExtension::None;
8173 switch (AK) {
8174 case ArgKind::GeneralPurpose: {
8175 // Always keep track of GpOffset, but store shadow only for varargs.
8176 uint64_t ArgSize = 8;
8177 if (GpOffset + ArgSize <= kParamTLSSize) {
8178 if (!IsFixed) {
8179 SE = getShadowExtension(CB, ArgNo);
8180 uint64_t GapSize = 0;
8181 if (SE == ShadowExtension::None) {
8182 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8183 assert(ArgAllocSize <= ArgSize);
8184 GapSize = ArgSize - ArgAllocSize;
8185 }
8186 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8187 if (MS.TrackOrigins)
8188 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8189 }
8190 GpOffset += ArgSize;
8191 } else {
8192 GpOffset = kParamTLSSize;
8193 }
8194 break;
8195 }
8196 case ArgKind::FloatingPoint: {
8197 // Always keep track of FpOffset, but store shadow only for varargs.
8198 uint64_t ArgSize = 8;
8199 if (FpOffset + ArgSize <= kParamTLSSize) {
8200 if (!IsFixed) {
8201 // PoP says: "A short floating-point datum requires only the
8202 // left-most 32 bit positions of a floating-point register".
8203 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8204 // don't extend shadow and don't mind the gap.
8205 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8206 if (MS.TrackOrigins)
8207 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8208 }
8209 FpOffset += ArgSize;
8210 } else {
8211 FpOffset = kParamTLSSize;
8212 }
8213 break;
8214 }
8215 case ArgKind::Vector: {
8216 // Keep track of VrIndex. No need to store shadow, since vector varargs
8217 // go through AK_Memory.
8218 assert(IsFixed);
8219 VrIndex++;
8220 break;
8221 }
8222 case ArgKind::Memory: {
8223 // Keep track of OverflowOffset and store shadow only for varargs.
8224 // Ignore fixed args, since we need to copy only the vararg portion of
8225 // the overflow area shadow.
8226 if (!IsFixed) {
8227 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8228 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8229 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8230 SE = getShadowExtension(CB, ArgNo);
8231 uint64_t GapSize =
8232 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8233 ShadowBase =
8234 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8235 if (MS.TrackOrigins)
8236 OriginBase =
8237 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8238 OverflowOffset += ArgSize;
8239 } else {
8240 OverflowOffset = kParamTLSSize;
8241 }
8242 }
8243 break;
8244 }
8245 case ArgKind::Indirect:
8246 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8247 }
8248 if (ShadowBase == nullptr)
8249 continue;
8250 Value *Shadow = MSV.getShadow(A);
8251 if (SE != ShadowExtension::None)
8252 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8253 /*Signed*/ SE == ShadowExtension::Sign);
8254 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8255 IRB.CreateStore(Shadow, ShadowBase);
8256 if (MS.TrackOrigins) {
8257 Value *Origin = MSV.getOrigin(A);
8258 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8259 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8261 }
8262 }
8263 Constant *OverflowSize = ConstantInt::get(
8264 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8265 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8266 }
8267
8268 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8269 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8270 IRB.CreateAdd(
8271 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8272 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8273 MS.PtrTy);
8274 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8275 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8276 const Align Alignment = Align(8);
8277 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8278 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8279 /*isStore*/ true);
8280 // TODO(iii): copy only fragments filled by visitCallBase()
8281 // TODO(iii): support packed-stack && !use-soft-float
8282 // For use-soft-float functions, it is enough to copy just the GPRs.
8283 unsigned RegSaveAreaSize =
8284 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8285 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8286 RegSaveAreaSize);
8287 if (MS.TrackOrigins)
8288 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8289 Alignment, RegSaveAreaSize);
8290 }
8291
8292 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8293 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8294 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8295 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8296 IRB.CreateAdd(
8297 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8298 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8299 MS.PtrTy);
8300 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8301 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8302 const Align Alignment = Align(8);
8303 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8304 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8305 Alignment, /*isStore*/ true);
8306 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8307 SystemZOverflowOffset);
8308 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8309 VAArgOverflowSize);
8310 if (MS.TrackOrigins) {
8311 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8312 SystemZOverflowOffset);
8313 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8314 VAArgOverflowSize);
8315 }
8316 }
8317
8318 void finalizeInstrumentation() override {
8319 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8320 "finalizeInstrumentation called twice");
8321 if (!VAStartInstrumentationList.empty()) {
8322 // If there is a va_start in this function, make a backup copy of
8323 // va_arg_tls somewhere in the function entry block.
8324 IRBuilder<> IRB(MSV.FnPrologueEnd);
8325 VAArgOverflowSize =
8326 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8327 Value *CopySize =
8328 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8329 VAArgOverflowSize);
8330 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8331 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8332 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8333 CopySize, kShadowTLSAlignment, false);
8334
8335 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8336 Intrinsic::umin, CopySize,
8337 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8338 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8339 kShadowTLSAlignment, SrcSize);
8340 if (MS.TrackOrigins) {
8341 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8342 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8343 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8344 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8345 }
8346 }
8347
8348 // Instrument va_start.
8349 // Copy va_list shadow from the backup copy of the TLS contents.
8350 for (CallInst *OrigInst : VAStartInstrumentationList) {
8351 NextNodeIRBuilder IRB(OrigInst);
8352 Value *VAListTag = OrigInst->getArgOperand(0);
8353 copyRegSaveArea(IRB, VAListTag);
8354 copyOverflowArea(IRB, VAListTag);
8355 }
8356 }
8357};
8358
8359/// i386-specific implementation of VarArgHelper.
8360struct VarArgI386Helper : public VarArgHelperBase {
8361 AllocaInst *VAArgTLSCopy = nullptr;
8362 Value *VAArgSize = nullptr;
8363
8364 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8365 MemorySanitizerVisitor &MSV)
8366 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8367
8368 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8369 const DataLayout &DL = F.getDataLayout();
8370 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8371 unsigned VAArgOffset = 0;
8372 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8373 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8374 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8375 if (IsByVal) {
8376 assert(A->getType()->isPointerTy());
8377 Type *RealTy = CB.getParamByValType(ArgNo);
8378 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8379 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8380 if (ArgAlign < IntptrSize)
8381 ArgAlign = Align(IntptrSize);
8382 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8383 if (!IsFixed) {
8384 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8385 if (Base) {
8386 Value *AShadowPtr, *AOriginPtr;
8387 std::tie(AShadowPtr, AOriginPtr) =
8388 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8389 kShadowTLSAlignment, /*isStore*/ false);
8390
8391 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8392 kShadowTLSAlignment, ArgSize);
8393 }
8394 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8395 }
8396 } else {
8397 Value *Base;
8398 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8399 Align ArgAlign = Align(IntptrSize);
8400 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8401 if (DL.isBigEndian()) {
8402 // Adjusting the shadow for argument with size < IntptrSize to match
8403 // the placement of bits in big endian system
8404 if (ArgSize < IntptrSize)
8405 VAArgOffset += (IntptrSize - ArgSize);
8406 }
8407 if (!IsFixed) {
8408 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8409 if (Base)
8410 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8411 VAArgOffset += ArgSize;
8412 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8413 }
8414 }
8415 }
8416
8417 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8418 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8419 // a new class member i.e. it is the total size of all VarArgs.
8420 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8421 }
8422
8423 void finalizeInstrumentation() override {
8424 assert(!VAArgSize && !VAArgTLSCopy &&
8425 "finalizeInstrumentation called twice");
8426 IRBuilder<> IRB(MSV.FnPrologueEnd);
8427 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8428 Value *CopySize = VAArgSize;
8429
8430 if (!VAStartInstrumentationList.empty()) {
8431 // If there is a va_start in this function, make a backup copy of
8432 // va_arg_tls somewhere in the function entry block.
8433 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8434 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8435 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8436 CopySize, kShadowTLSAlignment, false);
8437
8438 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8439 Intrinsic::umin, CopySize,
8440 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8441 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8442 kShadowTLSAlignment, SrcSize);
8443 }
8444
8445 // Instrument va_start.
8446 // Copy va_list shadow from the backup copy of the TLS contents.
8447 for (CallInst *OrigInst : VAStartInstrumentationList) {
8448 NextNodeIRBuilder IRB(OrigInst);
8449 Value *VAListTag = OrigInst->getArgOperand(0);
8450 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8451 Value *RegSaveAreaPtrPtr =
8452 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8453 PointerType::get(*MS.C, 0));
8454 Value *RegSaveAreaPtr =
8455 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8456 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8457 const DataLayout &DL = F.getDataLayout();
8458 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8459 const Align Alignment = Align(IntptrSize);
8460 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8461 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8462 Alignment, /*isStore*/ true);
8463 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8464 CopySize);
8465 }
8466 }
8467};
8468
8469/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8470/// LoongArch64.
8471struct VarArgGenericHelper : public VarArgHelperBase {
8472 AllocaInst *VAArgTLSCopy = nullptr;
8473 Value *VAArgSize = nullptr;
8474
8475 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8476 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8477 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8478
8479 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8480 unsigned VAArgOffset = 0;
8481 const DataLayout &DL = F.getDataLayout();
8482 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8483 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8484 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8485 if (IsFixed)
8486 continue;
8487 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8488 if (DL.isBigEndian()) {
8489 // Adjusting the shadow for argument with size < IntptrSize to match the
8490 // placement of bits in big endian system
8491 if (ArgSize < IntptrSize)
8492 VAArgOffset += (IntptrSize - ArgSize);
8493 }
8494 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8495 VAArgOffset += ArgSize;
8496 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8497 if (!Base)
8498 continue;
8499 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8500 }
8501
8502 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8503 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8504 // a new class member i.e. it is the total size of all VarArgs.
8505 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8506 }
8507
8508 void finalizeInstrumentation() override {
8509 assert(!VAArgSize && !VAArgTLSCopy &&
8510 "finalizeInstrumentation called twice");
8511 IRBuilder<> IRB(MSV.FnPrologueEnd);
8512 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8513 Value *CopySize = VAArgSize;
8514
8515 if (!VAStartInstrumentationList.empty()) {
8516 // If there is a va_start in this function, make a backup copy of
8517 // va_arg_tls somewhere in the function entry block.
8518 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8519 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8520 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8521 CopySize, kShadowTLSAlignment, false);
8522
8523 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8524 Intrinsic::umin, CopySize,
8525 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8526 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8527 kShadowTLSAlignment, SrcSize);
8528 }
8529
8530 // Instrument va_start.
8531 // Copy va_list shadow from the backup copy of the TLS contents.
8532 for (CallInst *OrigInst : VAStartInstrumentationList) {
8533 NextNodeIRBuilder IRB(OrigInst);
8534 Value *VAListTag = OrigInst->getArgOperand(0);
8535 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8536 Value *RegSaveAreaPtrPtr =
8537 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8538 PointerType::get(*MS.C, 0));
8539 Value *RegSaveAreaPtr =
8540 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8541 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8542 const DataLayout &DL = F.getDataLayout();
8543 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8544 const Align Alignment = Align(IntptrSize);
8545 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8546 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8547 Alignment, /*isStore*/ true);
8548 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8549 CopySize);
8550 }
8551 }
8552};
8553
8554// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8555// regarding VAArgs.
8556using VarArgARM32Helper = VarArgGenericHelper;
8557using VarArgRISCVHelper = VarArgGenericHelper;
8558using VarArgMIPSHelper = VarArgGenericHelper;
8559using VarArgLoongArch64Helper = VarArgGenericHelper;
8560
8561/// A no-op implementation of VarArgHelper.
8562struct VarArgNoOpHelper : public VarArgHelper {
8563 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8564 MemorySanitizerVisitor &MSV) {}
8565
8566 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8567
8568 void visitVAStartInst(VAStartInst &I) override {}
8569
8570 void visitVACopyInst(VACopyInst &I) override {}
8571
8572 void finalizeInstrumentation() override {}
8573};
8574
8575} // end anonymous namespace
8576
8577static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8578 MemorySanitizerVisitor &Visitor) {
8579 // VarArg handling is only implemented on AMD64. False positives are possible
8580 // on other platforms.
8581 Triple TargetTriple(Func.getParent()->getTargetTriple());
8582
8583 if (TargetTriple.getArch() == Triple::x86)
8584 return new VarArgI386Helper(Func, Msan, Visitor);
8585
8586 if (TargetTriple.getArch() == Triple::x86_64)
8587 return new VarArgAMD64Helper(Func, Msan, Visitor);
8588
8589 if (TargetTriple.isARM())
8590 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8591
8592 if (TargetTriple.isAArch64())
8593 return new VarArgAArch64Helper(Func, Msan, Visitor);
8594
8595 if (TargetTriple.isSystemZ())
8596 return new VarArgSystemZHelper(Func, Msan, Visitor);
8597
8598 // On PowerPC32 VAListTag is a struct
8599 // {char, char, i16 padding, char *, char *}
8600 if (TargetTriple.isPPC32())
8601 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8602
8603 if (TargetTriple.isPPC64())
8604 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8605
8606 if (TargetTriple.isRISCV32())
8607 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8608
8609 if (TargetTriple.isRISCV64())
8610 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8611
8612 if (TargetTriple.isMIPS32())
8613 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8614
8615 if (TargetTriple.isMIPS64())
8616 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8617
8618 if (TargetTriple.isLoongArch64())
8619 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8620 /*VAListTagSize=*/8);
8621
8622 return new VarArgNoOpHelper(Func, Msan, Visitor);
8623}
8624
8625bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8626 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8627 return false;
8628
8629 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8630 return false;
8631
8632 MemorySanitizerVisitor Visitor(F, *this, TLI);
8633
8634 // Clear out memory attributes.
8636 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8637 F.removeFnAttrs(B);
8638
8639 return Visitor.runOnFunction();
8640}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Hexagon Vector Combine
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const MemoryMapParams Linux_S390X_MemoryMapParams
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:219
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:150
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:472
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:678
@ ICMP_SLT
signed less than
Definition InstrTypes.h:707
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:708
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:705
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:706
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:131
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(unsigned CounterName)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:803
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2571
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1936
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1830
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2625
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2559
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1864
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2100
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2251
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2618
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2094
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2199
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2333
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1923
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1781
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2494
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1805
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2329
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:63
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2204
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1847
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2082
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2593
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1860
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2194
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2651
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2508
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2068
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2361
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2341
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2277
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2646
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1883
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2041
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2439
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2780
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:319
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:180
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:414
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1030
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1073
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1046
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:411
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1078
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1019
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1025
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:914
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1051
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:998
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1097
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:281
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:198
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:231
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:232
unsigned getNumOperands() const
Definition User.h:254
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:390
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:201
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:169
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:355
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1657
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2452
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:644
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:293
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:348
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:754
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:548
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:560
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3838
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70