-
Notifications
You must be signed in to change notification settings - Fork 286
Description
Description
Android NDK r25c produces invalid code (and crashes if we mark certain functions as noinline
) for arm64-v8a target architecture when compiling with -Os
(optimize size) compiler flag.
Bellow is attached minimized/striped sample that shows the problem (it uses functions from libtomcrypt).
Link to the repo repository: https://github.com/SanjaLV/NDK_r25c_repro
Prerequisites:
- Linux/macOS machine
- ANDROID_HOME env variable that will point to Android SDK root.
"ndk;25.2.9519653"
/"ndk;25.1.8937393"
installed in sdkmanager- System clang compiler with UBSAN/ASAN.
- arm64-v8a Android device/emulator (emulator was tested only on Apple M1 machine) connected to ADB
How to reproduce (invalid code):
- Run
test_manual.sh
- Observe that
REMOTE_0s_LOG.txt
differs fromREMOVE_02_LOG.txt
How to reproduce (compiler crash):
- Open
REPRO.c
in the editor of your choice - Change define on line 10 to
#define MAKE_COMPILER_CRASH 1
- Run
test_manual.sh
- Observe that clang will crash trying to compile REPRO.c with
-0s
Crash backtrace should look like:
Program received signal SIGSEGV, Segmentation fault.
0x00000000065bec3e in llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) ()
(gdb) bt
#0 0x00000000065bec3e in llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) ()
#1 0x00000000065be8b9 in llvm::InnerLoopVectorizer::scalarizeInstruction(llvm::Instruction*, llvm::VPReplicateRecipe*, llvm::VPIteration const&, bool, llvm::VPTransformState&) ()
#2 0x00000000065be760 in llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) ()
#3 0x00000000065be31e in llvm::VPBasicBlock::execute(llvm::VPTransformState*) ()
#4 0x00000000065be047 in llvm::VPRegionBlock::execute(llvm::VPTransformState*) ()
#5 0x00000000065bdf6c in llvm::VPRegionBlock::execute(llvm::VPTransformState*) ()
#6 0x00000000066fabf1 in llvm::VPlan::execute(llvm::VPTransformState*) ()
#7 0x00000000062fd91e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) ()
#8 0x000000000663a48f in llvm::LoopVectorizePass::processLoop(llvm::Loop*) ()
#9 0x0000000005edebea in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*)
()
#10 0x0000000005eddbb9 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
#11 0x0000000005edd89b in ?? ()
#12 0x0000000005c6168a in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
#13 0x0000000005c61521 in clang::TemplateDeclInstantiator::VisitDecl(clang::Decl*) ()
#14 0x0000000005ec7112 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
#15 0x0000000005ec6dd1 in ?? ()
#16 0x00000000063538c6 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
#17 0x00000000065d66c8 in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) ()
#18 0x00000000060524d5 in ?? ()
#19 0x0000000005ea25a9 in clang::ParseAST(clang::Sema&, bool, bool) ()
#20 0x00000000063c128d in clang::FrontendAction::Execute() ()
#21 0x00000000063c112d in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#22 0x00000000063c1541 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) ()
#23 0x00000000066a9f54 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) ()
#24 0x00000000066a6de3 in ?? ()
#25 0x00000000066754a5 in main ()
Context:
Originally discovered that upgrading NDK from version r25b to r25c changes return values of certain cryptographic functions. After some investigations, we found that the first function where the return value changes with NDK r25c was mp_montgomery_reduce
. Then we minimized the code by fixing the input argument to mp_montgomery_reduce
. (These values are not unique any valid random values will work too, as long as P
is odd).
We tried to compare generated assembly, but clangs inlines the majority of the calls, so that complicates the investigation, thus we tried to apply noinline
attribute. Which resulted in a compiler crash during Loop vectorization.
Comparing clang_source_info.md
of both NDK, I can spot a few patches regarding arm64 vectorization:
- [[AArch64] Use simd mov to materialize big fp constants](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/cherry/7a605ab7bfbc681c34335684f45b7da32d495db1.patch)
- [[AArch64] Emit vector FP cmp when LE is used with fast-math](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/cherry/bf268a05cd9294854ffccc3158c0e673069bed4a.patch)
- [Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch)
I don't have access to patches, thus cannot verify this hypothesis. Clang with debug asserts enabled might also provide additional information, but I don't know how to build NDK clang.
Feel free to ask for more information.
Many thanks,
Aleksandrs
Affected versions
r25
Canary version
No response
Host OS
Linux, Mac
Host OS version
Ubuntu 22.04
Affected ABIs
arm64-v8a
Build system
ndk-build
Other build system
No response
minSdkVersion
31 (not relevant)
Device API level
27