NEON 4x4 Rigid Matrix Inverse and Multiply

This project provides fast hand-written ARMv8-A NEON (AArch64) assembly routines for:

Inverting a rigid (rotation + translation) 4x4 matrix
Multiplying two 4x4 matrices

Both routines use single-precision floats in column-major order and are optimized for pipelines like Cortex-A53 (but suitable for all ARM64 NEON platforms).

How the Rigid Affine Property Optimizes the Inverse

A general 4x4 affine matrix (as used in 3D graphics) can include rotation, translation, scale, and shear. However, for most scene transforms (camera, object pose), only rotation and translation are present—this is called a rigid (or Euclidean) transform.

Such a matrix takes the form:

[ R | t ] (R = 3x3 rotation, t = translation) [ 0 | 1 ]

The inverse of a rigid affine matrix is mathematically simple and can be written as:

[ Rᵗ | -Rᵗ * t ] [ 0 | 1 ]

The 3×3 rotation block is transposed (Rᵗ)—that's much cheaper than a full matrix inverse.
The translation is efficiently computed as the negative dot product of the new rows and the original translation column.
No determinant, cofactors, or division required!

In this project, the NEON routine computes the inverse by:

Using NEON vector "unzip" and "zip" instructions to quickly transpose the 3×3 rotation.
Efficiently calculating and negating the new translation using NEON fused multiply-add, treating the 3 rows as parallel dot-products.
Writing out the new matrix in a tight, branchless sequence suited to modern ARM cores.

This leverages the affine rigid property—enabling the inverse of a pose matrix to be orders of magnitude faster than a general inverse.

Files

neon_mat4.S: NEON assembly code for efficient matrix inverse and multiplication.
test.c: C test harness with correctness-checking for various rigid transforms.

Matrix Format

All matrices are 4x4, stored as float[16] in column-major order (matching OpenGL conventions):

| m00 m04 m08 m12 |
| m01 m05 m09 m13 |
| m02 m06 m10 m14 |
| m03 m07 m11 m15 |

For a rigid transform:

The upper-left 3x3 is a rotation matrix
The last column (excluding bottom-right) is translation
Bottom-right element is always 1

Building & Testing

Prerequisites

GCC cross compiler for AArch64 (e.g., aarch64-linux-gnu-gcc)
QEMU user-mode emulator for AArch64 (qemu-aarch64)
Standard Linux tools

Building and Running

You can cross-compile it on Linux using GNU's aarch64 toolchain, and run using QEMU to emulate aarch64.

#assemble using aarch64 toolchain
aarch64-linux-gnu-as neon_mat4.S -o mat4.o

# compile the c test file and link
aarch64-linux-gnu-gcc mat4.o test.c -o tests

# run using qemu-system-aarch64
qemu-aarch64 -cpu cortex-a53 -L /usr/aarch64-linux-gnu ./tests

# you can also debug it on a x86-64 host
qemu-aarch64 -L /usr/aarch64-linux-gnu -g 1234 ./tests &
aarch64-linux-gnu-gdb -ex "target remote localhost:1234"

Usage from C

Declare prototypes:

extern void neon_mat4_affine_rigid_inverse(const float* src, float* dst);
extern void neon_mat4_mul(const float* a, const float* b, float* dst);

Note

The assembly routines' calling convention follows the AAPCS64 ABI

All pointers must be at least 8-byte (preferably 16-byte) aligned.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.clang-format		.clang-format
LICENSE		LICENSE
README.md		README.md
neon_mat4.S		neon_mat4.S
test.c		test.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEON 4x4 Rigid Matrix Inverse and Multiply

How the Rigid Affine Property Optimizes the Inverse

Files

Matrix Format

Building & Testing

Prerequisites

Building and Running

Usage from C

Note

About

Uh oh!

Releases

Packages

Languages

License

vernizzi/neon_mat4

Folders and files

Latest commit

History

Repository files navigation

NEON 4x4 Rigid Matrix Inverse and Multiply

How the Rigid Affine Property Optimizes the Inverse

Files

Matrix Format

Building & Testing

Prerequisites

Building and Running

Usage from C

Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages