General matrix multiplication

Introducation

This repository contains a simple implementation of matrix multiplication using OpenMP and the NEON instruction set. The goal is to demonstrate the use of parallel processing and optimized instructions for matrix operations.

Requirements

A compatible ARM-based processor with NEON support
OpenMP installed and configured on your system
A C compiler (e.g., GCC/CLANG)

Compiliation

To compile the code, use the following command:

 gcc -o gemm gemm.c -O3 -ffast-math -fopenmp -march=native

Matrix Multiplication Benchmark on M2 Pro Processor

This benchmark compares the performance of four different matrix multiplication implementations on an M2 Pro processor. Implementations:

Optimized Neon Parallel BLOCKED
Standard Neon Parallel BLOCKED
Normal Parallel NEON
Normal Parallel matmul

Results:

N = 1024, BLOCK_SIZE = 16

Optimized: 87.17 GFLOP/S
Standard: 69.49 GFLOP/S
Normal NEON: 76.44 GFLOP/S
Normal matmul: 4.85 GFLOP/S s

N = 8192, BLOCK_SIZE = 16

Optimized: 122.27 GFLOP/S ms
Standard: 72.04 GFLOP/S ms
Normal NEON: 60.23 GFLOP/S ms
Normal matmul: Not applicable

Note

Our old best performance was ~76 GFLOP/S, and we have now achieved a significant improvement of 122.27 GFLOP/S with our optimized implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
gemm.c		gemm.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

General matrix multiplication

Introducation

Requirements

Compiliation

Matrix Multiplication Benchmark on M2 Pro Processor

Results:

About

Uh oh!

Releases

Packages

Languages

Bahaatbb/GEMM

Folders and files

Latest commit

History

Repository files navigation

General matrix multiplication

Introducation

Requirements

Compiliation

Matrix Multiplication Benchmark on M2 Pro Processor

Results:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages