Codestin Search App

JakeChampion · 2026-03-25T12:45:26Z

flatten: replace per-pixel float division with 256-entry LUTs for uchar alpha blending
composite: precompute inv_max_band[] reciprocals instead of dividing per pixel; initialize blend arrays with = {0} instead of per-pixel zero-fill loop
rot/flip/zoom/subsample: replace byte-at-a-time pixel copies with typed stores

Measured with hyperfine on an arm64 Apple M2, clang 21

Test images: 8000x8000 or 4000x4000 uchar, created with vips gaussnoise + cast + bandjoin

$ hyperfine --warmup 5 --runs 15 \
    -n master 'VIPS_CONCURRENCY=1 vips <op> input.v output.v' \
    -n branch 'VIPS_CONCURRENCY=1 vips <op> input.v output.v'

Operation	master	branch	speedup
flatten 8k RGBA	170.0 ms	96.0 ms	1.77x faster
rot90 4k RGBA	86.0 ms	52.4 ms	1.64x faster
rot90 4k RGB	80.6 ms	50.7 ms	1.59x faster
zoom 2x 4k RGB	121.1 ms	75.9 ms	1.59x faster
rot270 4k RGB	79.8 ms	55.0 ms	1.45x faster
flip horiz 4k RGBA	68.7 ms	47.6 ms	1.44x faster
flip horiz 4k RGB	63.3 ms	47.7 ms	1.33x faster
rot180 4k RGB	63.3 ms	48.8 ms	1.30x faster
subsample 2x 8k RGB	78.2 ms	66.6 ms	1.17x faster

All affected operations produced bit-identical output to master, verified by running each operation on a 512x512 RGBA/RGB test image and comparing raw .v files with cmp -s.

Test plan

meson test -C build passes
Output images are bit-identical to master
Test on x86_64 / gcc

lovell

Thanks for this Jake, those timings look great. Would be great to see the impact on x64 CPUs too. I've left a couple of comments inline.

Replace per-pixel double-precision division in vips_flatten for UCHAR input with precomputed 256-entry LUTs. This applies to both the black background and arbitrary background paths, with a special-case unrolled loop for the common RGBA (4-band) case. Flatten is on the hot path for every RGBA-to-JPEG conversion (PNG/WebP with alpha saved as JPEG). Benchmarked on 4000x4000 RGBA (arm64, Apple M-series): Black background: ~3.6% faster (82ms -> 79ms) Colored background: ~2.8% faster (82ms -> 80ms)

Precompute inv_max_band = 1.0 / max_band once during build and use multiplication instead of division when scaling pixels to 0-1 in the composite blend loop. Applied to both the generic double path and the v4f SIMD vector path. Division is 3-5x slower than multiplication on modern CPUs.

Initialize A[] and f[] arrays with = {0} at declaration instead of zeroing unused entries in a per-pixel loop. Removes up to 63 double stores per pixel for images with few bands.

…mple Replace byte-at-a-time pixel copy loops with a VIPS_MEMCPY macro that uses typed stores for common pixel sizes (1/2/3/4/8 bytes) and falls back to memcpy for others. Define the macro once in util.h.

JakeChampion force-pushed the jake/perf-conversion branch from 66e28bb to 30fed1c Compare March 25, 2026 12:46

JakeChampion marked this pull request as ready for review March 25, 2026 12:46

lovell reviewed Mar 26, 2026

View reviewed changes

Comment thread libvips/conversion/composite.cpp

Comment thread libvips/conversion/flip.c Outdated

JakeChampion added 4 commits March 27, 2026 10:37

perf: remove per-pixel zero-fill loop in composite blend

d11c3af

Initialize A[] and f[] arrays with = {0} at declaration instead of zeroing unused entries in a per-pixel loop. Removes up to 63 double stores per pixel for images with few bands.

perf: use typed stores for per-pixel copies in rot, flip, zoom, subsa…

e3293dc

…mple Replace byte-at-a-time pixel copy loops with a VIPS_MEMCPY macro that uses typed stores for common pixel sizes (1/2/3/4/8 bytes) and falls back to memcpy for others. Define the macro once in util.h.

JakeChampion force-pushed the jake/perf-conversion branch from 30fed1c to e3293dc Compare March 27, 2026 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize conversion module hot paths#4969

perf: optimize conversion module hot paths#4969
JakeChampion wants to merge 4 commits intolibvips:masterfrom
JakeChampion:jake/perf-conversion

JakeChampion commented Mar 25, 2026 •

edited

Loading

Uh oh!

lovell left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JakeChampion commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

lovell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JakeChampion commented Mar 25, 2026 •

edited

Loading