Implement Perlin noise #441

lczyk · 2025-01-15T13:52:14Z

I wanted to switch one of my projects to cglm, but couldn't because I needed the Perlin noise implementation. Hence this PR.

So far this is just an implementation of float perlin_vec4(vec4 point), hence the draft version. It is based on the glm::perlin. I'm currently planning to add perlin_vec3 and perlin_vec2.

Also, I've come across some missing vec4-ext functions which, for now, i've put in perlin.h, but I'd like to move them to vec4-ext (and check whether any other vec-ext's want them). These are:

void glm_vec4_floor(vec4 x, vec4 dest) // and maybe ceil too
void glm_vec4_mods(vec4 x, float y, vec4 dest) // mod with scalar
void glm_vec4_steps(vec4 edge, float x, vec4 dest) // step with x as scalar
void glm_vec4_sets(vec4 v, float x) // and maybe glm_vec4_set too actually
void glm_vec4_muls(vec4 x, float y, vec4 dest) // mul with scalar

lczyk · 2025-01-15T14:11:21Z

Have a look at glm_perlin_test folder in my branch perlin-wip. There is a small testing script for comparing glm::perlin and glm_perlin_vec4. Here is a screenshot:

The difference is within GLM_FLT_EPSILON (as can also be seen by tests)

lczyk · 2025-01-17T16:43:30Z

Updated comparison which includes glm_perlin_vec3. The delta is smaller due to, i believe, smaller number of flops.

lczyk · 2025-01-17T20:24:27Z

Updated comparison which includes glm_perlin_vec2:

lczyk · 2025-01-17T20:32:09Z

Also there is a bit of a speed diff wrt glm::perlin (glm version 1.0.1):

Timing (in clock ticks)
GLM vec4:  106784
CGLM vec4: 68371 (x1.56 speedup)
GLM vec3:  55447
CGLM vec3: 14427 (x3.84 speedup)
GLM vec2:  22182
CGLM vec2: 8953 (x2.48 speedup)

Timed with a spiritual equivalent of:

#define N 1_000_000

clock_t start = clock();

for (size_t = 0; i < N; i++) {
    vec3 p = {float(i)/N, float(i)/N, float(i)/N};
    glm_perlin_vec3(p);
}

clock_t end = clock();

Compiled with:

zig c++ \
        -std=c++11 -O3 \
        -Wall -Wextra -Wpedantic -Wno-null-conversion -Wno-unused-variable -Werror \
        -o glm_perlin_test glm_perlin_test.cpp -lglm -L/opt/homebrew/opt/glm/lib \

_glm_noiseDetail_mod289 _glm_noiseDetail_permute _glm_noiseDetail_fade_vec4 _glm_noiseDetail_fade_vec3 _glm_noiseDetail_fade_vec2 _glm_noiseDetail_taylorInvSqrt _glm_noiseDetail_gradNorm_vec4 _glm_noiseDetail_gradNorm_vec3 _glm_noiseDetail_gradNorm_vec2 _glm_noiseDetail_i2gxyzw _glm_noiseDetail_i2gxyz _glm_noiseDetail_i2gxy

lczyk · 2025-01-18T20:26:01Z

Ok, so:

glm_vec4_scale() can be used instead of glm_vec4_muls()

done. also replaced _glm_vec4_sets with already existing _glm_vec4_fill

Some missing useful vec functions can be moved to vec[2|3|4] or -ext.h files

done. moved, _floor and _mods. renamed _steps to _stepr (steps is the threshold by a scalar. stepr is threshold of a scalar by a vector) and moved both to ext. deprecated step_uni in favour of steps (added all the compatibility macros) since, from what i can tell, the 's' suffix is much more prevalent for the scalar version of the function (as opposed to _uni).

No // comment pls

done

Two space instead of 4

done

Not sure about _glm_ functions since they all will be visible to user, maybe macro then #undef macro art end of file? if there is no better way to handle them

done. had no better idea than #define and then #undef so did that. (static would confine the functions to a comp unit, but that would still expose them in header-only mode)

If possible; same coding style with other files ( or I can do some small edits later )

unsure what bits you mean tbh. i've looked at formatting in #433, and as far as i can tell it matches. please feel free to point out bits and i'm more than happy to change them 👍

EDIT: In the future additional optimizations can be made if it could be possible

unsure what you mean(?)

Also, while moving things to ext, i found a bug in swizzle tests for vec3 and vec4 (they used glm_vec3_swizzle as opposed to test macro GLM(vec3)swizzle) so they were not testing the export to static lib). Fixed that.

Also also, found missing vec2_step and vec2_swizzle. added both with tests. although they are not used in noise, i thought that like this the api is consistent between vec2 and vec3/4.

recp · 2025-01-19T12:02:56Z

@MarcinKonowalczyk many thanks, some tests (vec3) are failing on some platforms. Are they about floating point errors or something can be improved?

EDIT: In the future additional optimizations can be made if it could be possible
unsure what you mean(?)

Even now, there is a lot of room for optimizations:

(t*t*t) * (t * (t * 6 - 15) + 10)
``

can be re-written as:

```C
t * (t*t * (t*6 - 15) + 10)

which reduces 1 vector mul and similar operations can be optimized even for scalar version.

  dest[0] = (t[0] * t[0] * t[0]) * (t[0] * (t[0] * 6.0f - 15.0f) + 10.0f); \
  dest[1] = (t[1] * t[1] * t[1]) * (t[1] * (t[1] * 6.0f - 15.0f) + 10.0f); \
  dest[2] = (t[2] * t[2] * t[2]) * (t[2] * (t[2] * 6.0f - 15.0f) + 10.0f); \
  dest[3] = (t[3] * t[3] * t[3]) * (t[3] * (t[3] * 6.0f - 15.0f) + 10.0f); \

using glm_vec4_mul() may give better result since it is optimized with SIMD. But well known compilers may do auto-vectorizing not sure for that.

In cglm some operations may be grouped ( if possible ) then the whole function can be optimized with SIMD where vec3/vec4 are used a lot internally.

_stepr

_steps s stands for scalar but r?

My hope was to keep swizzle as macro to use builtin shuffle / blend / permute ... to make them lightweight but 🤷‍♂️ anyway thanks for fixes.

please feel free to point out bits and i'm more than happy to change them 👍

no prob. I can do some small style changes later e.g. indents, declare variables at the beginning of functions or scope at least ( even c99+ doesn't require it ) where possible... probably prefer glm__over _glm_ for internal macros, temp definitions, functions ...

after tests are passed we can merge the PR.

lczyk · 2025-01-20T14:09:58Z

Are they about floating point errors or something can be improved?

looking into

(ttt) * (t * (t * 6 - 15) + 10)

can be re-written as:

t * (tt * (t6 - 15) + 10)

i'm nor sure it can.. (see this on wolfram alpha: just subtracting one eq from the other. i've also tried it in code and it produces wrong noise pattern.

using glm_vec4_mul() may give better result

i've tried

#define _glm_noiseDetail_fade_vec4(t, dest) { \
    glm_vec4_mul(t, t, dest); /* dest = t * t */ \
    glm_vec4_mul(dest, t, dest); /* dest *= t */ \
    vec4 temp; \
    glm_vec4_scale(t, 6.0f, temp); /* temp = t * 6.0f */ \
    glm_vec4_subs(temp, 15.0f, temp); /* temp -= 15.0f */ \
    glm_vec4_mul(t, temp, temp); /* temp *= t */ \
    glm_vec4_adds(temp, 10.0f, temp); /* temp += 10.0f */ \
    glm_vec4_mul(dest, temp, dest); /* dest *= temp */ \
}

but did not see any appreciable difference in speed. Happy to do a pass like this over the helper functions if you'd prefer though.

generally, so far i've not thought that much about optimisations, just about correct / readable implementation faithful to glm::perlin. the rough benchmarking looked good so i left it at that. i think maybe, given the benchmarks work, let's leave it like that as the initial implementation and do benchmarking / optimisation in a separate piece of work?

_steps s stands for scalar but r?

'reverse' 😅 not the best name, i admit. i just named it something which made sense in my head and then forgot to go back to it. maybe, instead, steps -> stepsv and stepr -> stepvs ?? or, indeed, maybe stepr should just be an internal helper of noise as opposed to an ext func? happy with whatever you'd like there 👍

probably prefer glm__over glm for internal macros

done

Looking at tests in more detail...

lczyk · 2025-01-20T16:41:12Z

Ive managed to reproduce the CMake/ubuntu-22.04/clang-15 error in docker. Dockerfile:

FROM ubuntu:22.04
WORKDIR /cglm

# Install dependencies
RUN apt-get update -y
RUN apt-get install -y cmake clang-15 ninja-build

# Copy source code, remove build dir if exists
COPY . .
RUN rm -rf build

RUN cmake \
    -B build \
    -GNinja \
    -DCMAKE_C_COMPILER=clang-15 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCGLM_STATIC=ON \
    -DCGLM_USE_TEST=ON

RUN cmake --build build

CMD ["bash"]

which follows pretty much exactly what the pipeline does.

Then

DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build -t cglm_test . && docker run --rm -it cglm_test /cglm/build/tests

fails and

DOCKER_DEFAULT_PLATFORM=linux/arm64 docker build -t cglm_test . && docker run --rm -it cglm_test /cglm/build/tests

succeeds (the diff is amd64 vs arm64). This pretty much confirms that it's to do with the fact that i'm working on an arm mac and i've generated the test values in test_noise.h with a native build.

I will investigate further to see how big is the difference and adjust the tests accordingly.

lczyk · 2025-01-20T18:12:32Z

So, this is wild...

Yeah, there are big differences on amd64, but only sometimes! I think it might be something to do with vec3 having a slightly defferent implementation in glm::perlin than vec2 and vec4 version. Will have a deeper look into. Might actually be a bug in glm::perlin, but will try to match ours first, and then potentially submit a patch there.

For reference, compiled in the same kind of container as above (aka with clang-15).

recp · 2025-01-20T20:48:14Z

t * (tt * (t6 - 15) + 10)
i'm nor sure it can.. (see this on wolfram alpha: just subtracting one eq from the other. i've also tried it in code and it produces wrong noise pattern.

What was I thinking when simplifying it 🫣 But the point was that some operations can be optimized by simplifying, some with SIMD and some with ILP....

As I mentioned before In the future additional optimizations can be made if it could be possible it is not expected in this PR :)

Lets skip micro optimizations for now and merge the PR after tests are passed.

maybe stepr should just be an internal helper of noise as opposed to an ext func?

makes sense for now 👍

Will have a deeper look into. Might actually be a bug in glm::perlin, but will try to match ours first, and then potentially submit a patch there.

thanks

lczyk · 2025-01-22T14:54:46Z

@recp done. see comment in the patch for explanation, and image below for the comparison.

i think this is something i will raise with glm, and if they and up changing it there i can do the check-and-match once again. 👍 I've ended up semi-automating the build for amd64 and arm64 with docker + makefile and then just going through the source of both noise.h and noise.inl (on glm side), returning early with partial values and seeing where the delta is. See perlin-wip branch for the code. tldr, this should show you the diff:

git clone --depth=1 --branch 14d14be8fac739666bb48c61eda9eff97b8dfd3a https://github.com/MarcinKonowalczyk/cglm/tree/perlin-wip
cd cglm/glm_perlin_test
make
python plot.py --suffix arm # or amd

there is also make test which runs all the tests in the containers.

writing this here partially to document it for myself when i inevitably need to do this again 😅

fix fade for vec2

lczyk · 2025-01-22T16:56:32Z

so, while i had everything set up, i thought i'd do a compare for other intermediate values and found a couple more bugs / inconsistencies. you know how the delta for amd was just noise ~1e-7 but for arm it had structure at ~1e-6 ? well, now they're both just noise (see below, that's on arm). as part of that i've vectorised some of the intermediate functions, and found a missing simd intrinsic from glm_vec4_divs.

btw, tests passing in both arm and amd container 👍

recp · 2025-01-22T20:23:40Z

@MarcinKonowalczyk the PR is merged, many thanks for your contributions 🚀

recp · 2025-01-22T20:46:39Z

you know how the delta for amd was just noise ~1e-7 but for arm it had structure at ~1e-6 ?

Hmm, in tests I've used GLM_FLT_EPSILON which is 1e-5 but can be configured due to these changes. Also FP precision may decrease after lot of FP ops this is why I had mentioned about floating point errors before. Due to this, in tests we may need low precision to compare :/ I guess. Fused math may reduce the errors maybe where available, cglm tries to take the advantage of fused math where possible. Maybe by using glm_vec4_ internally instead of expect auto vectorizing, compiler may use fma to reduce fp errors where possible. Compiler which cant optimize inline functions may generate lot of MOVs, this is why I think we should manually optimize each functions with SIMD where possible in the future...

as part of that i've vectorised some of the intermediate functions

IIRC, there were not too much diff before but okay 👍

Anyway, many thanks for your contributions 🚀

lczyk · 2025-01-22T23:59:58Z

nice 🥳

tomorrow/the-day-after, i will have another look over the commits and write up some bullet points for easy inclusion in release notes for the next v (given that i've also fixed a couple of bugs and added a couple of ext's).

gottfriedleibniz · 2025-01-24T16:30:39Z

vdivq_f32 is available only on A64 so compiling with ARMv7 may lead to issues. In this instance it may be easier to use the existing definition of glmm_div

recp · 2025-01-24T19:28:30Z

Hi @gottfriedleibniz,

Many thanks for the catch.

EDIT: 441f265...70a1a94

should fix this

We must add ARMv7 build to CI asap to catch these more quickly ( maybe --ffast-math too )

lczyk · 2025-01-25T11:34:11Z

cheers, good catch! having had a closer look at the code, i guess we should transitions most of the ext stuff to call through glmm_ where possible, and then handle intrinsic selection in glmm_, right?

... ( maybe --ffast-math too )

There might be some issues for that around the noise tests, given how small numerical differences can get amplified there (see the /7 vs * (1/7) case) but we can disable fast math for those particular tests. I could not find a convenient pre-defined flag, but something like --ffast-math -DCGLM_FAST_MATH=0 should work. To be clear, i mean this only for numerically vulnerable tests. I think, as a baseline, tests should pass under ffast-math.

lczyk added 13 commits January 14, 2025 17:31

initial impl

a4cd7e0

boilerplate

3e52d90

test boilerplate

98f53c7

docs boilerplate

8a2fd9c

work in progress

c3e16a5

impl but buggy

f3f75a2

missed bracket

f19dc13

unnecessary zero init

fda5406

minor comment

b54dff0

refactor gNorm

5d34a04

minor reshuffle

71a0dc6

test_perlin

43c9f84

perlin.h -> noise.h

a0d8803

lczyk added 8 commits January 17, 2025 16:38

doc

ae1bee7

more doc

fbdc46b

glm_perlin_vec3 impl

5f241a2

docs

585a999

glm_perlin_vec3 boilerplate

f052964

glm_perlin_vec3 test

98ab6fc

note

ae82a49

glm_perlin_vec3 docs

83b67ba

lczyk added 3 commits January 17, 2025 20:21

glm_perlin_vec2 impl

a98c270

glm_perlin_vec2 boilerplate

1377a94

glm_perlin_vec2 test

9085ed0

docs

f1a7224

lczyk marked this pull request as ready for review January 17, 2025 20:36

lczyk added 7 commits January 18, 2025 20:10

switch deprecation in cglm/call.h to #define

b45bf1d

vec2 swizzle

4b0e7da

glms_vec2_mods doc

082f187

sets -> fill

f32f18a

purged // comments in noise.h

d3ad164

couple more // comments

23c0f5f

_glm_ -> glm__ for internal macros

948642f

1/7 patch

450d747

lczyk added 5 commits January 22, 2025 16:49

vdivq_f32

b79347e

glm__noiseDetail_taylorInvSqrt

9cfa40f

fix granNorm arg order

fd01317

glm__noiseDetail_fade_vec2 arg restrict

2b4aef2

vectorise fades

dfc9969

fix fade for vec2

recp merged commit e8c791e into recp:master Jan 22, 2025
49 of 74 checks passed

BrewTestBot mentioned this pull request Feb 13, 2025

cglm 0.9.5 Homebrew/homebrew-core#207532

Merged

Uh oh!

Implement Perlin noise #441

Implement Perlin noise #441

Uh oh!

Conversation

lczyk commented Jan 15, 2025

Uh oh!

lczyk commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 17, 2025

Uh oh!

lczyk commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 17, 2025

Uh oh!

lczyk commented Jan 18, 2025

Uh oh!

recp commented Jan 19, 2025

Uh oh!

lczyk commented Jan 20, 2025

Uh oh!

lczyk commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

recp commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

recp commented Jan 22, 2025

Uh oh!

recp commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 22, 2025

Uh oh!

gottfriedleibniz commented Jan 24, 2025

Uh oh!

recp commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lczyk commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lczyk commented Jan 15, 2025 •

edited

Loading

lczyk commented Jan 17, 2025 •

edited

Loading

lczyk commented Jan 20, 2025 •

edited

Loading

lczyk commented Jan 20, 2025 •

edited

Loading

recp commented Jan 20, 2025 •

edited

Loading

lczyk commented Jan 22, 2025 •

edited

Loading

lczyk commented Jan 22, 2025 •

edited

Loading

recp commented Jan 22, 2025 •

edited

Loading

recp commented Jan 24, 2025 •

edited

Loading

lczyk commented Jan 25, 2025 •

edited

Loading