-
-
Notifications
You must be signed in to change notification settings - Fork 267
Implement Perlin noise #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have a look at The difference is within |
|
Also there is a bit of a speed diff wrt Timed with a spiritual equivalent of: #define N 1_000_000
clock_t start = clock();
for (size_t = 0; i < N; i++) {
vec3 p = {float(i)/N, float(i)/N, float(i)/N};
glm_perlin_vec3(p);
}
clock_t end = clock();Compiled with: |
_glm_noiseDetail_mod289 _glm_noiseDetail_permute _glm_noiseDetail_fade_vec4 _glm_noiseDetail_fade_vec3 _glm_noiseDetail_fade_vec2 _glm_noiseDetail_taylorInvSqrt _glm_noiseDetail_gradNorm_vec4 _glm_noiseDetail_gradNorm_vec3 _glm_noiseDetail_gradNorm_vec2 _glm_noiseDetail_i2gxyzw _glm_noiseDetail_i2gxyz _glm_noiseDetail_i2gxy
|
Ok, so:
done. also replaced
done. moved,
done
done
done. had no better idea than
unsure what bits you mean tbh. i've looked at formatting in #433, and as far as i can tell it matches. please feel free to point out bits and i'm more than happy to change them 👍
unsure what you mean(?) Also, while moving things to ext, i found a bug in swizzle tests for vec3 and vec4 (they used glm_vec3_swizzle as opposed to test macro GLM(vec3)swizzle) so they were not testing the export to static lib). Fixed that. Also also, found missing vec2_step and vec2_swizzle. added both with tests. although they are not used in noise, i thought that like this the api is consistent between vec2 and vec3/4. |
|
@MarcinKonowalczyk many thanks, some tests (vec3) are failing on some platforms. Are they about floating point errors or something can be improved?
Even now, there is a lot of room for optimizations: (t*t*t) * (t * (t * 6 - 15) + 10)
``
can be re-written as:
```C
t * (t*t * (t*6 - 15) + 10)which reduces 1 vector mul and similar operations can be optimized even for scalar version. dest[0] = (t[0] * t[0] * t[0]) * (t[0] * (t[0] * 6.0f - 15.0f) + 10.0f); \
dest[1] = (t[1] * t[1] * t[1]) * (t[1] * (t[1] * 6.0f - 15.0f) + 10.0f); \
dest[2] = (t[2] * t[2] * t[2]) * (t[2] * (t[2] * 6.0f - 15.0f) + 10.0f); \
dest[3] = (t[3] * t[3] * t[3]) * (t[3] * (t[3] * 6.0f - 15.0f) + 10.0f); \using In cglm some operations may be grouped ( if possible ) then the whole function can be optimized with SIMD where vec3/vec4 are used a lot internally.
_steps My hope was to keep swizzle as macro to use builtin shuffle / blend / permute ... to make them lightweight but 🤷♂️ anyway thanks for fixes.
no prob. I can do some small style changes later e.g. indents, declare variables at the beginning of functions or scope at least ( even c99+ doesn't require it ) where possible... probably prefer after tests are passed we can merge the PR. |
looking into
i'm nor sure it can.. (see this on wolfram alpha: just subtracting one eq from the other. i've also tried it in code and it produces wrong noise pattern.
i've tried #define _glm_noiseDetail_fade_vec4(t, dest) { \
glm_vec4_mul(t, t, dest); /* dest = t * t */ \
glm_vec4_mul(dest, t, dest); /* dest *= t */ \
vec4 temp; \
glm_vec4_scale(t, 6.0f, temp); /* temp = t * 6.0f */ \
glm_vec4_subs(temp, 15.0f, temp); /* temp -= 15.0f */ \
glm_vec4_mul(t, temp, temp); /* temp *= t */ \
glm_vec4_adds(temp, 10.0f, temp); /* temp += 10.0f */ \
glm_vec4_mul(dest, temp, dest); /* dest *= temp */ \
}but did not see any appreciable difference in speed. Happy to do a pass like this over the helper functions if you'd prefer though. generally, so far i've not thought that much about optimisations, just about correct / readable implementation faithful to glm::perlin. the rough benchmarking looked good so i left it at that. i think maybe, given the benchmarks work, let's leave it like that as the initial implementation and do benchmarking / optimisation in a separate piece of work?
'reverse' 😅 not the best name, i admit. i just named it something which made sense in my head and then forgot to go back to it. maybe, instead,
done Looking at tests in more detail... |
|
Ive managed to reproduce the FROM ubuntu:22.04
WORKDIR /cglm
# Install dependencies
RUN apt-get update -y
RUN apt-get install -y cmake clang-15 ninja-build
# Copy source code, remove build dir if exists
COPY . .
RUN rm -rf build
RUN cmake \
-B build \
-GNinja \
-DCMAKE_C_COMPILER=clang-15 \
-DCMAKE_BUILD_TYPE=Release \
-DCGLM_STATIC=ON \
-DCGLM_USE_TEST=ON
RUN cmake --build build
CMD ["bash"]
which follows pretty much exactly what the pipeline does. Then DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build -t cglm_test . && docker run --rm -it cglm_test /cglm/build/testsfails and DOCKER_DEFAULT_PLATFORM=linux/arm64 docker build -t cglm_test . && docker run --rm -it cglm_test /cglm/build/testssucceeds (the diff is I will investigate further to see how big is the difference and adjust the tests accordingly. |
|
So, this is wild... Yeah, there are big differences on amd64, but only sometimes! I think it might be something to do with vec3 having a slightly defferent implementation in glm::perlin than vec2 and vec4 version. Will have a deeper look into. Might actually be a bug in For reference, compiled in the same kind of container as above (aka with clang-15). |
What was I thinking when simplifying it 🫣 But the point was that some operations can be optimized by simplifying, some with SIMD and some with ILP.... As I mentioned before In the future additional optimizations can be made if it could be possible it is not expected in this PR :) Lets skip micro optimizations for now and merge the PR after tests are passed.
makes sense for now 👍
thanks |
|
@recp done. see comment in the patch for explanation, and image below for the comparison. i think this is something i will raise with glm, and if they and up changing it there i can do the check-and-match once again. 👍 I've ended up semi-automating the build for amd64 and arm64 with docker + makefile and then just going through the source of both there is also writing this here partially to document it for myself when i inevitably need to do this again 😅 |
|
so, while i had everything set up, i thought i'd do a compare for other intermediate values and found a couple more bugs / inconsistencies. you know how the delta for amd was just noise ~1e-7 but for arm it had structure at ~1e-6 ? well, now they're both just noise (see below, that's on arm). as part of that i've vectorised some of the intermediate functions, and found a missing simd intrinsic from btw, tests passing in both arm and amd container 👍 |
|
@MarcinKonowalczyk the PR is merged, many thanks for your contributions 🚀 |
Hmm, in tests I've used
IIRC, there were not too much diff before but okay 👍 Anyway, many thanks for your contributions 🚀 |
|
nice 🥳 tomorrow/the-day-after, i will have another look over the commits and write up some bullet points for easy inclusion in release notes for the next v (given that i've also fixed a couple of bugs and added a couple of ext's). |
|
|
|
Many thanks for the catch. EDIT: 441f265...70a1a94 should fix this We must add |
|
cheers, good catch! having had a closer look at the code, i guess we should transitions most of the ext stuff to call through
There might be some issues for that around the noise tests, given how small numerical differences can get amplified there (see the |
I wanted to switch one of my projects to cglm, but couldn't because I needed the Perlin noise implementation. Hence this PR.
So far this is just an implementation of
float perlin_vec4(vec4 point), hence the draft version. It is based on the glm::perlin. I'm currently planning to addperlin_vec3andperlin_vec2.Also, I've come across some missing
vec4-extfunctions which, for now, i've put inperlin.h, but I'd like to move them tovec4-ext(and check whether any othervec-ext's want them). These are: