ENH: Convert fp32 sin/cos from C universal intrinsics to C++ using Highway #25781

r-devulap · 2024-02-07T05:11:53Z

This patch is to experiment with highway and see how we can leverage its intrinsics using static dispatch. I would think these are the minimum requirements:

passes CI on all the relevant platforms: AVX512_SKX, [AVX2, FMA3], VSX4, VSX3, VSX2, NEON_VFPV4, VXE2, VX. All tests pass on my local AVX512 and AVX2 machines.
No performance regressions.

On x86, both AVX-512 and AVX2 seem to have performance regressions. Yet to figure out where they are coming from.

AVX-512 benchmarks

These are about 1.5x slower even when built with -march=skylake-avx512. If we use just -mavx512f -mavx512bw, etc, then its about 4x slower.

| Change   | Before [5867ee6b] <main>   | After [d5596c17] <sincos-hwy>   |   Ratio | Benchmark (Parameter)                                                   |
|----------|----------------------------|---------------------------------|---------|-------------------------------------------------------------------------|
| +        | 7.48±0μs                   | 11.5±0.01μs                     |    1.53 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 1, 'f')        |
| +        | 7.70±0.06μs                | 11.5±0.02μs                     |    1.49 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 1, 'f')        |
| +        | 11.8±0.1μs                 | 16.3±0.04μs                     |    1.39 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 2, 'f')        |
| +        | 11.9±0.1μs                 | 16.3±0.09μs                     |    1.38 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 2, 'f')        |
| +        | 23.2±0.06μs                | 28.0±0.01μs                     |    1.21 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 4, 2, 'f')        |
| +        | 23.3±0.2μs                 | 28.1±0.01μs                     |    1.21 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 4, 2, 'f')        |
| +        | 20.4±0.02μs                | 24.4±0.09μs                     |    1.2  | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 4, 1, 'f')        |
| +        | 20.3±0.01μs                | 24.5±0.02μs                     |    1.2  | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 4, 1, 'f')        |
| -        | 66.1±0.02μs                | 62.2±0.07μs                     |    0.94 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'cos'>, 4, 2, 'f') |
| -        | 67.2±0.01μs                | 62.2±0.03μs                     |    0.93 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'sin'>, 4, 2, 'f') |
| -        | 64.6±0.03μs                | 59.1±0.01μs                     |    0.91 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'cos'>, 4, 1, 'f') |
| -        | 65.2±0.02μs                | 59.1±0.04μs                     |    0.91 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'sin'>, 4, 1, 'f') |
| -        | 60.6±0.04μs                | 54.4±0.02μs                     |    0.9  | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'cos'>, 1, 2, 'f') |
| -        | 59.0±0.05μs                | 51.6±0.04μs                     |    0.88 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'cos'>, 1, 1, 'f') |
| -        | 61.5±0.03μs                | 54.1±0.06μs                     |    0.88 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'sin'>, 1, 2, 'f') |
| -        | 59.9±0.01μs                | 51.8±0.05μs                     |    0.86 | bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'sin'>, 1, 1, 'f') |

AVX2 benchmarks

These are about 1.34x slower when built using -march=skylake. If we use -maxv2 or even -march=haswell, then these seem to be 4x slower.

| Change   | Before [5867ee6b] <main>   | After [d5596c17] <sincos-hwy>   |   Ratio | Benchmark (Parameter)                                            |
|----------|----------------------------|---------------------------------|---------|------------------------------------------------------------------|
| +        | 12.7±0.1μs                 | 17.1±0.1μs                      |    1.34 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 1, 'f') |
| +        | 13.1±0.01μs                | 17.2±0.2μs                      |    1.31 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 1, 'f') |
| +        | 40.5±0.02μs                | 49.7±0.2μs                      |    1.23 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 4, 2, 'f') |
| +        | 36.7±0.2μs                 | 45.1±0.2μs                      |    1.23 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 4, 1, 'f') |
| +        | 37.2±0.04μs                | 45.2±0.4μs                      |    1.22 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 4, 1, 'f') |
| +        | 40.3±0.01μs                | 49.3±0.2μs                      |    1.22 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 4, 2, 'f') |
| +        | 18.5±0.1μs                 | 22.2±0.2μs                      |    1.2  | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 2, 'f') |
| +        | 18.5±0.02μs                | 21.8±0.2μs                      |    1.18 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 2, 'f') |

Mousius · 2024-02-07T10:50:32Z

cc @jan-wassenberg

Mousius · 2024-02-07T11:16:26Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+        }
+        opmask_t nnan_mask = hn::Not(hn::IsNaN(x_in));
+        // Eliminate NaN to avoid FP invalid exception
+        x_in = hn::IfThenElse(nnan_mask, x_in, zerosf);


This used to be wrapped in #if NPY_SIMD_CMPSIGNAL, which is 0 on AVX512 and AVX2.

Yeah, I had test failures which got resolved when I got rid of it. Need to figure that out.

Mousius · 2024-02-07T11:26:54Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+            }
+        }
+        if (simd_maski != (npy_uint64)((1 << lanes) - 1)) {
+            float ip_fback[hn::Lanes(f32)];


Unsure if the benchmarks hit this case often, but it'd be worth checking this compiles to a vector without the alignment attributes (I think the Highway thing here would be HWY_ALIGN)

Mousius · 2024-02-07T11:52:22Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+}
+
+NPY_FINLINE vec_f32
+GatherIndexN(const float* src, npy_intp ssrc, npy_intp len)


I'm not massively familiar with x86, but it looks like npyv_loadn_tillz_s32 uses a gather instruction here rather than a loop, should try it to see if it provides similar performance?

The 1.5x slowdown actually happens in the non strided case which don't use Gather or Scatter. But even for the strided case, using gather is slower than scalar method (since the DOWNFALL CVE).

Ahh, apologies, I wasn't sure if the code generated for Gather was pretty slow on x86 as well, it doesn't look as good as the ASIMD npyv_ function 😅 Too many architectures 😸

Mousius · 2024-02-07T12:10:44Z

Ok, so I played a bit of spot the difference and left some comments, and I ran some benchmarks quickly - it looks like with ASIMD, there's regressions with these:

bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 4, 2, 'f')
bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 4, 2, 'f')
bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'sin'>, 4, 2, 'f')
bench_ufunc_strides.UnaryFPSpecial.time_unary(<ufunc 'cos'>, 4, 2, 'f')
bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 2, 'f')

This indicates the gather/scatter aren't as optimal as the NumPy ones; I wonder if we can blend the NumPy loads and stores with the Highway code here 🤔

Mousius · 2024-02-07T12:18:46Z

meson_cpu/x86/meson.build

@@ -60,7 +60,7 @@ FMA3 = mod_features.new(
  test_code: files(source_root + '/numpy/distutils/checks/cpu_fma3.c')[0]
 )
 AVX2 = mod_features.new(
-  'AVX2', 25, implies: F16C, args: '-mavx2',
+  'AVX2', 25, implies: F16C, args: '-march=skylake',


I wonder if this is more related to implying -mtune=skylake? Highway relies on the compiler to do some optimisations, and I do not know what -mtune=generic does with just -mavx2 🤔

Just looked at this a bit more deeply, and it looks like with just -mavx2 you don't get HWY_AVX2, these flags worked to get HWY_AVX2:

-mpclmul -maes -mavx -mavx2 -mbmi -mbmi2 -mfma -mf16c

I assume they're all implied by haswell but avx2 is actually way more limiting. Is there a processor which supports avx2 without all these things? 🤔

yet, the surprising thing was using -march=haswell makes the performance a lot worse (nearly 4x slower). I need some input from @jan-wassenberg to see what is happening here.

You are exactly right, we require more flags than just -march=haswell: also -maes, then it is sufficient for HWY_AVX2.
Unfortunately there are some very few Haswells without AES, so we do check for that.

Out of curiosity: why do you need -maes for HWY_AVX2 if you aren't using any AES related instructions?

AESRound is a supported Highway op; we do not know whether users will use it. I suppose the options are to detect at runtime (but that would add considerable overhead), or add another target for AVX2 \ AES, but this is very rare, so not worthwhile, right?

r-devulap · 2024-02-07T16:57:33Z

This indicates the gather/scatter aren't as optimal as the NumPy ones; I wonder if we can blend the NumPy loads and stores with the Highway code here 🤔

That is extremely likely. My Gather/Scatter were just a quick and dirty way to make this work. I will eventually move to using the highway implementation.

seiko2plus · 2024-02-07T19:34:51Z

Please explain to me, where did this conclusion come from? the current speed-up is related to special cases (the libm fallback has been improved somehow) which affects both contiguous and non-contiguous. So I think maybe the regression is related to it.

r-devulap · 2024-02-07T19:45:16Z

@seiko2plus I am seeing slowdown for strided cases as well. I just meant this could be a result of my GatherIndexN and ScatterIndexN functions which just perform a simple scalar loop to load and store. It's only a guess, I will need to take a deeper look into it.

| +        | 11.8±0.1μs                 | 16.3±0.04μs                     |    1.39 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 2, 'f')        |
| +        | 11.9±0.1μs                 | 16.3±0.09μs                     |    1.38 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 2, 'f')

jan-wassenberg · 2024-02-07T22:50:33Z

-march=skylake-avx512. If we use just -mavx512f -mavx512bw, etc, then its about 4x slower.

Right, Highway checks for multiple CPU flags before using AVX3. -march=skx is sufficient here, but for the individual -mavx512, that would require a long list.

jan-wassenberg · 2024-02-12T17:44:58Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+         * these numbers
+         */
+        if (!hn::AllFalse(f32, simd_mask)) {
+            vec_f32 x = hn::IfThenElse(hn::And(nnan_mask, simd_mask), x_in, zerosf);


Could also use IfThenElseZero here and above if you like.

jan-wassenberg · 2024-02-12T17:48:59Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+            cos = hn::IfThenElse(sine_mask, sin, cos);
+
+            // multiply by -1 for appropriate elements
+            opmask_t negate_mask = hn::RebindMask(f32, hn::Eq(hn::And(iquadrant, twos), twos));


Is IfNegativeThenNegOrUndefIfZero faster than these two lines?

jan-wassenberg · 2024-02-12T17:50:16Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+                ScatterIndexN(cos, dst, sdst, len);
+            }
+        }
+        if (simd_maski != (npy_uint64)((1 << lanes) - 1)) {


StoreMaskBits is not necessarily cheap. Recommend instead using CountTrue to get the number instead.

I am now using if (!hn::AllTrue(f32, simd_mask)) instead and moved the StoreMaskBits inside the if condition (compute only when required). I am afraid CountTrue won't be sufficient here. We also need to know which specific lanes have bit set to 1.

r-devulap · 2024-02-28T22:15:57Z

Moving the hn::StoreMaskBits to inside the if condition helped perf by a little bit, now we are just about 1.2x slower.

| +        | 7.47±0.01μs                | 9.11±0.05μs                     |    1.22 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 1, 'f')        |
| +        | 7.68±0.03μs                | 9.32±0.04μs                     |    1.21 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 1, 'f')        |

jan-wassenberg · 2024-02-29T11:33:20Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+
+    for (; len > 0; len -= lanes, src += ssrc*lanes, dst += sdst*lanes) {
+        vec_f32 x_in;
+        if (ssrc == 1) {


LoadN should only be used in the tail of a loop. It is probably worthwhile replicating the body of the loop (e.g. by moving it to a helper function), with most iterations using a normal load, and only the last iteration using LoadN.

jan-wassenberg · 2024-02-29T11:35:03Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+        } else {
+            x_in = GatherIndexN(src, ssrc, len);
+        }
+        opmask_t nnan_mask = hn::Not(hn::IsNaN(x_in));


We can avoid the Not here by swapping the order of args to subsequent IfThenElse that use this, and also replacing And(nnan_mask, ..) with AndNot(nan_mask, ..).

jan-wassenberg · 2024-02-29T11:36:14Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+            cos = hn::IfThenElse(nnan_mask, cos, hn::Set(f32, NPY_NANF));
+
+            if (sdst == 1) {
+                hn::StoreN(cos, f32, dst, len);


As with LoadN, StoreN should only be called in the last iteration.

jan-wassenberg · 2024-02-29T11:36:50Z

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp

+                ScatterIndexN(cos, dst, sdst, len);
+            }
+        }
+        if (!hn::AllTrue(f32, simd_mask)) {


Consider HWY_UNLIKELY annotation?

jan-wassenberg · 2024-02-29T11:37:54Z

Moving the hn::StoreMaskBits to inside the if condition helped perf by a little bit, now we are just about 1.2x slower.

| +        | 7.47±0.01μs                | 9.11±0.05μs                     |    1.22 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'sin'>, 1, 1, 'f')        |
| +        | 7.68±0.03μs                | 9.32±0.04μs                     |    1.21 | bench_ufunc_strides.UnaryFP.time_unary(<ufunc 'cos'>, 1, 1, 'f')        |

Cool, I have added some additional suggestions :)

Mousius · 2024-03-01T22:08:12Z

@jan-wassenberg just looking at the CI failures and it seems there's some compiler incompatibility on PPC (https://github.com/numpy/numpy/actions/runs/8087779139/job/22100478961?pr=25781#step:8:504) and an abort on Z13 (https://github.com/numpy/numpy/actions/runs/8087779139/job/22100479504?pr=25781) any ideas?

There's also a failure on armhf but I can take a look at that when I get a minute (https://github.com/numpy/numpy/actions/runs/8087779139/job/22100478151?pr=25781)

jan-wassenberg · 2024-03-02T05:16:54Z

hm, the Z13 error is realloc(): invalid next size Fatal Python error: Aborted
Seems unrelated to SIMD; are we possibly corrupting the heap?

The latter at least I can help with. We are missing HWY_ATTR:

Additionally, each function that calls Highway ops (such as Load) must either be prefixed with HWY_ATTR, OR reside between HWY_BEFORE_NAMESPACE() and HWY_AFTER_NAMESPACE(). Lambda functions currently require HWY_ATTR before their opening brace.

r-devulap · 2024-03-20T23:05:32Z

The latter at least I can help with. We are missing HWY_ATTR:

Adding HWY_ATTR fixed the build errors on ppc64le. Why did it fail only for this platform though?

Need help with debugging 3 more failures:

The crash on s390x platforms. Ping @seiko2plus
armhf failure: looks like accuracy problems ping @Mousius
@jan-wassenberg any idea why the cygwin CI fails with this error?

../numpy/_core/src/highway/hwy/ops/generic_ops-inl.h: In function 'void hwy::N_AVX2::StoreInterleaved3(hwy::N_AVX2::VFromD<D>, hwy::N_AVX2::VFromD<D>, hwy::N_AVX2::VFromD<D>, D, hwy::N_AVX2::TFromD<D>*)':
../numpy/_core/src/highway/hwy/ops/generic_ops-inl.h:1320:14: error: expected unqualified-id before numeric constant
 1320 |   const auto B0 = TableLookupBytesOr0(v0, shuf_B0);

jan-wassenberg · 2024-03-22T02:14:42Z

hm, strange. Neither the x86 implementation of TableLookupBytesOr0, nor the quoted line and the one before it, have a numeric constant. Which compiler is cygwin using?

r-devulap · 2024-03-26T15:55:53Z

hm, strange. Neither the x86 implementation of TableLookupBytesOr0, nor the quoted line and the one before it, have a numeric constant. Which compiler is cygwin using?

From the logs:

C++ compiler for the host machine: c++ (gcc 11.4.0 "c++ (GCC) 11.4.0")
C++ linker for the host machine: c++ ld.bfd 2.42

jan-wassenberg · 2024-03-27T09:57:45Z

hm, here we are able to compile StoreInterleaved3 using GCC 11.4. Are you able to repro the issue in godbolt?

Mousius

LGTM! 🥳

Mousius · 2024-06-26T13:29:44Z

Huge milestone, thanks @r-devulap!

rgommers · 2024-06-27T13:09:11Z

This was a big lift, thank you all and congrats on getting it over the finish line!

seiko2plus · 2024-10-22T04:42:56Z

@r-devulap, it's great to see progress on moving to Highway! I noticed that you've disabled dispatching for the CPU features of ppc64 and zSystem, which not going to prevent baseline capabilities. Have you opened any issues related to Highway or NumPy reporting regarding this?

r-devulap · 2024-10-22T23:16:21Z

@seiko2plus good to hear from you again!

Have you opened any issues related to Highway or NumPy reporting regarding this?

Good point, we haven't. Perhaps highway is the right place? We aren't sure where the bug is coming from though.

jan-wassenberg · 2024-10-23T07:04:07Z

hm, what's the issue we were addressing by removing the VSX etc?

r-devulap · 2024-10-23T15:51:15Z

@seiko2plus #27627

@jan-wassenberg IIRC, it was a seg fault. We should see the failure in #27627

For clarification, SIMD optimizations for sine and cosine functions on both ppc64 and z/Architecture (IBM Z) were disabled by gh-25781 to bypass CI tests. This PR aims to re-enable optimizations for z/Architecture after addressing the following runtime errors, while gh-27627 re-enabled ppc64 optimizations. * Re-enable VXE for sin/cos HWY implementation --------- Co-authored-by: Sayed Adel <[email protected]>

github-actions bot added the 01 - Enhancement label Feb 7, 2024

Mousius reviewed Feb 7, 2024

View reviewed changes

jan-wassenberg reviewed Feb 12, 2024

View reviewed changes

r-devulap force-pushed the sincos-hwy branch from c0ea944 to bb36792 Compare February 28, 2024 22:12

jan-wassenberg reviewed Feb 29, 2024

View reviewed changes

Mousius mentioned this pull request Mar 4, 2024

BUG: crashes in sort/argsort on macOS arm64 #25464

Closed

sterrettm2 mentioned this pull request Mar 4, 2024

ENH: Convert tanh from C universal intrinsics to C++ using Highway #25934

Merged

r-devulap force-pushed the sincos-hwy branch from 1b5e3b3 to 6cffc47 Compare March 20, 2024 21:21

Mousius added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Mar 28, 2024

luyahan mentioned this pull request Apr 3, 2024

ENH: Add RISC-V Vector V1.0 Support #26200

Open

r-devulap added 12 commits June 20, 2024 08:51

ENH: Rewrite simd sin/cos f32 using Highway instrinsics

1fe8948

Eliminate NAN in sincos

c1b35d2

use -march=skylake-avx512 and -march=skylake

f69db54

Use references for function arguements

0c08261

Use aligned store

c56abf0

MAINT: Add HWY_ATTR to locally defined functions

f313286

AVX2 now includes FMA3

a46b082

Update highway to latest: fix build failure on cygwin

7d6cc6d

MAINT: Use hwy gather/scatter and hwy macros

ab62b9f

Update highway to latest: fix SIGILL for -ve indices

c535a4a

Revert back to NPY_SIMD_FMA3

2983386

Update highway to latest

0293ef0

r-devulap force-pushed the sincos-hwy branch from d7fd60f to 1cfbe72 Compare June 20, 2024 16:22

r-devulap closed this Jun 24, 2024

r-devulap reopened this Jun 24, 2024

r-devulap force-pushed the sincos-hwy branch from 7de2073 to e7e6574 Compare June 24, 2024 17:07

Remove VSX from build targets

e7e6574

Mousius approved these changes Jun 26, 2024

View reviewed changes

Mousius merged commit c46a513 into numpy:main Jun 26, 2024
68 checks passed

jjhelmus mentioned this pull request Aug 29, 2024

BUG,TST: Test again failing on spurious invalid values on M1 for sin, cos with clang < 16. #27313

Closed

r-devulap mentioned this pull request Sep 12, 2024

ENH, SIMD: Add C++ wrapper for universal intrinsics #21057

Closed

Rohanjames1997 mentioned this pull request Oct 9, 2024

ENH: Speed up umath functions using NEON/SVE | SIMD #27533

Open

seiko2plus mentioned this pull request Nov 9, 2024

ENH: Re-enable VXE from build targets for sin/cos #27665

Merged

Uh oh!

ENH: Convert fp32 sin/cos from C universal intrinsics to C++ using Highway #25781

ENH: Convert fp32 sin/cos from C universal intrinsics to C++ using Highway #25781

Uh oh!

Conversation

r-devulap commented Feb 7, 2024

AVX-512 benchmarks

AVX2 benchmarks

Uh oh!

Mousius commented Feb 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mousius Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mousius commented Feb 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-devulap Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-devulap commented Feb 7, 2024

Uh oh!

seiko2plus commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Feb 7, 2024

Uh oh!

jan-wassenberg commented Feb 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-devulap commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg commented Feb 29, 2024

Uh oh!

Mousius commented Mar 1, 2024

Uh oh!

jan-wassenberg commented Mar 2, 2024

Uh oh!

r-devulap commented Mar 20, 2024

Uh oh!

jan-wassenberg commented Mar 22, 2024

Mousius Feb 7, 2024 •

edited

Loading

r-devulap Feb 7, 2024 •

edited

Loading

jan-wassenberg Feb 7, 2024 •

edited

Loading

seiko2plus commented Feb 7, 2024 •

edited

Loading

r-devulap commented Feb 28, 2024 •

edited

Loading

seiko2plus commented Oct 22, 2024 •

edited

Loading