ENH: use AVX for float32 and float64 implementation of sqrt, square, absolute, reciprocal, rint, floor, ceil and trunc #13885

r-devulap · 2019-07-01T22:37:00Z

By leveraging AVX gather instructions, this patch enables SIMD processing of strided arrays, which is currently handled in a scalar fashion. Performance of functions like sqrt improves by 9x. Detailed performance numbers are presented below (array size = 10000 for every benchmark):

       before           after         ratio
     [a14a8cef]       [de47213c]
     <master>         <sqrt-sq-rcp-abs-avx>
-     2.27±0.02ns      2.05±0.04ns     0.90  bench_avx.Custom.time_square_stride1_float32
-     3.49±0.04ns      3.11±0.06ns     0.89  bench_avx.Custom.time_square_stride1_float64
-     3.38±0.03ns      2.93±0.03ns     0.87  bench_avx.Custom.time_reciprocal_stride1_float32
-     2.04±0.02ns      1.75±0.02ns     0.86  bench_avx.Custom.time_absolute_stride1_float32
-     3.37±0.02ns      2.78±0.05ns     0.83  bench_avx.Custom.time_absolute_stride1_float64
-     7.15±0.03ns      5.52±0.02ns     0.77  bench_avx.Custom.time_square_stride4_float64
-     7.15±0.03ns      5.04±0.03ns     0.71  bench_avx.Custom.time_square_stride2_float64
-     7.56±0.04ns      4.46±0.03ns     0.59  bench_avx.Custom.time_square_stride4_float32
-     8.94±0.03ns      5.15±0.02ns     0.58  bench_avx.Custom.time_absolute_stride4_float64
-     13.6±0.03ns      7.49±0.03ns     0.55  bench_avx.Custom.time_reciprocal_stride4_float64
-     8.52±0.01ns      4.67±0.02ns     0.55  bench_avx.Custom.time_absolute_stride2_float64
-     13.7±0.05ns      7.48±0.02ns     0.55  bench_avx.Custom.time_reciprocal_stride2_float64
-     7.55±0.05ns      4.13±0.02ns     0.55  bench_avx.Custom.time_square_stride2_float32
-     8.41±0.03ns      4.13±0.01ns     0.49  bench_avx.Custom.time_absolute_stride4_float32
-     7.97±0.03ns      3.80±0.01ns     0.48  bench_avx.Custom.time_absolute_stride2_float32
-     10.6±0.04ns      4.46±0.03ns     0.42  bench_avx.Custom.time_reciprocal_stride4_float32
-     10.5±0.03ns      4.26±0.03ns     0.40  bench_avx.Custom.time_reciprocal_stride2_float32
-     40.6±0.02ns      7.97±0.03ns     0.20  bench_avx.Custom.time_sqrt_stride2_float64
-     40.6±0.05ns      7.97±0.05ns     0.20  bench_avx.Custom.time_sqrt_stride4_float64
-     37.5±0.03ns      4.24±0.02ns     0.11  bench_avx.Custom.time_sqrt_stride4_float32
-     37.5±0.02ns      4.06±0.02ns     0.11  bench_avx.Custom.time_sqrt_stride2_float32

r-devulap · 2019-07-01T22:38:12Z

The first 10 commits overlap with PR #13368, I will rebase if and when PR #13368 gets merged.

r-devulap · 2019-07-01T22:50:23Z

Wonder why I assumed npy_abs was a legit function :/ Will fix that error ...

r-devulap · 2019-07-03T17:51:28Z

The failing test is because of a newly added test (by this patch) that fails on Windows Python36-32bit:

with np.errstate(invalid='raise'):
    assert_raises(FloatingPointError, np.sqrt, np.array(-100., dtype=dt))

Ideally, a FP exception should be raised if a negative input is passed to sqrt (as it does on other platforms). Anyone knows why this fails on windows py3.6 32bit?

r-devulap · 2019-07-10T03:28:53Z

Added AVX implementation of floor, ceil, trunc and rint. As with other AVX implementations, these handles strided arrays as well. These functions see up to a 14x speed up. Detailed numbers presented below:

before           after         ratio
[af5a1084]       [2bfb7685]
<master>         <sqrt-sq-rcp-abs-avx>
19.5±0.01ns      5.17±0.02ns     0.27  bench_avx.Custom.time_rint_stride4_f64
20.6±0.02ns      5.19±0.01ns     0.25  bench_avx.Custom.time_trunc_stride4_f64
19.5±0.02ns      4.71±0.01ns     0.24  bench_avx.Custom.time_rint_stride2_f64
17.6±0.01ns      4.13±0.02ns     0.24  bench_avx.Custom.time_rint_stride4_f32
20.5±0.01ns      4.70±0.02ns     0.23  bench_avx.Custom.time_trunc_stride2_f64
24.1±0.03ns      5.19±0.02ns     0.22  bench_avx.Custom.time_ceil_stride4_f64
17.6±0.01ns      3.77±0.01ns     0.21  bench_avx.Custom.time_rint_stride2_f32
20.5±0.01ns      4.12±0.01ns     0.20  bench_avx.Custom.time_trunc_stride4_f32
23.9±0.01ns      4.69±0.01ns     0.20  bench_avx.Custom.time_ceil_stride2_f64
26.4±0.02ns      5.18±0.02ns     0.20  bench_avx.Custom.time_floor_stride4_f64
20.4±0.01ns      3.78±0.01ns     0.19  bench_avx.Custom.time_trunc_stride2_f32
26.3±0.02ns      4.70±0.01ns     0.18  bench_avx.Custom.time_floor_stride2_f64
23.8±0.01ns      4.13±0.01ns     0.17  bench_avx.Custom.time_ceil_stride4_f32
23.8±0.01ns      3.81±0.01ns     0.16  bench_avx.Custom.time_ceil_stride2_f32
26.2±0.01ns      4.12±0.02ns     0.16  bench_avx.Custom.time_floor_stride4_f32
26.0±0.20ns      3.79±0.02ns     0.15  bench_avx.Custom.time_floor_stride2_f32
19.2±0.01ns      2.71±0.07ns     0.14  bench_avx.Custom.time_rint_stride1_f64
20.2±0.02ns      2.73±0.07ns     0.13  bench_avx.Custom.time_trunc_stride1_f64
23.6±0.01ns      2.72±0.07ns     0.12  bench_avx.Custom.time_ceil_stride1_f64
25.9±0.02ns      2.70±0.08ns     0.10  bench_avx.Custom.time_floor_stride1_f64
17.2±0.01ns      1.76±0.04ns     0.10  bench_avx.Custom.time_rint_stride1_f32
20.1±0.02ns      1.74±0.03ns     0.09  bench_avx.Custom.time_trunc_stride1_f32
23.5±0.02ns      1.73±0.03ns     0.07  bench_avx.Custom.time_ceil_stride1_f32
25.9±0.01ns      1.73±0.04ns     0.07  bench_avx.Custom.time_floor_stride1_f32

r-devulap · 2019-08-24T15:47:48Z

re-based with master. Can someone help review the code please?

r-devulap · 2019-09-01T14:41:24Z

@mattip @charris, any way forward with this patch?

mattip · 2019-09-01T17:26:57Z

Should these have tests added to numpy/core/tests/test_umath_accuracy.py via some numpy/core/tests/data/umath-validation*? I guess the abs, floor, ceil, trucate are quite straightforward, but maybe for reciprocal, sqrt, square?

r-devulap · 2019-09-01T18:43:44Z

I do not think it is necessary though. There is no change to the way these functions are computed and all of these functions directly use the hardware instructions anyway (sqrt is computed using vsqrtps/vsqrtpd instruction, reciprocal is just vdivps, and square is just vmulps(x,x)). I did add a bunch of tests to validate special value floats and strided arrays to make sure I didn't break functionality.

mattip · 2019-09-01T18:45:53Z

ok. The code, as much as I understand it, looks reasonable. Maybe someone else would like to try to take a look?

r-devulap · 2019-09-13T03:53:05Z

ping .. anyone else who can review the code?

r-devulap · 2019-09-23T14:01:57Z

@mattip how do you want to proceed with this patch?

mattip · 2019-09-23T14:26:55Z

@r-devulap sorry this is taking so long.

I think this PR has the same problem with duplicate loops that gh-14554 cleaned up, could you take a quick look? Also, a rebase/merge with master is needed to clear the conflict.

FWIW, adding this code on my Ubuntu 18.04 dev system makes the wheel grow to 10_655_010 bytes, adding 45_101 bytes or about 0.5%, and it makes the _multiarray_umath shared object grow by 200_432 bytes to 17_778_296 bytes or about 1.1%.

I think this is acceptable for the speed boost, so I propose to merge it.

r-devulap · 2019-09-23T19:09:13Z

thanks @mattip, I will take a look and fix it.

(1) Workaround for bug in clang6: added missing GCC attribute to the prototype of ISA_sqrt_TYPE function which otherwise leads to a weird build failure in clang6 (gcc and clang7.0 doesnt have this issue) (2) Changed np.float128 to np.longdouble in tests: NumPy in windows doesn't support the np.float128 dtype (3) GCC 4.8/5.0 doesn't support _mm512_abs_ps/pd intrinsic

clang6 generates an invalid exception when computing abs value of +/-np.nan.

r-devulap · 2019-09-27T23:17:34Z

Rebased with master and added a commit to fix the duplicated inner loop for float16. Let me know if that looks correct?

juliantaylor · 2019-09-28T12:50:28Z

please independently verify these benchmarks, we often get this MR using avx for e.g. sqrt and we always rejected it because the benchmarks turned out to be wrong and there were no performance improvements as the hardware does not actually use the wider registers in parallel.

I think the last time I verified this was on a skylake xeon gold ... which should still be the latest intel generation.

juliantaylor · 2019-09-28T12:57:51Z

Do we need manually writen code for this?
the compiler should be able to vectorize these functions fine like it does for our AVX integer code.

r-devulap · 2019-09-28T16:41:31Z

This patch does not significantly improve performance of sqrt, reciprocal, square and absolute for stride = 1 (as can be seen in the performance numbers). These are memory bound functions and hence AVX provides no significant benefit over SSE. The patch ENH: Add AVX2 and AVX512 functionality for numpy.sqrt #12459 I submitted a while back missed this point exactly.
The main feature this patch brings in is vectorizing strided arrays which are currently processed in a scalar fashion (gather instructions were not supported with SSE). It would be great if someone else can corroborate performance numbers for strided arrays, but we are moving from scalar processing to vector, so I am pretty confident the numbers should hold.
floor, rint, ceil and trunc are currently processed scalar for all strides. gcc refuses to auto-vectorizes them. When I tried to compile a simple loop with gcc-9:

for (int ii = 0; ii < N; ii++)                                              
    op[ii] = floor(ip[ii]);

gcc refuses to vectorize this and gave me this info: missed: not vectorized: relevant stmt not supported: _23 = __builtin_floorf (_22). and I doubt if gcc can vectorize strided arrays.

r-devulap · 2019-10-07T01:08:01Z

@juliantaylor @mattip ping ...

mattip · 2019-10-08T11:33:26Z

Did you commit the benchmarks you are showing above? I do not see them in benchmarking results on a machine with avx512.

r-devulap · 2019-10-08T18:05:33Z

I just added a commit for the benchmarks.

mattip · 2019-10-10T06:22:45Z

Comparing the pre- and post-PR benchmarks, I see large changes. The bench_avx parameters are ufunc, stride, dtype.

% asv compare 68bd6e35 5ee46de5 --only-changed -f 1.3 --sort ratio
· `wheel_cache_size` has been renamed to `build_cache_size`. Update your \ 
`asv.conf.json` accordingly.
       before           after         ratio
     [68bd6e35]       [5ee46de5]
     <sqrt-sq-rcp-abs-avx~8>       <sqrt-sq-rcp-abs-avx~1>
-         277±1μs          204±1μs     0.73  bench_ufunc.UFunc.time_ufunc_types('trunc')
-         293±2μs          213±3μs     0.73  bench_ufunc.UFunc.time_ufunc_types('ceil')
-         307±2μs        218±0.9μs     0.71  bench_ufunc.UFunc.time_ufunc_types('floor')
-     7.94±0.03μs       5.33±0.2μs     0.67  bench_avx.AVX_UFunc.time_ufunc('square', 4, 'd')
-      7.40±0.5μs      4.83±0.01μs     0.65  bench_avx.AVX_UFunc.time_ufunc('square', 2, 'd')
-      12.3±0.5μs      7.14±0.04μs     0.58  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 4, 'd')
-     7.89±0.02μs       4.51±0.2μs     0.57  bench_avx.AVX_UFunc.time_ufunc('square', 4, 'f')
-      7.07±0.5μs      3.99±0.01μs     0.56  bench_avx.AVX_UFunc.time_ufunc('square', 2, 'f')
-     14.1±0.01μs       7.76±0.3μs     0.55  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 2, 'd')
-      9.35±0.5μs       5.05±0.2μs     0.54  bench_avx.AVX_UFunc.time_ufunc('absolute', 4, 'd')
-     7.44±0.03μs      4.00±0.02μs     0.54  bench_avx.AVX_UFunc.time_ufunc('absolute', 2, 'f')
-      7.59±0.5μs         3.96±0μs     0.52  bench_avx.AVX_UFunc.time_ufunc('absolute', 4, 'f')
-      10.8±0.3μs       4.87±0.2μs     0.45  bench_avx.AVX_UFunc.time_ufunc('absolute', 2, 'd')
-      10.2±0.7μs       4.34±0.2μs     0.43  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 4, 'f')
-      10.2±0.7μs       4.31±0.2μs     0.42  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 2, 'f')
-     15.8±0.01μs      4.39±0.01μs     0.28  bench_avx.AVX_UFunc.time_ufunc('rint', 4, 'f')
-     20.3±0.01μs       5.13±0.2μs     0.25  bench_avx.AVX_UFunc.time_ufunc('rint', 4, 'd')
-     20.3±0.02μs      5.04±0.03μs     0.25  bench_avx.AVX_UFunc.time_ufunc('trunc', 4, 'd')
-        21.1±2μs      5.00±0.02μs     0.24  bench_avx.AVX_UFunc.time_ufunc('rint', 2, 'd')
-     36.4±0.02μs       8.25±0.3μs     0.23  bench_avx.AVX_UFunc.time_ufunc('sqrt', 4, 'd')
-     18.4±0.01μs      4.04±0.02μs     0.22  bench_avx.AVX_UFunc.time_ufunc('trunc', 4, 'f')
-        23.3±2μs      5.11±0.03μs     0.22  bench_avx.AVX_UFunc.time_ufunc('ceil', 4, 'd')
-        21.1±2μs      4.59±0.01μs     0.22  bench_avx.AVX_UFunc.time_ufunc('trunc', 2, 'd')
-     23.2±0.01μs       4.99±0.2μs     0.21  bench_avx.AVX_UFunc.time_ufunc('floor', 2, 'd')
-     21.4±0.01μs      4.58±0.04μs     0.21  bench_avx.AVX_UFunc.time_ufunc('ceil', 2, 'd')
-     36.3±0.01μs       7.66±0.2μs     0.21  bench_avx.AVX_UFunc.time_ufunc('sqrt', 2, 'd')
-     18.3±0.04μs       3.77±0.1μs     0.21  bench_avx.AVX_UFunc.time_ufunc('rint', 2, 'f')
-     18.3±0.04μs       3.77±0.1μs     0.21  bench_avx.AVX_UFunc.time_ufunc('rint', 2, 'f')
-      21.4±0.5μs       4.07±0.1μs     0.19  bench_avx.AVX_UFunc.time_ufunc('ceil', 4, 'f')
-     27.1±0.04μs      5.04±0.02μs     0.19  bench_avx.AVX_UFunc.time_ufunc('floor', 4, 'd')
-        20.9±1μs      3.74±0.02μs     0.18  bench_avx.AVX_UFunc.time_ufunc('trunc', 2, 'f')
-     22.9±0.03μs      4.03±0.02μs     0.18  bench_avx.AVX_UFunc.time_ufunc('floor', 4, 'f')
-        23.0±2μs       4.04±0.1μs     0.18  bench_avx.AVX_UFunc.time_ufunc('ceil', 2, 'f')
-        20.2±1μs      3.10±0.04μs     0.15  bench_avx.AVX_UFunc.time_ufunc('rint', 1, 'd')
-        26.5±1μs      4.04±0.02μs     0.15  bench_avx.AVX_UFunc.time_ufunc('floor', 2, 'f')
-     21.4±0.02μs      3.18±0.03μs     0.15  bench_avx.AVX_UFunc.time_ufunc('ceil', 1, 'd')
-        21.8±2μs      3.20±0.04μs     0.15  bench_avx.AVX_UFunc.time_ufunc('trunc', 1, 'd')
-      15.8±0.2μs      2.12±0.05μs     0.13  bench_avx.AVX_UFunc.time_ufunc('rint', 1, 'f')
-        33.6±3μs       4.01±0.2μs     0.12  bench_avx.AVX_UFunc.time_ufunc('sqrt', 2, 'f')
-     26.8±0.02μs      3.19±0.04μs     0.12  bench_avx.AVX_UFunc.time_ufunc('floor', 1, 'd')
-     18.3±0.01μs      2.12±0.02μs     0.12  bench_avx.AVX_UFunc.time_ufunc('trunc', 1, 'f')
-        36.5±3μs      4.08±0.01μs     0.11  bench_avx.AVX_UFunc.time_ufunc('sqrt', 4, 'f')
-        22.9±2μs      2.18±0.02μs     0.10  bench_avx.AVX_UFunc.time_ufunc('ceil', 1, 'f')
-     26.6±0.04μs      2.18±0.02μs     0.08  bench_avx.AVX_UFunc.time_ufunc('floor', 1, 'f')

I am not sure why the single strided cases are showing such a large speed-up. Do these results make sense?

r-devulap · 2019-10-10T14:41:14Z

I think these numbers make sense. NumPy's current implementation of the rounding functions ceil, floor, rint and trunc are scalar even for stride 1, so the 10x speed up for the these functions is expected (my numbers are similar too). As expected, we do not need see any significant speed up for sqrt, square, reciprocal and absolute for stride 1, since these are currently implemented with SSE.

r-devulap · 2019-10-10T14:55:36Z

just out of curiosity, what CPU did you run these benchmarks on?

mattip · 2019-10-10T15:40:26Z

Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz

mattip · 2019-10-10T15:41:23Z

@juliantaylor ping

eric-wieser · 2019-10-10T17:25:42Z

numpy/core/src/umath/loops.c.src

+    if (!run_unary_@isa@_sqrt_@TYPE@(args, dimensions, steps)) {
+        UNARY_LOOP {
+            const @type@ in1 = *(@type@ *)ip1;
+            *(@type@ *)op1 = npy_sqrt@typesub@(in1);
+        }
+    }


Is the compiler able to generate the avx code automatically if you use

if (IS_OUTPUT_BLOCKABLE_UNARY(sizeof(@type@), @REGISTER_SIZE@)) { UNARY_LOOP { ... } } else { // as above UNARY_LOOP { ... } }

We use this trick in all sorts of places today to encourage it to generate optimized code.

I tried several options with GCC-9.2 and found the following:

Any compiler generated vectorized loop for floating point seems to require extra compiler options like -ffast-math (see https://gcc.gnu.org/projects/tree-ssa/vectorization.html#using) . Here is the code for an example of the sqrt loop with and without this option. There are several problems with this path: (1) -ffast-math obviously should not be used as a global compile option and (2) the code generated with this option ends up using a combination of vrsqrt14ps and vmulps instruction to compute square root which is neither accurate ~~nor fast~~ (vrsqrt14ps is only accurate up to the 6th decimal place and I have no idea why even the latest GCC wont use a simple vsqrtps instruction instead!)

The other problem is, no matter what option I try, I could not get GCC to vectorize the strided array case (see an example here). Even if somehow we were able to properly vectorize the case where stride = 1, as far as I know, we cannot auto-vectorize for general strided arrays.

I finally learnt why gcc wont use vsqrtps! vrsqrt14ps instruction is 1-3 cycles, where as vsqrtps is > 14 cycles. So its basically faster to compute invsqrt , multiple it with input and then correct it with one step of newton raphson than to compute an accurate sqrt directly. -ffast-math obviously chooses speed over accuracy. This logic works for single precision and not for double precision where it uses the vsqrtpd instruction (see code here) :)

mattip · 2019-10-15T16:44:24Z

Thanks @r-devulap.

r-devulap · 2019-10-15T19:47:21Z

thanks @mattip !

seiko2plus · 2020-05-09T01:45:16Z

numpy/core/src/umath/simd.inc.src

+             * Replace masked elements with 1.0f to avoid divide by zero fp
+             * exception in reciprocal
+             */
+            x = @isa@_set_masked_lanes_ps(x, ones_f, inv_load_mask);


I'm trying to understand this line, "How and why"?

@r-devulap, you forget to remove it right?

The masked load instruction loads 0 for elements for which the mask is set to 0 (which happens for the trailing end of the array). For reciprocal, it causes a 1/0 which raises an divide by zero exception. This line replaces the zeros with ones to avoid that exception.

@r-devulap, "the trailing end of the array", oh it makes sense now. thank you

charris added 01 - Enhancement 03 - Maintenance component: numpy._core labels Jul 2, 2019

r-devulap force-pushed the sqrt-sq-rcp-abs-avx branch from de47213 to 93e2e3f Compare July 2, 2019 03:21

mattip mentioned this pull request Jul 2, 2019

WIP: use openmp vector function ABI for sin, cos, exp and log #7865

Closed

r-devulap force-pushed the sqrt-sq-rcp-abs-avx branch 2 times, most recently from 76a620b to 2bfb768 Compare July 10, 2019 03:22

r-devulap changed the title ~~ENH: use AVX for sqrt, square, absolute and reciprocal~~ ENH: use AVX for float32 and float64 implementation of sqrt, square, absolute, reciprocal, rint, floor, ceil and trunc Jul 10, 2019

r-devulap force-pushed the sqrt-sq-rcp-abs-avx branch from 2bfb768 to 226ae20 Compare August 24, 2019 15:21

mattip requested a review from juliantaylor September 1, 2019 18:45

r-devulap added 7 commits September 27, 2019 15:39

ENH: use AVX for sqrt, square, reciprocal and absolute value

c37fb93

BUG: ignore invalid exception raised by absolute

344b40f

clang6 generates an invalid exception when computing abs value of +/-np.nan.

ENH: use AVX for floor, rint, ceil and trunc

7a327d0

TEST: disable raise invalid exception test for sqrt

299e533

MAINT: rebase with master

0286715

MAINT: removing duplicated inner loop for e->e

5ee46de

r-devulap force-pushed the sqrt-sq-rcp-abs-avx branch from 226ae20 to 5ee46de Compare September 27, 2019 22:59

BENCH: adding benchmarks for avx based ufuncs

5323bbf

eric-wieser reviewed Oct 10, 2019

View reviewed changes

mattip merged commit c7f532e into numpy:master Oct 15, 2019

seiko2plus reviewed May 9, 2020

View reviewed changes

eric-wieser mentioned this pull request Jun 22, 2020

abs() on structured array gives invalid values in numpy 1.19.0 #16660

Closed

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

Uh oh!

ENH: use AVX for float32 and float64 implementation of sqrt, square, absolute, reciprocal, rint, floor, ceil and trunc #13885

ENH: use AVX for float32 and float64 implementation of sqrt, square, absolute, reciprocal, rint, floor, ceil and trunc #13885

Uh oh!

Conversation

r-devulap commented Jul 1, 2019

Uh oh!

r-devulap commented Jul 1, 2019

Uh oh!

r-devulap commented Jul 1, 2019

Uh oh!

r-devulap commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Jul 10, 2019

Uh oh!

r-devulap commented Aug 24, 2019

Uh oh!

r-devulap commented Sep 1, 2019

Uh oh!

mattip commented Sep 1, 2019

Uh oh!

r-devulap commented Sep 1, 2019

Uh oh!

mattip commented Sep 1, 2019

Uh oh!

r-devulap commented Sep 13, 2019

Uh oh!

r-devulap commented Sep 23, 2019

Uh oh!

mattip commented Sep 23, 2019

Uh oh!

r-devulap commented Sep 23, 2019

Uh oh!

r-devulap commented Sep 27, 2019

Uh oh!

juliantaylor commented Sep 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliantaylor commented Sep 28, 2019

Uh oh!

r-devulap commented Sep 28, 2019

Uh oh!

r-devulap commented Oct 7, 2019

Uh oh!

mattip commented Oct 8, 2019

Uh oh!

r-devulap commented Oct 8, 2019

Uh oh!

mattip commented Oct 10, 2019

Uh oh!

r-devulap commented Oct 10, 2019

Uh oh!

r-devulap commented Oct 10, 2019

Uh oh!

mattip commented Oct 10, 2019

Uh oh!

mattip commented Oct 10, 2019

Uh oh!

eric-wieser Oct 10, 2019

Choose a reason for hiding this comment

Uh oh!

r-devulap Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-devulap Oct 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Oct 15, 2019

Uh oh!

r-devulap commented Oct 15, 2019

Uh oh!

seiko2plus May 9, 2020

Choose a reason for hiding this comment

Uh oh!

seiko2plus May 9, 2020

Choose a reason for hiding this comment

Uh oh!

r-devulap commented Jul 3, 2019 •

edited

Loading

juliantaylor commented Sep 28, 2019 •

edited

Loading

r-devulap Oct 10, 2019 •

edited

Loading

r-devulap Oct 11, 2019 •

edited

Loading