Description
Geometry and matrix manipulation.
Description
Sometimes it is unavoidable you have to do stuff on CPU. Let's at least do it faster.
README.md
Geomancy
Linear is nice, but slow. Those are naughty, but a bit faster.
- All data types are monomorphic, unpacked and specialized.
Mat4
andVec4
areByteArray#
.Mat4
xMat4
andMat4
xVec4
is done with SIMD.
Matrix layout
CPU-side matrices compose in MVP order, optimized for mconcat (local1 : local2 : ... : root)
operation.
GPU-side, in GLSL, it is PVM * v
.
The Numbers
Storing a list of 1000 transformations (e.g. rendering instance data):
benchmarking 4x4 poke/1000/geomancy
time 11.76 μs (11.66 μs .. 11.92 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 11.75 μs (11.69 μs .. 11.86 μs)
std dev 283.4 ns (199.0 ns .. 399.0 ns)
variance introduced by outliers: 26% (moderately inflated)
If you're willing to adjust your shaders, it's only 2.4 times slower.
benchmarking 4x4 poke/1000/linear
time 28.29 μs (28.21 μs .. 28.38 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 28.40 μs (28.34 μs .. 28.50 μs)
std dev 267.4 ns (145.5 ns .. 419.9 ns)
Keeping your shaders straight make the affair 6.1x slower.
benchmarking 4x4 poke/1000/linear/T
time 73.70 μs (73.06 μs .. 74.49 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 72.77 μs (72.50 μs .. 73.22 μs)
std dev 1.129 μs (793.5 ns .. 1.580 μs)
Folding down a gloss
-style scene graph is where it is all started:
benchmarking 4x4 multiply/1000/geomancy
time 20.79 μs (20.77 μs .. 20.83 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 20.80 μs (20.78 μs .. 20.83 μs)
std dev 76.71 ns (60.01 ns .. 99.06 ns)
benchmarking 4x4 multiply/1000/linear
time 173.9 μs (173.6 μs .. 174.4 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 173.5 μs (173.2 μs .. 174.4 μs)
std dev 1.733 μs (727.8 ns .. 3.422 μs)
Add that time to the poking that'll follow.
Sure, it is in the lower microseconds range, but this budget can be used elsewhere.