In the 4x4 code, some temporary variables which are the result of multiplying two entries are defined and get used twice, first in the determinant and then in the output.
The 3x3 case also has some computations that get used more than once (such as m.m11 * m.m22), and there's more than 1 potential way of picking which if you were to expand the products from the determinant .
Would be nice to have an optimized version of 3x3 reusing computations.