Great work on the port!
In java, the equivalent to countBits gets compiled as an intrinsic and it is crucial for good performance.
Though I have done no benchmarking, I believe that an ASM version of the performance-critical functions is going to be needed, like Will's bitset library:
https://github.com/willf/bitset/blob/master/popcnt_amd64.s