More comprehensive end-to-end performance benchmarking

I checked out the benchmarks on the website and they feel a bit convoluted and doesn't give me a clear comparison. I.e I would prefer if the performance of different (big) models where compared with the equivalent formulation in Jax/Pytorch/Flux, etc and I think that'd drive wider adoption.

Is there such a comprehensive benchmark somewhere out there?