See here. Interestingly, the log std is not clipped or otherwise bounded in PPO, but its also invariant to the state there. Maybe investiage differences.
On a different note, SAC.action_dist should also be made private, since returning a distrax distribution from a jitted function does not work last time I checked.