An alternative to this is to always return zero and implement wrappers for each game that set the reward. That seems a bit nicer to me in terms of layering things.
Same goes for the observation spaces. The base class here could just return an image and everything else in the info dict, and the wrapper could parse things from info into the obs.