The grid provided to the GridSamplingOp is of shape *batchdim, z,y,x,3 or *batchdim, y,x,2. The last dimension are the components of the grid, i.e. transformations along different spatial directions.
Currently they need to be provided as x,y,z and x,y, respectively. This it the convention of torch. Would it make sense to change this to z,y,x as we use this convention everywhere else?