We will eventually have a datasets module but what we really have right now are "pipes" in the sense of https://github.com/pytorch/data#what-are-datapipes. We should disambiguate the meaning of "dataset" (static set of files) from the pipelines we use to load files from a dataset.
Note that torchdata development is on hold. I think we should just move our internals under a new namespace for now.