Tags: neilshastry/datasets
Tags
Update TF to 4.0.1 Fix `tfds.load` when generation code isn't present and improve GCS compatibility. Thanks @carlthome for reporting and fixing the issue. PiperOrigin-RevId: 336306487
Update TFDS version to 4.0.0
API changes, new features:
* Dataset-as-folder: Dataset can now be self-contained module in a folder with checksums, dummy data,... This simplify implementing datasets outside the TFDS repository.
* `tfds.load` can now load dataset without using the generation class. So `tfds.load('my_dataset:1.0.0')` can work even if `MyDataset.VERSION == '2.0.0'` (See tensorflow#2493).
* Add a new TFDS CLI (see https://www.tensorflow.org/datasets/cli for detail)
* `tfds.testing.mock_data` does not require metadata files anymore!
* Add `tfds.as_dataframe(ds, ds_info)` with custom visualisation ([example](https://www.tensorflow.org/datasets/overview#tfdsas_dataframe))
* Add `tfds.even_splits` to generate subsplits (e.g. `tfds.even_splits('train', n=3) == ['train[0%:33%]', 'train[33%:67%]', ...]`
* Add new `DatasetBuilder.RELEASE_NOTES` property
* tfds.features.Image now supports PNG with 4-channels
* `tfds.ImageFolder` now supports custom shape, dtype
* Downloaded URLs are available through `MyDataset.url_infos`
* Add `skip_prefetch` option to `tfds.ReadConfig`
* `as_supervised=True` support for `tfds.show_examples`, `tfds.as_dataframe`
Breaking compatible changes:
* `tfds.as_numpy()` now returns an iterable which can be iterated multiple times. To migrate `next(ds)` -> `next(iter(ds))`
* Rename `tfds.features.text.Xyz` -> `tfds.deprecated.text.Xyz`
* Remove `DatasetBuilder.IN_DEVELOPMENT` property
* Remove `tfds.core.disallow_positional_args` (should use Py3 `*, ` instead)
* tfds.features can now be saved/loaded, you may have to overwrite [FeatureConnector.from_json_content](https://www.tensorflow.org/datasets/api_docs/python/tfds/features/FeatureConnector?version=nightly#from_json_content) and `FeatureConnector.to_json_content` to support this feature.
* Stop testing against TF 1.15. Requires Python 3.6.8+.
Other bug fixes:
* Better archive extension detection for `dl_manager.download_and_extract`
* Fix `tfds.__version__` in TFDS nightly to be PEP440 compliant
* Fix crash when GCS not available
* Script to detect dead-urls
* Improved open-source workflow, contributor guide, documentation
* Many other internal cleanups, bugs, dead code removal, py2->py3 cleanup, pytype annotations,...
And of course, new datasets, datasets updates.
A gigantic thanks to our community which has helped us debugging issues and with the implementation of many features, especially vijayphoenix@ which has been one of our main contributor for this release.
PiperOrigin-RevId: 335667395
Update TFDS version to 3.2.0 API: * Add a `tfds.ImageFolder` and `tfds.TranslateFolder` to easily create custom datasets with your custom data. * Add a `tfds.ReadConfig(input_context=)` to shard dataset, for better multi-worker compatibility (tensorflow#1426). * The default `data_dir` can be controlled by the `TFDS_DATA_DIR` environment variable. * Better usability when developing datasets outside TFDS * Downloads are always cached * Checksum are optional * Added a `tfds.show_statistics(ds_info)` to display [FACETS OVERVIEW](https://pair-code.github.io/facets/). Note: This require the dataset to have been generated with the statistics. * Open source various scripts to help deployment/documentation (Generate catalog documentation, export all metadata files,...) Documentation: * Catalog display images ([example](https://www.tensorflow.org/datasets/catalog/sun397#sun397standard-part2-120k)) * Catalog shows which dataset have been recently added and are only available in `tfds-nightly` <span class="material-icons">nights_stay</span> Breaking compatibility change: * Fix deterministic example order on Windows when path was used as key (this only impact a few datasets). Now example order should be the same on all platforms. * Remove `tfds.load('image_label_folder')` in favor of the more user-friendly `tfds.ImageFolder` Other: * Various performances improvements for both generation and reading (e.g. use `__slot__`, fix parallelisation bug in `tf.data.TFRecordReader`,...) * Various fixes (typo, types annotations, better error messages, fixing dead links, better windows compatibility,...) PiperOrigin-RevId: 320672697
Update TFDS version Breaking changes: * Legacy mode `tfds.experiment.S3` has been removed * New `tfds.image_classification` section and move there some datasets from `tfds.images`. * `in_memory` argument removed from `as_dataset`/`tfds.load` (small datasets are auto-cached). * DownloadConfig do not append the dataset name anymore (manual data should be in `<manual_dir>/` instead of `<manual_dir>/<dataset_name>/`) * Tests now check that all `dl_manager.download` urls has registered checksums. To opt-out, add `SKIP_CHECKSUMS = True` to your `DatasetBuilderTestCase`. * `tfds.load` now always returns `tf.compat.v2.Dataset`. If you're using still using `tf.compat.v1`: * Use `tf.compat.v1.data.make_one_shot_iterator(ds)` rather than `ds.make_one_shot_iterator()` * Use `isinstance(ds, tf.compat.v2.Dataset)` instead of `isinstance(ds, tf.data.Dataset)` * `tfds.Split.ALL` has been removed from the API. Future breaking change: * The tfds.features.text encoding API is deprecated. Please use [tensorflow_text](https://www.tensorflow.org/tutorials/tensorflow_text/intro) instead. * `num_shards` argument of `tfds.core.SplitGenerator` is currently ignored and will be removed in the next version. Features: * `DownloadManager` is now pickable (can be used inside Beam pipelines) * `tfds.features.Audio`: * Support float as returned value * Expose sample_rate through `info.features['audio'].sample_rate` * Support for encoding audio features from file objects * Various bug fixes, better error messages, documentation improvements * More datasets Thank you to all our contributors for helping us make TFDS better for everyone! PiperOrigin-RevId: 306768189
PreviousNext