Philip's blog

https://blog.philip-huang.tech/?page=iterable-style-dataset-worker-setting 

 



iterable-style dataset 可以處理巨量訓練資料迭代，但是當使用多個 worker 時，每個 worker 都會有一份相同的資料集副本，PyTorch 需要開發者自己去實現邏輯避免 worker 拿到重複資料。
> 資料通常是一個 generator 物件，所以就算多個 worker 手上都有一份副本也不會佔用許多記憶體。

根據 PyTorch 官方建議，我們可以使用 `torch.utils.data.get_worker_info()` 進行 worker 配置達到目的。

> For iterable-style datasets, since each worker process gets a replica of the dataset object, naive multi-process loading will often result in duplicated data. Using torch.utils.data.get_worker_info() and/or worker_init_fn, users may configure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Philip's blog #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Philip's blog #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions