Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow dataloader to accept a custom memory pinning function#16743

Closed
mcarilli wants to merge 2 commits into
pytorch:masterfrom
mcarilli:simon_custom_batch_pin
Closed

Allow dataloader to accept a custom memory pinning function#16743
mcarilli wants to merge 2 commits into
pytorch:masterfrom
mcarilli:simon_custom_batch_pin

Conversation

@mcarilli
Copy link
Copy Markdown
Collaborator

@mcarilli mcarilli commented Feb 5, 2019

Renewed attempt at #14171

From the original PR:

Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.

This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.

The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. @slayton58 suggested a cleaner approach: allow the user to define a pin_memory method on their custom types, and have pin_memory_batch check for the presence of that method in the incoming batch as a fallback. I've updated the test and docstrings accordingly.

The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.

@fmassa and @yf225 who were my POCs on the old PR.

Copy link
Copy Markdown
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the two nits. looks reasonable otherwise

Comment thread torch/utils/data/dataloader.py Outdated
into CUDA pinned memory before returning them.
into CUDA pinned memory before returning them. If your data elements
are a custom type, or your ``collate_fn`` returns a batch that is a custom type
see the Warning below.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably should use lowercase warning

Comment thread torch/utils/data/dataloader.py Outdated
or if each element of your batch is a custom type, the pinning logic will not
recognize them, and it will return that batch (or those elements)
without pinning the memory. To enable memory pinning for custom batch or data types,
define a pin_memory method on your custom type(s). See ``SimpleCustomBatch`` and
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users don't have direct access to these files. I would prefer writing them in an Example:: code block below.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I do both: an example of a custom batch class, and a link to the tests script on Github? I'd like users who are curious to be able to find a fully-worked example, which is too big to fit on the docs page itself.

Copy link
Copy Markdown
Collaborator Author

@mcarilli mcarilli Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, a minimal-but-complete in-place example doesn't actually look bad there imo. I've updated the PR without any reference to the tests script. Let me know what you think.

Copy link
Copy Markdown
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!

One question I have: isn't it more general to allow passing a custom pin_memory function?
While I'm ok with this PR, the assumption about pin_memory is a bit hidden to me, and I'd think it would make sense to expose it to the user.

Was the reason for replacing it with the pin_memory attribute to circumvent the potential Windows timeout?

@mcarilli
Copy link
Copy Markdown
Collaborator Author

mcarilli commented Feb 6, 2019

@fmassa No, I don't think this has anything to do with the windows OOM issue. I have no idea where the windows OOM issue came from or if it's even relevant.

I personally like this approach because it seems more surgical/localized. Off the top of my head, I also can't think of a case where it might be less general, since the user has the ability to supply a custom collate function to return a fully customized batch type already. If you have any misgivings about the new approach I can revert it to the old approach.

@fmassa
Copy link
Copy Markdown
Member

fmassa commented Feb 6, 2019

I'm ok with this new approach.

cc @gchanan to see if he has any preferences.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…16743)

Summary:
Renewed attempt at pytorch#14171

From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.

The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader.  slayton58 suggested a cleaner approach:  allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback.  I've updated the test and docstrings accordingly.

The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related.  I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.

fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: pytorch#16743

Differential Revision: D13991745

Pulled By: ezyang

fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants