Thanks to visit codestin.com
Credit goes to github.com

Skip to content

libshm retry on EINTR#15964

Closed
ssnl wants to merge 2 commits into
pytorch:masterfrom
ssnl:libshm_eintr
Closed

libshm retry on EINTR#15964
ssnl wants to merge 2 commits into
pytorch:masterfrom
ssnl:libshm_eintr

Conversation

@ssnl
Copy link
Copy Markdown
Collaborator

@ssnl ssnl commented Jan 11, 2019

fixes #14314

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@soumith
Copy link
Copy Markdown
Collaborator

soumith commented Jan 11, 2019

@ssnl a test errored out on OSX:

Jan 11 10:22:01 test_batch_sampler (__main__.TestDataLoader) ... ERROR

@ssnl
Copy link
Copy Markdown
Collaborator Author

ssnl commented Jan 12, 2019

@soumith Fixed!

Comment thread torch/lib/libshm/err.h
//
// All functions used in `libshm` (so far) indicate error by returning `-1`. If
// you want to use a function with a different error reporting mechanism, you
// need to port `SYSCHECK` from `torch/lib/c10d/Utils.hpp`.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to the modified macro in #15986

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jan 15, 2019
Summary:
In #15964, I learned that `errno` is only meaningful if the function call fails. E.g., on some macos, a successful `fork()` sets `errno` to `EINVAL` in child process. This commit changes the `SYSCALL` macro so error checking is only done when an error happens. This means checking whether `rv == -1` for most calls, but is checking `rv == nullptr` for `inet_ntop`.

Now `SYSCALL` accepts a second argument `success_cond`, which should be an expression returning whether the call succeeded. `SYSCHECK_ERR_RETURN_NEG1` is the shorthand for checking if rv is `-1`.

Any suggestion on better macro names is welcomed.
Pull Request resolved: #15986

Reviewed By: janewangfb

Differential Revision: D13661790

Pulled By: pietern

fbshipit-source-id: 9551b14b9f88805454a7bfb8e4d39e0f3aed8131
@ssnl ssnl deleted the libshm_eintr branch January 16, 2019 16:04
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
fixes pytorch#14314
Pull Request resolved: pytorch#15964

Differential Revision: D13639034

Pulled By: soumith

fbshipit-source-id: 44592762aa46982e5d3616d55b5666a2c2ce9105
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
In pytorch#15964, I learned that `errno` is only meaningful if the function call fails. E.g., on some macos, a successful `fork()` sets `errno` to `EINVAL` in child process. This commit changes the `SYSCALL` macro so error checking is only done when an error happens. This means checking whether `rv == -1` for most calls, but is checking `rv == nullptr` for `inet_ntop`.

Now `SYSCALL` accepts a second argument `success_cond`, which should be an expression returning whether the call succeeded. `SYSCHECK_ERR_RETURN_NEG1` is the shorthand for checking if rv is `-1`.

Any suggestion on better macro names is welcomed.
Pull Request resolved: pytorch#15986

Reviewed By: janewangfb

Differential Revision: D13661790

Pulled By: pietern

fbshipit-source-id: 9551b14b9f88805454a7bfb8e4d39e0f3aed8131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libshm doesn't retry on EINTR at all

4 participants