Fix underflow issue with dirichlet sample#17488
Conversation
| BaseSampler<accscalar_t, decltype(normal_lambda)> standard_normal(normal_lambda); | ||
| auto sample = sample_gamma<scalar_t, accscalar_t, decltype(uniform_lambda), decltype(normal_lambda)>(alpha, standard_uniform, standard_normal); | ||
| auto min_value = std::numeric_limits<scalar_t>::lowest(); | ||
| auto min_value = std::numeric_limits<scalar_t>::min(); |
There was a problem hiding this comment.
I think this was a mistake - since lowest would be a negative value. I was seeing 0s on CUDA.
|
Looks like test failure wasn't totally fixed |
Thanks for pointing out. I couldn't see this on mac, but I am getting this error on my linux workstation. Will debug and push a fix for this. |
|
That was a legitimate error |
|
looks like the latest test failures are real: |
It seems like we might have a bias in the number of samples towards 0s and 1s for I have decreased the precision, but just noting that the imbalance issue reported in #15738 for GPU isn't fixed with this PR and still exists for certain builds. I'll need more help debugging that, because I can't seem to reproduce it. |
|
The recent failures are unrelated, and related to the build. |
|
@fritzo I'll merge this on your say so. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@li-roy We can't land this patch until BC is restored: @neerajprad You don't need to do anything, but this diff no longer applies on master due to a BC-breaking change to |
I see that this was recently changed in #17527. Should I push a fix for this? |
|
@ezyang in the meantime, this can just be changed to ret.scalar_type() i think |
|
@pytorchbot rebase this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Good to go. |
Summary: Addresses #15738, using fritzo's suggestion. This adds a `torch._sample_dirichlet` method in `Distributions.cpp` and `Distributions.cu`. - For CPU, this leads to no perf hit since all we do is to promote the `alpha` to double when getting the gamma samples (the gamma sampler anyways uses `accscalar_t`(double for CPU)) and cast it back to float32 on return. - I have added an analogous method for CUDA as well, but the default sampler for CUDA uses scalar_t for efficiency, so I have kept it as that. With this, I do not see the bias towards 1 as reported in #15738 with `float32`, but there is a spurious mode at 0.5, as would be expected. Users would need to explicitly use `float64` for GPU to not see the spurious mode at 0.5. (EDIT: see note below, it appears that the bias issue is still there for certain builds). Added some tests and checked that there is no perf regression. My experience with C++ is very limited, so apologies in advance if I missed something basic. cc. ailzhang, fritzo, fmassa Pull Request resolved: pytorch/pytorch#17488 Differential Revision: D14410301 Pulled By: ezyang fbshipit-source-id: 62b2f694b4642685eab06db96d74ce28e05c3992
Summary: Addresses pytorch#15738, using fritzo's suggestion. This adds a `torch._sample_dirichlet` method in `Distributions.cpp` and `Distributions.cu`. - For CPU, this leads to no perf hit since all we do is to promote the `alpha` to double when getting the gamma samples (the gamma sampler anyways uses `accscalar_t`(double for CPU)) and cast it back to float32 on return. - I have added an analogous method for CUDA as well, but the default sampler for CUDA uses scalar_t for efficiency, so I have kept it as that. With this, I do not see the bias towards 1 as reported in pytorch#15738 with `float32`, but there is a spurious mode at 0.5, as would be expected. Users would need to explicitly use `float64` for GPU to not see the spurious mode at 0.5. (EDIT: see note below, it appears that the bias issue is still there for certain builds). Added some tests and checked that there is no perf regression. My experience with C++ is very limited, so apologies in advance if I missed something basic. cc. ailzhang, fritzo, fmassa Pull Request resolved: pytorch#17488 Differential Revision: D14410301 Pulled By: ezyang fbshipit-source-id: 62b2f694b4642685eab06db96d74ce28e05c3992
Addresses #15738, using @fritzo's suggestion. This adds a
torch._sample_dirichletmethod inDistributions.cppandDistributions.cu.alphato double when getting the gamma samples (the gamma sampler anyways usesaccscalar_t(double for CPU)) and cast it back to float32 on return.float32, but there is a spurious mode at 0.5, as would be expected. Users would need to explicitly usefloat64for GPU to not see the spurious mode at 0.5. (EDIT: see note below, it appears that the bias issue is still there for certain builds).Added some tests and checked that there is no perf regression. My experience with C++ is very limited, so apologies in advance if I missed something basic. cc. @ailzhang, @fritzo, @fmassa