Add tf.stop_gradient in tf.summary.histogram#5311
Conversation
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
stephanwlee
left a comment
There was a problem hiding this comment.
The change seems logic. Wondering, is this unique to histograms or other Summary APIs as well? Also, would you mind giving test a stab?
we might be able to do a unit test with XLA+CPU in the TensorBoard repo
If this is not true, I am okay with just moving forward as is.
|
Sure, added a test. It's quite weird since summaries can't actually run with XLA CPU/GPU (no outside compilation support), but it does fail before the stop_gradient change. As far as I know the issue is limited to histograms, the combination of a particular slice op and tf.cond. But I haven't looked thoroughly. Like I say, the more satisfying solution is just to define a gradient for the op, this is just to buy some time. |
FWIW, we're trying to simplify this op for TPU-related reasons anyway (#2885) so that would presumably also address the issue if it removes the offending slice + cond combination. |
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
Add tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
Working around a user issue. It's intended as a temporary fix, but keeping it long-term doesn't have any downsides that I know of.
Adds tf.stop_gradient in tf.summary.histogram. The function is not differentiable, and building gradient graphs for its conds have been causing some issues. This should be a no-op with a tiny positive performance impact for most users, and will work around an error for those using XLA with multiple/persistent tf.GradientTapes (due to XlaDynamicUpdateSlice not having a gradient defined; there is a separate TensorFlow bug about the op with no gradient, which is the more satisfying medium-term solution here).
N/A; this just changes the implementation of a summary API.
Two folks (at Alphabet) have confirmed that this works around an issue they reported for now. It's a bit tricky to reproduce; we might be able to do a unit test with XLA+CPU in the TensorBoard repo, although the original report relates to TPU usage.
The op should have a gradient defined, and will eventually.