Should make the doc of `nn.CrossEntropyLoss()` more clear #134853

hyperkai · 2024-08-30T11:42:52Z

📚 The doc issue

The doc of nn.CrossEntropyLoss() explains about target tensor in a complex way as shown below. *It's difficult to understand:

So from my understanding and experiments, these simple explanations below should be added to the doc above. *It's easy to understand:

The target tensor whose size is different from input tensor is treated as class indices.
The target tensor whose size is same as input tensor is the class probabilities which should be between [0, 1].

And from what the doc says below and my experiments, when target tensor is treated as class indices, softmax() is used both for input and target tensor internally:

The target that this criterion expects should contain either:

Class indices in the range ...
...
Note that this case is equivalent to applying LogSoftmax on an input, followed by NLLLoss.

But when target tensor is treated as class probabilities, softmax() is used only for input tensor internally, that's why the example of target tensor as class indices in the doc doesn't use softmax() externally while the example of target tensor as class probabilities in the doc uses softmax() externally as shown below:

So, the doc also should say something like as shown below. *You also use the words class indices mode and class probabilities mode :

softmax() is used internally for input tensor, both when target tensor is treated as class indices and class probabilities so you don't need to use softmax() externally.
softmax() is used internally for target tensor only when target tensor is treated as class indices so you should use softmax() externally for target tensor when target tensor is treated as class probabilities.

Suggest a potential alternative/fix

No response

cc @svekars @brycebortree @tstatler @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

The text was updated successfully, but these errors were encountered:

mikaylagawarecki · 2024-08-30T16:24:40Z

I don't think it is the case that softmax is applied internally on target when target is class indices. Please correct me if I missed this though.

Secondly, not sure it is the case that you should use softmax externally for the target tensor when target tensor is treated as class probabilities...the word class probabilities implies a probability distribution, softmax a way to generate a probability distribution (and is probably the most common one indeed)

That said, I agree that the docs might not be that intuitive and happy to review attempts to improve it

jbschlosser added module: docs Related to our documentation, both in docs/ and docblocks module: nn Related to torch.nn module: loss Problem is related to loss function triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 30, 2024

svekars closed this as completed May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should make the doc of `nn.CrossEntropyLoss()` more clear #134853

Should make the doc of `nn.CrossEntropyLoss()` more clear #134853

hyperkai commented Aug 30, 2024 •

edited by pytorch-bot bot

Loading

mikaylagawarecki commented Aug 30, 2024 •

edited

Loading

Should make the doc of nn.CrossEntropyLoss() more clear #134853

Should make the doc of nn.CrossEntropyLoss() more clear #134853

Comments

hyperkai commented Aug 30, 2024 • edited by pytorch-bot bot Loading

📚 The doc issue

Suggest a potential alternative/fix

mikaylagawarecki commented Aug 30, 2024 • edited Loading

Should make the doc of `nn.CrossEntropyLoss()` more clear #134853

Should make the doc of `nn.CrossEntropyLoss()` more clear #134853

hyperkai commented Aug 30, 2024 •

edited by pytorch-bot bot

Loading

mikaylagawarecki commented Aug 30, 2024 •

edited

Loading