Logits vs Log-softmax scores in LearnedMixin implementation

Hi,

I had a question regarding the PyTorch implementation of LearnedMixin. https://github.com/chrisc36/debias/blob/af7f0e40f9120ae2d3081cb8a2bf4dad64a18aa7/debias/bert/clf_debias_loss_functions.py#L41

```
def forward(self, hidden, logits, bias, labels):
    logits = logits.float()  # In case we were in fp16 mode
    logits = F.log_softmax(logits, 1)

    factor = self.bias_lin.forward(hidden)
    factor = factor.float()
    factor = F.softplus(factor)

    bias = bias * factor

    bias_lp = F.log_softmax(bias, 1)
    entropy = -(torch.exp(bias_lp) * bias_lp).sum(1).mean(0)

    loss = F.cross_entropy(logits + bias, labels) + self.penalty*entropy
    return loss
```

The forward function adds `logits` and `bias` variables, however, `logits` has been log-softmaxed whereas `bias` is not (`bias` seems to be raw logits from bias-only model). Should we really apply log-softmax to `logits` before sending into `cross_entropy` loss? Could you explain the reasoning behind this?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Logits vs Log-softmax scores in LearnedMixin implementation #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Logits vs Log-softmax scores in LearnedMixin implementation #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions