Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Reduce mish error by an alternative without softplus op#2618

Open
ChinChangYang wants to merge 1 commit intoapple:mainfrom
ChinChangYang:reduce-mish-error
Open

Reduce mish error by an alternative without softplus op#2618
ChinChangYang wants to merge 1 commit intoapple:mainfrom
ChinChangYang:reduce-mish-error

Conversation

@ChinChangYang
Copy link
Contributor

Fix the high numerical error in mish activation #2359.

Algorithm:

e = exp(x)
mish = x / (1 + 2 / (e * (e + 2)))

Evaluation:

In the following experiments, the mean absolute errors are evaluated by the method in #2359 (comment).

Before this change, NE generates high numerical error:

Mean Absolute Errors Across Samples:
  var_17:
    NE:  2.955052
    GPU: 0.000998

With the new algorithm, NE generates low numerical error:

Mean Absolute Errors Across Samples:
  var_17:
    NE:  0.001744
    GPU: 0.001516

A tester reported that the new mish function generates NaN only when x is -Inf in the float16 format.

Performance:

This change has been adopted in KataGo Core ML backend ChinChangYang/KataGo#7. The performance of the KataGo model with the new mish activation (7.15 ms) is similar to the original mish implementation (7.03 ms).

Conclusion:

Overall, the change enhances the accuracy and reliability of the mish activation in Core ML models.

inputs = _get_inputs(context, node, expected=1)
x = inputs[0]

softplus = mb.softplus(x=x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the PyTorch documentation, it seems the existing implementation is correct:
https://docs.pytorch.org/docs/stable/generated/torch.nn.Mish.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the existing (software) implementation is correct, it must be a hardware precision issue in the Neural Engine. This PR provides a (software) workaround to circumvent the precision issue. I anticipate that Apple’s low-level (hardware) developers will investigate this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants