Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Quant][X86] add an op to compute uint8 batch norm 2d #152811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented May 5, 2025

Stack from ghstack (oldest at bottom):

Summary
This PR adds a new op, onednn.qbatch_norm2d, which accepts uint8 inputs on CPU device (instead of QuantizedCPU).
The new ops are implemented with AVX512 instructions and it provides similar performance as its counterpart for QuantizedCPU device quantized.batch_norm2d.
The new op supports output dtypes other than uint8 (fp32, fp16 and bf16 are supported).

Test plan

pytest test/quantization/core/test_quantized_op.py -k test_int8_batch_norm_onednn

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

[ghstack-poisoned]
Copy link

pytorch-bot bot commented May 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152811

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ab2a1c8 with merge base f5e0806 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels May 5, 2025
Xia-Weiwen added a commit that referenced this pull request May 5, 2025
ghstack-source-id: 949cb72
Pull Request resolved: #152811
@Xia-Weiwen Xia-Weiwen marked this pull request as draft May 5, 2025 08:13
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 5, 2025
ghstack-source-id: dfcf0db
Pull Request resolved: #152811
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 6, 2025
ghstack-source-id: 3b393ae
Pull Request resolved: #152811
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 6, 2025
ghstack-source-id: 0c2d840
Pull Request resolved: #152811
Xia-Weiwen added a commit that referenced this pull request May 6, 2025
ghstack-source-id: 0c2d840
Pull Request resolved: #152811
@Xia-Weiwen Xia-Weiwen added the intel This tag is for PR from Intel label May 6, 2025
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 6, 2025
ghstack-source-id: d3e06ef
Pull Request resolved: #152811
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 7, 2025
ghstack-source-id: f1f01f2
Pull Request resolved: #152811
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 7, 2025
ghstack-source-id: 1d1f4ec
Pull Request resolved: #152811
@@ -407,4 +489,8 @@ TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) {
m.impl(TORCH_SELECTIVE_NAME("quantized::batch_norm3d_relu"), TORCH_FN(q_batch_norm3d_impl<true>));
}

TORCH_LIBRARY_IMPL(onednn, CPU, m) {
m.impl(TORCH_SELECTIVE_NAME("onednn::qbatch_norm2d"), TORCH_FN(int8_batch_norm2d_cpu_impl));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, even though the op is put under "onednn" namespace but seems the implementation does not rely on onednn at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. It's because all these ops used for PT2E quantization are put under the onednn namespace, and they should only be used when using the X86Inductor backend. We decide to keep that practice. Do you have any concerns? Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can separate the "op" namespace and the quantization backend such as "onednn". For op, "onednn" means that the implementation relies on the oneDNN library and don't have to put other quantized ops under "onednn" namespace. On the other hand, the quantization support with "onednn" quant backend can leverage ops implemented with oneDNN library and native ATen. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. It makes sense to me. How would you suggest we name the namespace? x86 or still quantized? cc @leslie-fang-intel @jerryzh168

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jerryzh168 @leslie-fang-intel Do you think it's ok to use onednn namespace even if the we don't use oneDNN library for implementation? If not, do you have any suggestions on name of the namespace? Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it could be a long-term refactoring work, not necessarily blocking this PR landing.

Copy link
Contributor

@jerryzh168 jerryzh168 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be x86 I think, since it's used in x86 inductor backend, we are deprecating the quantized namespace as we deprecate eager and fx graph mode quant in pytorch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgong5 @jerryzh168 Thanks for your suggestions. I will keep this PR as is and refactor this part in a future PR.

@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review May 9, 2025 01:14
@Xia-Weiwen Xia-Weiwen requested review from jgong5 and jerryzh168 May 9, 2025 01:14
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request May 15, 2025
ghstack-source-id: d1bd152
Pull Request resolved: #152811
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 16, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/Xia-Weiwen/39/head branch June 19, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: quantization release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants