Export QAT model is not performing as expected when compared to the original model and FX Graph QAT #150746
Labels
needs reproduction
Someone else needs to try reproducing the issue given the instructions. No action needed from user
oncall: export
oncall: pt2
oncall: quantization
Quantization support in PyTorch
π Describe the bug
I'm trying to perform QAT utilizing MobileNetV2 with the goal of converting it into TFLite. However, after training the model, I run a bench-marking script to compare its performance to the original model and see that the performance deprecates greatly.
Here are the important code snippets:
I only included what I thought was relevant since I didn't want to add confusion with all of my helper functions
Actual vs expected behavior:
I would expect that the quantized model has better performance than the original model but it does not.
This is even stranger since if I switch to FX Graph QAT, I get the expected behavior. However, I need to use Export quantization since I want to use the ai-edge-torch API to convert my model to TFLite.
Additionally, when I print the resulting QAT model I get the following:
I would think that it would be more similar to the resulting QAT model from FX Graph quantization which leads me to believe that it is not training correctly. The FX Graph is added below:
Versions
My system has a
AMD Ryzenβ’ Threadripperβ’ 7960Xs Γ 48
and a NVIDIAGeForce RTX 4090
Here is my virtual env:
cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim @chauhang @penguinwu @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4
The text was updated successfully, but these errors were encountered: