Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Investigate alternative approach for Q4 quantization  #397

Closed
@ggerganov

Description

@ggerganov

Currently, in Q4_0 quantization we choose the scaling factor for each 32 group of weights as abs(max(x_i))/7. It is easy to see that this is suboptimal.

Consider quantization of the following 4 numbers:

0.1 0.2 0.3 0.6

Currently, we would determine a scaling factor of 0.6 / 7 ~= 0.0857 and the dequantized numbers will be:

0.0857 0.1714 0.3428 0.6

So the RMS between the dequantized and original values will be non-zero:

sqrt((0.1 - 0.0857)^2 + (0.2 - 0.1714)^2 + (0.3 - 0.3428)^2 + (0.6 - 0.6)^2) > 0.0

However, if we choose the scaling factor to be 0.1 instead, then it is easy to see that the original numbers will be quantized perfectly.

So the scaling factor is better to be chosen as the one that minimises some error (e.g. RMS or whatever is more meaningful and easy to compute). Doing that we will certainly achieve better accuracy compared to the existing approach. The question is - how much better?

The goal of this task is to implement the described quantization above and evaluate the perplexity using the new approach. The approach in simple terms boils down to making a linear regression of the data with a fixed zero point. This new quantization might be a bit heavier to compute compared to Q4_0, so for start we can do it just on the model tensors. The intermediate tensors during the evaluation can remain quantized using the existing approach, so that the evaluation is efficient. If the results look promising, we can put effort into optimising the new approach and replacing completely Q4_0 with it.

Whoever demonstrates the results of this quantization will get the chance to give it a name and publish a paper (just kidding 😆 )

Similar strategy for determining the scale factor and offset factor can be applied to Q4_1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions