Investigate alternative approach for Q4 quantization 

Currently, in [Q4_0](https://github.com/ggerganov/ggml/pull/27) quantization we choose the scaling factor for each 32 group of weights as `abs(max(x_i))/7`. It is easy to see that this is suboptimal.

Consider quantization of the following 4 numbers:

`0.1 0.2 0.3 0.6`

Currently, we would determine a scaling factor of `0.6 / 7 ~= 0.0857` and the dequantized numbers will be:

`0.0857 0.1714 0.3428 0.6`

So the RMS between the dequantized and original values will be non-zero:

`sqrt((0.1 - 0.0857)^2 + (0.2 - 0.1714)^2 + (0.3 - 0.3428)^2 + (0.6 - 0.6)^2) > 0.0`

However, if we choose the scaling factor to be `0.1` instead, then it is easy to see that the original numbers will be quantized perfectly.

So the scaling factor is better to be chosen as the one that minimises some error (e.g. RMS or whatever is more meaningful and easy to compute). Doing that we will certainly achieve better accuracy compared to the existing approach. The question is - how much better?

The goal of this task is to implement the described quantization above and evaluate the perplexity using the new approach. The approach in simple terms boils down to making a linear regression of the data with a fixed zero point. This new quantization might be a bit heavier to compute compared to `Q4_0`, so for start we can do it just on the model tensors. The intermediate tensors during the evaluation can remain quantized using the existing approach, so that the evaluation is efficient. If the results look promising, we can put effort into optimising the new approach and replacing completely `Q4_0` with it.

Whoever demonstrates the results of this quantization will get the chance to give it a name and publish a paper (just kidding 😆 )

Similar strategy for determining the scale factor and offset factor can be applied to `Q4_1`. 






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate alternative approach for Q4 quantization #397

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate alternative approach for Q4 quantization #397

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions