Block quantize axes?

Do you know any instances in either existing models or requests for to add support that you could point me to?

🤔 ONNX's elementwise operators support generic broadcasting between inputs (you don't specify any axes or size parameters, as broadcasting is known from the tensor sizes), and ONNX's resampling operator supports arbitrary axes (1D/2D/3D/4D/ND..., not just a subset mix of them), and so ONNX's DequantizeLinear operator always felt awkwardly non-generic to me in contrast, with a less obvious interaction of parameters (an axis parameter that sometimes is or isn't used, block size that sometimes is or isn't used, scale and ZP inputs which might be a scalar/1D tensor/block).

Compare that with WebNN's dequantizeLinear and DirectML's DEQUANTIZE where there is no blocksize or axis parameter (it would be redundant), as all inputs must be the same rank and be blockwise broadcastable, which are simply upsampled to the output size. So it's a superset of ONNX, able to support blockwise broadcasting along more than 1 dimension and differing block sizes along axes. If the backend has an optimized form (maybe it specially recognizes scalars and 1D cases), then the backend can utilize that, but otherwise the decomposition to expanded form is equivalent to calling Resize on the scale and ZP with a mode nearest neighbor for block-based upsampling, then Sub and Mul (using normal broadcasting rules).

For cleaner generality, I'd value seeing that in ONNX too (currently ONNX defines unidirectional broadcasting and bidirectional broadcasting, and so it would need a definition for blockwise broadcasting), but then I don't know any models currently that use multiple axes with the same or differing block sizes (which surprises me because I'd figure there would be useful cases at least along W and H) 🟦. So, my motivation would mainly be for the sake of generality, future proofing, clearer parameter documentation, and conformance testing simplicity (unless you've seen such a feature request for ONNX which would add more motivation?).

[RFC] What is missing from ONNX's representation of quantized models? #7435

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions