Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[mlir][tosa] Fix integer bilinear (quantized) tosa.resize lowering to use floordivsi#193821

Open
HanchengWu wants to merge 1 commit intollvm:mainfrom
HanchengWu:upsampling2D
Open

[mlir][tosa] Fix integer bilinear (quantized) tosa.resize lowering to use floordivsi#193821
HanchengWu wants to merge 1 commit intollvm:mainfrom
HanchengWu:upsampling2D

Conversation

@HanchengWu
Copy link
Copy Markdown
Contributor

Background

tosa.resize in bilinear integer (quantized) mode lowers to a linalg.generic
body that, for each output pixel, computes a corresponding input coordinate and
blends the four neighboring input pixels. The mapping is:

val   = out_coord * scale_d + offset
index = val / scale_n          // integer part — which input pixel to start from
delta = val - index * scale_n  // fractional part, scaled to [0, scale_n)

delta is the interpolation weight toward the next pixel. The bilinear formula
(integer path) is:

topAcc    = pixel[y0,x0] * (scale_x - dx) + pixel[y0,x1] * dx
bottomAcc = pixel[y1,x0] * (scale_x - dx) + pixel[y1,x1] * dx
result    = topAcc * (scale_y - dy) + bottomAcc * dy

For this to be a valid convex combination (interpolation, not extrapolation),
dx and dy must be in [0, scale_n].

The pixel indices y0, y1, x0, x1 are computed by getClampedIdxs:

y0 = clamp(iy,   0, H-1)
y1 = clamp(iy+1, 0, H-1)

The Bug

The integer path uses DivSIOp (truncation toward zero):

// getIndexAndDeltaInt — mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
index = arith::DivSIOp::create(b, val, scaleN);   // truncates toward zero
delta = arith::MulIOp::create(b, index, scaleN);
delta = arith::SubIOp::create(b, val, delta);      // = val - (val/scaleN)*scaleN

When val < 0 (which happens at boundary output pixels when offset is
negative), DivSIOp truncates toward zero instead of toward -∞. This produces
a negative remainder, i.e. a negative delta, which causes extrapolation.

Note: the code comment on line 2058 already says // ix = floor(x / scale_n),
but the code uses truncation — this mismatch is the root cause of the bug.

The float path (getIndexAndDeltaFp) uses FloorDivSIOp and is unaffected:
with floor division, r = val - floor(val/scaleN)*scaleN is always in
[0, scaleN-1].


Concrete Example

Setup: 2×2 input upsampled to 4×4, scale=[4,2,4,2], offset=[-1,-1]

  • scale_y_n=4, scale_y_d=2, scale_x_n=4, scale_x_d=2
  • offset_y=-1, offset_x=-1
  • Input: tensor<1x2x2x1xi8> with input[0,0,0,0]=100, all others 0

At output pixel (0,0):

val  = 0 * 2 + (-1) = -1

Without any fix (DivSIOp, buggy)

iy  = DivSIOp(-1, 4) = 0     // truncates -0.25 toward zero
dy  = -1 - 0*4 = -1          // OUT OF RANGE: should be in [0, 4]

y0 = clamp(0,   0, 1) = 0
y1 = clamp(0+1, 0, 1) = 1    // different rows

ix  = DivSIOp(-1, 4) = 0
dx  = -1                      // same issue

x0 = clamp(0,   0, 1) = 0
x1 = clamp(0+1, 0, 1) = 1

// pixels:
y0x0 = input[0,0,0,0] = 100
y0x1 = input[0,0,1,0] = 0
y1x0 = input[0,1,0,0] = 0
y1x1 = input[0,1,1,0] = 0

topAcc    = 100*(4-(-1)) + 0*(-1) = 500   // EXTRAPOLATION
bottomAcc = 0*(4-(-1))   + 0*(-1) = 0
result    = 500*(4-(-1)) + 0*(-1) = 2500  // WRONG

Fix w/ FloorDivSIOp

iy  = FloorDivSIOp(-1, 4) = -1   // floors -0.25 toward -∞
dy  = -1 - (-1)*4 = 3            // naturally in [0, scale_n-1]

y0 = clamp(-1,  0, 1) = 0
y1 = clamp(-1+1, 0, 1) = 0      // SAME row — both snap to boundary

dx  = 3 (same by symmetry)

// all four neighbors collapse to the same boundary pixel:
y0x0 = y0x1 = y1x0 = y1x1 = input[0,0,0,0] = 100

topAcc    = 100*(4-3) + 100*3 = 400
bottomAcc = 100*(4-3) + 100*3 = 400
result    = 400*(4-3) + 400*3 = 1600  // correct

y0=y1=0 means boundary replication is enforced by getClampedIdxs, making
dy irrelevant.

Semantic Analysis of the fix

  • iy=-1 correctly signals "this position is before the first input pixel."
  • getClampedIdxs does its intended job: both y0 and y1 snap to the
    boundary pixel, enforcing replication explicitly.
  • dy=3 appears valid (it's in [0, scale_n-1]) but is semantically
    meaningless: the true position is -0.25, which is outside the image — there
    is no "3/4 toward the next pixel" to interpolate toward. It is harmless only
    because y0=y1.
  • Same analysis for dx=3 by symmetry.
  • Fixes the root cause (wrong division op, matching the existing code comment
    and mirroring the float path), but delta still carries a misleading value
    at out-of-bounds positions.

…or index computation

The integer path of tosa.resize bilinear lowering used divsi (truncation
toward zero) to compute the input pixel index, but the code comment says
"ix = floor(x / scale_n)" and the floating-point path already uses
floordivsi. With divsi, negative dividends (caused by negative offsets)
produce negative remainders, making the interpolation weights fall outside
[0, scale] and causing extrapolation instead of boundary replication.

Fix by replacing DivSIOp with FloorDivSIOp in getIndexAndDeltaInt. With
floor division, the remainder is always in [0, scaleN), so interpolation
weights are naturally non-negative and no post-hoc clamping is needed.
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 23, 2026

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-tosa

@llvm/pr-subscribers-mlir-linalg

Author: Henry Wu (HanchengWu)

Changes

Background

tosa.resize in bilinear integer (quantized) mode lowers to a linalg.generic
body that, for each output pixel, computes a corresponding input coordinate and
blends the four neighboring input pixels. The mapping is:

val   = out_coord * scale_d + offset
index = val / scale_n          // integer part — which input pixel to start from
delta = val - index * scale_n  // fractional part, scaled to [0, scale_n)

delta is the interpolation weight toward the next pixel. The bilinear formula
(integer path) is:

topAcc    = pixel[y0,x0] * (scale_x - dx) + pixel[y0,x1] * dx
bottomAcc = pixel[y1,x0] * (scale_x - dx) + pixel[y1,x1] * dx
result    = topAcc * (scale_y - dy) + bottomAcc * dy

For this to be a valid convex combination (interpolation, not extrapolation),
dx and dy must be in [0, scale_n].

The pixel indices y0, y1, x0, x1 are computed by getClampedIdxs:

y0 = clamp(iy,   0, H-1)
y1 = clamp(iy+1, 0, H-1)

The Bug

The integer path uses DivSIOp (truncation toward zero):

// getIndexAndDeltaInt — mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
index = arith::DivSIOp::create(b, val, scaleN);   // truncates toward zero
delta = arith::MulIOp::create(b, index, scaleN);
delta = arith::SubIOp::create(b, val, delta);      // = val - (val/scaleN)*scaleN

When val &lt; 0 (which happens at boundary output pixels when offset is
negative), DivSIOp truncates toward zero instead of toward -∞. This produces
a negative remainder, i.e. a negative delta, which causes extrapolation.

Note: the code comment on line 2058 already says // ix = floor(x / scale_n),
but the code uses truncation — this mismatch is the root cause of the bug.

The float path (getIndexAndDeltaFp) uses FloorDivSIOp and is unaffected:
with floor division, r = val - floor(val/scaleN)*scaleN is always in
[0, scaleN-1].


Concrete Example

Setup: 2×2 input upsampled to 4×4, scale=[4,2,4,2], offset=[-1,-1]

  • scale_y_n=4, scale_y_d=2, scale_x_n=4, scale_x_d=2
  • offset_y=-1, offset_x=-1
  • Input: tensor&lt;1x2x2x1xi8&gt; with input[0,0,0,0]=100, all others 0

At output pixel (0,0):

val  = 0 * 2 + (-1) = -1

Without any fix (DivSIOp, buggy)

iy  = DivSIOp(-1, 4) = 0     // truncates -0.25 toward zero
dy  = -1 - 0*4 = -1          // OUT OF RANGE: should be in [0, 4]

y0 = clamp(0,   0, 1) = 0
y1 = clamp(0+1, 0, 1) = 1    // different rows

ix  = DivSIOp(-1, 4) = 0
dx  = -1                      // same issue

x0 = clamp(0,   0, 1) = 0
x1 = clamp(0+1, 0, 1) = 1

// pixels:
y0x0 = input[0,0,0,0] = 100
y0x1 = input[0,0,1,0] = 0
y1x0 = input[0,1,0,0] = 0
y1x1 = input[0,1,1,0] = 0

topAcc    = 100*(4-(-1)) + 0*(-1) = 500   // EXTRAPOLATION
bottomAcc = 0*(4-(-1))   + 0*(-1) = 0
result    = 500*(4-(-1)) + 0*(-1) = 2500  // WRONG

Fix w/ FloorDivSIOp

iy  = FloorDivSIOp(-1, 4) = -1   // floors -0.25 toward -∞
dy  = -1 - (-1)*4 = 3            // naturally in [0, scale_n-1]

y0 = clamp(-1,  0, 1) = 0
y1 = clamp(-1+1, 0, 1) = 0      // SAME row — both snap to boundary

dx  = 3 (same by symmetry)

// all four neighbors collapse to the same boundary pixel:
y0x0 = y0x1 = y1x0 = y1x1 = input[0,0,0,0] = 100

topAcc    = 100*(4-3) + 100*3 = 400
bottomAcc = 100*(4-3) + 100*3 = 400
result    = 400*(4-3) + 400*3 = 1600  // correct

y0=y1=0 means boundary replication is enforced by getClampedIdxs, making
dy irrelevant.

Semantic Analysis of the fix

  • iy=-1 correctly signals "this position is before the first input pixel."
  • getClampedIdxs does its intended job: both y0 and y1 snap to the
    boundary pixel, enforcing replication explicitly.
  • dy=3 appears valid (it's in [0, scale_n-1]) but is semantically
    meaningless: the true position is -0.25, which is outside the image — there
    is no "3/4 toward the next pixel" to interpolate toward. It is harmless only
    because y0=y1.
  • Same analysis for dx=3 by symmetry.
  • Fixes the root cause (wrong division op, matching the existing code comment
    and mirroring the float path), but delta still carries a misleading value
    at out-of-bounds positions.

Full diff: https://github.com/llvm/llvm-project/pull/193821.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp (+1-1)
  • (modified) mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir (+5-4)
diff --git a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
index 11b3aabcbfeb4..e9c9e17fe6274 100644
--- a/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
+++ b/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
@@ -2059,7 +2059,7 @@ class GenericResizeConverter : public OpRewritePattern<tosa::ResizeOp> {
         //  dx = x - ix * scale_n;
         Value val = arith::MulIOp::create(b, in, scaleD);
         val = arith::AddIOp::create(b, val, offset);
-        index = arith::DivSIOp::create(b, val, scaleN);
+        index = arith::FloorDivSIOp::create(b, val, scaleN);
         delta = arith::MulIOp::create(b, index, scaleN);
         delta = arith::SubIOp::create(b, val, delta);
       };
diff --git a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
index 4900476b25dc5..2959cf59e953a 100644
--- a/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
+++ b/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
@@ -218,13 +218,13 @@ func.func @resize_nearest_int(%arg0: tensor<1x15x13x1xi8>) -> () {
 
   // CHECK: %[[TEMP_Y:.*]] = arith.muli %[[Y]], %[[SCALE_Y_D]]
   // CHECK: %[[Y:.*]] = arith.addi %[[TEMP_Y]], %[[OFFSET_Y]]
-  // CHECK: %[[I_Y:.*]] = arith.divsi %[[Y]], %[[SCALE_Y_N]]
+  // CHECK: %[[I_Y:.*]] = arith.floordivsi %[[Y]], %[[SCALE_Y_N]]
   // CHECK: %[[TEMP_Y:.*]] = arith.muli %[[I_Y]], %[[SCALE_Y_N]]
   // CHECK: %[[D_Y:.*]] = arith.subi %[[Y]], %[[TEMP_Y]]
 
   // CHECK: %[[TEMP_X:.*]] = arith.muli %[[X]], %[[SCALE_X_D]]
   // CHECK: %[[X:.*]] = arith.addi %[[TEMP_X]], %[[OFFSET_X]]
-  // CHECK: %[[I_X:.*]] = arith.divsi %[[X]], %[[SCALE_X_N]]
+  // CHECK: %[[I_X:.*]] = arith.floordivsi %[[X]], %[[SCALE_X_N]]
   // CHECK: %[[TEMP_X:.*]] = arith.muli %[[I_X]], %[[SCALE_X_N]]
   // CHECK: %[[D_X:.*]] = arith.subi %[[X]], %[[TEMP_X]]
 
@@ -285,13 +285,13 @@ func.func @resize_bilinear_int(%arg0: tensor<1x19x20x1xi8>) {
 
   // CHECK: %[[TEMP_Y:.*]] = arith.muli %[[Y]], %[[SCALE_Y_D]]
   // CHECK: %[[Y:.*]] = arith.addi %[[TEMP_Y]], %[[OFFSET_Y]]
-  // CHECK: %[[I_Y:.*]] = arith.divsi %[[Y]], %[[SCALE_Y_N]]
+  // CHECK: %[[I_Y:.*]] = arith.floordivsi %[[Y]], %[[SCALE_Y_N]]
   // CHECK: %[[TEMP_Y:.*]] = arith.muli %[[I_Y]], %[[SCALE_Y_N]]
   // CHECK: %[[D_Y:.*]] = arith.subi %[[Y]], %[[TEMP_Y]]
 
   // CHECK: %[[TEMP_X:.*]] = arith.muli %[[X]], %[[SCALE_X_D]]
   // CHECK: %[[X:.*]] = arith.addi %[[TEMP_X]], %[[OFFSET_X]]
-  // CHECK: %[[I_X:.*]] = arith.divsi %[[X]], %[[SCALE_X_N]]
+  // CHECK: %[[I_X:.*]] = arith.floordivsi %[[X]], %[[SCALE_X_N]]
   // CHECK: %[[TEMP_X:.*]] = arith.muli %[[I_X]], %[[SCALE_X_N]]
   // CHECK: %[[D_X:.*]] = arith.subi %[[X]], %[[TEMP_X]]
 
@@ -605,3 +605,4 @@ func.func @skip_interpolate_bilinear_f32(%arg0 : tensor<3x1x2x7xf32>) -> tensor<
   // CHECK:  return %[[GENERIC]]
   return %resize : tensor<3x1x4x7xf32>
 }
+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants