Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DAGCombiner/SimplifyDemandedBits making ISel DAG more poisonous for bitcasted vector #138513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bjope opened this issue May 5, 2025 · 3 comments · May be fixed by #139085
Open

DAGCombiner/SimplifyDemandedBits making ISel DAG more poisonous for bitcasted vector #138513

bjope opened this issue May 5, 2025 · 3 comments · May be fixed by #139085
Assignees
Labels
llvm:codegen llvm:SelectionDAG SelectionDAGISel as well

Comments

@bjope
Copy link
Collaborator

bjope commented May 5, 2025

Consider IR in this example:

define void @bar(i16 %a, ptr %p) {
entry:
  %.upto4 = insertelement <8 x i16> <i16 0, i16 1, i16 2, i16 3, i16 poison, i16 poison, i16 poison, i16 poison>, i16 %a, i64 4
  %.upto5 = insertelement <8 x i16> %.upto4, i16 5, i64 5
  %.upto6 = insertelement <8 x i16> %.upto5, i16 6, i64 6
  %.upto7 = insertelement <8 x i16> %.upto6, i16 7, i64 7
  %bitcast = bitcast <8 x i16> %.upto7 to i128
  %lshr = lshr i128 %bitcast, 48
  %trunc = trunc i128 %lshr to i32
  store i32 %trunc, ptr %p
  ret void
}

When running llc -debug ... on the above we see this:

Initial selection DAG: %bb.0 'bar:entry'
SelectionDAG has 33 nodes:
  t0: ch,glue = EntryToken
  t24: i128 = Constant<48>
  t28: i64 = Constant<0>
                    t11: v8i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<1>, Constant:i16<2>, Constant:i16<3>, poison:i16, poison:i16, poison:i16, poison:i16
                      t2: i32,ch = CopyFromReg t0, Register:i32 %0
                    t3: i16 = truncate t2
                  t13: v8i16 = insert_vector_elt t11, t3, Constant:i64<4>
                t16: v8i16 = insert_vector_elt t13, Constant:i16<5>, Constant:i64<5>
              t19: v8i16 = insert_vector_elt t16, Constant:i16<6>, Constant:i64<6>
            t22: v8i16 = insert_vector_elt t19, Constant:i16<7>, Constant:i64<7>
          t23: i128 = bitcast t22
        t26: i128 = srl t23, Constant:i8<48>
      t27: i32 = truncate t26
      t5: i64,ch = CopyFromReg t0, Register:i64 %1
    t30: ch = store<(store (s32) into %ir.p)> t0, t27, t5, undef:i64
  t32: ch = X86ISD::RET_GLUE t30, TargetConstant:i32<0>

...

Combining: t27: i32 = truncate t26

Replacing.2 t22: v8i16 = insert_vector_elt t19, Constant:i16<7>, Constant:i64<7>

With: t19: v8i16 = insert_vector_elt t16, Constant:i16<6>, Constant:i64<6>


Combining: t27: i32 = truncate t26

Replacing.2 t19: v8i16 = insert_vector_elt t16, Constant:i16<6>, Constant:i64<6>

With: t16: v8i16 = insert_vector_elt t13, Constant:i16<5>, Constant:i64<5>


Combining: t27: i32 = truncate t26

Replacing.2 t16: v8i16 = insert_vector_elt t13, Constant:i16<5>, Constant:i64<5>

With: t13: v8i16 = insert_vector_elt t11, t3, Constant:i64<4>

...

Optimized lowered selection DAG: %bb.0 'bar:entry'
SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
                t2: i32,ch = CopyFromReg t0, Register:i32 %0
              t3: i16 = truncate t2
            t35: v8i16 = BUILD_VECTOR undef:i16, undef:i16, undef:i16, Constant:i16<3>, t3, poison:i16, poison:i16, poison:i16
          t23: i128 = bitcast t35
        t26: i128 = srl t23, Constant:i8<48>
      t27: i32 = truncate t26
      t5: i64,ch = CopyFromReg t0, Register:i64 %1
    t30: ch = store<(store (s32) into %ir.p)> t0, t27, t5, undef:i64
  t32: ch = X86ISD::RET_GLUE t30, TargetConstant:i32<0>

...

The above looks wrong!
Before the DAG combine rewrites the bitcast is casting a vector without poisoned elements into an i128, but in the optimized selection DAG the bitcast is casting a vector with poisoned elements into i128 making the result poison. So the result of the bitcast is now posion.

I suspect TargetLowering::SimplifyDemandedBits/SimplifyDemandedVectorElts somehow is to blame for this.
When dealing with BITCAST in SimplifyDemandedBits it may call SimplifyDemandedVectorElts. We need to make sure that all smaller source elements mapping to a larger element are demanded when looking at a bitcast from 'small element' src vector to a 'large element' vector.

@bjope bjope added the llvm:SelectionDAG SelectionDAGISel as well label May 5, 2025
@bjope bjope self-assigned this May 5, 2025
@bjope
Copy link
Collaborator Author

bjope commented May 5, 2025

My original idea of making sure all smaller elements are demanded when bitcasting from a vector with small elements to a vector/scalar with larger elements seem to result in lots of lit test regressions. It is often fine to simplify based on some elements not being demanded, as long as not making the vector more poisonous. So maybe the problem here really is how we deal with INSERT_VECTOR_ELT in TargetLowering::SimplifyDemandedVectorElts. It is only ok to skip the INSERT_VECTOR_ELT if either the inserted value is known to be POISON, or if the element in the source vector is known not to be POISON.

No idea if similar problems then exists for other node types, or if it only is INSERT_VECTOR_ELT that has such problems.

@bjope
Copy link
Collaborator Author

bjope commented May 6, 2025

I'm struggling a bit with this, trying to find a suitable solution.

Base problem as I see it is that we can have BITCAST of vectors, that merge smaller elements into larger vector elements (or just a scalar). For a vector individual elements can be poison without making the full vector poison (e.g. if only demanding certain elts). When doing SimplifyDemandedBits/SimplifyDemandedVectorElts in DAGCombiner we do not (always) demand all smaller elements when recursing through such a BITCAST.
Simply fixing this by making sure we demand all "not allowed to turn into poison" elements would result in several regressions in lit tests afaict.

One point of view would be that SimplifyDemanded* shouldn't be allowed to make things more poisonous when doing simplifications. More specifically not even turning undemanded elements from non-poison to poison, which typically can happen when dealing with INSERT_VECTOR_ELT. If the inserted element isn't demanded we simplify by just taking the source vector (which could have poison elements). This kind of "making the vector more poisonous" can even happen in SelectionDAG::getNode when trying to create a INSERT_VECTOR_ELT that inserts an UNDEF element (which otherwise could be a way of making sure the element isn't POISON).
Simply fixing this by making sure we avoid to turn undemanded elements into poison by eliminating INSERT_VECTOR_ELT operations that overwrite a maybe poison element also result in several lit test regressions. A typical example would be

        t992: v4i32 = insert_vector_elt undef:v4i32, t943, Constant:i32<1>
      t1000: v4i32 = insert_vector_elt t992, undef:i32, Constant:i32<3>
    t318: i32 = extract_vector_elt t1000, Constant:i32<1>
  t293: i32,i32 = ARMISD::LSRL t957, t318, Constant:i32<28>

when the irrelevant t1000 node would block collapsing of t318+t992 into t943, unless we make more advanced DAG combines to handle such situations.

Maybe there is a bit of philosophy around this. Should we allow simplifications that make a vector more poisonous during SimplifyDemanded* or not? Is there perhaps a difference between SimplifyDemandedBits and SimplifyDemandedVectorElts that I haven't understood? Maybe the whole idea with SimplifyDemandedVectorElts is that undemanded elements really should be irrelevant for the result (such as not impacting the poisoness of the result).

One idea I haven't tried yet is to propagate some kind of bool (MayPoisonUndemandedElts) or mask (UndemandedEltsThatMustNotBeMadeMorePosionous) in the SimplifyDemanded* function. This to indicate if we have recursed through a BITCAST that actually depend on the vector elements that aren't demanded (via the DemandedElts mask) but that isn't allowed to be turned into POISON. That would introduce lots of code changes just to propagate the flag when recursing, but could give us a way to use SimplifyDemanded* both in situations when we want to simplify through a BITCAST (when poison can be an issue even for elements that aren't used in the end) as well as for example simplifying an EXTRACT_VECTOR_ELT (when posion for undemanded elts isn't an issue).

Do you have any advice/recommendations regarding this (@RKSimon , @arsenm , @nikic, @dtcxzyw, or maybe someone else)?

@nikic
Copy link
Contributor

nikic commented May 6, 2025

I haven't looked into your example in detail, but I believe the way it should work is that you can turn non-demanded elements poison, but you can't turn non-demanded bits poison. This also means that you cannot transfer non-demanded bits into non-demanded elements for bitcasts. At least this is the way it would work in the middle end.

bjope added a commit to bjope/llvm-project that referenced this issue May 8, 2025
When we have a BITCAST and the source type is a vector with smaller
elements compared to the destination type, then we need to demand
all the source elements that make up the demanded elts for the
result when doing recursive calls to SimplifyDemandedBits,
SimplifyDemandedVectorElts and SimplifyMultipleUseDemandedBits.
Problem is that those simplifications are allowed to turn non-demanded
elements of a vector into POISON, so unless we demand all source
elements that make up the result there is a risk that the result
would be more poisonous (even for demanded elts) after the
simplification.

The patch fixes some bugs in SimplifyMultipleUseDemandedBits and
SimplifyDemandedBits for situations when we did not consider the
problem described above. Now we make sure that we also demand vector
elements that "must not be turned into poison" even if those elements
correspond to bits that does not need to be defined according to
the DemandedBits mask.

Fixes llvm#138513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:codegen llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants