Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AArch64] recognize the shufflevector equivalent of a vector select #28904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rotateright opened this issue Jul 13, 2016 · 11 comments
Closed

[AArch64] recognize the shufflevector equivalent of a vector select #28904

rotateright opened this issue Jul 13, 2016 · 11 comments
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla

Comments

@rotateright
Copy link
Contributor

rotateright commented Jul 13, 2016

Bugzilla Link 28530
Version trunk
OS All
CC @aemerson

Extended Description

$ cat shufsel.ll 
define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
  %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
  ret <4 x i32> %sel
}

define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) {
  %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuf
}

I'm guessing that one of these is generally better than the other (and there may be a better way than either of these?):

$ ./llc shufsel.ll -o - -mtriple=aarch64
	.LCPI0_0:
	.word	4294967295              // 0xffffffff
	.word	0                       // 0x0
	.word	0                       // 0x0
	.word	4294967295              // 0xffffffff
foo:
	adrp	x8, .LCPI0_0
	ldr	q2, [x8, :lo12:.LCPI0_0]
	bsl	v2.16b, v0.16b, v1.16b
	mov		v0.16b, v2.16b
	ret
goo:                                    // @goo
	ext	v1.16b, v0.16b, v1.16b, #12
	ext	v0.16b, v1.16b, v0.16b, #4
	ext	v1.16b, v1.16b, v1.16b, #8
	ext	v0.16b, v0.16b, v1.16b, #12
	ret

Note that in http://reviews.llvm.org/D22114 , there's a proposal to canonicalize to the shufflevector form of the IR.

@rotateright
Copy link
Contributor Author

*** Bug llvm/llvm-bugzilla-archive#29125 has been marked as a duplicate of this bug. ***

@rotateright
Copy link
Contributor Author

Another possibility that was in bug 29125 - use inserts (this was an example with constants, but the 'ins' will be the same for a variable operand)?

2 inserts:
fmov s1, #​1.00000000
ins v0.s[1], v1.s[0]
fmov s1, #​2.00000000
ins v0.s[2], v1.s[0]
ret

@rotateright
Copy link
Contributor Author

The patch for canonicalization of vector select with constant condition to shuffle is here:
https://reviews.llvm.org/D24279

@aemerson
Copy link
Contributor

aemerson commented Oct 8, 2017

Can this be closed Sanjay?

@rotateright
Copy link
Contributor Author

I filed this as a courtesy to AArch64 (unfortunately for me, I don't currently have an incentive to optimize that backend), so if you're ok with this:

ext	v1.16b, v0.16b, v1.16b, #&#8203;12
ext	v0.16b, v1.16b, v0.16b, #&#8203;4
ext	v1.16b, v1.16b, v1.16b, #&#8203;8
ext	v0.16b, v0.16b, v1.16b, #&#8203;12

...then we can close this. But wouldn't this be better as:

trn1	v1.4s, v0.4s, v1.4s
trn2	v0.4s, v1.4s, v0.4s

Or some other permute ops?

@rotateright
Copy link
Contributor Author

trn1 v1.4s, v0.4s, v1.4s
trn2 v0.4s, v1.4s, v0.4s

Sorry, that doesn't work. Is 'bsl' as shown in the description better?

@rotateright
Copy link
Contributor Author

Or 2 of these:
C7.2.150
INS (element)
Insert vector element from another vector element.

?

@aemerson
Copy link
Contributor

aemerson commented Oct 9, 2017

Ah sorry, I'd only skimmed the bug and thought a fix had already been committed. Looks like we still have work to do here, thanks for reporting.

@rotateright
Copy link
Contributor Author

mentioned in issue llvm/llvm-bugzilla-archive#29125

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@c-rhodes
Copy link
Collaborator

c-rhodes commented May 1, 2025

reproducer: https://godbolt.org/z/E1dhe61Ea

Codegen for shufflevector looks better now, using inserts:

; llc -mtriple=aarch64
define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) {
  %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuf
}

goo:                                    // @goo
        mov     v1.s[0], v0.s[0]
        mov     v1.s[3], v0.s[3]
        mov     v0.16b, v1.16b
        ret

don't think we can do any better than this.

Although for the select it seems canonicalization isn't happening:

; llc -mtriple=aarch64
define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
  %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
  ret <4 x i32> %sel
}

foo:                                    // @foo
        adrp    x8, .LCPI0_0
        ldr     q2, [x8, :lo12:.LCPI0_0]
        bif     v0.16b, v1.16b, v2.16b
        ret

not sure what's going wrong there. Running inst-combine on its own we get identical IR so canonicalization (to shufflevector) is happening there: https://godbolt.org/z/xjf6WhrdP

@c-rhodes
Copy link
Collaborator

c-rhodes commented May 1, 2025

Argh, I realise my mistake now. Inst combine is happening but it's a middle-end pass and not run by llc! 🤦

So the select is canonicalized to a shuffle and the code-generation for the shuffle looks good, so this one can be closed.

@c-rhodes c-rhodes closed this as completed May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants