Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Failure to fold lsr or asr into cmp #122380

Open
Kmeakin opened this issue Jan 9, 2025 · 2 comments
Open

[AArch64] Failure to fold lsr or asr into cmp #122380

Kmeakin opened this issue Jan 9, 2025 · 2 comments

Comments

@Kmeakin
Copy link
Contributor

Kmeakin commented Jan 9, 2025

https://godbolt.org/z/rMMbbfMdW

Instead of performing an lsr/asr and then comparing the result against zero, the shift can be performed as part of a cmp against xzr:

src:
        lsr     x8, x0, #32
        cmp     x8, #0
        cset    w0, ne
        ret

tgt:
        cmp     xzr, x0, lsr 32
        cset    w0, ne
        ret

LLVM does perform this fold for shifts <= 31, and for lsl it is able to find a different way of doing the comparison in one instruction using tst. It also seems to already perform the fold if comparing against a variable instead of 0. I guess the fold fails when comparing against 0 because a comparison against zero can be represented either as cmp x0, #0 or cmp xzr, x0

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Karl Meakin (Kmeakin)

https://godbolt.org/z/rMMbbfMdW

Instead of performing an lsr/asr and then comparing the result against zero, the shift can be performed as part of a cmp against xzr:

src:
        lsr     x8, x0, #<!-- -->32
        cmp     x8, #<!-- -->0
        cset    w0, ne
        ret

tgt:
        cmp     xzr, x0, lsr 32
        cset    w0, ne
        ret

LLVM does perform this fold for shifts <= 31, and for lsl it is able to find a different way of doing the comparison in one instruction using tst. It also seems to already perform the fold if comparing against a variable instead of 0. I guess the fold fails when comparing against 0 because a comparison against zero can be represented either as cmp x0, #<!-- -->0 or cmp xzr, x0

@efriedma-quic
Copy link
Collaborator

tst x0, #0xffffffff00000000 is better than cmp xzr, x0, lsr 32; the latter has an extra cycle of latency on many chips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants