Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a number of additional APIs to the various SIMD accelerated vector types #111179

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jan 8, 2025

This resolves #103845 and #98055

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@tannergooding tannergooding marked this pull request as ready for review January 8, 2025 20:44
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib there's some JIT side changes for review/approval here

The summary of changes is:

  • Vector2/3/4 continue to be implemented entirely in managed
  • Vector continues to be a thin redirect in importer to Vector128/256/512
  • Vector64/128/256/512 continue to be accelerated where applicable

I added support for the new Is* APIs to the JIT, keeping inline where we already had related handling/acceleration. I did not yet add the All/Any/Count/IndexOf/LastIndexOf/None APIs to the JIT yet as that is a slightly more involved change. The reason it's slightly more involved is because these are special APIs that we can optimize for Arm64 where no direct ExtractMostSignificantBits API exists; which is notably part of the reason these new helper APIs exist.

@vitek-karas
Copy link
Member

@jkurdek could you please review the mono parts? Do we need to update other codegens for this as well?

Copy link
Member

@jkurdek jkurdek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this change doesn't implement the new APIs on mono side, it just reverts to managed. @tannergooding do you plan to bring those in to mono in this PR?

@@ -6525,6 +6562,10 @@ emit_intrinsics (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSignature *fsi
MonoInst*
mono_emit_simd_intrinsics (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSignature *fsig, MonoInst **args)
{
if (!has_intrinsic_cattr (cmethod)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the impact of this is going to be. if we had some intrinsics for methods which weren't marked [Intrinsic] before now they gonna experience slow downs. Have you benchmarked this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it match the behavior that RyuJiT has. Anything that isn’t marked Intrinisic is not intended to be handled as such and is likely a Mono bug

If, after merge, there are regressions then we should look at those cases individually and determine if the attribute should be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono may have had intrinsics for a few non-Intrinsic methods previously for performance reasons, but I think it's valid to do this and see how it shakes out as long as we pay attention to the benchmarks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono may have had intrinsics for a few non-Intrinsic methods previously for performance reasons

👍 We should definitely make sure such cases are known and mark them as [Intrinsic] in those scenarios instead. It massively simplifies things and should also improve Mono throughput, as we won't be doing unnecessary work for every single API anymore.

If it's truly unique to Mono, then we can even do the following; however I expect most places that Mono handles should also be handled on RyuJIT or at least get the inlining profitability boost from the attribute:

#if MONO
[Intrinsic]
#endif

Assuming this PR is merged before the weekend, we should catch any such regressions next Tuesday and be able to trivially root cause the smaller number of samples so we can know what should be attributed.

-- Notably this is being done in this PR for correctness reasons; there are cases where Mono has "bad assumptions" about signatures and where it was trying to treat an API named X as an intrinsic for X, even when the signature is a different overload all together and was designed to mean something different. This was the case for some APIs like Vector.Store which exposes extension methods for Vector<T> and also Vector2/3/4 as an example.

@tannergooding
Copy link
Member Author

Currently this change doesn't implement the new APIs on mono side, it just reverts to managed. @tannergooding do you plan to bring those in to mono in this PR?

It implements a few, but not all of them. I plan on finishing up the other new Is*() APIs before getting this merged.

For the other APIs, they would be handled at the same time as support is added to RyuJIT, which as per the top post is not being done yet.

@steveisok steveisok requested a review from kg January 9, 2025 16:52
@kg
Copy link
Member

kg commented Jan 9, 2025

Happy to see a lot of these additions, should cut down on the amount of per-architecture code necessary to solve problems.

@tannergooding
Copy link
Member Author

It implements a few, but not all of them. I plan on finishing up the other new Is*() APIs before getting this merged.

I've added handling in a few more cases where it is trivial to do so. However, there are still a couple of the helpers that are not so trivial to do in this PR.

Namely we're missing a centralized helper for some concepts like Abs(x), BroadcastConstant(cns), and Truncate(x). Given that making those centralized is a bit more involved, I'd prefer to log an issue and do that in a follow up PR (one that I'd immediately start working on) if that is alright with the Mono team -- CC. @kg, @jkurdek

-- Such internal helpers is notably what the gtNewSimdAbsNode and similar APIs are for in RyuJIT. They simply accelerating other helper intrinsics by ensuring that we don't need to duplicate the IR (or MonoInst) creation logic all over the place. Mono has a few of these, like emit_xones and I've added a few others ad-hoc as they've been needed, but it isn't nearly as "complete" yet; I'd like the follow up PR to make sure we have such helpers for all the foundational concepts to maximize our ability to accelerate Mono at the same time as RyuJIT moving forward.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (non-Mono parts), although, it seems like managed implementations for these new APIs are trivial enough (for integers) to stay managed only?

@tannergooding
Copy link
Member Author

although, it seems like managed implementations for these new APIs are trivial enough (for integers) to stay managed only?

Potentially. But there was also pushback when I had previously tried to move all the code, whether trivial or not, into managed.

We had essentially decided at that time that cases that are trivial to handle in the JIT and which are expected to be fairly typical should be done in the JIT; which is why most of the paths were added for the Is* APIs directly (but a couple, namely IsEvenInteger/IsOddInteger for float, which are more complex and less likely remain in managed).

The All/Any/None and Count/IndexOf/LastIndexOf are then case we're not handling in this PR, but which we do want to accelerate because of specialized codegen we want on Arm64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: Additional vectorized functions
5 participants