Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MLIR] Switch the ABI to use bare pointers and not memrefs#1630

Merged
junliume merged 1 commit intodevelopfrom
bare-ptr-api-miir
Aug 24, 2022
Merged

[MLIR] Switch the ABI to use bare pointers and not memrefs#1630
junliume merged 1 commit intodevelopfrom
bare-ptr-api-miir

Conversation

@krzysz00
Copy link
Contributor

This is a companion PR to
ROCm/rocMLIR#690 and
should not be merged until that lands.

These changes will need to be revisited if the MLIR solver ever
supports dynamic shapes.

(I'm not entirely sure what to do about the unused args warnings)

@krzysz00 krzysz00 requested a review from jerryyin July 11, 2022 22:21
Copy link
Member

@jerryyin jerryyin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a blocker in case it get merged pre-maturely.

On one aspect, this has to wait till MIOpen branches for ROCm 5.3.

On the other aspect, this will break past kernel ABI and require MIOpen re-generate all kernel database.

@atamazov Based on our past discussion on backward compatibility, it looks like we'd need to keep both implementation and macro out the old one.

@krzysz00
Copy link
Contributor Author

My sense of the backwards compatibility situation is that a given MIOpen release is compatible with exactly the corresponding MLIR release and there are no guarantees outside of that?

@jerryyin
Copy link
Member

@krzysz00 Refer to the discussion #1558

@krzysz00
Copy link
Contributor Author

@jerryyin ... Ah, yeah, in which case, the change is that Miir.h defines something like MIIR_VERSION and we use a #if defined(MIIR_VERSION) && MIIR_VERSION >= 2 to enable the new ABI?

@jerryyin
Copy link
Member

in which case, the change is that Miir.h defines something like MIIR_VERSION

Exactly, actually we already have that version defined. You just need to make sure to bump this in your MLIR PR. Then eventually we can delete the macroed code when ROCm 5.2 is deprecated.

Thinking of which, I think you don't need to delete most of the memref related calculation but only need to macro out the kernel launching part. Those memref calculation can still be useful suppose we have dynamic kernel support.

@krzysz00
Copy link
Contributor Author

Won't we get dead code complaints if we don't delete it?

@krzysz00
Copy link
Contributor Author

Update, yeah, #ifdef works. We could probably land this now, since MIOpen 5.3 will still see the old version?

@jerryyin
Copy link
Member

jerryyin commented Jul 12, 2022

We could probably land this now, since MIOpen 5.3 will still see the old version?

I don't think there's enough time to re-tune the entire benchmark space and re-generate the kernel database. @JehandadKhan Could you correct me? How much time do you have until ROCm 5.3 branch to rerun MLIR solvers with all supported benchmarks?

@krzysz00 The answer is likely no. What that means is that even if we shipped this in invoker, a customer would get mass amount of seg-fault by trying to launch an old kernel from the new invoker.

@krzysz00
Copy link
Contributor Author

Fair enough, I've got no real objections to waiting until MIOpen fires off its release branch and so we can merge breaking changes again.

@krzysz00
Copy link
Contributor Author

(Though to rephrase, what I meant was that, in ROCm 5.3, MIIR_VERSION_FLAT will still be 5, so none of this will matter.

@jerryyin
Copy link
Member

jerryyin commented Jul 12, 2022

@krzysz00 That's true, but then the newly added changes aren't really get regressed at all from MIOpen end (If you don't bump the MLIR commit). I'd be fine with you pushing this in (as ROCm 5.3) as long as:

  • You don't bump MLIR commit, so it won't break anything with our release branch
  • You finish whatever necessary for MIOpen repo to approve and merge it

@krzysz00
Copy link
Contributor Author

We could also wait until MIOpen branches - feels much simpler.

@krzysz00 krzysz00 marked this pull request as draft July 12, 2022 18:50
@codecov

This comment was marked as off-topic.

@jerryyin
Copy link
Member

jerryyin commented Aug 3, 2022

@junliume May I ask when will MIOpen branch for ROCm 5.3? This pull request is a breaking change, therefore we will proceed with PR review once MIOpen is ready for the next release.

Comment on lines +40 to +44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Are we going to deprecate MIIR version 5 (and older) and when?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can deprecate this, at the soonest, after the change lands on both sides, and at the latest, after 5.4 goes out, I'd say. Because 5.4 would have the v6 ABI, which means all kernels tuned for 5.4 would be using the new ABI and so we couldn't get weird memory faults from a kernel expecting the v5 ABI being cached somewhere.

@atamazov
Copy link
Contributor

atamazov commented Aug 3, 2022

@jerryyin

This pull request is a breaking change...

Is it, really? Can you please show where the breakage happens, in our code? Thanks!

@atamazov
Copy link
Contributor

atamazov commented Aug 3, 2022

@krzysz00

This is a companion PR to ROCmSoftwarePlatform/llvm-project-mlir#690 and should not be merged until that lands.

If we check the MIIR version and enable new features only when they are available, what can go wrong? Seems like I'm missing something...

@jerryyin
Copy link
Member

jerryyin commented Aug 3, 2022

Is it, really? Can you please show where the breakage happens

This PR as-is does not break anything. But if we ship as-is it means a decent of chunk untested will go in MIOpen, are you okay with that?

If we check the MIIR version and enable new features only when they are available

That sounds fair. I think the goal was to ship the feature all-together in next release. But you are right that if we unblock this PR then it allows MLIR development to continue. Till the next release, we'd only need to bump the commit pointer.

jerryyin
jerryyin previously approved these changes Aug 3, 2022
Copy link
Member

@jerryyin jerryyin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@krzysz00 krzysz00 marked this pull request as ready for review August 3, 2022 18:05
@krzysz00
Copy link
Contributor Author

krzysz00 commented Aug 3, 2022

The way things can break is if there's a compiled kernel binary from MLIR sitting around in a cache somewhere that was generated by a v5 MLIR commit and that is being run by a MIOpen compiled 'against a v6 MLIR commit. Then, the v6 code won't set the arguments to the v5 binary correctly and segfaults happen.

(and this goes the other way - if you compile a binary under v6 and then run it under v5, you'll see breakage)

@jerryyin
Copy link
Member

jerryyin commented Aug 3, 2022

The way things can break is if there's a compiled kernel binary from MLIR sitting around in a cache

Yes, kernel db will completely break for MLIR when MIOpen switch to a newer MLIR commit. However, it isn't a problem as long as MIOpen use the old commit that use the older compatibility number (which still try use the interface from the macroed implementation).

@krzysz00
Copy link
Contributor Author

krzysz00 commented Aug 4, 2022

Merging the MLIR side of this requires #1673 to land first

@jerryyin jerryyin self-requested a review August 4, 2022 19:07
jerryyin
jerryyin previously approved these changes Aug 4, 2022
DrizztDoUrden
DrizztDoUrden previously approved these changes Aug 9, 2022
Copy link
Contributor

@DrizztDoUrden DrizztDoUrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerryyin
Copy link
Member

@krzysz00 In case you don't know, if you want a PR to merge in MIOpen it has to satisfy full pass + approved as well. That means you'll have to baby-sit the MIOpen CI for your PR. I have helped you kick off for rerunning the stage. I'll let you monitor a full pass so we are in the ready state.

This is a companion PR to
ROCm/rocMLIR#690 and
should not be merged until that lands.

These changes will need to be revisited if the MLIR solver ever
supports dynamic shapes.
@krzysz00 krzysz00 dismissed stale reviews from DrizztDoUrden and jerryyin via 0bb6948 August 18, 2022 16:41
@krzysz00 krzysz00 removed the ON_HOLD label Aug 19, 2022
@krzysz00
Copy link
Contributor Author

@DrizztDoUrden @jerryyin , the approval seems to have disappeared and I have a CI pass - can one of y'all re-review and merge?

@jerryyin jerryyin self-requested a review August 19, 2022 20:03
@krzysz00
Copy link
Contributor Author

@JehandadKhan , @junliume could one of y'all merge this?

@junliume
Copy link
Contributor

@atamazov @JehandadKhan is the discussion on breaking changes concluded? Need your reviews here. For example, MLIR generated kernels in previous KDBs, will they be compatible before and after this PR?

@krzysz00
Copy link
Contributor Author

If y'all're keeping around generated kernel binaries, then if you compile against a MLIR that's landed the corresponding ABI change PR and try to run those binaries, they'll be called incorrectly (pointers will be placed in unexpected places).

@krzysz00
Copy link
Contributor Author

(This is somewhat related to @whchung 's work on a faster launch ABI, and so I think the cache invalidation is worth it)

@krzysz00
Copy link
Contributor Author

@junliume @JehandadKhan @atamazov To clarify the cache invalidation

  1. The PR enabling the new ABI hasn't landed in MLIR and isn't in 5.3. So whatever y'all currently have will work
  2. This PR doesn't bump the MLIR commit to one that uses the new ABI (since there isn't one yet) and so merging it won't invalidate anything
  3. Once both PRs land (which I want to see happen soon) and once the commit pointer is updated to a commit including the new ABI, all cached kernel binaries (but not tuning settings, etc) will become incompatible, as the code will try to call them using the v6 calling convention even though they were generated by a MLIR that made them expect v5's argument layout.

It's my understanding ( @jerryyin ) that we'll need to re-tune the kernels for 5.4 anyway due to the changes you've made to reduction xdlops support, and so I don't think this introduces any additional workload.

@junliume
Copy link
Contributor

@junliume @JehandadKhan @atamazov To clarify the cache invalidation

  1. The PR enabling the new ABI hasn't landed in MLIR and isn't in 5.3. So whatever y'all currently have will work
  2. This PR doesn't bump the MLIR commit to one that uses the new ABI (since there isn't one yet) and so merging it won't invalidate anything
  3. Once both PRs land (which I want to see happen soon) and once the commit pointer is updated to a commit including the new ABI, all cached kernel binaries (but not tuning settings, etc) will become incompatible, as the code will try to call them using the v6 calling convention even though they were generated by a MLIR that made them expect v5's argument layout.

It's my understanding ( @jerryyin ) that we'll need to re-tune the kernels for 5.4 anyway due to the changes you've made to reduction xdlops support, and so I don't think this introduces any additional workload.

Hint to @iq136boy @cderb @JehandadKhan about ROCm 5.4 KDB re-generation due to backward incompatibility with ROCm 5.3

@junliume junliume changed the title [mlir] Switch the ABI to use bare pointers and not memrefs [MLIR] Switch the ABI to use bare pointers and not memrefs Aug 24, 2022
@junliume junliume merged commit e18c938 into develop Aug 24, 2022
@krzysz00 krzysz00 deleted the bare-ptr-api-miir branch August 24, 2022 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants