Codestin Search App

krzysz00 · 2022-07-11T22:21:54Z

This is a companion PR to
ROCm/rocMLIR#690 and
should not be merged until that lands.

These changes will need to be revisited if the MLIR solver ever
supports dynamic shapes.

(I'm not entirely sure what to do about the unused args warnings)

jerryyin

Adding a blocker in case it get merged pre-maturely.

On one aspect, this has to wait till MIOpen branches for ROCm 5.3.

On the other aspect, this will break past kernel ABI and require MIOpen re-generate all kernel database.

@atamazov Based on our past discussion on backward compatibility, it looks like we'd need to keep both implementation and macro out the old one.

krzysz00 · 2022-07-12T01:15:49Z

My sense of the backwards compatibility situation is that a given MIOpen release is compatible with exactly the corresponding MLIR release and there are no guarantees outside of that?

jerryyin · 2022-07-12T14:03:42Z

@krzysz00 Refer to the discussion #1558

krzysz00 · 2022-07-12T15:27:56Z

@jerryyin ... Ah, yeah, in which case, the change is that Miir.h defines something like MIIR_VERSION and we use a #if defined(MIIR_VERSION) && MIIR_VERSION >= 2 to enable the new ABI?

jerryyin · 2022-07-12T15:35:31Z

in which case, the change is that Miir.h defines something like MIIR_VERSION

Exactly, actually we already have that version defined. You just need to make sure to bump this in your MLIR PR. Then eventually we can delete the macroed code when ROCm 5.2 is deprecated.

Thinking of which, I think you don't need to delete most of the memref related calculation but only need to macro out the kernel launching part. Those memref calculation can still be useful suppose we have dynamic kernel support.

krzysz00 · 2022-07-12T16:07:52Z

Won't we get dead code complaints if we don't delete it?

krzysz00 · 2022-07-12T18:24:53Z

Update, yeah, #ifdef works. We could probably land this now, since MIOpen 5.3 will still see the old version?

jerryyin · 2022-07-12T18:32:40Z

We could probably land this now, since MIOpen 5.3 will still see the old version?

I don't think there's enough time to re-tune the entire benchmark space and re-generate the kernel database. @JehandadKhan Could you correct me? How much time do you have until ROCm 5.3 branch to rerun MLIR solvers with all supported benchmarks?

@krzysz00 The answer is likely no. What that means is that even if we shipped this in invoker, a customer would get mass amount of seg-fault by trying to launch an old kernel from the new invoker.

krzysz00 · 2022-07-12T18:33:57Z

Fair enough, I've got no real objections to waiting until MIOpen fires off its release branch and so we can merge breaking changes again.

krzysz00 · 2022-07-12T18:40:56Z

(Though to rephrase, what I meant was that, in ROCm 5.3, MIIR_VERSION_FLAT will still be 5, so none of this will matter.

jerryyin · 2022-07-12T18:46:23Z

@krzysz00 That's true, but then the newly added changes aren't really get regressed at all from MIOpen end (If you don't bump the MLIR commit). I'd be fine with you pushing this in (as ROCm 5.3) as long as:

You don't bump MLIR commit, so it won't break anything with our release branch
You finish whatever necessary for MIOpen repo to approve and merge it

krzysz00 · 2022-07-12T18:50:37Z

We could also wait until MIOpen branches - feels much simpler.

jerryyin · 2022-08-03T13:19:56Z

@junliume May I ask when will MIOpen branch for ROCm 5.3? This pull request is a breaking change, therefore we will proceed with PR review once MIOpen is ready for the next release.

atamazov · 2022-08-03T15:22:09Z

src/conv/invokers/mlir_impl_gemm.cpp

[Question] Are we going to deprecate MIIR version 5 (and older) and when?

We can deprecate this, at the soonest, after the change lands on both sides, and at the latest, after 5.4 goes out, I'd say. Because 5.4 would have the v6 ABI, which means all kernels tuned for 5.4 would be using the new ABI and so we couldn't get weird memory faults from a kernel expecting the v5 ABI being cached somewhere.

atamazov · 2022-08-03T15:24:10Z

@jerryyin

This pull request is a breaking change...

Is it, really? Can you please show where the breakage happens, in our code? Thanks!

atamazov · 2022-08-03T15:30:23Z

@krzysz00

This is a companion PR to ROCmSoftwarePlatform/llvm-project-mlir#690 and should not be merged until that lands.

If we check the MIIR version and enable new features only when they are available, what can go wrong? Seems like I'm missing something...

jerryyin · 2022-08-03T15:37:14Z

Is it, really? Can you please show where the breakage happens

This PR as-is does not break anything. But if we ship as-is it means a decent of chunk untested will go in MIOpen, are you okay with that?

If we check the MIIR version and enable new features only when they are available

That sounds fair. I think the goal was to ship the feature all-together in next release. But you are right that if we unblock this PR then it allows MLIR development to continue. Till the next release, we'd only need to bump the commit pointer.

jerryyin

LGTM

krzysz00 · 2022-08-03T18:08:03Z

The way things can break is if there's a compiled kernel binary from MLIR sitting around in a cache somewhere that was generated by a v5 MLIR commit and that is being run by a MIOpen compiled 'against a v6 MLIR commit. Then, the v6 code won't set the arguments to the v5 binary correctly and segfaults happen.

(and this goes the other way - if you compile a binary under v6 and then run it under v5, you'll see breakage)

jerryyin · 2022-08-03T18:25:06Z

The way things can break is if there's a compiled kernel binary from MLIR sitting around in a cache

Yes, kernel db will completely break for MLIR when MIOpen switch to a newer MLIR commit. However, it isn't a problem as long as MIOpen use the old commit that use the older compatibility number (which still try use the interface from the macroed implementation).

krzysz00 · 2022-08-04T18:54:33Z

Merging the MLIR side of this requires #1673 to land first

DrizztDoUrden

LGTM

jerryyin · 2022-08-11T18:33:59Z

@krzysz00 In case you don't know, if you want a PR to merge in MIOpen it has to satisfy full pass + approved as well. That means you'll have to baby-sit the MIOpen CI for your PR. I have helped you kick off for rerunning the stage. I'll let you monitor a full pass so we are in the ready state.

This is a companion PR to ROCm/rocMLIR#690 and should not be merged until that lands. These changes will need to be revisited if the MLIR solver ever supports dynamic shapes.

krzysz00 · 2022-08-19T18:46:49Z

@DrizztDoUrden @jerryyin , the approval seems to have disappeared and I have a CI pass - can one of y'all re-review and merge?

krzysz00 · 2022-08-19T20:06:42Z

@JehandadKhan , @junliume could one of y'all merge this?

junliume · 2022-08-19T20:20:41Z

@atamazov @JehandadKhan is the discussion on breaking changes concluded? Need your reviews here. For example, MLIR generated kernels in previous KDBs, will they be compatible before and after this PR?

krzysz00 · 2022-08-19T21:49:48Z

If y'all're keeping around generated kernel binaries, then if you compile against a MLIR that's landed the corresponding ABI change PR and try to run those binaries, they'll be called incorrectly (pointers will be placed in unexpected places).

krzysz00 · 2022-08-19T21:51:16Z

(This is somewhat related to @whchung 's work on a faster launch ABI, and so I think the cache invalidation is worth it)

krzysz00 · 2022-08-23T16:26:47Z

@junliume @JehandadKhan @atamazov To clarify the cache invalidation

The PR enabling the new ABI hasn't landed in MLIR and isn't in 5.3. So whatever y'all currently have will work
This PR doesn't bump the MLIR commit to one that uses the new ABI (since there isn't one yet) and so merging it won't invalidate anything
Once both PRs land (which I want to see happen soon) and once the commit pointer is updated to a commit including the new ABI, all cached kernel binaries (but not tuning settings, etc) will become incompatible, as the code will try to call them using the v6 calling convention even though they were generated by a MLIR that made them expect v5's argument layout.

It's my understanding ( @jerryyin ) that we'll need to re-tune the kernels for 5.4 anyway due to the changes you've made to reduction xdlops support, and so I don't think this introduces any additional workload.

junliume · 2022-08-24T00:52:27Z

@junliume @JehandadKhan @atamazov To clarify the cache invalidation

The PR enabling the new ABI hasn't landed in MLIR and isn't in 5.3. So whatever y'all currently have will work

This PR doesn't bump the MLIR commit to one that uses the new ABI (since there isn't one yet) and so merging it won't invalidate anything

Once both PRs land (which I want to see happen soon) and once the commit pointer is updated to a commit including the new ABI, all cached kernel binaries (but not tuning settings, etc) will become incompatible, as the code will try to call them using the v6 calling convention even though they were generated by a MLIR that made them expect v5's argument layout.

It's my understanding ( @jerryyin ) that we'll need to re-tune the kernels for 5.4 anyway due to the changes you've made to reduction xdlops support, and so I don't think this introduces any additional workload.

Hint to @iq136boy @cderb @JehandadKhan about ROCm 5.4 KDB re-generation due to backward incompatibility with ROCm 5.3

krzysz00 requested a review from jerryyin July 11, 2022 22:21

jerryyin requested changes Jul 11, 2022

View reviewed changes

krzysz00 marked this pull request as draft July 12, 2022 18:50

This comment was marked as off-topic.

Sign in to view

junliume added the ON_HOLD label Jul 27, 2022

krzysz00 force-pushed the bare-ptr-api-miir branch from 68b0705 to b3fd9fa Compare August 2, 2022 21:07

atamazov reviewed Aug 3, 2022

View reviewed changes

jerryyin previously approved these changes Aug 3, 2022

View reviewed changes

krzysz00 marked this pull request as ready for review August 3, 2022 18:05

krzysz00 dismissed jerryyin’s stale review via 871c036 August 4, 2022 15:46

krzysz00 force-pushed the bare-ptr-api-miir branch from b3fd9fa to 871c036 Compare August 4, 2022 15:46

jerryyin self-requested a review August 4, 2022 19:07

jerryyin previously approved these changes Aug 4, 2022

View reviewed changes

DrizztDoUrden previously approved these changes Aug 9, 2022

View reviewed changes

zhanglx13 mentioned this pull request Aug 11, 2022

Include libMLIRMIOpen into all target ROCm/rocMLIR#725

Merged

[mlir] Switch the ABI to use bare pointers and not memrefs

0bb6948

This is a companion PR to ROCm/rocMLIR#690 and should not be merged until that lands. These changes will need to be revisited if the MLIR solver ever supports dynamic shapes.

krzysz00 dismissed stale reviews from DrizztDoUrden and jerryyin via 0bb6948 August 18, 2022 16:41

krzysz00 force-pushed the bare-ptr-api-miir branch from 84e7d4d to 0bb6948 Compare August 18, 2022 16:41

krzysz00 removed the ON_HOLD label Aug 19, 2022

jerryyin self-requested a review August 19, 2022 20:03

jerryyin approved these changes Aug 19, 2022

View reviewed changes

jerryyin added tuning TESTING_CI_PASSED labels Aug 22, 2022

jerryyin requested review from DrizztDoUrden and atamazov August 23, 2022 17:14

junliume changed the title ~~[mlir] Switch the ABI to use bare pointers and not memrefs~~ [MLIR] Switch the ABI to use bare pointers and not memrefs Aug 24, 2022

junliume merged commit e18c938 into develop Aug 24, 2022

krzysz00 deleted the bare-ptr-api-miir branch August 24, 2022 16:16

Conversation

krzysz00 commented Jul 11, 2022

Uh oh!

jerryyin left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

jerryyin commented Jul 12, 2022

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

jerryyin commented Jul 12, 2022

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

jerryyin commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

jerryyin commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 commented Jul 12, 2022

Uh oh!

This comment was marked as off-topic.

jerryyin commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atamazov Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

krzysz00 Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

atamazov commented Aug 3, 2022

Uh oh!

atamazov commented Aug 3, 2022

Uh oh!

jerryyin commented Aug 3, 2022

Uh oh!

jerryyin left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Aug 3, 2022

Uh oh!

jerryyin commented Aug 3, 2022

Uh oh!

krzysz00 commented Aug 4, 2022

Uh oh!

DrizztDoUrden left a comment

Choose a reason for hiding this comment

Uh oh!

jerryyin commented Aug 11, 2022

Uh oh!

krzysz00 commented Aug 19, 2022

Uh oh!

krzysz00 commented Aug 19, 2022

Uh oh!

junliume commented Aug 19, 2022

Uh oh!

krzysz00 commented Aug 19, 2022

Uh oh!

krzysz00 commented Aug 19, 2022

Uh oh!

krzysz00 commented Aug 23, 2022

Uh oh!

junliume commented Aug 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

jerryyin commented Jul 12, 2022 •

edited

Loading

jerryyin commented Jul 12, 2022 •

edited

Loading

jerryyin commented Aug 3, 2022 •

edited

Loading