[MLIR] Switch the ABI to use bare pointers and not memrefs#1630
[MLIR] Switch the ABI to use bare pointers and not memrefs#1630
Conversation
jerryyin
left a comment
There was a problem hiding this comment.
Adding a blocker in case it get merged pre-maturely.
On one aspect, this has to wait till MIOpen branches for ROCm 5.3.
On the other aspect, this will break past kernel ABI and require MIOpen re-generate all kernel database.
@atamazov Based on our past discussion on backward compatibility, it looks like we'd need to keep both implementation and macro out the old one.
|
My sense of the backwards compatibility situation is that a given MIOpen release is compatible with exactly the corresponding MLIR release and there are no guarantees outside of that? |
|
@jerryyin ... Ah, yeah, in which case, the change is that |
Exactly, actually we already have that version defined. You just need to make sure to bump this in your MLIR PR. Then eventually we can delete the macroed code when ROCm 5.2 is deprecated. Thinking of which, I think you don't need to delete most of the memref related calculation but only need to macro out the kernel launching part. Those memref calculation can still be useful suppose we have dynamic kernel support. |
|
Won't we get dead code complaints if we don't delete it? |
|
Update, yeah, |
I don't think there's enough time to re-tune the entire benchmark space and re-generate the kernel database. @JehandadKhan Could you correct me? How much time do you have until ROCm 5.3 branch to rerun MLIR solvers with all supported benchmarks? @krzysz00 The answer is likely no. What that means is that even if we shipped this in invoker, a customer would get mass amount of seg-fault by trying to launch an old kernel from the new invoker. |
|
Fair enough, I've got no real objections to waiting until MIOpen fires off its release branch and so we can merge breaking changes again. |
|
(Though to rephrase, what I meant was that, in ROCm 5.3, |
|
@krzysz00 That's true, but then the newly added changes aren't really get regressed at all from MIOpen end (If you don't bump the MLIR commit). I'd be fine with you pushing this in (as ROCm 5.3) as long as:
|
|
We could also wait until MIOpen branches - feels much simpler. |
This comment was marked as off-topic.
This comment was marked as off-topic.
68b0705 to
b3fd9fa
Compare
|
@junliume May I ask when will MIOpen branch for ROCm 5.3? This pull request is a breaking change, therefore we will proceed with PR review once MIOpen is ready for the next release. |
src/conv/invokers/mlir_impl_gemm.cpp
Outdated
There was a problem hiding this comment.
[Question] Are we going to deprecate MIIR version 5 (and older) and when?
There was a problem hiding this comment.
We can deprecate this, at the soonest, after the change lands on both sides, and at the latest, after 5.4 goes out, I'd say. Because 5.4 would have the v6 ABI, which means all kernels tuned for 5.4 would be using the new ABI and so we couldn't get weird memory faults from a kernel expecting the v5 ABI being cached somewhere.
Is it, really? Can you please show where the breakage happens, in our code? Thanks! |
If we check the MIIR version and enable new features only when they are available, what can go wrong? Seems like I'm missing something... |
This PR as-is does not break anything. But if we ship as-is it means a decent of chunk untested will go in MIOpen, are you okay with that?
That sounds fair. I think the goal was to ship the feature all-together in next release. But you are right that if we unblock this PR then it allows MLIR development to continue. Till the next release, we'd only need to bump the commit pointer. |
|
The way things can break is if there's a compiled kernel binary from MLIR sitting around in a cache somewhere that was generated by a v5 MLIR commit and that is being run by a MIOpen compiled 'against a v6 MLIR commit. Then, the v6 code won't set the arguments to the v5 binary correctly and segfaults happen. (and this goes the other way - if you compile a binary under v6 and then run it under v5, you'll see breakage) |
Yes, kernel db will completely break for MLIR when MIOpen switch to a newer MLIR commit. However, it isn't a problem as long as MIOpen use the old commit that use the older compatibility number (which still try use the interface from the macroed implementation). |
b3fd9fa to
871c036
Compare
|
Merging the MLIR side of this requires #1673 to land first |
|
@krzysz00 In case you don't know, if you want a PR to merge in MIOpen it has to satisfy |
This is a companion PR to ROCm/rocMLIR#690 and should not be merged until that lands. These changes will need to be revisited if the MLIR solver ever supports dynamic shapes.
84e7d4d to
0bb6948
Compare
|
@DrizztDoUrden @jerryyin , the approval seems to have disappeared and I have a CI pass - can one of y'all re-review and merge? |
|
@JehandadKhan , @junliume could one of y'all merge this? |
|
@atamazov @JehandadKhan is the discussion on breaking changes concluded? Need your reviews here. For example, MLIR generated kernels in previous KDBs, will they be compatible before and after this PR? |
|
If y'all're keeping around generated kernel binaries, then if you compile against a MLIR that's landed the corresponding ABI change PR and try to run those binaries, they'll be called incorrectly (pointers will be placed in unexpected places). |
|
(This is somewhat related to @whchung 's work on a faster launch ABI, and so I think the cache invalidation is worth it) |
|
@junliume @JehandadKhan @atamazov To clarify the cache invalidation
It's my understanding ( @jerryyin ) that we'll need to re-tune the kernels for 5.4 anyway due to the changes you've made to reduction xdlops support, and so I don't think this introduces any additional workload. |
Hint to @iq136boy @cderb @JehandadKhan about ROCm 5.4 KDB re-generation due to backward incompatibility with ROCm 5.3 |
This is a companion PR to
ROCm/rocMLIR#690 and
should not be merged until that lands.
These changes will need to be revisited if the MLIR solver ever
supports dynamic shapes.
(I'm not entirely sure what to do about the unused args warnings)