Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jjsjann123
Copy link
Collaborator

No description provided.

@jjsjann123
Copy link
Collaborator Author

!test

Copy link

github-actions bot commented Sep 9, 2025

Review updated until commit d8db005

Description

  • Fix allocation and indexing for layout ops

  • Prevent scheduling when non-indexable ops are consumed

  • Add tests for layout op with scheduler support

  • Improve device split validation and error handling


Changes walkthrough 📝

Relevant files
Bug fix
5 files
fusion_segmenter.cpp
Set allocation domain for layout op inputs                             
+5/-0     
indexing.cpp
Skip resize in layout padding for safety                                 
+6/-0     
allocations.cpp
Handle allocation vs logical sizes in tensor output           
+43/-9   
domain_map.cpp
Ignore layout op offsets in domain mapping                             
+4/-0     
vectorize_helper.cpp
Handle non-device splits in vectorization                               
+10/-1   
Enhancement
7 files
utils.cpp
Identify index select uses in layout op                                   
+6/-0     
utils.cpp
Update device split validation to return bool                       
+18/-17 
registry.cpp
Block scheduling on non-indexable op consumers                     
+9/-2     
registry_utils.cpp
Add check for consumers of non-indexable ops                         
+11/-0   
utils.cpp
Skip cacheFork for layout op outputs                                         
+4/-2     
utils.h
Update validateDeviceSplit to isValidateDeviceSplit           
+1/-1     
registry_utils.h
Declare hasConsumerOfNonIndexableOps function                       
+2/-0     
Tests
1 files
test_layout_op.cpp
Add multiple layout op scheduler tests                                     
+161/-0 
Formatting
1 files
test_low_precision_recipe.cpp
Fix typo in test name                                                                       
+1/-1     

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review

Possible Issue

The code adds handling for PreprocessGroupedMatmulInputSf in makeFusion, but the comment indicates uncertainty about safety ("TODO: check all uses are safe"). This requires validation to ensure the allocation domain setting is correct and safe for this operation type.

} else if (inp->isDefinitionType<PreprocessGroupedMatmulInputSf>()) {
  // There's no point of replaying allocation domain if we cannot index into TV anyway.
  // TODO: check all uses are safe
  auto* tv_ptr = clone_tv->as<TensorView>();
  tv_ptr->setAllocationDomain(tv_ptr->getLogicalDomain(), true);
Possible Issue

The TODO comment suggests that PreprocessGroupedMatmulInputSf outputs need to be in global memory or fusion output, but this check is not implemented. This could lead to scheduling issues if outputs are used in intermediate computations.

// TODO: check PreprocessGroupedMatmulInputSf's output is in global memory / fusion output
Possible Issue

The test SchedulerKernelWithConsumer contains FIXME comments indicating that consuming PreprocessGroupedMatmulInputSf output is currently undefined behavior and should be validated or result in an error. This test case may be testing invalid usage.

// FIXME: this is undefined and we should error out.
// FIXME: add validation for relu_tv.
// TODO: consumer of output from PreprocessGroupedMatmulInputSf needs to be segmented, because indexing won't work on lowerSrcIndex. So this needs to be changed into some other operation that would go through expr_eval instead. Maybe a matmul or something like that.
auto relu_tv = relu(out_tv);
fusion.addOutput(relu_tv);

@jjsjann123 jjsjann123 force-pushed the jj/wip branch 2 times, most recently from f994f4e to dcdeff5 Compare September 10, 2025 21:12
@jjsjann123 jjsjann123 force-pushed the jj/layout_op_PR2_manual_kernel branch from a7f2636 to 6072fd3 Compare September 10, 2025 23:48
@jjsjann123 jjsjann123 force-pushed the jj/wip branch 2 times, most recently from 29889e8 to 856f291 Compare September 15, 2025 16:16
Base automatically changed from jj/layout_op_PR2_manual_kernel to main September 15, 2025 22:37
jjsjann123 and others added 8 commits September 17, 2025 02:09
test case added

wip

prevent cacheAndForkOutputs

disabl cacheInputs for offsets TVs

change domain stuff in reference TV

revert unused changes

err something isn't working right

wip
misc fixes

adding quick note for me to pick up later

break consumer of layout op into separate fusion

err

quick fix on logic

refactor to use resize for transform replay

clear allocation domain

revert resize change for allocation domain; wipe  out allocation for layout op on fusion segment boundaries

trying to patch allocation domain handling during allocation

fix cases where allocation domain isn't available

fixing allocation transform and fixing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant