update mla rope mcore>=0.18 (0.15-0.18 compat)#114
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the README files to correct the categorization of Kimi models and introduces a new RoPE utility module (rope.py) supporting both conventional (bshd) and packed sequence (thd) formats. Feedback on the new RoPE implementation highlights several critical issues: a hardcoded 4D tensor slicing in _rotate_half that could cause runtime errors on other dimensionalities, potential AttributeError and ValueError exceptions when context parallel (cp_group) is not enabled or initialized, and performance degradation due to multiple synchronous host-device transfers when accessing cu_seqlens on the GPU.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the README documentation to categorize kimi_k25 as a VL model and adjusts the import path of apply_rotary_pos_emb in deepseek_v4.py. It also introduces a patch for rope_utils.apply_rotary_pos_emb in patcher.py to handle mla_rotary_interleaved. Feedback points out that the current patching mechanism can cause a TypeError on older Megatron-Core versions if unsupported keyword arguments are passed, and suggests a robust argument-filtering approach using signature inspection.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
No description provided.