Highlights
- RDMA over thunderbolt with the JACCL backend (macOS >= 26.2) (some numbers)
- NAX with JIT so that they can be used in MLX Swift
- CUDA improvements
- Many improvements to SDPA (masking, T_q != T_kv)
- Faster quantize/dequantize
- QQMM to make use of faster tensor cores
- Fix in col reduce speeds up training
What's Changed
- patch + fix docs build by @awni in #2799
- Fix macos release target and linux arm release by @awni in #2802
- Fix cuda allocator copy condition by @awni in #2800
- [CUDA] Partly fix random for large sizes by @awni in #2798
- patch bump for future version by @awni in #2804
- Centralize NAX condition by @awni in #2811
- Tolerance for some ops tests on cuda by @awni in #2815
- Fix typo: refs/head/main => refs/heads/main by @zcbenz in #2818
- Add float64 Eig and complex64 SVD/Eig support (Fixes #2708) by @harsh-sutariya in #2737
- Fix
mx.core.loadtype annotation by @CC-Yeh in #2819 - Force cudaGraphExec reinstantiation when clusters are used by @andportnoy in #2813
- Bump actions/checkout from 5 to 6 by @dependabot[bot] in #2828
- Fix
mx.core.linspacetype annotation by @CC-Yeh in #2820 - [CUDA] Exit on crash and more helpful errors by @awni in #2830
- [CUDA] Add debug env to save cuda graphs to dot files by @zcbenz in #2825
- [CUDA] Output of SDPA should have same layout with inputs by @zcbenz in #2826
- Merge build-cuda and build-linux actions by @zcbenz in #2783
- [CUDA] Support array mask in SDPA by @zcbenz in #2822
- [CUDA] Faster rms norm for small dimension by @awni in #2838
- Added clarification to apply_fn parameter of apply_to_modules by @yuchaoran2011 in #2831
- [CUDA] Use cuDNN attention when T_q != T_kv by @zcbenz in #2843
- [CUDA] Migrate conv code to new cuDNN APIs by @zcbenz in #2847
- Support more
Numpyinterfaces formasked_scatterby @CC-Yeh in #2832 - use thread local cpature mode by @awni in #2850
- Fix export scatters by @awni in #2852
- Reduce JVP by @awni in #2854
- Fix graph updating by @awni in #2857
- Fix init from double by @awni in #2861
- Update gumbel function signature parameters by @tianenchong in #2868
- Added support for pytree types that inherit from tuple and typing.namedtuple by @romanoneg in #2845
- Layer norm throws on dimension mismatch by @awni in #2870
- fix compile copying by @awni in #2871
- Do a PyPi release for cuda on arm by @awni in #2866
- Add a 2-pass col reduce for CUDA by @angeloskath in #2863
- [CUDA] Faster general copy by @awni in #2873
- [CUDA] Release build for cuda 13 by @awni in #2872
- Make allocator::malloc throw on allocation failure by @zcbenz in #2874
- [Metal] No copy array init by @awni in #2875
- Try not to fail when there should be memory available by @awni in #2869
- [CUDA] Enable more graphs to be updatable by @awni in #2883
- Fix docs: replace nonexistent mx.random.randn with mx.random.normal by @Satyam12singh in #2890
- Allow events in sub graph to be updatable by @awni in #2886
- bump minimum required Python version by @ngoldbaum in #2891
- do not use simd neon intrinsics on x86 by @davidkoski in #2893
- Fix input buffer donation in compile by @CC-Yeh in #2897
- Update nanobind pin to most recent version by @ngoldbaum in #2896
- fp quantize by @nastya236 in #2892
- Fix grad in place updates by @awni in #2899
- [CUDA] Add host nodes to subgraph types for graph update by @awni in #2901
- fix: possible heap-buffer-overflow in RandomBits::eval_cpu (follow for new ASAN CI tests) by @incertum in #2877
- Fix ccache getting disabled by @zcbenz in #2905
- Fix attention for large sizes by @awni in #2903
- No VJP for mask or sinks in attention by @awni in #2909
- Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #2911
- Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #2912
- Use CUDA runtime headers from local python package by @zcbenz in #2906
- DOC : Add compile state example by @Satyam12singh in #2910
- qqmm by @nastya236 in #2789
- Thunderbolt RDMA communications backend by @angeloskath in #2808
- Add JIT support for NAX kernels by @jagrit06 in #2916
- Fix warnings for the NAX build by @angeloskath in #2921
New Contributors
- @dependabot[bot] made their first contribution in #2828
- @yuchaoran2011 made their first contribution in #2831
- @tianenchong made their first contribution in #2868
- @romanoneg made their first contribution in #2845
- @Satyam12singh made their first contribution in #2890
- @ngoldbaum made their first contribution in #2891
Full Changelog: v0.30.0...v0.30.1