v0.30.1

@awni

Highlights

RDMA over thunderbolt with the JACCL backend (macOS >= 26.2) (some numbers)
NAX with JIT so that they can be used in MLX Swift
CUDA improvements
- Many improvements to SDPA (masking, T_q != T_kv)
- Faster quantize/dequantize
- QQMM to make use of faster tensor cores
- Fix in col reduce speeds up training

What's Changed

patch + fix docs build by @awni in #2799
Fix macos release target and linux arm release by @awni in #2802
Fix cuda allocator copy condition by @awni in #2800
[CUDA] Partly fix random for large sizes by @awni in #2798
patch bump for future version by @awni in #2804
Centralize NAX condition by @awni in #2811
Tolerance for some ops tests on cuda by @awni in #2815
Fix typo: refs/head/main => refs/heads/main by @zcbenz in #2818
Add float64 Eig and complex64 SVD/Eig support (Fixes #2708) by @harsh-sutariya in #2737
Fix mx.core.load type annotation by @CC-Yeh in #2819
Force cudaGraphExec reinstantiation when clusters are used by @andportnoy in #2813
Bump actions/checkout from 5 to 6 by @dependabot[bot] in #2828
Fix mx.core.linspace type annotation by @CC-Yeh in #2820
[CUDA] Exit on crash and more helpful errors by @awni in #2830
[CUDA] Add debug env to save cuda graphs to dot files by @zcbenz in #2825
[CUDA] Output of SDPA should have same layout with inputs by @zcbenz in #2826
Merge build-cuda and build-linux actions by @zcbenz in #2783
[CUDA] Support array mask in SDPA by @zcbenz in #2822
[CUDA] Faster rms norm for small dimension by @awni in #2838
Added clarification to apply_fn parameter of apply_to_modules by @yuchaoran2011 in #2831
[CUDA] Use cuDNN attention when T_q != T_kv by @zcbenz in #2843
[CUDA] Migrate conv code to new cuDNN APIs by @zcbenz in #2847
Support more Numpy interfaces for masked_scatter by @CC-Yeh in #2832
use thread local cpature mode by @awni in #2850
Fix export scatters by @awni in #2852
Reduce JVP by @awni in #2854
Fix graph updating by @awni in #2857
Fix init from double by @awni in #2861
Update gumbel function signature parameters by @tianenchong in #2868
Added support for pytree types that inherit from tuple and typing.namedtuple by @romanoneg in #2845
Layer norm throws on dimension mismatch by @awni in #2870
fix compile copying by @awni in #2871
Do a PyPi release for cuda on arm by @awni in #2866
Add a 2-pass col reduce for CUDA by @angeloskath in #2863
[CUDA] Faster general copy by @awni in #2873
[CUDA] Release build for cuda 13 by @awni in #2872
Make allocator::malloc throw on allocation failure by @zcbenz in #2874
[Metal] No copy array init by @awni in #2875
Try not to fail when there should be memory available by @awni in #2869
[CUDA] Enable more graphs to be updatable by @awni in #2883
Fix docs: replace nonexistent mx.random.randn with mx.random.normal by @Satyam12singh in #2890
Allow events in sub graph to be updatable by @awni in #2886
bump minimum required Python version by @ngoldbaum in #2891
do not use simd neon intrinsics on x86 by @davidkoski in #2893
Fix input buffer donation in compile by @CC-Yeh in #2897
Update nanobind pin to most recent version by @ngoldbaum in #2896
fp quantize by @nastya236 in #2892
Fix grad in place updates by @awni in #2899
[CUDA] Add host nodes to subgraph types for graph update by @awni in #2901
fix: possible heap-buffer-overflow in RandomBits::eval_cpu (follow for new ASAN CI tests) by @incertum in #2877
Fix ccache getting disabled by @zcbenz in #2905
Fix attention for large sizes by @awni in #2903
No VJP for mask or sinks in attention by @awni in #2909
Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #2911
Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #2912
Use CUDA runtime headers from local python package by @zcbenz in #2906
DOC : Add compile state example by @Satyam12singh in #2910
qqmm by @nastya236 in #2789
Thunderbolt RDMA communications backend by @angeloskath in #2808
Add JIT support for NAX kernels by @jagrit06 in #2916
Fix warnings for the NAX build by @angeloskath in #2921

New Contributors

@dependabot[bot] made their first contribution in #2828
@yuchaoran2011 made their first contribution in #2831
@tianenchong made their first contribution in #2868
@romanoneg made their first contribution in #2845
@Satyam12singh made their first contribution in #2890
@ngoldbaum made their first contribution in #2891

Full Changelog: v0.30.0...v0.30.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.30.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!