Codestin Search App

Thanks to visit codestin.com
Credit goes to github.com

April 30, 2025 – May 7, 2025

Overview

208 Active pull requests

247 Active issues

2 Pull requests merged by 2 people

[dynamo][super variable] Fix bug to use correct source
#152774 merged May 6, 2025
[cudagraphs] Fix issue in collecting static_input_idxs
#152768 merged May 6, 2025

206 Pull requests opened by 117 people

[CI] Use cmake from pip instead of conda in CI docker images
#152537 opened Apr 30, 2025
Use swap_tensors path in nn.Module.to for all subclasses that override __torch_dispatch__
#152539 opened Apr 30, 2025
[CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
#152540 opened Apr 30, 2025
strict multidimensional slicing
#152543 opened Apr 30, 2025
ci: Switch benchmark dependency to use pip
#152545 opened Apr 30, 2025
Remove Conda Instructions
#152546 opened Apr 30, 2025
Implemented `Size.__radd__`
#152554 opened Apr 30, 2025
[BE] Update numba versions
#152557 opened Apr 30, 2025
xpu: rely on sycl/sycl.hpp to include bfloat16.hpp
#152562 opened Apr 30, 2025
[c10d][fr] Make FR vendor neutral so that other backends can use it
#152563 opened Apr 30, 2025
[ROCm] Update spack includes
#152569 opened Apr 30, 2025
[Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl
#152570 opened Apr 30, 2025
[Dynamo] Fix typing in graph_deduplication.py
#152572 opened May 1, 2025
Allow decomposeK to fuse
#152573 opened May 1, 2025
[2/N] Use std::filesystem
#152586 opened May 1, 2025
[WIP] suggest whitelist for dynamic shape recompilations
#152588 opened May 1, 2025
[Dynamo] Optimize dedupe region ancestor tracking
#152589 opened May 1, 2025
Fix #152280: add Literal[…] PaddingMode to Conv modules
#152590 opened May 1, 2025
Fix: promote scalar to MPS device in exec_binary_kernel
#152591 opened May 1, 2025
[c10d] Add support for ReduceOp::AVG in ProcessGroupMPI for FSDP2
#152594 opened May 1, 2025
[not for review] benchmark script
#152596 opened May 1, 2025
[multigraph] add backend_specialization kwarg to mark_dynamic
#152597 opened May 1, 2025
[multigraph] use backend specializations in compile_and_call_fx_graph
#152601 opened May 1, 2025
[BE] Delete `Module_CUDA_fix`
#152603 opened May 1, 2025
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 opened May 1, 2025
[Environment Variable] Use thread-safe getenv functions
#152609 opened May 1, 2025
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 opened May 1, 2025
Makefile: refactor build, setup and lint rules
#152611 opened May 1, 2025
Revert "Cleanup VS 2019 refs in pytorch (#145863)"
#152613 opened May 1, 2025
[WIP] Make FR vendor generic and try to enable it for gloo
#152614 opened May 1, 2025
Stop proxy-ing autograd.Function.ctx into the graph
#152621 opened May 1, 2025
Parameterized CUDA Graph Launch
#152622 opened May 1, 2025
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 opened May 1, 2025
[ROCm] Initial AITER Integration for mha_bwd asm kernels
#152630 opened May 1, 2025
[ca] wrap flex attention tests with compiled autograd
#152633 opened May 1, 2025
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 opened May 1, 2025
[BE]remove vulkan test
#152643 opened May 1, 2025
[do-not-land][ca] default on for CI
#152646 opened May 1, 2025
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 opened May 2, 2025
Refactor some common autotune-related utils into a new file
#152652 opened May 2, 2025
cleanup, refactor and add missing self._dde_suppressed checks
#152657 opened May 2, 2025
Fix the basic description of torch.min(), torch.max(), torch.all(), torch.any()
#152658 opened May 2, 2025
Fix evaluate_expr to include suppress_guards_tls in cache key
#152661 opened May 2, 2025
Re-enable FakeTensor caching for SymInts
#152662 opened May 2, 2025
Raise error when no record on extra_files
#152664 opened May 2, 2025
MXFP8 Fix broken bias support for mxfp8
#152665 opened May 2, 2025
Added documentation for nonzero_static function (#152347)
#152669 opened May 2, 2025
[export] Dynamo symint support
#152677 opened May 2, 2025
Fix signature of torch.sparse_coo_tensor()
#152681 opened May 2, 2025
Update the signature and test of torch.hamming_window()
#152682 opened May 2, 2025
[ca][dtensor] run real PG dtensor tests under CA
#152689 opened May 2, 2025
set CUDA_MODULE_LOADING for older drivers only
#152695 opened May 2, 2025
Removing conda references from PyTorch Docs
#152702 opened May 2, 2025
[Memento] Add PT2 to Memory Snapshot
#152707 opened May 2, 2025
Scheduler Flops refactor
#152708 opened May 2, 2025
remove conda from devcontainer
#152713 opened May 2, 2025
[caffe2] Make c10::str works with scoped enum (#152705)
#152714 opened May 2, 2025
[BE][CI] Merge regular and MPS test config shards
#152719 opened May 2, 2025
try something
#152722 opened May 2, 2025
[Dynamo] Guard serialization for NN_MODULE
#152725 opened May 2, 2025
[Dynamo] Guard serialization for FUNCTION_MATCH
#152727 opened May 2, 2025
[Dynamo] Guard serialization for CLOSURE_MATCH
#152728 opened May 2, 2025
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 opened May 2, 2025
[Dynamo] Guard serialization for SEQUENCE_LENGTH
#152730 opened May 2, 2025
[Cutlass] Handle broadcasting in EVT python codegen
#152733 opened May 2, 2025
docs: fix dead link in torch.compile docs
#152734 opened May 2, 2025
[ca][ddp] loud error instead of silent incorrectness under C++ Reducer
#152735 opened May 2, 2025
[BE][Cleanup][Dynamo] Stop logging entire_frame_compile_time_s
#152738 opened May 2, 2025
[export] add serialized_artifact test
#152739 opened May 2, 2025
[export][cond] support merging constant ints as unbacked symint
#152742 opened May 2, 2025
[CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`
#152745 opened May 3, 2025
Conditionally support experimental filesystem include in jit_opt_limit
#152748 opened May 3, 2025
Handle less functions than number of segments
#152753 opened May 3, 2025
Allow ATen ops overloading
#152759 opened May 3, 2025
[Easy][BE] update recommanded VS Code settings
#152760 opened May 3, 2025
added short integer for repeat_interleave_cpu, Fixes #151311
#152762 opened May 3, 2025
[aoti] Add grid_sampler_3d to cshim
#152771 opened May 4, 2025
[Inductor] Pattern matcher support for mutable ops with non-view inputs
#152775 opened May 4, 2025
[WIP] Pattern matcher support for mutable ops with view inputs
#152776 opened May 4, 2025
[BE]: Update cudnn to 9.9 for cu128
#152782 opened May 4, 2025
test that guard_or_true change can only make valid results null but does not change result or make invalid valid
#152784 opened May 4, 2025
Fix negative dim issue in for parallel loss context manager
#152785 opened May 4, 2025
Update CMakeLists.txt
#152786 opened May 4, 2025
Implement DeviceType.h as header-only
#152787 opened May 4, 2025
Fixed rerr computation in lobpcg
#152789 opened May 4, 2025
same test for guard_or_false 1
#152802 opened May 5, 2025
same test for guard_or_false 2
#152803 opened May 5, 2025
[invoke_subgraph] Force the output stride to be same as eager
#152806 opened May 5, 2025
wip
#152807 opened May 5, 2025
another try
#152808 opened May 5, 2025
Upgrade to NCCL 2.26.5 for CUDA 12
#152810 opened May 5, 2025
[Quant][X86] add an op to compute uint8 batch norm 2d
#152811 opened May 5, 2025
[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+
#152814 opened May 5, 2025
[Cutlass] E2E Tests for EVT
#152815 opened May 5, 2025
[TEST][Quantization] Skip test_learnable due to hypothesis
#152819 opened May 5, 2025
[DO NOT MERGE] update build tools version
#152820 opened May 5, 2025
[Easy][Inductor] Adds safety checks in get_estimated_runtime
#152821 opened May 5, 2025
Use gcc13 in Manylinux 2.28 images
#152825 opened May 5, 2025
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 opened May 5, 2025
[BE]: Improve aten formatter with fmtlib
#152830 opened May 5, 2025
Allow to set custom PYTHONPATH for torch.inductor
#152832 opened May 5, 2025
[c10d] Fix extra CUDA context created by barrier
#152834 opened May 5, 2025
[DRAFT] Test nccl
#152835 opened May 5, 2025
[nativert] Move MPMCQueue to torch/nativert.
#152837 opened May 5, 2025
[precompile] Add BundledAOTAutogradCacheEntry
#152840 opened May 5, 2025
Add memory reporting for XPU to Memory Profiler
#152842 opened May 5, 2025
ci: Remove conda-env-macOS-ARM64, prefer pip
#152843 opened May 5, 2025
Fix HF loading when there's no metadata file to work with fsspec
#152856 opened May 5, 2025
Add torch._C.Tag.needs_contiguous_strides
#152859 opened May 5, 2025
[Graph Partition] remove weak dep from `partition_input_names`
#152863 opened May 5, 2025
[Dynamo] Guard serialization for TUPLE_ITERATOR_LEN
#152865 opened May 5, 2025
[Dynamo] Guard serialization for RANGE_ITERATOR_MATCH
#152872 opened May 5, 2025
Clarify wrap_triton doc about optional triton_op usage
#152874 opened May 5, 2025
[Graph Partition][Flex Attention] analyze symints from subgraph inputs and outputs
#152878 opened May 5, 2025
xpu: support custom ops with torch.library on xpu backend
#152879 opened May 5, 2025
[dynamo] Fix bug in hasattr(tensor, "size")
#152883 opened May 6, 2025
[SDPA] Add testing to ensure stride order exactly matches
#152894 opened May 6, 2025
Move link check jobs to pull to go with doc build
#152896 opened May 6, 2025
[Inductor] Set correct baseline for decomposek test
#152897 opened May 6, 2025
Remove `property` from python_type function
#152900 opened May 6, 2025
[Set] Add set.symmetric_difference(_update)
#152901 opened May 6, 2025
[Set] Add `set.issubset` and `set.issuperset`
#152902 opened May 6, 2025
[Set] Raise `KeyError` if elem not contained in the set
#152903 opened May 6, 2025
[Set] Raise TypeError if number of arguments mismatch
#152904 opened May 6, 2025
[Set] Add `set.difference(_update)`
#152905 opened May 6, 2025
[Set] Add `set.intersection(_update)`
#152906 opened May 6, 2025
[Set] Raise KeyError on empty `set.pop()`
#152907 opened May 6, 2025
[Set] Add correct set/frozenset __init__ behavior
#152908 opened May 6, 2025
Add overall tensor similarity comparison (#152647)
#152920 opened May 6, 2025
Upgrade to cuda 12.8.1 for docker builds
#152923 opened May 6, 2025
include user stacks with constraint violation error message
#152924 opened May 6, 2025
[WIP] Add unified memory APIs for torch.accelerator
#152932 opened May 6, 2025
Allow Inductor backends to attest their own availability
#152933 opened May 6, 2025
[Dynamo] Allow inlining into AO quantization modules
#152934 opened May 6, 2025
Fix doc cosineannealinglr 152081
#152936 opened May 6, 2025
[feature] Channel Wise Parallel API for Conv layers
#152937 opened May 6, 2025
[Pipelining] Fix _batch_p2p bug for non-NCCL backends (#132644)
#152938 opened May 6, 2025
get right function declaration on windows inductor
#152939 opened May 6, 2025
[Don't merge] Debug
#152940 opened May 6, 2025
Clean up of CUTLASS_VERSION
#152947 opened May 6, 2025
[Linter] Add linter to detect device-bias hard code in test cases.
#152948 opened May 6, 2025
[dtensor] add privateuse1 SDPA op support to DTensor
#152949 opened May 6, 2025
Add NestedTensorHPU to to_padded_tensor in native_functions.yaml
#152950 opened May 6, 2025
[ROCm] Ck gemm architecture guard
#152951 opened May 6, 2025
[nativert] Move Placement to pytorch core
#152953 opened May 6, 2025
[ROCm] unkip test_non_standard_bool except for failings ops
#152956 opened May 6, 2025
Follow up to #152209, remove compat patch
#152958 opened May 6, 2025
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 opened May 6, 2025
Change aoti cpp tests to run serially within file
#152960 opened May 6, 2025
[Dynamo] Remove unused guard PYMODULE_MATCH
#152961 opened May 6, 2025
WIP so many changes to generate non-as strided view
#152965 opened May 6, 2025
[Memento] On-demand mode using without torch api
#152966 opened May 6, 2025
[ATen][CUDA] Optimize 128 bit vectorization
#152967 opened May 6, 2025
[inductor] Generate synthetic offsets appropriately for autotuning _scaled_grouped_mm
#152968 opened May 6, 2025
[nativert] Move GraphSignature to pytorch core
#152969 opened May 6, 2025
Adding XPU support to DTensor examples.
#152973 opened May 6, 2025
[hop_schema] add HopSchemaGenerator to make it easier to create hop schema
#152974 opened May 6, 2025
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 opened May 6, 2025
[MegaCache] Make MegaCache generic to allow external plugins registration
#152977 opened May 6, 2025
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 opened May 6, 2025
Fix `'TensorBox' object has no attribute 'is_input_buffer'`
#152980 opened May 6, 2025
Catch TypeError from ValueRanges
#152981 opened May 6, 2025
[torch][ao] Properly strip tracking stats in _fold_conv_bn_qat for 1D
#152982 opened May 6, 2025
compile_fx: make a compile event that corresponds to the fx_compile waitcounter
#152983 opened May 6, 2025
[hop_schema] support gen_schema for invoke_subgraph
#152984 opened May 6, 2025
[WIP] Add XPU support for FlightRecorder
#152986 opened May 6, 2025
[Set] Handle exception in ConstantVariable operation
#152987 opened May 6, 2025
[Set] Raise `TypeError` if argument is unhashable
#152988 opened May 6, 2025
[Set] Update `set.union` and `set.update` to support *args
#152989 opened May 6, 2025
[Set] Raise TypeError if set is called with the wrong number of arguments
#152990 opened May 6, 2025
[FrozenSet] Fixes for FrozenSet
#152991 opened May 6, 2025
[inductor] Fix ModularIndexing assumptions
#152993 opened May 6, 2025
[inductor] dtype promotion error in cat decomp
#152995 opened May 6, 2025
[export] Unflatten None
#153000 opened May 6, 2025
[cutlass backend][test] re-enable test_cuda_compile_command for fbcode
#153001 opened May 6, 2025
[CI] Use sccache installed in docker image in xla build
#153002 opened May 6, 2025
[cutlass backend] Skip cuda lib path if it is torch/lib
#153003 opened May 6, 2025
[WIP][Inductor-CPU] int8 WoQ concat linear
#153004 opened May 6, 2025
[autograd][docs] Add more details on why save_for_backward is important in extending autograd note
#153005 opened May 6, 2025
[cutlass backend] Use src code to generate cutlass gemm name
#153006 opened May 6, 2025
Remove redundant type aliases of _device_t for torch.Device (#152952)
#153007 opened May 7, 2025
Detect NVSHMEM location
#153010 opened May 7, 2025
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 opened May 7, 2025
c10d/gloo: add ibverbs backend
#153015 opened May 7, 2025
Add a project section to pyproject.toml, making uv sync work
#153020 opened May 7, 2025
Adding a generic attribute for easier checkpoint discrepancy debugging.
#153021 opened May 7, 2025
[Typing] Apply `torch.types.Device` in `torch/cuda/memory.py`
#153027 opened May 7, 2025
[Typing] Improve device typing for `torch.set_default_device()`
#153028 opened May 7, 2025
WIP: Fix caching when output has unbacked
#153034 opened May 7, 2025
Allow zero sized dimensions in padding operations
#153037 opened May 7, 2025
Add CUDA support for Adagrad(fused=True)
#153038 opened May 7, 2025
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/misc.py` [2/2]
#153039 opened May 7, 2025
[AOTInductor] Generate kernels separately for const graph and main graph
#153040 opened May 7, 2025
[Typing] Remove redundant type aliases of `_device_t` for `torch.types.Device` in `torch/_dynamo/device_interface.py`
#153043 opened May 7, 2025
🌠 Add Muon optimizer
#153048 opened May 7, 2025
Update docs of saved_tensors_hooks to avoid ref cycle
#153049 opened May 7, 2025
[Intel GPU] empty-size tensor case handling in addmm, baddmm
#153051 opened May 7, 2025
[BE]: Use undocumented temp shim to restore setuptools compat
#153052 opened May 7, 2025
[BE]: Blacklist broken setuptools until we upgrade MSVC API
#153053 opened May 7, 2025
[HOP] Reworked HOPs to use FunctionalizeCtxWrapper
#153054 opened May 7, 2025
[BE]: Add PEP621 project section to pyproject.toml
#153055 opened May 7, 2025
[BE] Update ruamel to 0.18.10
#153057 opened May 7, 2025
Fix misleadingly high AOT Inductor dashboard performance
#153060 opened May 7, 2025
Keep raw cubin file around in case it gets deleted underneath us
#153064 opened May 7, 2025
[ONNX] dynamic_shapes uses DYNAMIC
#153065 opened May 7, 2025
fix bug with TORCHINDUCTOR_DUMP_LAUNCH_PARAMS
#153066 opened May 7, 2025

96 Issues closed by 42 people

manylinux_2_28 support
#114232 closed May 7, 2025
[graph pickler] [inductor compile async] imprecise filter for non standard op?
#151904 closed May 7, 2025
[inductor] cudagraph error for individually compiled transformer blocks
#152887 closed May 7, 2025
DISABLED test_sdpa_compile_cuda_bfloat16 (__main__.TestNestedTensorSubclassCUDA)
#119903 closed May 7, 2025
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 closed May 7, 2025
[XPU] test_tensordot_out_kernel_errors_with_autograd_xpu_float32 UT failure
#152090 closed May 7, 2025
Failed visualized 1D DTensor
#152848 closed May 7, 2025
[RFC] A device-agnostic Python runtime API design for stream-based accelerators
#128403 closed May 7, 2025
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153018 closed May 7, 2025
DISABLED test_comprehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153017 closed May 7, 2025
Add metal-flash-attention for MPS backend
#139668 closed May 7, 2025
[MPS] Binary kernels produce incorrect results when one of the tensor arguments is from a wrapped scalar
#152582 closed May 7, 2025
[ONNX] Create a message to suggest users setting dynamo=True when exporting
#152025 closed May 6, 2025
[dynamo] register_module_forward_pre_hook lead to compiled model produce wrong inference results
#149502 closed May 6, 2025
[CI] [anaconda] Review Devcontainer anaconda usage
#148341 closed May 6, 2025
addmv bfloat16 accuracy issues on cpu
#147860 closed May 6, 2025
Optimize printing sympy expressions during logging and cache key computation
#151823 closed May 6, 2025
UNSTABLE pull / linux-docs / build-docs-functorch-false
#152955 closed May 6, 2025
Parameters between models don't copy in the C++ Pytroch Frontend under windows
#114485 closed May 6, 2025
Unexpected result from `torch.xpu.is_bf16_supported()` when XPU is unavailable
#152301 closed May 6, 2025
DISABLED test_dynamo_timed (__main__.TestDynamoTimed)
#148093 closed May 6, 2025
Running `LazyModuleMixin` example throw errors
#150404 closed May 6, 2025
can't build torch on WSL
#152763 closed May 6, 2025
[AOTI] Package lowered with package_constants_in_so=False still uses lots of memory when loaded
#152356 closed May 6, 2025
Throwing more specific errors for CrossEntropyLoss weights being on a different device than the input/target
#122757 closed May 6, 2025
DISABLED test_matmul_layer_norm_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#151835 closed May 6, 2025
DISABLED test_tmp_not_defined_issue2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135219 closed May 6, 2025
[XPU] Get [ZE]: 0x78000011 on torch.compile with new driver
#151898 closed May 6, 2025
flex attention does not leverage masking, memory error
#152528 closed May 5, 2025
Flex attention: batch-index-dependent block mask causes error with changing batch size
#152297 closed May 5, 2025
torch.compile LLMs on MPS progress tracker
#150710 closed May 5, 2025
[ued] Investigate diffuser pipeline transformer recompilations due to different width/height
#150702 closed May 5, 2025
[binary builds] Anaconda. Remove dependency on conda libuv module in MacOS and Windows nightly builds
#145872 closed May 5, 2025
Dynamo Unsupported: call_method UserDefinedObjectVariable(dict_items) __iter__ () {}
#147440 closed May 5, 2025
[binary builds] Anaconda. Remove dependency on conda environment for Windows nightly builds
#146048 closed May 5, 2025
torch.multinomial is not deterministic for large number of input probabilities when replacement=True
#152854 closed May 5, 2025
AsyncCollectiveTensor doesn't trigger wait upon dtype cast
#152534 closed May 5, 2025
Poor performance of torch.dot with float16 & bfloat16
#152798 closed May 5, 2025
Segmentation fault (core dumped) in torch.nn.functional.max_unpool2d
#152804 closed May 5, 2025
Mention of nondeterministic index_add when deterministic implementation is being used
#152817 closed May 5, 2025
There is some discrepancy between the document explain and code implement of optim.SGD() when use maximize=True
#149476 closed May 5, 2025
[RFC][PGNCCL] Add Float8 support
#148344 closed May 5, 2025
False INTERNAL ASSERT FAILED
#152805 closed May 5, 2025
[inductor] [triton] the generated triton code throws `NameError('rindex is not defined')` when using `torch.cummin`
#151738 closed May 5, 2025
The docstring linter should not force overridden methods to be documented
#151692 closed May 5, 2025
[RFC] Intel GPU ATen Operations Upstreaming Options
#119682 closed May 5, 2025
DISABLED test_variant_consistency_eager_nn_functional_conv3d_cuda_complex64 (__main__.TestCommonCUDA)
#114592 closed May 5, 2025
[CPU][UT] 16 UT of test/inductor/test_cpu_select_algorithm.py failed with PyTorch 2025-04-028 nightly wheel
#152398 closed May 5, 2025
Error with nccl + multiple RTX5090 in ddp training. CUDA error: an illegal memory access was encountered
#152780 closed May 5, 2025
Kw argument `dtype` less relative with the functions themselves
#145607 closed May 5, 2025
cpu - gpu calculation results differs by far with torch.nn.functional.linear
#69969 closed May 4, 2025
Inconsistent behavior between CPU and GPU implementations of `torch.Tensor.put_` method
#152755 closed May 4, 2025
Performance Regression nightly 02/14→02/15, on nanogpt speedrun
#152761 closed May 4, 2025
Note some limit in docstring of `padding` in Poolnd
#152156 closed May 4, 2025
AOTInductor package can only be loaded on the first GPU (cuda:0) in C++ via AOTIModelPackageLoader
#152087 closed May 4, 2025
Make scaler.step() return if step was skipped or not
#152279 closed May 3, 2025
[Torch Profiler] Only two streams captured in CUDA graph but multiple streams shown in Torch Profiler
#152114 closed May 3, 2025
The device_id parameter of distributed.init_process_group will cause each process to occupy video memory on the first accessible GPU
#149119 closed May 3, 2025
[inductor][triton] Inductor is not compatible with the latest upstream Triton
#152531 closed May 2, 2025
Performance Regression nightly 2025/02/08→02/09, on nanogpt speedrun
#147463 closed May 2, 2025
DISABLED test_captured_scale_float16_cuda_float16 (__main__.TestFlexAttentionCUDA)
#152083 closed May 2, 2025
DISABLED test_builtin_score_mods_float32_score_mod4_cuda_float32 (__main__.TestFlexAttentionCUDA)
#152082 closed May 2, 2025
Update documentation to include insert and + methods to add layers in sequential
#146892 closed May 2, 2025
test_reference_numerics_normal fails with certain versions of numpy/scipy
#148143 closed May 2, 2025
inductor-periodic failures 5/2/2025
#152691 closed May 2, 2025
static cuda launcher causes `RuntimeError: CUDA driver error: invalid device context` in torchtitan CI
#152639 closed May 2, 2025
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 closed May 2, 2025
DISABLED test_cat_max_autotune_triton (__main__.TestMaxAutotune)
#145830 closed May 2, 2025
DISABLED test_sparse_add_cuda_complex64 (__main__.TestSparseCSRCUDA)
#145069 closed May 2, 2025
Some Performance Bug in `tol` of `torch.lobpcg()`
#152154 closed May 2, 2025
DISABLED test_nvshmem
#152649 closed May 2, 2025
py_limited_api=True in PyTorch2.7 will break the build of extensions
#152243 closed May 2, 2025
[ONNX] Improve and sort out fallback mechanism
#151703 closed May 2, 2025
Should make the doc of `nn.CrossEntropyLoss()` more clear
#134853 closed May 1, 2025
torch.compile should not recompiles when `.requires_grad=True` under `torch.no_grad()` context
#131975 closed May 1, 2025
compiled autograd + dynamic shapes fails with constraint violation
#133575 closed May 1, 2025
[torch.compile] Dynamic shape behavior is different between using torch.compile with and without compiled_autograd.enable
#113129 closed May 1, 2025
Export QAT model is not performing as expected when compared to the original model and FX Graph QAT
#150746 closed May 1, 2025
`torch.export` fails on `InstanceNorm1d`
#152467 closed May 1, 2025
[CI] [anaconda] CI Perf Tests
#148342 closed May 1, 2025
[Inductor] Dynamo hangs when processing an operator, seemingly depending on a logical argument value
#151743 closed May 1, 2025
[export] Warn users when 0/1 specialization happens
#151582 closed May 1, 2025
The test 'test_host_memory_stats' is failing in torch2.7.0+cu118
#152422 closed May 1, 2025
How does torch.cudagraph capture a hybrid graph?
#152584 closed May 1, 2025
`torch.randint` can't handle large `high` argument (and in general high range of `torch.uint64`)
#152564 closed Apr 30, 2025
torch.randint should accept high=2**63
#81446 closed Apr 30, 2025
pytorch index_select is too slow
#111247 closed Apr 30, 2025
cuda graphs produce two additional kernel calls
#143572 closed Apr 30, 2025
[regression] Not getting `CUDA error: device-side assert triggered` on main for CUDA_KERNEL_ASSERT2
#107396 closed Apr 30, 2025
[CI] [anaconda] Benchmarks anaconda removal
#152123 closed Apr 30, 2025
More logs to show why fx graph cache isn't hit / created?
#152065 closed Apr 30, 2025
Mr
#152549 closed Apr 30, 2025
Add Description of `validate_args` in `torch.distributions.`
#152165 closed Apr 30, 2025
[ROCm] "No available kernel" when running EFFICIENT_ATTENTION sdpa
#138864 closed Apr 30, 2025
difficulty creating magma tarball when new rocm or cuda versions are deployed
#151707 closed Apr 30, 2025
[CUDA Graph tree] Cannot capture buffer allocation on side CUDA Streams
#151199 closed Apr 30, 2025

151 Issues opened by 78 people

[FlexAttention] export fails to trace with functorch
#153063 opened May 7, 2025
non-strict export should detect fake tensor leakage
#153062 opened May 7, 2025
register_constant doesn't work on simple types
#153061 opened May 7, 2025
DISABLED test_input_hooks_same (__main__.HooksTests)
#153059 opened May 7, 2025
`cuda.Event` handling in dynamo is broken
#153058 opened May 7, 2025
Export doesn't move embedding to correct device
#153056 opened May 7, 2025
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ on macOS
#153050 opened May 7, 2025
DISABLED test_comprehensive_special_ndtri_cuda_int64 (__main__.TestInductorOpInfoCUDA)
#153047 opened May 7, 2025
DISABLED test_comprehensive_trunc_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#153046 opened May 7, 2025
DISABLED test_hook_with_nested_closure (__main__.HooksTests)
#153045 opened May 7, 2025
Unexpected float32 overflow for amp training with torch.compile
#153044 opened May 7, 2025
Pytorch 2.7 crashes when using flex attention with torch.amp
#153042 opened May 7, 2025
using as_strided in torch compile generate wrong result.
#153041 opened May 7, 2025
Add Split Softmax
#153035 opened May 7, 2025
missalignment with differenet shape in F.linear with bf16 dtype
#153033 opened May 7, 2025
DISABLED test_hook_with_closure (__main__.HooksTests)
#153032 opened May 7, 2025
DISABLED test_comprehensive_svd_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153031 opened May 7, 2025
DISABLED test_comprehensive_amin_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153030 opened May 7, 2025
DISABLED test_comprehensive_asinh_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153029 opened May 7, 2025
Multiple CUDA graphs utilizing multiple CUDA GPUs encounter illegal memory access during replay
#153025 opened May 7, 2025
[RFC] Enable XPU+FlexAttention on Intel GPU
#153024 opened May 7, 2025
XPU inference output abnormal with device 'XPU:1'
#153022 opened May 7, 2025
[RFC][API-Unstable]Enable A16W4 on XPU Device
#153019 opened May 7, 2025
inconsistent grads between two types of `allgather`s
#153016 opened May 7, 2025
Operations on different precision tensors in CPU lead to different outputs
#153014 opened May 7, 2025
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153009 opened May 7, 2025
DISABLED test_comprehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153008 opened May 7, 2025
`lintrunenr init` fails
#152999 opened May 6, 2025
DISABLED test_comprehensive_rsub_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152996 opened May 6, 2025
[dynamo] Actually support functools.lru_cache
#152994 opened May 6, 2025
conv2d with int8 on CUDA: GET was unable to find an engine to execute this computation
#152992 opened May 6, 2025
`torch.load` can't deserialize `datetime` objects, even with the appropriate `safe_globals`
#152985 opened May 6, 2025
FPE when using `torch.lcm_` with int32 tensor and int16 scalar
#152979 opened May 6, 2025
Refactor MegaCache to make it generic
#152976 opened May 6, 2025
avoid falling back to as_strided for non-contiguous in-place reshape.
#152972 opened May 6, 2025
DISABLED test_comprehensive_scatter_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152971 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_int64 (__main__.TestInductorOpInfoXPU)
#152970 opened May 6, 2025
[FSDP2] need dummy forward/backward to stay SPMD
#152964 opened May 6, 2025
DTensor support for dynamic shapes is soft
#152963 opened May 6, 2025
TestNestedTensorOpInfoCUDA.test_compile_backward_matmul_cuda_float32 Test Failure
#152962 opened May 6, 2025
DTensor placement propagation for `slice` fails during recompile due to SymInts
#152954 opened May 6, 2025
Remove redundant type aliases of _device for torch.Device
#152952 opened May 6, 2025
DISABLED test_compiler_collectives_automatic_dynamic_tensor (__main__.TestMultiProc)
#152944 opened May 6, 2025
DISABLED test_comprehensive_ormqr_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152943 opened May 6, 2025
aten._scaled_dot_product_efficient_attention returns LSE padded to next highest multiple of 32
#152942 opened May 6, 2025
ROCm: no HIP device available if device is already initialized
#152941 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_bool (__main__.TestInductorOpInfoXPU)
#152931 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152930 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152929 opened May 6, 2025
DISABLED test_comprehensive_triu_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152928 opened May 6, 2025
DISABLED test_comprehensive_rot90_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152927 opened May 6, 2025
DISABLED test_comprehensive_scatter_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152925 opened May 6, 2025
Enable 12.8.1
#152922 opened May 6, 2025
[inductor][cpu] pytorch_CycleGAN_and_pix2pix AMP multiple thread performance regression in 2025-04-27 nightly release
#152921 opened May 6, 2025
We should include where specialization happens when we throw a constraint violation error
#152918 opened May 6, 2025
UNSTABLE inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper)
#152916 opened May 6, 2025
Segmentation fault (core dumped) in torch.nn.functional.max_unpool2d
#152913 opened May 6, 2025
DISABLED test_comprehensive_scatter_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152912 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152911 opened May 6, 2025
DISABLED test_comprehensive_gather_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152910 opened May 6, 2025
DISABLED test_comprehensive_scatter_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152898 opened May 6, 2025
DISABLED test_comprehensive_nn_functional_conv3d_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152893 opened May 6, 2025
DISABLED AotInductorTest.BasicPackageLoaderTestCpu (build.bin.test_aoti_inference)
#152891 opened May 6, 2025
DISABLED test_comprehensive_sort_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152892 opened May 6, 2025
DISABLED test_comprehensive_diagonal_copy_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152890 opened May 6, 2025
DISABLED AotInductorTest.BasicTestCpu (build.bin.test_aoti_inference)
#152889 opened May 6, 2025
DISABLED AotInductorTest.BasicTestCuda (build.bin.test_aoti_inference)
#152888 opened May 6, 2025
UNSTABLE Lint / Link checks / Lint URLs / linux-job
#152884 opened May 6, 2025
Motivate Pytorch's forward mode AD APIs with training examples
#152877 opened May 5, 2025
Incorporate CUDA Memory Trimming Into DeviceCachingAllocator
#152875 opened May 5, 2025
Docs Update `wrap_triton`
#152870 opened May 5, 2025
DISABLED testAssertNotRegex (__main__.CPythonTest_Assertions)
#152869 opened May 5, 2025
Have WrapTriton work w/ `TRITON_INTERPRET=1` in eager
#152868 opened May 5, 2025
[dynamo] Improve final traceback frame format
#152867 opened May 5, 2025
inductor-periodic rocm tests failing since at least 4/10
#152866 opened May 5, 2025
torch.cuda.use_mem_pool is not thread safe
#152861 opened May 5, 2025
DISABLED test_comprehensive___rmul___cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152851 opened May 5, 2025
Clarification on build_stage usage with DistributedDataParallel: the code example in torch.distributed.piplining
#152849 opened May 5, 2025
Clean up CUTLASS_VERSION post cutlass version update
#152847 opened May 5, 2025
torch.distributions.Beta.entropy returns negative values
#152845 opened May 5, 2025
Don't hardcoded support for DTensor to_local/from_local/redistribute into dynamo
#152829 opened May 5, 2025
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 opened May 5, 2025
`mypy` stage of `lintrunner -a` has intermittent but continuing crashes
#152824 opened May 5, 2025
Performance Regression nightly 03/11→03/12, on nanogpt speedrun
#152823 opened May 5, 2025
TorchRun: Option to specify which GPUs to run on
#152822 opened May 5, 2025
Depthwise Separable Convolutions with Large Tensors (> 2**31) Elements) Fail Despite cuDNN 64-bit Indexing Support
#152816 opened May 5, 2025
Mismatch in dynamic quantization performance for torchao and torch.quantization
#152813 opened May 5, 2025
DISABLED test_comprehensive_fliplr_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152797 opened May 5, 2025
DISABLED test_comprehensive_rot90_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152796 opened May 5, 2025
DISABLED test_comprehensive_unbind_copy_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152795 opened May 5, 2025
DISABLED test_comprehensive_linalg_pinv_singular_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152793 opened May 5, 2025
DISABLED test_comprehensive_slice_scatter_cuda_bool (__main__.TestInductorOpInfoCUDA)
#152794 opened May 5, 2025
Inconsistent export behavior for nonzero+grid_sample between CUDA and CPU/MPS backends
#152791 opened May 4, 2025
[CXX11ABI] torch 2.6.0-cu126 and cu124 have different exported symbols
#152790 opened May 4, 2025
undefined symbol: __nvJitLinkCreate_12_8, version libnvJitLink.so.12
#152783 opened May 4, 2025
Segmentation fault (core dumped) in torch.nn.functional.alpha_dropout
#152777 opened May 4, 2025
RuntimeError: creation_meta == CreationMeta::DEFAULT INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/variable.cpp":224, please report a bug to PyTorch.
#152773 opened May 4, 2025
Cuda-12.9 removed libnvToolsExt.so.* and is now purely header nvtx3
#152756 opened May 3, 2025
Checkpoint sequential doesn't raise clear error when segments is greater than number of functions
#152752 opened May 3, 2025
Error on padding 0-sized tensors
#152750 opened May 3, 2025
torch.compile causes stride mismatch in SDPA with non-contiguous query in torch 2.7
#152747 opened May 3, 2025
[FSDP2] fully_shard(mesh=(shard, shard)) for intra and inter node all-gathers
#152746 opened May 3, 2025
[MPS] TensorIterator and accuracy
#152736 opened May 2, 2025
Inconsistent float16 overflow behavior between CPU and CUDA devices
#152731 opened May 2, 2025
When scoped_libary is destroyed the fake impls are not cleared
#152720 opened May 2, 2025
[Mergebot] Adding ciflow/pull in PR without pull and lint workflows
#152718 opened May 2, 2025
Cannot mask a DTensor
#152717 opened May 2, 2025
dtensors TP+DP issues
#152712 opened May 2, 2025
[FSDP2] NO_SHARD as fully_shard(mesh=(Replicate, Shard)) with shard of world size 1
#152710 opened May 2, 2025
Gradient can be backpropagated through only certain distributions
#152703 opened May 2, 2025
MPS internal assertion with jacfwd and concatenation
#152701 opened May 2, 2025
DISABLED test_2d_mlp_with_nd_mesh (__main__.TestFullyShardNDTraining)
#152700 opened May 2, 2025
[CI] [anaconda] Triton windows build
#152699 opened May 2, 2025
CI workflows being skipped on PR
#152697 opened May 2, 2025
torch._foreach_pow(DTensor, float) and torch._foreach_pow_(DTensor, float) do not work
#152696 opened May 2, 2025
Add sm_86 (Ampere) and sm_89 (Ada) SASS in aarch64 builds
#152690 opened May 2, 2025
torch.library.custom_op string support
#152685 opened May 2, 2025
DISABLED test_comprehensive_select_scatter_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152684 opened May 2, 2025
Flex attention strides
#152683 opened May 2, 2025
[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators
#152679 opened May 2, 2025
DISABLED AotInductorTest.BasicPackageLoaderTestCuda (build.bin.test_aoti_inference)
#152674 opened May 2, 2025
DISABLED test_comprehensive_std_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152673 opened May 2, 2025
DISABLED test_comprehensive_cummin_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152672 opened May 2, 2025
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152671 opened May 2, 2025
Torch BF16 group gemm hangs in backward pass - core issue isolated, needs proper resolution.
#152668 opened May 2, 2025
DISABLED test_comprehensive_nansum_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152666 opened May 2, 2025
Add explicit error message for def infer_size(a, b): that specificy that non broadcast path was picked due to unbacked existing in both inputs.
#152656 opened May 2, 2025
UNSTABLE docker-cache-mi300 / docker-cache
#152655 opened May 2, 2025
Check for if two tensors are overall similar instead of bitwise similar?
#152647 opened May 2, 2025
ProcessGroupGloo.allgather_into_tensor_coalesced crashes with CUDA tensors
#152645 opened May 1, 2025
TestFlexAttentionCUDA.test_GQA_score_mod7_cuda_float16 fails on h100
#152635 opened May 1, 2025
Incorrect strides for `nonzero_static` compilation
#152634 opened May 1, 2025
DISABLED test_torchvision_models_efficientnet_v2_l (__main__.TestVisionTracing)
#152632 opened May 1, 2025
[v2.7.1] Release Tracker
#152627 opened May 1, 2025
modded-nanogpt flaky NCCL hang starting 3/30 nightly
#152623 opened May 1, 2025
Pytorch Profiler crashes while using it with Pytorch Lightning module
#152617 opened May 1, 2025
Enable AOTI for Metal inductor
#152612 opened May 1, 2025
[triton pin update] Run Inductor CI on pin updates for Triton and the PyTorch nightly branch
#152608 opened May 1, 2025
Loops impacting output when utilizing hooks
#152607 opened May 1, 2025
AOTI regression on SAM and tts-angular
#152606 opened May 1, 2025
ROCm, 7900 XTX: Pytorch FLASH_ATTENTION SDPA is 2.5x slower than MATH (fp16, head_dim 256, seqlen 4360, 12 heads)
#152595 opened May 1, 2025
Flex Attention doesn't scale with custom bias
#152593 opened May 1, 2025
[ratter-build] Cannot detect CUDA when build from source
#152592 opened May 1, 2025
[Benchmark] High compilation time variance on benchmark dashboards
#152566 opened Apr 30, 2025
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 opened Apr 30, 2025
DISABLED test_pending_fusion_pro_and_epi (__main__.TestPrologueFusion)
#152560 opened Apr 30, 2025
DISABLED test_comprehensive_signal_windows_hamming_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152559 opened Apr 30, 2025
DISABLED test_comprehensive_amin_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152558 opened Apr 30, 2025
PGO does not work on jobs for frameworks that copy code to different dirs at different attempts.
#152555 opened Apr 30, 2025
MPS varying seq len SDPA memory leak
#152550 opened Apr 30, 2025
FakeTensorUpdater does not trace nodes correctly
#152548 opened Apr 30, 2025

534 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

auto functionalize base_hop
#151067 commented on May 6, 2025 • 22 new comments
[Accelerator] Fix Python typing in accelerator
#152394 commented on May 6, 2025 • 16 new comments
Random Batch Sampler Speedup
#147706 commented on May 6, 2025 • 14 new comments
[1/n][Optimus][Auto-AC] Support activation quantization without scaling
#148380 commented on May 1, 2025 • 12 new comments
[Intel GPU] Support f32 intermediate dtype, headdim size <=576 and f32 causal mask for SDPA
#152091 commented on May 7, 2025 • 11 new comments
convert guard_size_oblivious to runtime check in infer_size_impl
#148872 commented on May 6, 2025 • 11 new comments
Cache code generation during triton template expansion and enable it for mm_template.
#151773 commented on May 5, 2025 • 9 new comments
[cp] dispatch flex_attention to CP impl in TorchDispatchMode
#151497 commented on May 3, 2025 • 9 new comments
Mini tutorial for provenance tracking
#152211 commented on May 6, 2025 • 8 new comments
[Inductor-CPU] Faster int8 WoQ GEMM for small M with explicit prefetching
#149373 commented on May 7, 2025 • 8 new comments
[Cutlass] Integrate EVT into CUDACPPScheduling
#150906 commented on May 6, 2025 • 6 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on May 3, 2025 • 6 new comments
[WIP] DeadCodeEliminator Mark(block) improvement
#152348 commented on May 2, 2025 • 5 new comments
[Cutlass] Changes to gemm template for EVT
#150907 commented on May 7, 2025 • 5 new comments
[device_mesh] improve device selection logic
#150897 commented on May 1, 2025 • 4 new comments
[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices
#152341 commented on May 7, 2025 • 4 new comments
Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1
#149312 commented on May 6, 2025 • 3 new comments
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 commented on May 6, 2025 • 3 new comments
[Hierarchical Compilation] Use universal flatten APIs
#152505 commented on May 5, 2025 • 3 new comments
`torch.tensordot`: performance improvements when contracting to a scalar.
#145936 commented on May 5, 2025 • 3 new comments
Add differentiable ops hint message in Module docs
#150291 commented on May 6, 2025 • 2 new comments
[BE]: Follow detach().clone() pattern for SGD
#144468 commented on May 5, 2025 • 2 new comments
[Quant][X86] add ops to compute uint8 pointwise add/add_relu
#152411 commented on May 7, 2025 • 2 new comments
Temp test
#148424 commented on May 6, 2025 • 2 new comments
[CI][CUDA] Move cu118 distributed pull jobs to cu126, move cu124-sm75 to cu126-sm75
#151594 commented on May 2, 2025 • 2 new comments
[pytorch][triton] flex attention fwd kernel with TMA loads (#151923)
#152460 commented on May 7, 2025 • 2 new comments
cpu: enable gemm-bf16f32 for SDPA BF16
#140159 commented on May 7, 2025 • 2 new comments
remove guard_size_oblivious from unbind.
#148815 commented on May 2, 2025 • 2 new comments
[export] Refactor pt2 save/load
#152495 commented on May 6, 2025 • 2 new comments
autograd: Add VJP and JVP rules for aten::aminmax
#151186 commented on May 3, 2025 • 2 new comments
Horizontal
#151780 commented on May 7, 2025 • 1 new comment
Move prologue_supported_inputs computations to def_kernal
#150869 commented on May 2, 2025 • 1 new comment
[Graph Partition] Pass all cudagraph tree tests
#152048 commented on May 7, 2025 • 1 new comment
Move mps_linear forward to use MPS kernels directly instead of MPSGraph
#152210 commented on May 6, 2025 • 1 new comment
[dynamic shapes] guard_or_false for infer_size
#152146 commented on May 6, 2025 • 1 new comment
update get_default_device to also respect torch.device ctx manager
#148621 commented on May 6, 2025 • 1 new comment
[inductor] lowering for fractional_max_pool3d
#148630 commented on May 7, 2025 • 1 new comment
Fix `InstanceNorm` wrong suggestion in warning message
#151534 commented on May 2, 2025 • 1 new comment
Change unsafe_marked_cacheable_functions to a dictionary, so that you can specify a static cache key
#152486 commented on May 6, 2025 • 1 new comment
Parallelize sort using libstdc++ parallel mode
#150195 commented on May 1, 2025 • 1 new comment
add device generalisation support for distributed tests
#152471 commented on May 2, 2025 • 1 new comment
[WIP][dynamic shapes] rewrite should_swap with guard_or_false
#150164 commented on May 2, 2025 • 1 new comment
[aotd] Support saved tensors hooks in aot_autograd
#150032 commented on May 6, 2025 • 1 new comment
Deprecate DataLoader pin_memory_device param
#146821 commented on May 6, 2025 • 1 new comment
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#150726 commented on May 2, 2025 • 1 new comment
Make `Adam`, `AdamW` work with nonzero-dim Tensor betas
#149939 commented on May 3, 2025 • 1 new comment
Add is_pinned to host allocator
#151439 commented on May 6, 2025 • 1 new comment
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on May 7, 2025 • 1 new comment
Skip fuse attention on fp32 if not tf32
#151924 commented on May 5, 2025 • 1 new comment
Use gather in index_select
#151715 commented on May 7, 2025 • 1 new comment
Inductor logging + analysis of torch.profile
#149697 commented on May 3, 2025 • 1 new comment
[ca] mark scalar int sizes as dynamic via tensor wrapping
#151731 commented on May 7, 2025 • 1 new comment
[DTensor] enable SimpleFSDP's composability with Tensor Parallel
#152286 commented on May 6, 2025 • 1 new comment
Rewrite autograd producer consumer stream sync logic
#151079 commented on May 7, 2025 • 1 new comment
Fix CUPTI lookup to include target directory
#148668 commented on May 5, 2025 • 0 new comments
[SGD] Add SGD capturable API and tests
#148647 commented on May 5, 2025 • 0 new comments
[dynamic shapes] guard_or_false for computeStorageNbytes
#150483 commented on May 6, 2025 • 0 new comments
Adjust CMake code for Eigen
#148628 commented on May 7, 2025 • 0 new comments
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 commented on May 2, 2025 • 0 new comments
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on May 5, 2025 • 0 new comments
[triton hash update] update the pinned triton hash
#148492 commented on May 7, 2025 • 0 new comments
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on May 2, 2025 • 0 new comments
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on May 2, 2025 • 0 new comments
Remove `torch.testing` from `MOD_SKIPLIST`
#148459 commented on May 5, 2025 • 0 new comments
Add 'x in {...}' patterns to perf_linter
#148417 commented on May 4, 2025 • 0 new comments
Add perf_linter to auto-fix some anti-patterns
#148416 commented on May 3, 2025 • 0 new comments
softmax: add device check for xpu with half_to_float
#150278 commented on May 7, 2025 • 0 new comments
Add cmake variable USE_ROCM_CK
#150245 commented on May 6, 2025 • 0 new comments
[c10d] Test multiple CUDA Graph captures
#150040 commented on May 7, 2025 • 0 new comments
Fixes detection of ArmPL on Linux platform
#150031 commented on May 7, 2025 • 0 new comments
AOTI freezing: fix test issues and enable by default
#149961 commented on May 2, 2025 • 0 new comments
[inductor] Add typing to _inductor/ir.py
#149958 commented on May 7, 2025 • 0 new comments
Enable XPU distributed test for PT2.8
#149916 commented on May 5, 2025 • 0 new comments
Refactoring FSDP2 (_composable/fsdp) test cases to be device agnostic
#149848 commented on May 5, 2025 • 0 new comments
Add SWA with a cyclical scheduler example
#149847 commented on May 2, 2025 • 0 new comments
Add x86-simd-sort accelerated sorting
#149362 commented on May 7, 2025 • 0 new comments
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on May 1, 2025 • 0 new comments
Make Subset dataset a true wrapper
#149272 commented on May 5, 2025 • 0 new comments
[Easy] update pip sources for CUDA in nightly pull tool
#149143 commented on May 1, 2025 • 0 new comments
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on May 7, 2025 • 0 new comments
[test] bigger runnner
#149003 commented on Apr 30, 2025 • 0 new comments
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging
#148981 commented on May 6, 2025 • 0 new comments
Move token linter code into tools/linter/adaptors/_linter/
#148959 commented on May 5, 2025 • 0 new comments
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on May 7, 2025 • 0 new comments
C++ support to print symbolic tensors as `Symbolic tensor: size=(...)`
#148846 commented on May 4, 2025 • 0 new comments
Set specialized representation string for meta/fake tensor with empty construction
#148794 commented on May 7, 2025 • 0 new comments
cpp_wrapper: build non-performance-sensitive code at O1
#148773 commented on May 2, 2025 • 0 new comments
Trunk workflow for Windows Arm64
#148753 commented on May 2, 2025 • 0 new comments
Fix calling torch.compile inside of a `__torch_dispatch__`
#148712 commented on May 6, 2025 • 0 new comments
[Just SCRTCH] no review
#148710 commented on May 6, 2025 • 0 new comments
Automated perf_linter changes: x in (...)
#148415 commented on May 4, 2025 • 0 new comments
Define USE_C10D_XCCL and USE_XCCL in pytorch
#147593 commented on May 7, 2025 • 0 new comments
[import][inductor] Simplify grid handling
#147583 commented on May 3, 2025 • 0 new comments
[ONNX][demo] Rotary embedding
#147576 commented on May 4, 2025 • 0 new comments
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on Apr 30, 2025 • 0 new comments
removed zero dim cpu logic from fake_tensor.py
#147501 commented on May 1, 2025 • 0 new comments
[test] sccache log
#147470 commented on May 6, 2025 • 0 new comments
[Inductor] Avoid tensor slice overflow for large step
#147433 commented on May 6, 2025 • 0 new comments
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on May 6, 2025 • 0 new comments
Add the memory and dispatch to the logging module.
#147262 commented on May 3, 2025 • 0 new comments
Fix clang-tidy warnings in torch/jit
#147253 commented on May 2, 2025 • 0 new comments
logging: close handler after removing it
#147235 commented on May 4, 2025 • 0 new comments
Record the XPU and XCCL build settings in the compiled binary
#147161 commented on May 7, 2025 • 0 new comments
[fsdp] add an experimental allocator hook for buffers that participate in collective communication
#147146 commented on May 5, 2025 • 0 new comments
fake_tensor: Handle op errors more gracefully
#147049 commented on May 7, 2025 • 0 new comments
experimental proposal DCP v2
#146999 commented on May 2, 2025 • 0 new comments
[BE]: Try to remove unused type ignores - attempt 1
#146989 commented on May 2, 2025 • 0 new comments
[DO NOT MERGE][cuDNN][SDPA] Testing sm90/sm100 priority for cuDNN SDPA
#146947 commented on May 6, 2025 • 0 new comments
Support pin_memory() during CUDA stream capture.
#146924 commented on May 7, 2025 • 0 new comments
[DO NOT MERGE] ROCm sandbox PR
#146903 commented on May 7, 2025 • 0 new comments
Fix non-bitwise type annotations for Tensor operators (see #145838)
#146845 commented on May 1, 2025 • 0 new comments
Enable Windows tests
#146695 commented on May 4, 2025 • 0 new comments
Optimize LRScheduler docs
#146684 commented on May 2, 2025 • 0 new comments
[HOP] Mutation and alias rework
#146658 commented on May 7, 2025 • 0 new comments
[WIP] BaseSubclass
#146612 commented on May 5, 2025 • 0 new comments
clang-format CUDASymmetricMemory.cu
#146592 commented on May 5, 2025 • 0 new comments
Update quantile doc
#146485 commented on May 2, 2025 • 0 new comments
[WIP][dynamic shapes] mark backed size symbols as size-like
#146335 commented on May 5, 2025 • 0 new comments
[dcp] Minor improvements to filesystem writer
#146273 commented on May 6, 2025 • 0 new comments
Format tests by PYFMT
#146267 commented on May 2, 2025 • 0 new comments
docs: change log to ln in Softplus function and class
#146199 commented on May 2, 2025 • 0 new comments
Automated perf_linter changes: list constructors
#148414 commented on May 4, 2025 • 0 new comments
Automated perf_linter changes: generators
#148413 commented on May 3, 2025 • 0 new comments
Add api info for torch._C._nn.pyi [1/N]
#148410 commented on May 3, 2025 • 0 new comments
Enable `_lazy_clone` between CPU and MPS
#148408 commented on May 1, 2025 • 0 new comments
[Utilization] Add utilization monitor for linux build
#148375 commented on May 6, 2025 • 0 new comments
test index_put
#148357 commented on May 3, 2025 • 0 new comments
[ROCm][CI] Add support for gfx1100 in rocm workflow + test skips
#148355 commented on May 7, 2025 • 0 new comments
[pytree] simplify public API exposition with `__module__`
#148328 commented on May 3, 2025 • 0 new comments
Checking for cuda version to see if bf16 is natively supported or emulated
#148322 commented on May 6, 2025 • 0 new comments
```torch.as_strided``` negative stride SIGSEV fix when using ```torch.compile```
#148301 commented on May 3, 2025 • 0 new comments
Treat CUDA warnings as errors
#148294 commented on May 3, 2025 • 0 new comments
handle jk for emulation runs
#148240 commented on May 2, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 commented on May 1, 2025 • 0 new comments
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on May 3, 2025 • 0 new comments
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(ZI)
#148173 commented on May 4, 2025 • 0 new comments
Disable cudnn to avoid creating guards that denies exporting
#148140 commented on May 4, 2025 • 0 new comments
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 commented on May 5, 2025 • 0 new comments
[torch/elastic][upstream] Fix the wrong order when start_index is not 0
#147967 commented on May 4, 2025 • 0 new comments
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Zi)
#147917 commented on May 2, 2025 • 0 new comments
[Draft] Enable cpu_offload for _distribute_state_dict
#147916 commented on May 2, 2025 • 0 new comments
[export][dynamic shapes] add Dim._OBLIVIOUS, _mark_oblivious()
#147881 commented on May 2, 2025 • 0 new comments
Set disable_clone=True when running opt_gm
#147845 commented on May 2, 2025 • 0 new comments
Use /permissive- for torch libraries in MSVC builds
#147825 commented on May 4, 2025 • 0 new comments
remove asserttion in expand_to_full_mesh_op_strategy
#147823 commented on May 6, 2025 • 0 new comments
[WIP][ptd][nccl] use current-stream as nccl-stream under async=False mode
#147820 commented on May 2, 2025 • 0 new comments
[cuda] Added a correctness test for layernorm backwards
#147763 commented on May 5, 2025 • 0 new comments
[DCP][OSS] Rank local checkpointing in DCP without collectives
#147758 commented on Apr 30, 2025 • 0 new comments
Modifications to RuntimeEstimator and SACEstimator
#147750 commented on May 6, 2025 • 0 new comments
Skip test_dtypes xpu test on bmm and addbmm
#147721 commented on May 3, 2025 • 0 new comments
[Dtensor] Pass device information in OffsetBasedRNGTracker
#147594 commented on May 4, 2025 • 0 new comments
Use std::apply for CPU code
#152526 commented on May 2, 2025 • 0 new comments
Add `padding="same"` for transposed convolution
#152228 commented on May 5, 2025 • 0 new comments
Add support for torch.cuda.FloatTensor()
#152208 commented on May 1, 2025 • 0 new comments
[submodule] Update ONNX to 1.18
#152200 commented on May 7, 2025 • 0 new comments
[inductor] propagate shapes in CSEVariable
#152198 commented on May 7, 2025 • 0 new comments
IGNORE: Testing OIDC
#152181 commented on May 1, 2025 • 0 new comments
Extend compute_global_tensor_shape to multi dimension sharding
#152166 commented on May 3, 2025 • 0 new comments
Add dynamo config to HOP-ify context managers
#152159 commented on Apr 30, 2025 • 0 new comments
Add runtime asserts to AOTI
#152125 commented on May 6, 2025 • 0 new comments
[dynamo][ca] support dynamic annotations on tensors in ListVariables/TupleVariables
#152119 commented on May 7, 2025 • 0 new comments
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on May 5, 2025 • 0 new comments
Switch to standard pep517 sdist generation
#152098 commented on May 6, 2025 • 0 new comments
unbreak fb:operator_benchmark_test
#152049 commented on May 1, 2025 • 0 new comments
[map] always turn on dynamo for map
#152041 commented on May 6, 2025 • 0 new comments
Add CPython complex tests
#152015 commented on May 6, 2025 • 0 new comments
[Kineto] Upgrade the kineto commit to fb36cce
#152007 commented on May 7, 2025 • 0 new comments
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on May 6, 2025 • 0 new comments
Add torchcheck for replication_pad3d_backward
#151986 commented on May 2, 2025 • 0 new comments
Make `aten.embedding` do not wrap negative index
#151967 commented on May 5, 2025 • 0 new comments
[ca] hide unused scalar int sizes from dynamo
#151962 commented on May 7, 2025 • 0 new comments
[ROCm][CI] Update dockerfile to use centos9
#151929 commented on May 6, 2025 • 0 new comments
[BE] Upgrade XPU support package to 2025.1 in CICD
#151899 commented on May 7, 2025 • 0 new comments
Avoid differing results in `linalg.(tensor_)solve`
#151896 commented on May 6, 2025 • 0 new comments
[aot][ca] save bw_module in AOTAutogradCache
#151860 commented on May 7, 2025 • 0 new comments
[reland][ROCm] remove caffe2 from hipify
#151845 commented on May 6, 2025 • 0 new comments
[2/n][Optimus][Auto-AC] Support activation quantization with scaling
#151770 commented on May 1, 2025 • 0 new comments
Add adaptive_avg_pool2d input and output_size check
#151769 commented on May 2, 2025 • 0 new comments
[Don't merge] Upgrade oneDNN to v3.8 for XPU build
#151767 commented on May 7, 2025 • 0 new comments
Implement avg_pool3d for MPS backend
#151742 commented on May 6, 2025 • 0 new comments
[ROCm] Maxpool forward NHWC Perf Improvement targeting Resnet scenarios
#151727 commented on May 7, 2025 • 0 new comments
elastic: do not shutdown rendezvous on leaving workers
#152525 commented on May 2, 2025 • 0 new comments
[compile async] [cache] testing
#152523 commented on May 7, 2025 • 0 new comments
[inductor] [compile async] Don't compile in eager
#152507 commented on May 6, 2025 • 0 new comments
[Hierarchical Compile] Take into account mutation deps in cycle detection
#152506 commented on May 1, 2025 • 0 new comments
Fix flaky test in test_custom_ops
#152484 commented on May 6, 2025 • 0 new comments
[IR] Input Adapter refactor prototype
#152459 commented on Apr 30, 2025 • 0 new comments
fix: Update padding_mode to use Literal for type checking
#152458 commented on May 2, 2025 • 0 new comments
Add epoch to fake tensor cache key
#152453 commented on May 2, 2025 • 0 new comments
[ROCm] cpp_extension allow user to override default flags
#152432 commented on May 4, 2025 • 0 new comments
Relax tolerance for test_quick_baddbmm_cpu_complex64
#152424 commented on May 6, 2025 • 0 new comments
[Inductor][CPP] Enable vectorized fp8 quant dequant
#152418 commented on Apr 30, 2025 • 0 new comments
[Hierarchical Compile] Add mutation dependencies to topological sorting
#152410 commented on May 1, 2025 • 0 new comments
[Hierarchical Compilation] Track node mutations
#152389 commented on May 1, 2025 • 0 new comments
Add vec_reduce_all specialization for std::plus on AArch64
#152388 commented on Apr 30, 2025 • 0 new comments
fix: outdated contents in dynamo overview
#152382 commented on May 6, 2025 • 0 new comments
complex.pow(2) on GPU by replacing with complex * complex to avoid numerical instability
#152373 commented on May 2, 2025 • 0 new comments
[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up
#152372 commented on May 1, 2025 • 0 new comments
vec::map: directly process reduced-precision floats when reasonable
#152366 commented on Apr 30, 2025 • 0 new comments
add is_vec_specialized_for
#152365 commented on May 2, 2025 • 0 new comments
Format all headers under ATen/cpu/vec, not just top-level
#152364 commented on May 6, 2025 • 0 new comments
Add codeowner for merge rules
#152354 commented on May 6, 2025 • 0 new comments
[inductor][dynamo] Include operator name in size/stride/alignment assertion
#152353 commented on May 7, 2025 • 0 new comments
[cp] dispatch flex_attention_backward to CP impl in TorchDispatchMode
#152311 commented on May 2, 2025 • 0 new comments
Enable the AMP precision with freezing for CPU nightly test
#152298 commented on May 6, 2025 • 0 new comments
[CI] Add xpu inductor test into periodic workflow
#152281 commented on May 7, 2025 • 0 new comments
[ROCm] Maxpool backward NHWC Perf Improvement targeting Resnet scenarios
#152267 commented on Apr 30, 2025 • 0 new comments
[executorch hash update] update the pinned executorch hash
#152238 commented on May 7, 2025 • 0 new comments
At least one of ROCM_HOME or CUDA_HOME must be None
#152236 commented on Apr 30, 2025 • 0 new comments
_get_total_norm should use float64 to avoid rounding errors
#152234 commented on May 1, 2025 • 0 new comments
flex attention: fix dispatch order for tensor subclasses, avoid hardcoding call to faketensor impl in dynamo
#151719 commented on May 2, 2025 • 0 new comments
Do not cover up `__dunder`__ method type-hints from `.pyi` file
#150875 commented on May 6, 2025 • 0 new comments
Add CPython tests for iter/sort
#150797 commented on May 6, 2025 • 0 new comments
Add CPython generator/contextlib tests
#150796 commented on May 6, 2025 • 0 new comments
Add CPython int/float tests
#150795 commented on May 6, 2025 • 0 new comments
Add CPython math/cmath tests
#150794 commented on May 6, 2025 • 0 new comments
Add CPython string tests
#150793 commented on May 6, 2025 • 0 new comments
[Set] Add CPython set tests
#150792 commented on May 7, 2025 • 0 new comments
Add CPython dict tests
#150791 commented on May 6, 2025 • 0 new comments
Add CPython list/tuple tests
#150790 commented on May 6, 2025 • 0 new comments
Add CPython exception tests
#150789 commented on May 6, 2025 • 0 new comments
Add CPython tests for unittest
#150788 commented on May 6, 2025 • 0 new comments
Make device check error message more descriptive
#150750 commented on May 7, 2025 • 0 new comments
[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files
#150732 commented on May 2, 2025 • 0 new comments
[BE] Resolve lint errors in `.pyi` stub files
#150731 commented on May 2, 2025 • 0 new comments
[BE] Ensure generated stub files by `gen_pyi` are properly formatted
#150730 commented on May 2, 2025 • 0 new comments
[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#150729 commented on May 2, 2025 • 0 new comments
[BE] Update `.pyi` stub template to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150728 commented on May 2, 2025 • 0 new comments
[torchgen] Refactor and simplify `gen_pyi.py` to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150727 commented on May 2, 2025 • 0 new comments
Avoid overwriting COW data in MPS code
#150721 commented on May 2, 2025 • 0 new comments
[export] add runtime assert messages to python torch checks
#150719 commented on May 6, 2025 • 0 new comments
Support XPU in memory tracker
#150703 commented on May 7, 2025 • 0 new comments
[draft][distributed] add into 3d composability test at AMD CI test
#150694 commented on May 6, 2025 • 0 new comments
AOTI: add all fallback ops that are missing from C-shim
#150673 commented on May 2, 2025 • 0 new comments
[Inductor] Fix CUDA memory usage for CPU only compile
#150669 commented on May 2, 2025 • 0 new comments
Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen`
#150626 commented on May 2, 2025 • 0 new comments
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on May 2, 2025 • 0 new comments
fix dynamic shapes for kwargs
#150583 commented on Apr 30, 2025 • 0 new comments
Enable lazy cloning in `Tensor.to` between CPU and MPS
#150569 commented on May 2, 2025 • 0 new comments
API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+
#150536 commented on May 5, 2025 • 0 new comments
Add a custom profiler configuration option
#151656 commented on May 6, 2025 • 0 new comments
[Intel GPU] Use user-friendly err msg in mm
#151655 commented on May 7, 2025 • 0 new comments
[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU
#151637 commented on May 7, 2025 • 0 new comments
Add OIDC perms to windows-[build|test] workflows
#151596 commented on May 1, 2025 • 0 new comments
Add OIDC permissions to linux-test workflow
#151585 commented on May 1, 2025 • 0 new comments
Add OIDC permissions to linux-build workflow
#151581 commented on May 1, 2025 • 0 new comments
[bazel] Fix unusual reference to cpuinfo workspace
#151578 commented on May 5, 2025 • 0 new comments
Add device agnostic support for distributed tests
#151560 commented on May 6, 2025 • 0 new comments
Update OpenBLAS commit
#151547 commented on May 6, 2025 • 0 new comments
Add hint message when parameters is empty in clip_grad_norm_
#151529 commented on May 6, 2025 • 0 new comments
Allow to byteswap data when reading saved torch jit data
#151447 commented on May 1, 2025 • 0 new comments
[ez] Rewrite comment to be more friendly to non haskellers
#151421 commented on May 4, 2025 • 0 new comments
[Cutlass] Add epilogue inputs/outputs to def_kernel
#151406 commented on May 5, 2025 • 0 new comments
[ROCm] Upgrade ROCm CI to ROCm6.4
#151368 commented on May 7, 2025 • 0 new comments
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on May 6, 2025 • 0 new comments
[WIP] Generalize device caching allocator
#151298 commented on May 6, 2025 • 0 new comments
Remove outdated Android workarounds of nearbyintf
#151292 commented on May 3, 2025 • 0 new comments
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on May 6, 2025 • 0 new comments
[aot autograd][logging] Profile large missing gaps in compile time tracing
#151256 commented on May 6, 2025 • 0 new comments
NCCL: Fix cmake file when cross compiling.
#151234 commented on May 5, 2025 • 0 new comments
[dynamo] keep C++ symbolic shape guards disabled for benchmarks
#151225 commented on May 1, 2025 • 0 new comments
Implement MKLGenerator
#151218 commented on May 1, 2025 • 0 new comments
Update slow tests
#151207 commented on May 5, 2025 • 0 new comments
[dynamo] Prevent lazy variable realization on STORE_FAST
#151184 commented on May 2, 2025 • 0 new comments
TESTING: IGNORE
#151116 commented on May 1, 2025 • 0 new comments
Update auto-tuning support for _scaled_grouped_mm
#150944 commented on May 6, 2025 • 0 new comments
[CI] Enable XCCL in XPU CI build
#150927 commented on May 7, 2025 • 0 new comments
fix shard tensor gather when a local tensor on certain ranks has zero elements
#150914 commented on May 7, 2025 • 0 new comments
[torch.compile] handle a custom __delattr__ method correctly
#150899 commented on May 5, 2025 • 0 new comments
An exported onnx model can't reduce on dim with value of 0 if 'keepdims' is false
#66200 commented on May 4, 2025 • 0 new comments
Install pytorch from pypi using local CUDA build
#150742 commented on May 4, 2025 • 0 new comments
pytorch pip install instructions: always include the cuda index
#150432 commented on May 4, 2025 • 0 new comments
[compile] DDPOptimizer + activation checkpointing not supported
#104674 commented on May 4, 2025 • 0 new comments
[ROCm] PyTorch slow on TTS
#150168 commented on May 4, 2025 • 0 new comments
torch.lobpcg producing different largest eigenvalue than scipy and np.linalg.eig
#101075 commented on May 4, 2025 • 0 new comments
Inconsistent `sum`/`dot`/`norm` behavior
#151761 commented on May 5, 2025 • 0 new comments
Logging when executing fx.Interpreter
#117351 commented on May 5, 2025 • 0 new comments
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151661 commented on May 5, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8 (__main__.TestForeachCUDA)
#150407 commented on May 5, 2025 • 0 new comments
[C10D] Allow NCCL single P2P ops to use parent/collective communicator
#152220 commented on May 5, 2025 • 0 new comments
Unexpected behavior when using dist.all_reduce(x, op=dist.ReduceOp.SUM)
#152300 commented on May 5, 2025 • 0 new comments
can't reconstruct the communication group using PyTorch.
#152527 commented on May 5, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex64 (__main__.TestForeachCUDA)
#150960 commented on May 5, 2025 • 0 new comments
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151675 commented on May 5, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on May 5, 2025 • 0 new comments
DISABLED test_comprehensive_lu_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152520 commented on May 5, 2025 • 0 new comments
The 2.7.0 release tarball is missing `.ci/docker/ci_commit_pins/nccl-cu12.txt` required for building
#152532 commented on May 1, 2025 • 0 new comments
Newly added lint-urls jobs are very flaky
#152439 commented on May 2, 2025 • 0 new comments
associative_scan not composable with vmap in eager-mode
#134000 commented on May 2, 2025 • 0 new comments
Multiple Learning Rate Scheduler for Specific Parameters Groups
#101082 commented on May 2, 2025 • 0 new comments
Adam (fused=True) issues
#90752 commented on May 2, 2025 • 0 new comments
Unwanted Warning in lr_scheduler.step()
#117540 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_bitwise_right_shift_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152057 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_nansum_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#139710 commented on May 2, 2025 • 0 new comments
AdamW(fused=True) slower than unfused AdamW
#121857 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_floor_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152058 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_nansum_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140693 commented on May 2, 2025 • 0 new comments
`einsum` is about 40x slower on CUDA than manually multiplying and summing
#101249 commented on May 2, 2025 • 0 new comments
torch.nn.functional.ctc_loss raises cuDNN error in PyTorch versions >=2.5.0
#152421 commented on May 3, 2025 • 0 new comments
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on May 3, 2025 • 0 new comments
A problem discovered when computing complex matrices in deep neural networks
#151182 commented on May 3, 2025 • 0 new comments
[dynamic shapes] data-dependent error when backed + unbacked expression resolves statically
#151491 commented on May 3, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on May 4, 2025 • 0 new comments
[dynamo] guard code generation triggers attribute error on DeviceMesh object
#152447 commented on May 5, 2025 • 0 new comments
[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition
#151478 commented on May 5, 2025 • 0 new comments
[inductor] [aot] `torch.linalg.lu` can't accept `slice operation`, behaving differently with eager
#151401 commented on May 5, 2025 • 0 new comments
[dynamo] Try tracing into einops
#152480 commented on May 5, 2025 • 0 new comments
[dynamo] `torch.compile` prevents fsdp warning from getting generated
#152451 commented on May 5, 2025 • 0 new comments
`torch.compile` causes assertion error in distributed checkpoint wrapper test
#152442 commented on May 5, 2025 • 0 new comments
[inductor] Improve codegen for argmax+max
#146643 commented on May 5, 2025 • 0 new comments
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 commented on May 5, 2025 • 0 new comments
[ued] Slow start up time for `torch.compile` on GGUF Auraflow
#150706 commented on May 5, 2025 • 0 new comments
[inductor] [assertion error] `torch.select_scatter` crashes on inductor but passes on eager
#151296 commented on May 5, 2025 • 0 new comments
DISABLED AotInductorTest.FreeInactiveConstantBufferRuntimeConstantFoldingCuda (build.bin.test_aoti_inference)
#150299 commented on May 5, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float16 (__main__.TestForeachCUDA)
#150985 commented on May 5, 2025 • 0 new comments
Cannot override __add__ in NamedTuple with __new__ + torch.compile
#133762 commented on May 5, 2025 • 0 new comments
`randint(max)` causes a graph break, but not `rand().mul(max).floor().to(torch.long)` (on CPU)
#135664 commented on May 5, 2025 • 0 new comments
Investigate FlexAttention performance degradation on low precision inputs
#147336 commented on May 5, 2025 • 0 new comments
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on May 6, 2025 • 0 new comments
upstream `apex.normalization.FusedRMSNorm`
#72643 commented on May 5, 2025 • 0 new comments
[XPU] Upgrade the XPU support packages version to 2025.1 in CI/CD
#151097 commented on May 5, 2025 • 0 new comments
RFC: Torch Native Runtime
#152034 commented on May 5, 2025 • 0 new comments
[TensorDict - compile] dynamo generator compatibility
#129658 commented on May 5, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32 (__main__.TestForeachCUDA)
#149409 commented on May 5, 2025 • 0 new comments
[discussion] "TensorList" as first-class python-world abstraction (to mirror existing dispatcher's `TensorList` and `torch::autograd::tensor_list`) and as key for dispatch for merging `torch._foreach_*` into regular `torch.*` functions
#100249 commented on May 5, 2025 • 0 new comments
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on May 5, 2025 • 0 new comments
Illegal Instruction Caused by `grid_sample` Under Windows
#152385 commented on May 5, 2025 • 0 new comments
optree package status in PyTorch
#152535 commented on May 5, 2025 • 0 new comments
DISABLED test_pattern_matcher_multi_user_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134433 commented on May 5, 2025 • 0 new comments
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151712 commented on May 5, 2025 • 0 new comments
Update quantization to make source files complient with /Zc:lambda
#92600 commented on May 5, 2025 • 0 new comments
Stop special-casing einops in Dynamo
#142486 commented on May 5, 2025 • 0 new comments
DISABLED test_comprehensive_index_select_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152416 commented on May 5, 2025 • 0 new comments
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152469 commented on May 5, 2025 • 0 new comments
DISABLED test_comprehensive_polygamma_polygamma_n_1_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152470 commented on May 5, 2025 • 0 new comments
DISABLED test_comprehensive_repeat_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152500 commented on May 5, 2025 • 0 new comments
[Async TP] all-gather-matuls not fusing properly when rowwise scales are used
#149990 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16 (__main__.TestForeachCUDA)
#150173 commented on May 1, 2025 • 0 new comments
Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)
#145374 commented on May 1, 2025 • 0 new comments
Ability to do aot/inductor compilation from a jit model (or torch.exported model)
#127928 commented on May 1, 2025 • 0 new comments
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on May 1, 2025 • 0 new comments
Update `torch/nn/modules/conv.py` to use Literal for support padding modes
#152280 commented on May 1, 2025 • 0 new comments
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on May 1, 2025 • 0 new comments
`nn.CrossEntropyLoss` accepts negative target probabilities
#152437 commented on May 1, 2025 • 0 new comments
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32 (__main__.TestForeachCUDA)
#150208 commented on May 1, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128 (__main__.TestForeachCUDA)
#149323 commented on May 1, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#150933 commented on May 1, 2025 • 0 new comments
DTensor slicing on sharded dimension leads to replication
#149447 commented on May 1, 2025 • 0 new comments
Status of pip wheels with _GLIBCXX_USE_CXX11_ABI=1
#51039 commented on May 1, 2025 • 0 new comments
Set `size` when `is_coalesced` is set in `torch.sparse_coo_tensor()`
#145371 commented on May 2, 2025 • 0 new comments
[ROCm] sdpa group query attention bf16 numeric error
#139352 commented on Apr 30, 2025 • 0 new comments
Profiler doesn't seem to work on AMD CPUs
#150052 commented on Apr 30, 2025 • 0 new comments
MPS: Conv1d fails with NotImplementedError for output_channels > 65536
#152278 commented on Apr 30, 2025 • 0 new comments
[ROCm] MI300X FP8 scaled_mm is extremely slow on nightly
#143465 commented on Apr 30, 2025 • 0 new comments
torch.compile on MPS progress tracker
#150121 commented on Apr 30, 2025 • 0 new comments
NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCUDA' backend.
#152226 commented on Apr 30, 2025 • 0 new comments
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on Apr 30, 2025 • 0 new comments
enhance documentation around the developer build
#108406 commented on Apr 30, 2025 • 0 new comments
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on Apr 30, 2025 • 0 new comments
missing docs for torch.Tag
#126518 commented on Apr 30, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128 (__main__.TestForeachCUDA)
#150141 commented on Apr 30, 2025 • 0 new comments
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on May 1, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on May 1, 2025 • 0 new comments
DISABLED test_comprehensive_nanmean_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140339 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64 (__main__.TestForeachCUDA)
#150161 commented on May 1, 2025 • 0 new comments
[Tracker] Nested tensor op coverage requests
#118107 commented on May 1, 2025 • 0 new comments
Signature should be extended for `torch.hamming_window()`
#146590 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_special_xlog1py_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140648 commented on May 2, 2025 • 0 new comments
DISABLED AotInductorTest.FreeInactiveConstantBufferCuda (build.bin.test_aoti_inference)
#149495 commented on May 2, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 commented on May 2, 2025 • 0 new comments
[inductor] nan_asserts doesn't work for FP8, "RuntimeError: "isinf" not implemented for 'Float8_e4m3fn'"
#149002 commented on May 2, 2025 • 0 new comments
Major perf regression with `BatchNorm2d` + `torch.compile` with `reduce-overhead` + DDP
#139207 commented on May 2, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64 (__main__.TestForeachCUDA)
#149199 commented on May 2, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 commented on May 2, 2025 • 0 new comments
module.cuda() doesn't work under FakeTensorMode
#148977 commented on May 2, 2025 • 0 new comments
Support SDPA flash attention/ memory efficant attn on ROCm gfx908
#141958 commented on May 2, 2025 • 0 new comments
[CI] [anaconda] CI Build and Test scripts Windows
#148338 commented on May 2, 2025 • 0 new comments
Continuous calls to nn.Linear in fp32 on the 5090D cause severe performance degradation
#150725 commented on May 2, 2025 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on May 2, 2025 • 0 new comments
Pytorch DDP across nodes: self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected
#114357 commented on May 2, 2025 • 0 new comments
[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`
#125198 commented on May 2, 2025 • 0 new comments
fully_shard() for huggingface 72B model: pytorch caches too much GPU memory
#151936 commented on May 2, 2025 • 0 new comments
Dynamo unsupported: dynamic padding
#123855 commented on May 1, 2025 • 0 new comments
When using torch to convert to oxxn model, testing the inference results with actual images shows tensor mismatch
#152097 commented on May 1, 2025 • 0 new comments
dynamo cannot trace global op_set .__contains__
#145761 commented on May 1, 2025 • 0 new comments
[RFC] : Dynamically Quantized 8-bit Matrix Multiplication support
#149500 commented on May 1, 2025 • 0 new comments
LoadHIP.cmake should find_package(composable_kernel)
#149809 commented on May 1, 2025 • 0 new comments
`view()` + modify-in-place fails silently with DTensor
#147570 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on May 1, 2025 • 0 new comments
Context Parallel -- unsharded output doesn't match output without CP.
#152261 commented on May 1, 2025 • 0 new comments
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on May 1, 2025 • 0 new comments
The state of sparse Tensors
#9674 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64 (__main__.TestForeachCUDA)
#150298 commented on May 2, 2025 • 0 new comments
Add description of several params in the basic usage of `torch.min()`, `torch.max()`, `torch.all()` and `torch.any()`
#152176 commented on May 2, 2025 • 0 new comments
Raise an Error when File Not Found in `torch.jit.load()`
#152178 commented on May 2, 2025 • 0 new comments
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on May 2, 2025 • 0 new comments
DISABLED test_remove_noop_view_dtype_cpu (__main__.CpuTests)
#151540 commented on May 2, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16 (__main__.TestForeachCUDA)
#150309 commented on May 2, 2025 • 0 new comments
[NCCL] Unordered destruction of `ProcessGroupNCCL` no longer supported
#137507 commented on May 2, 2025 • 0 new comments
[inductor] enable bf32 test for mkldnn conv
#127293 commented on May 7, 2025 • 0 new comments
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on May 7, 2025 • 0 new comments
[1/N] Update CI jobs to use CMake >= 3.25
#130522 commented on May 6, 2025 • 0 new comments
Make IPC features extendable on third-party devices
#133222 commented on May 6, 2025 • 0 new comments
add ranking for grouped benchmarks
#133287 commented on May 2, 2025 • 0 new comments
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 commented on May 6, 2025 • 0 new comments
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on May 6, 2025 • 0 new comments
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on May 2, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on May 5, 2025 • 0 new comments
[WIP] Add DeviceAllocator as the base device allocator
#138222 commented on May 6, 2025 • 0 new comments
Always produce XML
#138513 commented on May 6, 2025 • 0 new comments
[cuDNN] Add an option to force cuDNN usage (incl. SDPA)
#139699 commented on May 5, 2025 • 0 new comments
Fix warnings and simplify code in TensorShape
#141971 commented on May 4, 2025 • 0 new comments
Fix platform detection in MKLDNN CMake file
#142067 commented on May 2, 2025 • 0 new comments
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on May 7, 2025 • 0 new comments
Replacing explicit backend search with api call
#144944 commented on May 5, 2025 • 0 new comments
Wrong formula for CosineAnnealingLR
#152081 commented on May 6, 2025 • 0 new comments
`Aborted (core dumped)` in `torch.cuda.nccl.reduce`
#150836 commented on May 7, 2025 • 0 new comments
[RFC] zentorch Integration
#150296 commented on May 7, 2025 • 0 new comments
DISABLED test_comprehensive_signal_windows_general_cosine_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139682 commented on May 7, 2025 • 0 new comments
[inductor] [silent incorrectness] `torch.nn.PairwiseDistance(p=2)` outputs incorrect results with eager
#151198 commented on May 7, 2025 • 0 new comments
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU
#145702 commented on May 7, 2025 • 0 new comments
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 commented on May 7, 2025 • 0 new comments
`setup.py develop` command is disappearing soon from `setuptools`
#152276 commented on May 7, 2025 • 0 new comments
ImportError: dlopen: cannot load any more object with static TLS
#2575 commented on May 7, 2025 • 0 new comments
[ATen][Sparse] Use Third-Party Eigen for sparse addmm
#101814 commented on May 5, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on May 2, 2025 • 0 new comments
[pytree] support PyStructSequence types for Python pytree
#113258 commented on May 5, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on May 7, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#125806 commented on May 7, 2025 • 0 new comments
refine fp32 precision api
#125888 commented on May 7, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on May 7, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on May 7, 2025 • 0 new comments
[test] fix unit test
#144977 commented on May 2, 2025 • 0 new comments
removed check for ConvTranspose3D on MPS
#145366 commented on May 7, 2025 • 0 new comments
Open up PT UTs to cover additional devices
#145589 commented on May 6, 2025 • 0 new comments
[micro_pipeline_tp] add logging for all-gather-matmul fusion
#145594 commented on May 5, 2025 • 0 new comments
[micro_pipeline_tp] support pattern matching row-wise scaled_mm with sharded scale
#145595 commented on May 5, 2025 • 0 new comments
[c10d] implement ReduceOp.unbox()
#145652 commented on May 5, 2025 • 0 new comments
Avoid data-dependent errors by runtime assert substitution.
#145681 commented on May 2, 2025 • 0 new comments
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on May 1, 2025 • 0 new comments
[Async-TP] Port _fused_all_gather_matmul_native to cpp to reduce launching overhead
#145794 commented on May 5, 2025 • 0 new comments
[AsyncMM] preliminary tuning
#145795 commented on May 5, 2025 • 0 new comments
[Async-TP] _pipelined_multi_all_gather_and_consume reduce overhead
#145796 commented on May 5, 2025 • 0 new comments
[Async-TP] improve algo selection
#145797 commented on May 5, 2025 • 0 new comments
[will-not-merge] tuning
#145798 commented on May 5, 2025 • 0 new comments
Replace distutils.version with copied looseversion
#145819 commented on May 6, 2025 • 0 new comments
[CUDAEvent.h] support external cuda events in cudagraphs
#146145 commented on May 5, 2025 • 0 new comments
[CI] Get rid of UCC builds
#146173 commented on May 2, 2025 • 0 new comments
Add where_ ops
#143636 commented on May 6, 2025 • 0 new comments
Defaults to C++20 in CMake torch targets
#143959 commented on May 7, 2025 • 0 new comments
[Intel GPU] add tf32 support for matmul on XPU
#144240 commented on May 7, 2025 • 0 new comments
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on May 2, 2025 • 0 new comments
[pytree][1/N] change pytree usages to implementation agnostic: `torch.distributed`
#144332 commented on May 2, 2025 • 0 new comments
[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern
#144338 commented on May 5, 2025 • 0 new comments
[BE][pytree][Easy] change imports `torch.utils._pytree` -> `torch.utils.pytree.python`
#144405 commented on May 2, 2025 • 0 new comments
Remove the `_stacklevel` arg from `log_softmax`, `softmax` and `softmin`
#144451 commented on May 6, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on May 1, 2025 • 0 new comments
[Intel CPU] Fix issue #143482.
#144760 commented on May 2, 2025 • 0 new comments
[Intel CPU] Fix issue #143483.
#144854 commented on May 2, 2025 • 0 new comments
[export] check non-negative modulus, avoid unnecessary congruences, in export solver
#144925 commented on May 3, 2025 • 0 new comments
DISABLED test_comprehensive_rot90_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140773 commented on May 6, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float64 (__main__.TestForeachCUDA)
#151019 commented on May 6, 2025 • 0 new comments
DISABLED test_cublas_addmm_size_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA)
#151834 commented on May 6, 2025 • 0 new comments
Semi-Structured Sparsity unsupported for Windows
#125302 commented on May 6, 2025 • 0 new comments
Add switch to disable truncation to long list print
#152427 commented on May 6, 2025 • 0 new comments
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on May 6, 2025 • 0 new comments
DISABLED test_cublas_addmm_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151862 commented on May 6, 2025 • 0 new comments
Segmentation error for torch==2.2.1 on MacOs
#121101 commented on May 6, 2025 • 0 new comments
Expand Tag Set: views & reductions
#129020 commented on May 6, 2025 • 0 new comments
at::BlasBackend::Ck does not handle all ROCm BLAS gpus
#150187 commented on May 6, 2025 • 0 new comments
compile generates inefficient code for mutations on small slices of inputs
#152346 commented on May 6, 2025 • 0 new comments
DeepSeek: mixed precision optimizers (BF16AdamW)
#146542 commented on May 6, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_single (__main__.CompileTest)
#147707 commented on May 6, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151054 commented on May 6, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8 (__main__.TestForeachCUDA)
#149858 commented on May 6, 2025 • 0 new comments
[t.compile][Functools] Cache decorator support for dynamo
#146598 commented on May 6, 2025 • 0 new comments
Attributeless FakeRootModule
#135696 commented on May 5, 2025 • 0 new comments
torch.onnx.export causes floating point exception with core dump for empty slice assignment
#110056 commented on May 5, 2025 • 0 new comments
torch.export with dynamic shapes on Static Cache HF LLama model fails
#152465 commented on May 5, 2025 • 0 new comments
torch.matrix_exp gets stuck on GPU
#149335 commented on May 6, 2025 • 0 new comments
welfordreduce slows down forward layernorm in a bunch of cases
#120184 commented on May 6, 2025 • 0 new comments
dist.barrier() hangs after calling async_save
#123447 commented on May 6, 2025 • 0 new comments
[inductor] `proxy_tensor.py` throws `SyntaxError` when using `.random_`
#151432 commented on May 6, 2025 • 0 new comments
[inductor] [silence] `nn.ConvTranspose2d-F.dropout` outputs inconsistent results with eager
#148061 commented on May 6, 2025 • 0 new comments
[inductor] [cuda] [fake tensor] `torch.triu_indices` throws `pointer argument` error when using `[0, 0]`
#151737 commented on May 6, 2025 • 0 new comments
Device Error on vmap
#151591 commented on May 6, 2025 • 0 new comments
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on May 6, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float32 (__main__.TestForeachCUDA)
#151003 commented on May 6, 2025 • 0 new comments
[dynamo] Dynamo fails to run torch.cat() with FakeTensors because it can't confirm 's0 + s1*u0' is nonzero
#152473 commented on May 6, 2025 • 0 new comments
[DTensor] Calling .item() on DTensor with Partial placement results in local value
#152406 commented on May 6, 2025 • 0 new comments
Simplification of pruned models
#58846 commented on May 6, 2025 • 0 new comments
torch.compile fails in FSDP due to .data assignment with different floating type
#152162 commented on May 6, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on May 6, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex128 (__main__.TestForeachCUDA)
#151093 commented on May 6, 2025 • 0 new comments
inductor `full_like` decompositions give incorrect strides
#144699 commented on May 6, 2025 • 0 new comments
[CI] No workflows scheduled on PRs
#151322 commented on May 6, 2025 • 0 new comments
NCCL out of memory error after updating to PyTorch 2.7
#152302 commented on May 6, 2025 • 0 new comments
[ued] HF diffusers pipeline `enable_cpu_offload` errors or graph breaks with a `torch.compile`-ed transformer
#150711 commented on May 6, 2025 • 0 new comments
Question about that support of torch.compile for a custom CUDA operator?
#152270 commented on May 7, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex64 (__main__.TestForeachCUDA)
#151099 commented on May 7, 2025 • 0 new comments
AttributeError: type object 'torch._C._distributed_c10d.BackendType' has no attribute 'XCCL'.
#147059 commented on May 7, 2025 • 0 new comments
Windows inductor genarated code without function declaration, and compile failed on MSVC.
#152251 commented on May 7, 2025 • 0 new comments
Device assert throws a runtime error in cuda backend and results in a crash in xpu backend
#142135 commented on May 7, 2025 • 0 new comments
`context_parallel` fails for training with `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
#149306 commented on May 7, 2025 • 0 new comments
Request to cherrypick a fix into v1.13.1 (v1.8 has a CVE)
#98115 commented on May 7, 2025 • 0 new comments
`RuntimeError: UR error` with XPU
#149953 commented on May 7, 2025 • 0 new comments
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on May 7, 2025 • 0 new comments
DISABLED test_cublas_and_lt_reduced_precision_fp16_accumulate_cuda (__main__.TestMatmulCudaCUDA)
#151890 commented on May 7, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#151114 commented on May 7, 2025 • 0 new comments
[Manylinux 2.28] Migrate Docker container to use gcc 13, CUDA 12.6 and gcc14 CUDA 12.8
#152426 commented on May 6, 2025 • 0 new comments
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on May 6, 2025 • 0 new comments
[dynamo] torch._dynamo crashes on `self.value.__module__` inside SkipFunctionVariable.call_function() (PyTorch 2.7, works 2.6)
#152316 commented on May 6, 2025 • 0 new comments
`torch.compile()` produces incorrect results for `asinh_()` operation on large/small values
#152299 commented on May 6, 2025 • 0 new comments
Unusually slow draft_export time
#152337 commented on May 6, 2025 • 0 new comments
Silent incorrectness between static torch.compile vs eager
#152425 commented on May 6, 2025 • 0 new comments
Softmax Decomp Causes Incorrect Gradients when Using `torch.compile` with `F.multi_head_attention_forward`
#152309 commented on May 6, 2025 • 0 new comments
RMS norm causes NaNs when used with torch.compile + float8 with rowwise scales
#150859 commented on May 6, 2025 • 0 new comments
aot_eager produces wrong output with all_gather_tensor_autograd
#148701 commented on May 6, 2025 • 0 new comments
Significant precision error from torch.compile
#145213 commented on May 6, 2025 • 0 new comments
Composition of torch.compile and torch.func.grad silently produces a wrong result.
#136662 commented on May 6, 2025 • 0 new comments
Fx Graph cache hit generates guards that does not exists in the original cached program causing recompilations only at cache hit.
#152435 commented on May 6, 2025 • 0 new comments
Invalid handling of nans in compiled torch.quantile / torch.nanquantile on cuda
#152423 commented on May 6, 2025 • 0 new comments
OptimizedModule __getattr__ may causes dead recursive call loop
#138157 commented on May 6, 2025 • 0 new comments
Graph break on .t() when Tensor._make_subclass
#151771 commented on May 6, 2025 • 0 new comments
torch.compile on MPS fails: generated Metal kernel uses loop-local variable out of scope
#152155 commented on May 6, 2025 • 0 new comments
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on May 6, 2025 • 0 new comments