-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
2 Pull requests merged by 2 people
-
[dynamo][super variable] Fix bug to use correct source
#152774 merged
May 6, 2025 -
[cudagraphs] Fix issue in collecting static_input_idxs
#152768 merged
May 6, 2025
206 Pull requests opened by 117 people
-
[CI] Use cmake from pip instead of conda in CI docker images
#152537 opened
Apr 30, 2025 -
Use swap_tensors path in nn.Module.to for all subclasses that override __torch_dispatch__
#152539 opened
Apr 30, 2025 -
[CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
#152540 opened
Apr 30, 2025 -
strict multidimensional slicing
#152543 opened
Apr 30, 2025 -
ci: Switch benchmark dependency to use pip
#152545 opened
Apr 30, 2025 -
Remove Conda Instructions
#152546 opened
Apr 30, 2025 -
Implemented `Size.__radd__`
#152554 opened
Apr 30, 2025 -
[BE] Update numba versions
#152557 opened
Apr 30, 2025 -
xpu: rely on sycl/sycl.hpp to include bfloat16.hpp
#152562 opened
Apr 30, 2025 -
[c10d][fr] Make FR vendor neutral so that other backends can use it
#152563 opened
Apr 30, 2025 -
[ROCm] Update spack includes
#152569 opened
Apr 30, 2025 -
[Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl
#152570 opened
Apr 30, 2025 -
[Dynamo] Fix typing in graph_deduplication.py
#152572 opened
May 1, 2025 -
Allow decomposeK to fuse
#152573 opened
May 1, 2025 -
[2/N] Use std::filesystem
#152586 opened
May 1, 2025 -
[WIP] suggest whitelist for dynamic shape recompilations
#152588 opened
May 1, 2025 -
[Dynamo] Optimize dedupe region ancestor tracking
#152589 opened
May 1, 2025 -
Fix #152280: add Literal[…] PaddingMode to Conv modules
#152590 opened
May 1, 2025 -
Fix: promote scalar to MPS device in exec_binary_kernel
#152591 opened
May 1, 2025 -
[c10d] Add support for ReduceOp::AVG in ProcessGroupMPI for FSDP2
#152594 opened
May 1, 2025 -
[not for review] benchmark script
#152596 opened
May 1, 2025 -
[multigraph] add backend_specialization kwarg to mark_dynamic
#152597 opened
May 1, 2025 -
[multigraph] use backend specializations in compile_and_call_fx_graph
#152601 opened
May 1, 2025 -
[BE] Delete `Module_CUDA_fix`
#152603 opened
May 1, 2025 -
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 opened
May 1, 2025 -
[Environment Variable] Use thread-safe getenv functions
#152609 opened
May 1, 2025 -
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 opened
May 1, 2025 -
Makefile: refactor build, setup and lint rules
#152611 opened
May 1, 2025 -
Revert "Cleanup VS 2019 refs in pytorch (#145863)"
#152613 opened
May 1, 2025 -
[WIP] Make FR vendor generic and try to enable it for gloo
#152614 opened
May 1, 2025 -
Stop proxy-ing autograd.Function.ctx into the graph
#152621 opened
May 1, 2025 -
Parameterized CUDA Graph Launch
#152622 opened
May 1, 2025 -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 opened
May 1, 2025 -
[ROCm] Initial AITER Integration for mha_bwd asm kernels
#152630 opened
May 1, 2025 -
[ca] wrap flex attention tests with compiled autograd
#152633 opened
May 1, 2025 -
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 opened
May 1, 2025 -
[BE]remove vulkan test
#152643 opened
May 1, 2025 -
[do-not-land][ca] default on for CI
#152646 opened
May 1, 2025 -
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 opened
May 2, 2025 -
Refactor some common autotune-related utils into a new file
#152652 opened
May 2, 2025 -
cleanup, refactor and add missing self._dde_suppressed checks
#152657 opened
May 2, 2025 -
Fix the basic description of torch.min(), torch.max(), torch.all(), torch.any()
#152658 opened
May 2, 2025 -
Fix evaluate_expr to include suppress_guards_tls in cache key
#152661 opened
May 2, 2025 -
Re-enable FakeTensor caching for SymInts
#152662 opened
May 2, 2025 -
Raise error when no record on extra_files
#152664 opened
May 2, 2025 -
MXFP8 Fix broken bias support for mxfp8
#152665 opened
May 2, 2025 -
Added documentation for nonzero_static function (#152347)
#152669 opened
May 2, 2025 -
[export] Dynamo symint support
#152677 opened
May 2, 2025 -
Fix signature of torch.sparse_coo_tensor()
#152681 opened
May 2, 2025 -
Update the signature and test of torch.hamming_window()
#152682 opened
May 2, 2025 -
[ca][dtensor] run real PG dtensor tests under CA
#152689 opened
May 2, 2025 -
set CUDA_MODULE_LOADING for older drivers only
#152695 opened
May 2, 2025 -
Removing conda references from PyTorch Docs
#152702 opened
May 2, 2025 -
[Memento] Add PT2 to Memory Snapshot
#152707 opened
May 2, 2025 -
Scheduler Flops refactor
#152708 opened
May 2, 2025 -
remove conda from devcontainer
#152713 opened
May 2, 2025 -
[caffe2] Make c10::str works with scoped enum (#152705)
#152714 opened
May 2, 2025 -
[BE][CI] Merge regular and MPS test config shards
#152719 opened
May 2, 2025 -
try something
#152722 opened
May 2, 2025 -
[Dynamo] Guard serialization for NN_MODULE
#152725 opened
May 2, 2025 -
[Dynamo] Guard serialization for FUNCTION_MATCH
#152727 opened
May 2, 2025 -
[Dynamo] Guard serialization for CLOSURE_MATCH
#152728 opened
May 2, 2025 -
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 opened
May 2, 2025 -
[Dynamo] Guard serialization for SEQUENCE_LENGTH
#152730 opened
May 2, 2025 -
[Cutlass] Handle broadcasting in EVT python codegen
#152733 opened
May 2, 2025 -
docs: fix dead link in torch.compile docs
#152734 opened
May 2, 2025 -
[ca][ddp] loud error instead of silent incorrectness under C++ Reducer
#152735 opened
May 2, 2025 -
[BE][Cleanup][Dynamo] Stop logging entire_frame_compile_time_s
#152738 opened
May 2, 2025 -
[export] add serialized_artifact test
#152739 opened
May 2, 2025 -
[export][cond] support merging constant ints as unbacked symint
#152742 opened
May 2, 2025 -
[CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`
#152745 opened
May 3, 2025 -
Conditionally support experimental filesystem include in jit_opt_limit
#152748 opened
May 3, 2025 -
Handle less functions than number of segments
#152753 opened
May 3, 2025 -
Allow ATen ops overloading
#152759 opened
May 3, 2025 -
[Easy][BE] update recommanded VS Code settings
#152760 opened
May 3, 2025 -
added short integer for repeat_interleave_cpu, Fixes #151311
#152762 opened
May 3, 2025 -
[aoti] Add grid_sampler_3d to cshim
#152771 opened
May 4, 2025 -
[Inductor] Pattern matcher support for mutable ops with non-view inputs
#152775 opened
May 4, 2025 -
[WIP] Pattern matcher support for mutable ops with view inputs
#152776 opened
May 4, 2025 -
[BE]: Update cudnn to 9.9 for cu128
#152782 opened
May 4, 2025 -
Fix negative dim issue in for parallel loss context manager
#152785 opened
May 4, 2025 -
Update CMakeLists.txt
#152786 opened
May 4, 2025 -
Implement DeviceType.h as header-only
#152787 opened
May 4, 2025 -
Fixed rerr computation in lobpcg
#152789 opened
May 4, 2025 -
same test for guard_or_false 1
#152802 opened
May 5, 2025 -
same test for guard_or_false 2
#152803 opened
May 5, 2025 -
[invoke_subgraph] Force the output stride to be same as eager
#152806 opened
May 5, 2025 -
wip
#152807 opened
May 5, 2025 -
another try
#152808 opened
May 5, 2025 -
Upgrade to NCCL 2.26.5 for CUDA 12
#152810 opened
May 5, 2025 -
[Quant][X86] add an op to compute uint8 batch norm 2d
#152811 opened
May 5, 2025 -
[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+
#152814 opened
May 5, 2025 -
[Cutlass] E2E Tests for EVT
#152815 opened
May 5, 2025 -
[TEST][Quantization] Skip test_learnable due to hypothesis
#152819 opened
May 5, 2025 -
[DO NOT MERGE] update build tools version
#152820 opened
May 5, 2025 -
[Easy][Inductor] Adds safety checks in get_estimated_runtime
#152821 opened
May 5, 2025 -
Use gcc13 in Manylinux 2.28 images
#152825 opened
May 5, 2025 -
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 opened
May 5, 2025 -
[BE]: Improve aten formatter with fmtlib
#152830 opened
May 5, 2025 -
Allow to set custom PYTHONPATH for torch.inductor
#152832 opened
May 5, 2025 -
[c10d] Fix extra CUDA context created by barrier
#152834 opened
May 5, 2025 -
[DRAFT] Test nccl
#152835 opened
May 5, 2025 -
[nativert] Move MPMCQueue to torch/nativert.
#152837 opened
May 5, 2025 -
[precompile] Add BundledAOTAutogradCacheEntry
#152840 opened
May 5, 2025 -
Add memory reporting for XPU to Memory Profiler
#152842 opened
May 5, 2025 -
ci: Remove conda-env-macOS-ARM64, prefer pip
#152843 opened
May 5, 2025 -
Fix HF loading when there's no metadata file to work with fsspec
#152856 opened
May 5, 2025 -
Add torch._C.Tag.needs_contiguous_strides
#152859 opened
May 5, 2025 -
[Graph Partition] remove weak dep from `partition_input_names`
#152863 opened
May 5, 2025 -
[Dynamo] Guard serialization for TUPLE_ITERATOR_LEN
#152865 opened
May 5, 2025 -
[Dynamo] Guard serialization for RANGE_ITERATOR_MATCH
#152872 opened
May 5, 2025 -
Clarify wrap_triton doc about optional triton_op usage
#152874 opened
May 5, 2025 -
[Graph Partition][Flex Attention] analyze symints from subgraph inputs and outputs
#152878 opened
May 5, 2025 -
xpu: support custom ops with torch.library on xpu backend
#152879 opened
May 5, 2025 -
[dynamo] Fix bug in hasattr(tensor, "size")
#152883 opened
May 6, 2025 -
[SDPA] Add testing to ensure stride order exactly matches
#152894 opened
May 6, 2025 -
Move link check jobs to pull to go with doc build
#152896 opened
May 6, 2025 -
[Inductor] Set correct baseline for decomposek test
#152897 opened
May 6, 2025 -
Remove `property` from python_type function
#152900 opened
May 6, 2025 -
[Set] Add set.symmetric_difference(_update)
#152901 opened
May 6, 2025 -
[Set] Add `set.issubset` and `set.issuperset`
#152902 opened
May 6, 2025 -
[Set] Raise `KeyError` if elem not contained in the set
#152903 opened
May 6, 2025 -
[Set] Raise TypeError if number of arguments mismatch
#152904 opened
May 6, 2025 -
[Set] Add `set.difference(_update)`
#152905 opened
May 6, 2025 -
[Set] Add `set.intersection(_update)`
#152906 opened
May 6, 2025 -
[Set] Raise KeyError on empty `set.pop()`
#152907 opened
May 6, 2025 -
[Set] Add correct set/frozenset __init__ behavior
#152908 opened
May 6, 2025 -
Add overall tensor similarity comparison (#152647)
#152920 opened
May 6, 2025 -
Upgrade to cuda 12.8.1 for docker builds
#152923 opened
May 6, 2025 -
include user stacks with constraint violation error message
#152924 opened
May 6, 2025 -
[WIP] Add unified memory APIs for torch.accelerator
#152932 opened
May 6, 2025 -
Allow Inductor backends to attest their own availability
#152933 opened
May 6, 2025 -
[Dynamo] Allow inlining into AO quantization modules
#152934 opened
May 6, 2025 -
Fix doc cosineannealinglr 152081
#152936 opened
May 6, 2025 -
[feature] Channel Wise Parallel API for Conv layers
#152937 opened
May 6, 2025 -
[Pipelining] Fix _batch_p2p bug for non-NCCL backends (#132644)
#152938 opened
May 6, 2025 -
get right function declaration on windows inductor
#152939 opened
May 6, 2025 -
[Don't merge] Debug
#152940 opened
May 6, 2025 -
Clean up of CUTLASS_VERSION
#152947 opened
May 6, 2025 -
[Linter] Add linter to detect device-bias hard code in test cases.
#152948 opened
May 6, 2025 -
[dtensor] add privateuse1 SDPA op support to DTensor
#152949 opened
May 6, 2025 -
Add NestedTensorHPU to to_padded_tensor in native_functions.yaml
#152950 opened
May 6, 2025 -
[ROCm] Ck gemm architecture guard
#152951 opened
May 6, 2025 -
[nativert] Move Placement to pytorch core
#152953 opened
May 6, 2025 -
[ROCm] unkip test_non_standard_bool except for failings ops
#152956 opened
May 6, 2025 -
Follow up to #152209, remove compat patch
#152958 opened
May 6, 2025 -
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 opened
May 6, 2025 -
Change aoti cpp tests to run serially within file
#152960 opened
May 6, 2025 -
[Dynamo] Remove unused guard PYMODULE_MATCH
#152961 opened
May 6, 2025 -
WIP so many changes to generate non-as strided view
#152965 opened
May 6, 2025 -
[Memento] On-demand mode using without torch api
#152966 opened
May 6, 2025 -
[ATen][CUDA] Optimize 128 bit vectorization
#152967 opened
May 6, 2025 -
[inductor] Generate synthetic offsets appropriately for autotuning _scaled_grouped_mm
#152968 opened
May 6, 2025 -
[nativert] Move GraphSignature to pytorch core
#152969 opened
May 6, 2025 -
Adding XPU support to DTensor examples.
#152973 opened
May 6, 2025 -
[hop_schema] add HopSchemaGenerator to make it easier to create hop schema
#152974 opened
May 6, 2025 -
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 opened
May 6, 2025 -
[MegaCache] Make MegaCache generic to allow external plugins registration
#152977 opened
May 6, 2025 -
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 opened
May 6, 2025 -
Fix `'TensorBox' object has no attribute 'is_input_buffer'`
#152980 opened
May 6, 2025 -
Catch TypeError from ValueRanges
#152981 opened
May 6, 2025 -
[torch][ao] Properly strip tracking stats in _fold_conv_bn_qat for 1D
#152982 opened
May 6, 2025 -
compile_fx: make a compile event that corresponds to the fx_compile waitcounter
#152983 opened
May 6, 2025 -
[hop_schema] support gen_schema for invoke_subgraph
#152984 opened
May 6, 2025 -
[WIP] Add XPU support for FlightRecorder
#152986 opened
May 6, 2025 -
[Set] Handle exception in ConstantVariable operation
#152987 opened
May 6, 2025 -
[Set] Raise `TypeError` if argument is unhashable
#152988 opened
May 6, 2025 -
[Set] Update `set.union` and `set.update` to support *args
#152989 opened
May 6, 2025 -
[Set] Raise TypeError if set is called with the wrong number of arguments
#152990 opened
May 6, 2025 -
[FrozenSet] Fixes for FrozenSet
#152991 opened
May 6, 2025 -
[inductor] Fix ModularIndexing assumptions
#152993 opened
May 6, 2025 -
[inductor] dtype promotion error in cat decomp
#152995 opened
May 6, 2025 -
[export] Unflatten None
#153000 opened
May 6, 2025 -
[cutlass backend][test] re-enable test_cuda_compile_command for fbcode
#153001 opened
May 6, 2025 -
[CI] Use sccache installed in docker image in xla build
#153002 opened
May 6, 2025 -
[cutlass backend] Skip cuda lib path if it is torch/lib
#153003 opened
May 6, 2025 -
[WIP][Inductor-CPU] int8 WoQ concat linear
#153004 opened
May 6, 2025 -
[autograd][docs] Add more details on why save_for_backward is important in extending autograd note
#153005 opened
May 6, 2025 -
[cutlass backend] Use src code to generate cutlass gemm name
#153006 opened
May 6, 2025 -
Remove redundant type aliases of _device_t for torch.Device (#152952)
#153007 opened
May 7, 2025 -
Detect NVSHMEM location
#153010 opened
May 7, 2025 -
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 opened
May 7, 2025 -
c10d/gloo: add ibverbs backend
#153015 opened
May 7, 2025 -
Add a project section to pyproject.toml, making uv sync work
#153020 opened
May 7, 2025 -
Adding a generic attribute for easier checkpoint discrepancy debugging.
#153021 opened
May 7, 2025 -
[Typing] Apply `torch.types.Device` in `torch/cuda/memory.py`
#153027 opened
May 7, 2025 -
[Typing] Improve device typing for `torch.set_default_device()`
#153028 opened
May 7, 2025 -
WIP: Fix caching when output has unbacked
#153034 opened
May 7, 2025 -
Allow zero sized dimensions in padding operations
#153037 opened
May 7, 2025 -
Add CUDA support for Adagrad(fused=True)
#153038 opened
May 7, 2025 -
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/misc.py` [2/2]
#153039 opened
May 7, 2025 -
[AOTInductor] Generate kernels separately for const graph and main graph
#153040 opened
May 7, 2025 -
🌠 Add Muon optimizer
#153048 opened
May 7, 2025 -
Update docs of saved_tensors_hooks to avoid ref cycle
#153049 opened
May 7, 2025 -
[Intel GPU] empty-size tensor case handling in addmm, baddmm
#153051 opened
May 7, 2025 -
[BE]: Use undocumented temp shim to restore setuptools compat
#153052 opened
May 7, 2025 -
[BE]: Blacklist broken setuptools until we upgrade MSVC API
#153053 opened
May 7, 2025 -
[HOP] Reworked HOPs to use FunctionalizeCtxWrapper
#153054 opened
May 7, 2025 -
[BE]: Add PEP621 project section to pyproject.toml
#153055 opened
May 7, 2025 -
[BE] Update ruamel to 0.18.10
#153057 opened
May 7, 2025 -
Fix misleadingly high AOT Inductor dashboard performance
#153060 opened
May 7, 2025 -
Keep raw cubin file around in case it gets deleted underneath us
#153064 opened
May 7, 2025 -
[ONNX] dynamic_shapes uses DYNAMIC
#153065 opened
May 7, 2025 -
fix bug with TORCHINDUCTOR_DUMP_LAUNCH_PARAMS
#153066 opened
May 7, 2025
96 Issues closed by 42 people
-
manylinux_2_28 support
#114232 closed
May 7, 2025 -
[graph pickler] [inductor compile async] imprecise filter for non standard op?
#151904 closed
May 7, 2025 -
[inductor] cudagraph error for individually compiled transformer blocks
#152887 closed
May 7, 2025 -
DISABLED test_sdpa_compile_cuda_bfloat16 (__main__.TestNestedTensorSubclassCUDA)
#119903 closed
May 7, 2025 -
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 closed
May 7, 2025 -
[XPU] test_tensordot_out_kernel_errors_with_autograd_xpu_float32 UT failure
#152090 closed
May 7, 2025 -
Failed visualized 1D DTensor
#152848 closed
May 7, 2025 -
[RFC] A device-agnostic Python runtime API design for stream-based accelerators
#128403 closed
May 7, 2025 -
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153018 closed
May 7, 2025 -
DISABLED test_comprehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153017 closed
May 7, 2025 -
Add metal-flash-attention for MPS backend
#139668 closed
May 7, 2025 -
[MPS] Binary kernels produce incorrect results when one of the tensor arguments is from a wrapped scalar
#152582 closed
May 7, 2025 -
[ONNX] Create a message to suggest users setting dynamo=True when exporting
#152025 closed
May 6, 2025 -
[dynamo] register_module_forward_pre_hook lead to compiled model produce wrong inference results
#149502 closed
May 6, 2025 -
[CI] [anaconda] Review Devcontainer anaconda usage
#148341 closed
May 6, 2025 -
addmv bfloat16 accuracy issues on cpu
#147860 closed
May 6, 2025 -
Optimize printing sympy expressions during logging and cache key computation
#151823 closed
May 6, 2025 -
UNSTABLE pull / linux-docs / build-docs-functorch-false
#152955 closed
May 6, 2025 -
Parameters between models don't copy in the C++ Pytroch Frontend under windows
#114485 closed
May 6, 2025 -
Unexpected result from `torch.xpu.is_bf16_supported()` when XPU is unavailable
#152301 closed
May 6, 2025 -
DISABLED test_dynamo_timed (__main__.TestDynamoTimed)
#148093 closed
May 6, 2025 -
Running `LazyModuleMixin` example throw errors
#150404 closed
May 6, 2025 -
can't build torch on WSL
#152763 closed
May 6, 2025 -
[AOTI] Package lowered with package_constants_in_so=False still uses lots of memory when loaded
#152356 closed
May 6, 2025 -
Throwing more specific errors for CrossEntropyLoss weights being on a different device than the input/target
#122757 closed
May 6, 2025 -
DISABLED test_matmul_layer_norm_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#151835 closed
May 6, 2025 -
DISABLED test_tmp_not_defined_issue2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135219 closed
May 6, 2025 -
[XPU] Get [ZE]: 0x78000011 on torch.compile with new driver
#151898 closed
May 6, 2025 -
flex attention does not leverage masking, memory error
#152528 closed
May 5, 2025 -
Flex attention: batch-index-dependent block mask causes error with changing batch size
#152297 closed
May 5, 2025 -
torch.compile LLMs on MPS progress tracker
#150710 closed
May 5, 2025 -
[ued] Investigate diffuser pipeline transformer recompilations due to different width/height
#150702 closed
May 5, 2025 -
[binary builds] Anaconda. Remove dependency on conda libuv module in MacOS and Windows nightly builds
#145872 closed
May 5, 2025 -
Dynamo Unsupported: call_method UserDefinedObjectVariable(dict_items) __iter__ () {}
#147440 closed
May 5, 2025 -
[binary builds] Anaconda. Remove dependency on conda environment for Windows nightly builds
#146048 closed
May 5, 2025 -
torch.multinomial is not deterministic for large number of input probabilities when replacement=True
#152854 closed
May 5, 2025 -
AsyncCollectiveTensor doesn't trigger wait upon dtype cast
#152534 closed
May 5, 2025 -
Poor performance of torch.dot with float16 & bfloat16
#152798 closed
May 5, 2025 -
Segmentation fault (core dumped) in torch.nn.functional.max_unpool2d
#152804 closed
May 5, 2025 -
Mention of nondeterministic index_add when deterministic implementation is being used
#152817 closed
May 5, 2025 -
[RFC][PGNCCL] Add Float8 support
#148344 closed
May 5, 2025 -
False INTERNAL ASSERT FAILED
#152805 closed
May 5, 2025 -
The docstring linter should not force overridden methods to be documented
#151692 closed
May 5, 2025 -
[RFC] Intel GPU ATen Operations Upstreaming Options
#119682 closed
May 5, 2025 -
DISABLED test_variant_consistency_eager_nn_functional_conv3d_cuda_complex64 (__main__.TestCommonCUDA)
#114592 closed
May 5, 2025 -
[CPU][UT] 16 UT of test/inductor/test_cpu_select_algorithm.py failed with PyTorch 2025-04-028 nightly wheel
#152398 closed
May 5, 2025 -
Error with nccl + multiple RTX5090 in ddp training. CUDA error: an illegal memory access was encountered
#152780 closed
May 5, 2025 -
Kw argument `dtype` less relative with the functions themselves
#145607 closed
May 5, 2025 -
cpu - gpu calculation results differs by far with torch.nn.functional.linear
#69969 closed
May 4, 2025 -
Inconsistent behavior between CPU and GPU implementations of `torch.Tensor.put_` method
#152755 closed
May 4, 2025 -
Performance Regression nightly 02/14→02/15, on nanogpt speedrun
#152761 closed
May 4, 2025 -
Note some limit in docstring of `padding` in Poolnd
#152156 closed
May 4, 2025 -
AOTInductor package can only be loaded on the first GPU (cuda:0) in C++ via AOTIModelPackageLoader
#152087 closed
May 4, 2025 -
Make scaler.step() return if step was skipped or not
#152279 closed
May 3, 2025 -
[Torch Profiler] Only two streams captured in CUDA graph but multiple streams shown in Torch Profiler
#152114 closed
May 3, 2025 -
[inductor][triton] Inductor is not compatible with the latest upstream Triton
#152531 closed
May 2, 2025 -
Performance Regression nightly 2025/02/08→02/09, on nanogpt speedrun
#147463 closed
May 2, 2025 -
DISABLED test_captured_scale_float16_cuda_float16 (__main__.TestFlexAttentionCUDA)
#152083 closed
May 2, 2025 -
DISABLED test_builtin_score_mods_float32_score_mod4_cuda_float32 (__main__.TestFlexAttentionCUDA)
#152082 closed
May 2, 2025 -
Update documentation to include insert and + methods to add layers in sequential
#146892 closed
May 2, 2025 -
test_reference_numerics_normal fails with certain versions of numpy/scipy
#148143 closed
May 2, 2025 -
inductor-periodic failures 5/2/2025
#152691 closed
May 2, 2025 -
static cuda launcher causes `RuntimeError: CUDA driver error: invalid device context` in torchtitan CI
#152639 closed
May 2, 2025 -
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 closed
May 2, 2025 -
DISABLED test_cat_max_autotune_triton (__main__.TestMaxAutotune)
#145830 closed
May 2, 2025 -
DISABLED test_sparse_add_cuda_complex64 (__main__.TestSparseCSRCUDA)
#145069 closed
May 2, 2025 -
Some Performance Bug in `tol` of `torch.lobpcg()`
#152154 closed
May 2, 2025 -
DISABLED test_nvshmem
#152649 closed
May 2, 2025 -
py_limited_api=True in PyTorch2.7 will break the build of extensions
#152243 closed
May 2, 2025 -
[ONNX] Improve and sort out fallback mechanism
#151703 closed
May 2, 2025 -
Should make the doc of `nn.CrossEntropyLoss()` more clear
#134853 closed
May 1, 2025 -
torch.compile should not recompiles when `.requires_grad=True` under `torch.no_grad()` context
#131975 closed
May 1, 2025 -
compiled autograd + dynamic shapes fails with constraint violation
#133575 closed
May 1, 2025 -
Export QAT model is not performing as expected when compared to the original model and FX Graph QAT
#150746 closed
May 1, 2025 -
`torch.export` fails on `InstanceNorm1d`
#152467 closed
May 1, 2025 -
[CI] [anaconda] CI Perf Tests
#148342 closed
May 1, 2025 -
[Inductor] Dynamo hangs when processing an operator, seemingly depending on a logical argument value
#151743 closed
May 1, 2025 -
[export] Warn users when 0/1 specialization happens
#151582 closed
May 1, 2025 -
The test 'test_host_memory_stats' is failing in torch2.7.0+cu118
#152422 closed
May 1, 2025 -
How does torch.cudagraph capture a hybrid graph?
#152584 closed
May 1, 2025 -
`torch.randint` can't handle large `high` argument (and in general high range of `torch.uint64`)
#152564 closed
Apr 30, 2025 -
torch.randint should accept high=2**63
#81446 closed
Apr 30, 2025 -
pytorch index_select is too slow
#111247 closed
Apr 30, 2025 -
cuda graphs produce two additional kernel calls
#143572 closed
Apr 30, 2025 -
[regression] Not getting `CUDA error: device-side assert triggered` on main for CUDA_KERNEL_ASSERT2
#107396 closed
Apr 30, 2025 -
[CI] [anaconda] Benchmarks anaconda removal
#152123 closed
Apr 30, 2025 -
More logs to show why fx graph cache isn't hit / created?
#152065 closed
Apr 30, 2025 -
Mr
#152549 closed
Apr 30, 2025 -
Add Description of `validate_args` in `torch.distributions.`
#152165 closed
Apr 30, 2025 -
[ROCm] "No available kernel" when running EFFICIENT_ATTENTION sdpa
#138864 closed
Apr 30, 2025 -
difficulty creating magma tarball when new rocm or cuda versions are deployed
#151707 closed
Apr 30, 2025 -
[CUDA Graph tree] Cannot capture buffer allocation on side CUDA Streams
#151199 closed
Apr 30, 2025
151 Issues opened by 78 people
-
[FlexAttention] export fails to trace with functorch
#153063 opened
May 7, 2025 -
non-strict export should detect fake tensor leakage
#153062 opened
May 7, 2025 -
register_constant doesn't work on simple types
#153061 opened
May 7, 2025 -
DISABLED test_input_hooks_same (__main__.HooksTests)
#153059 opened
May 7, 2025 -
`cuda.Event` handling in dynamo is broken
#153058 opened
May 7, 2025 -
Export doesn't move embedding to correct device
#153056 opened
May 7, 2025 -
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ on macOS
#153050 opened
May 7, 2025 -
DISABLED test_comprehensive_special_ndtri_cuda_int64 (__main__.TestInductorOpInfoCUDA)
#153047 opened
May 7, 2025 -
DISABLED test_comprehensive_trunc_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#153046 opened
May 7, 2025 -
DISABLED test_hook_with_nested_closure (__main__.HooksTests)
#153045 opened
May 7, 2025 -
Unexpected float32 overflow for amp training with torch.compile
#153044 opened
May 7, 2025 -
Pytorch 2.7 crashes when using flex attention with torch.amp
#153042 opened
May 7, 2025 -
using as_strided in torch compile generate wrong result.
#153041 opened
May 7, 2025 -
Add Split Softmax
#153035 opened
May 7, 2025 -
missalignment with differenet shape in F.linear with bf16 dtype
#153033 opened
May 7, 2025 -
DISABLED test_hook_with_closure (__main__.HooksTests)
#153032 opened
May 7, 2025 -
DISABLED test_comprehensive_svd_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153031 opened
May 7, 2025 -
DISABLED test_comprehensive_amin_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153030 opened
May 7, 2025 -
DISABLED test_comprehensive_asinh_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#153029 opened
May 7, 2025 -
Multiple CUDA graphs utilizing multiple CUDA GPUs encounter illegal memory access during replay
#153025 opened
May 7, 2025 -
[RFC] Enable XPU+FlexAttention on Intel GPU
#153024 opened
May 7, 2025 -
XPU inference output abnormal with device 'XPU:1'
#153022 opened
May 7, 2025 -
[RFC][API-Unstable]Enable A16W4 on XPU Device
#153019 opened
May 7, 2025 -
inconsistent grads between two types of `allgather`s
#153016 opened
May 7, 2025 -
Operations on different precision tensors in CPU lead to different outputs
#153014 opened
May 7, 2025 -
DISABLED test_comprehensive_scatter_xpu_bool (__main__.TestInductorOpInfoXPU)
#153009 opened
May 7, 2025 -
DISABLED test_comprehensive_scatter_xpu_int64 (__main__.TestInductorOpInfoXPU)
#153008 opened
May 7, 2025 -
`lintrunenr init` fails
#152999 opened
May 6, 2025 -
DISABLED test_comprehensive_rsub_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152996 opened
May 6, 2025 -
[dynamo] Actually support functools.lru_cache
#152994 opened
May 6, 2025 -
conv2d with int8 on CUDA: GET was unable to find an engine to execute this computation
#152992 opened
May 6, 2025 -
`torch.load` can't deserialize `datetime` objects, even with the appropriate `safe_globals`
#152985 opened
May 6, 2025 -
FPE when using `torch.lcm_` with int32 tensor and int16 scalar
#152979 opened
May 6, 2025 -
Refactor MegaCache to make it generic
#152976 opened
May 6, 2025 -
avoid falling back to as_strided for non-contiguous in-place reshape.
#152972 opened
May 6, 2025 -
DISABLED test_comprehensive_scatter_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152971 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_int64 (__main__.TestInductorOpInfoXPU)
#152970 opened
May 6, 2025 -
[FSDP2] need dummy forward/backward to stay SPMD
#152964 opened
May 6, 2025 -
DTensor support for dynamic shapes is soft
#152963 opened
May 6, 2025 -
TestNestedTensorOpInfoCUDA.test_compile_backward_matmul_cuda_float32 Test Failure
#152962 opened
May 6, 2025 -
DTensor placement propagation for `slice` fails during recompile due to SymInts
#152954 opened
May 6, 2025 -
Remove redundant type aliases of _device for torch.Device
#152952 opened
May 6, 2025 -
DISABLED test_compiler_collectives_automatic_dynamic_tensor (__main__.TestMultiProc)
#152944 opened
May 6, 2025 -
DISABLED test_comprehensive_ormqr_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152943 opened
May 6, 2025 -
aten._scaled_dot_product_efficient_attention returns LSE padded to next highest multiple of 32
#152942 opened
May 6, 2025 -
ROCm: no HIP device available if device is already initialized
#152941 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_bool (__main__.TestInductorOpInfoXPU)
#152931 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_int32 (__main__.TestInductorOpInfoXPU)
#152930 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152929 opened
May 6, 2025 -
DISABLED test_comprehensive_triu_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152928 opened
May 6, 2025 -
DISABLED test_comprehensive_rot90_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152927 opened
May 6, 2025 -
DISABLED test_comprehensive_scatter_xpu_float16 (__main__.TestInductorOpInfoXPU)
#152925 opened
May 6, 2025 -
Enable 12.8.1
#152922 opened
May 6, 2025 -
We should include where specialization happens when we throw a constraint violation error
#152918 opened
May 6, 2025 -
UNSTABLE inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper)
#152916 opened
May 6, 2025 -
Segmentation fault (core dumped) in torch.nn.functional.max_unpool2d
#152913 opened
May 6, 2025 -
DISABLED test_comprehensive_scatter_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152912 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_float32 (__main__.TestInductorOpInfoXPU)
#152911 opened
May 6, 2025 -
DISABLED test_comprehensive_gather_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152910 opened
May 6, 2025 -
DISABLED test_comprehensive_scatter_xpu_float64 (__main__.TestInductorOpInfoXPU)
#152898 opened
May 6, 2025 -
DISABLED test_comprehensive_nn_functional_conv3d_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152893 opened
May 6, 2025 -
DISABLED AotInductorTest.BasicPackageLoaderTestCpu (build.bin.test_aoti_inference)
#152891 opened
May 6, 2025 -
DISABLED test_comprehensive_sort_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152892 opened
May 6, 2025 -
DISABLED test_comprehensive_diagonal_copy_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152890 opened
May 6, 2025 -
DISABLED AotInductorTest.BasicTestCpu (build.bin.test_aoti_inference)
#152889 opened
May 6, 2025 -
DISABLED AotInductorTest.BasicTestCuda (build.bin.test_aoti_inference)
#152888 opened
May 6, 2025 -
UNSTABLE Lint / Link checks / Lint URLs / linux-job
#152884 opened
May 6, 2025 -
Motivate Pytorch's forward mode AD APIs with training examples
#152877 opened
May 5, 2025 -
Incorporate CUDA Memory Trimming Into DeviceCachingAllocator
#152875 opened
May 5, 2025 -
Docs Update `wrap_triton`
#152870 opened
May 5, 2025 -
DISABLED testAssertNotRegex (__main__.CPythonTest_Assertions)
#152869 opened
May 5, 2025 -
Have WrapTriton work w/ `TRITON_INTERPRET=1` in eager
#152868 opened
May 5, 2025 -
[dynamo] Improve final traceback frame format
#152867 opened
May 5, 2025 -
inductor-periodic rocm tests failing since at least 4/10
#152866 opened
May 5, 2025 -
torch.cuda.use_mem_pool is not thread safe
#152861 opened
May 5, 2025 -
DISABLED test_comprehensive___rmul___cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152851 opened
May 5, 2025 -
Clean up CUTLASS_VERSION post cutlass version update
#152847 opened
May 5, 2025 -
torch.distributions.Beta.entropy returns negative values
#152845 opened
May 5, 2025 -
Don't hardcoded support for DTensor to_local/from_local/redistribute into dynamo
#152829 opened
May 5, 2025 -
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 opened
May 5, 2025 -
`mypy` stage of `lintrunner -a` has intermittent but continuing crashes
#152824 opened
May 5, 2025 -
Performance Regression nightly 03/11→03/12, on nanogpt speedrun
#152823 opened
May 5, 2025 -
TorchRun: Option to specify which GPUs to run on
#152822 opened
May 5, 2025 -
Mismatch in dynamic quantization performance for torchao and torch.quantization
#152813 opened
May 5, 2025 -
DISABLED test_comprehensive_fliplr_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152797 opened
May 5, 2025 -
DISABLED test_comprehensive_rot90_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152796 opened
May 5, 2025 -
DISABLED test_comprehensive_unbind_copy_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152795 opened
May 5, 2025 -
DISABLED test_comprehensive_linalg_pinv_singular_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152793 opened
May 5, 2025 -
DISABLED test_comprehensive_slice_scatter_cuda_bool (__main__.TestInductorOpInfoCUDA)
#152794 opened
May 5, 2025 -
Inconsistent export behavior for nonzero+grid_sample between CUDA and CPU/MPS backends
#152791 opened
May 4, 2025 -
[CXX11ABI] torch 2.6.0-cu126 and cu124 have different exported symbols
#152790 opened
May 4, 2025 -
undefined symbol: __nvJitLinkCreate_12_8, version libnvJitLink.so.12
#152783 opened
May 4, 2025 -
Segmentation fault (core dumped) in torch.nn.functional.alpha_dropout
#152777 opened
May 4, 2025 -
Cuda-12.9 removed libnvToolsExt.so.* and is now purely header nvtx3
#152756 opened
May 3, 2025 -
Checkpoint sequential doesn't raise clear error when segments is greater than number of functions
#152752 opened
May 3, 2025 -
Error on padding 0-sized tensors
#152750 opened
May 3, 2025 -
torch.compile causes stride mismatch in SDPA with non-contiguous query in torch 2.7
#152747 opened
May 3, 2025 -
[FSDP2] fully_shard(mesh=(shard, shard)) for intra and inter node all-gathers
#152746 opened
May 3, 2025 -
[MPS] TensorIterator and accuracy
#152736 opened
May 2, 2025 -
Inconsistent float16 overflow behavior between CPU and CUDA devices
#152731 opened
May 2, 2025 -
When scoped_libary is destroyed the fake impls are not cleared
#152720 opened
May 2, 2025 -
[Mergebot] Adding ciflow/pull in PR without pull and lint workflows
#152718 opened
May 2, 2025 -
Cannot mask a DTensor
#152717 opened
May 2, 2025 -
dtensors TP+DP issues
#152712 opened
May 2, 2025 -
[FSDP2] NO_SHARD as fully_shard(mesh=(Replicate, Shard)) with shard of world size 1
#152710 opened
May 2, 2025 -
Gradient can be backpropagated through only certain distributions
#152703 opened
May 2, 2025 -
MPS internal assertion with jacfwd and concatenation
#152701 opened
May 2, 2025 -
DISABLED test_2d_mlp_with_nd_mesh (__main__.TestFullyShardNDTraining)
#152700 opened
May 2, 2025 -
[CI] [anaconda] Triton windows build
#152699 opened
May 2, 2025 -
CI workflows being skipped on PR
#152697 opened
May 2, 2025 -
torch._foreach_pow(DTensor, float) and torch._foreach_pow_(DTensor, float) do not work
#152696 opened
May 2, 2025 -
Add sm_86 (Ampere) and sm_89 (Ada) SASS in aarch64 builds
#152690 opened
May 2, 2025 -
torch.library.custom_op string support
#152685 opened
May 2, 2025 -
DISABLED test_comprehensive_select_scatter_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152684 opened
May 2, 2025 -
Flex attention strides
#152683 opened
May 2, 2025 -
[RFC] Universal Device Context and Safe GPU/CPU Execution Decorators
#152679 opened
May 2, 2025 -
DISABLED AotInductorTest.BasicPackageLoaderTestCuda (build.bin.test_aoti_inference)
#152674 opened
May 2, 2025 -
DISABLED test_comprehensive_std_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152673 opened
May 2, 2025 -
DISABLED test_comprehensive_cummin_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152672 opened
May 2, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152671 opened
May 2, 2025 -
Torch BF16 group gemm hangs in backward pass - core issue isolated, needs proper resolution.
#152668 opened
May 2, 2025 -
DISABLED test_comprehensive_nansum_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152666 opened
May 2, 2025 -
UNSTABLE docker-cache-mi300 / docker-cache
#152655 opened
May 2, 2025 -
Check for if two tensors are overall similar instead of bitwise similar?
#152647 opened
May 2, 2025 -
ProcessGroupGloo.allgather_into_tensor_coalesced crashes with CUDA tensors
#152645 opened
May 1, 2025 -
TestFlexAttentionCUDA.test_GQA_score_mod7_cuda_float16 fails on h100
#152635 opened
May 1, 2025 -
Incorrect strides for `nonzero_static` compilation
#152634 opened
May 1, 2025 -
DISABLED test_torchvision_models_efficientnet_v2_l (__main__.TestVisionTracing)
#152632 opened
May 1, 2025 -
[v2.7.1] Release Tracker
#152627 opened
May 1, 2025 -
modded-nanogpt flaky NCCL hang starting 3/30 nightly
#152623 opened
May 1, 2025 -
Pytorch Profiler crashes while using it with Pytorch Lightning module
#152617 opened
May 1, 2025 -
Enable AOTI for Metal inductor
#152612 opened
May 1, 2025 -
[triton pin update] Run Inductor CI on pin updates for Triton and the PyTorch nightly branch
#152608 opened
May 1, 2025 -
Loops impacting output when utilizing hooks
#152607 opened
May 1, 2025 -
AOTI regression on SAM and tts-angular
#152606 opened
May 1, 2025 -
Flex Attention doesn't scale with custom bias
#152593 opened
May 1, 2025 -
[ratter-build] Cannot detect CUDA when build from source
#152592 opened
May 1, 2025 -
[Benchmark] High compilation time variance on benchmark dashboards
#152566 opened
Apr 30, 2025 -
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 opened
Apr 30, 2025 -
DISABLED test_pending_fusion_pro_and_epi (__main__.TestPrologueFusion)
#152560 opened
Apr 30, 2025 -
DISABLED test_comprehensive_signal_windows_hamming_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152559 opened
Apr 30, 2025 -
DISABLED test_comprehensive_amin_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152558 opened
Apr 30, 2025 -
PGO does not work on jobs for frameworks that copy code to different dirs at different attempts.
#152555 opened
Apr 30, 2025 -
MPS varying seq len SDPA memory leak
#152550 opened
Apr 30, 2025 -
FakeTensorUpdater does not trace nodes correctly
#152548 opened
Apr 30, 2025
534 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
auto functionalize base_hop
#151067 commented on
May 6, 2025 • 22 new comments -
[Accelerator] Fix Python typing in accelerator
#152394 commented on
May 6, 2025 • 16 new comments -
Random Batch Sampler Speedup
#147706 commented on
May 6, 2025 • 14 new comments -
[1/n][Optimus][Auto-AC] Support activation quantization without scaling
#148380 commented on
May 1, 2025 • 12 new comments -
[Intel GPU] Support f32 intermediate dtype, headdim size <=576 and f32 causal mask for SDPA
#152091 commented on
May 7, 2025 • 11 new comments -
convert guard_size_oblivious to runtime check in infer_size_impl
#148872 commented on
May 6, 2025 • 11 new comments -
Cache code generation during triton template expansion and enable it for mm_template.
#151773 commented on
May 5, 2025 • 9 new comments -
[cp] dispatch flex_attention to CP impl in TorchDispatchMode
#151497 commented on
May 3, 2025 • 9 new comments -
Mini tutorial for provenance tracking
#152211 commented on
May 6, 2025 • 8 new comments -
[Inductor-CPU] Faster int8 WoQ GEMM for small M with explicit prefetching
#149373 commented on
May 7, 2025 • 8 new comments -
[Cutlass] Integrate EVT into CUDACPPScheduling
#150906 commented on
May 6, 2025 • 6 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
May 3, 2025 • 6 new comments -
[WIP] DeadCodeEliminator Mark(block) improvement
#152348 commented on
May 2, 2025 • 5 new comments -
[Cutlass] Changes to gemm template for EVT
#150907 commented on
May 7, 2025 • 5 new comments -
[device_mesh] improve device selection logic
#150897 commented on
May 1, 2025 • 4 new comments -
[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices
#152341 commented on
May 7, 2025 • 4 new comments -
Fix `lr_scheduler` unexpectedly calls `step()` when init argument last_epoch is larger than -1
#149312 commented on
May 6, 2025 • 3 new comments -
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 commented on
May 6, 2025 • 3 new comments -
[Hierarchical Compilation] Use universal flatten APIs
#152505 commented on
May 5, 2025 • 3 new comments -
`torch.tensordot`: performance improvements when contracting to a scalar.
#145936 commented on
May 5, 2025 • 3 new comments -
Add differentiable ops hint message in Module docs
#150291 commented on
May 6, 2025 • 2 new comments -
[BE]: Follow detach().clone() pattern for SGD
#144468 commented on
May 5, 2025 • 2 new comments -
[Quant][X86] add ops to compute uint8 pointwise add/add_relu
#152411 commented on
May 7, 2025 • 2 new comments -
Temp test
#148424 commented on
May 6, 2025 • 2 new comments -
[CI][CUDA] Move cu118 distributed pull jobs to cu126, move cu124-sm75 to cu126-sm75
#151594 commented on
May 2, 2025 • 2 new comments -
[pytorch][triton] flex attention fwd kernel with TMA loads (#151923)
#152460 commented on
May 7, 2025 • 2 new comments -
cpu: enable gemm-bf16f32 for SDPA BF16
#140159 commented on
May 7, 2025 • 2 new comments -
remove guard_size_oblivious from unbind.
#148815 commented on
May 2, 2025 • 2 new comments -
[export] Refactor pt2 save/load
#152495 commented on
May 6, 2025 • 2 new comments -
autograd: Add VJP and JVP rules for aten::aminmax
#151186 commented on
May 3, 2025 • 2 new comments -
Horizontal
#151780 commented on
May 7, 2025 • 1 new comment -
Move prologue_supported_inputs computations to def_kernal
#150869 commented on
May 2, 2025 • 1 new comment -
[Graph Partition] Pass all cudagraph tree tests
#152048 commented on
May 7, 2025 • 1 new comment -
Move mps_linear forward to use MPS kernels directly instead of MPSGraph
#152210 commented on
May 6, 2025 • 1 new comment -
[dynamic shapes] guard_or_false for infer_size
#152146 commented on
May 6, 2025 • 1 new comment -
update get_default_device to also respect torch.device ctx manager
#148621 commented on
May 6, 2025 • 1 new comment -
[inductor] lowering for fractional_max_pool3d
#148630 commented on
May 7, 2025 • 1 new comment -
Fix `InstanceNorm` wrong suggestion in warning message
#151534 commented on
May 2, 2025 • 1 new comment -
Change unsafe_marked_cacheable_functions to a dictionary, so that you can specify a static cache key
#152486 commented on
May 6, 2025 • 1 new comment -
Parallelize sort using libstdc++ parallel mode
#150195 commented on
May 1, 2025 • 1 new comment -
add device generalisation support for distributed tests
#152471 commented on
May 2, 2025 • 1 new comment -
[WIP][dynamic shapes] rewrite should_swap with guard_or_false
#150164 commented on
May 2, 2025 • 1 new comment -
[aotd] Support saved tensors hooks in aot_autograd
#150032 commented on
May 6, 2025 • 1 new comment -
Deprecate DataLoader pin_memory_device param
#146821 commented on
May 6, 2025 • 1 new comment -
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#150726 commented on
May 2, 2025 • 1 new comment -
Make `Adam`, `AdamW` work with nonzero-dim Tensor betas
#149939 commented on
May 3, 2025 • 1 new comment -
Add is_pinned to host allocator
#151439 commented on
May 6, 2025 • 1 new comment -
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on
May 7, 2025 • 1 new comment -
Skip fuse attention on fp32 if not tf32
#151924 commented on
May 5, 2025 • 1 new comment -
Use gather in index_select
#151715 commented on
May 7, 2025 • 1 new comment -
Inductor logging + analysis of torch.profile
#149697 commented on
May 3, 2025 • 1 new comment -
[ca] mark scalar int sizes as dynamic via tensor wrapping
#151731 commented on
May 7, 2025 • 1 new comment -
[DTensor] enable SimpleFSDP's composability with Tensor Parallel
#152286 commented on
May 6, 2025 • 1 new comment -
Rewrite autograd producer consumer stream sync logic
#151079 commented on
May 7, 2025 • 1 new comment -
Fix CUPTI lookup to include target directory
#148668 commented on
May 5, 2025 • 0 new comments -
[SGD] Add SGD capturable API and tests
#148647 commented on
May 5, 2025 • 0 new comments -
[dynamic shapes] guard_or_false for computeStorageNbytes
#150483 commented on
May 6, 2025 • 0 new comments -
Adjust CMake code for Eigen
#148628 commented on
May 7, 2025 • 0 new comments -
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 commented on
May 2, 2025 • 0 new comments -
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
May 5, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
May 7, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
May 2, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
May 2, 2025 • 0 new comments -
Remove `torch.testing` from `MOD_SKIPLIST`
#148459 commented on
May 5, 2025 • 0 new comments -
Add 'x in {...}' patterns to perf_linter
#148417 commented on
May 4, 2025 • 0 new comments -
Add perf_linter to auto-fix some anti-patterns
#148416 commented on
May 3, 2025 • 0 new comments -
softmax: add device check for xpu with half_to_float
#150278 commented on
May 7, 2025 • 0 new comments -
Add cmake variable USE_ROCM_CK
#150245 commented on
May 6, 2025 • 0 new comments -
[c10d] Test multiple CUDA Graph captures
#150040 commented on
May 7, 2025 • 0 new comments -
Fixes detection of ArmPL on Linux platform
#150031 commented on
May 7, 2025 • 0 new comments -
AOTI freezing: fix test issues and enable by default
#149961 commented on
May 2, 2025 • 0 new comments -
[inductor] Add typing to _inductor/ir.py
#149958 commented on
May 7, 2025 • 0 new comments -
Enable XPU distributed test for PT2.8
#149916 commented on
May 5, 2025 • 0 new comments -
Refactoring FSDP2 (_composable/fsdp) test cases to be device agnostic
#149848 commented on
May 5, 2025 • 0 new comments -
Add SWA with a cyclical scheduler example
#149847 commented on
May 2, 2025 • 0 new comments -
Add x86-simd-sort accelerated sorting
#149362 commented on
May 7, 2025 • 0 new comments -
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on
May 1, 2025 • 0 new comments -
Make Subset dataset a true wrapper
#149272 commented on
May 5, 2025 • 0 new comments -
[Easy] update pip sources for CUDA in nightly pull tool
#149143 commented on
May 1, 2025 • 0 new comments -
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on
May 7, 2025 • 0 new comments -
[test] bigger runnner
#149003 commented on
Apr 30, 2025 • 0 new comments -
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging
#148981 commented on
May 6, 2025 • 0 new comments -
Move token linter code into tools/linter/adaptors/_linter/
#148959 commented on
May 5, 2025 • 0 new comments -
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on
May 7, 2025 • 0 new comments -
C++ support to print symbolic tensors as `Symbolic tensor: size=(...)`
#148846 commented on
May 4, 2025 • 0 new comments -
Set specialized representation string for meta/fake tensor with empty construction
#148794 commented on
May 7, 2025 • 0 new comments -
cpp_wrapper: build non-performance-sensitive code at O1
#148773 commented on
May 2, 2025 • 0 new comments -
Trunk workflow for Windows Arm64
#148753 commented on
May 2, 2025 • 0 new comments -
Fix calling torch.compile inside of a `__torch_dispatch__`
#148712 commented on
May 6, 2025 • 0 new comments -
[Just SCRTCH] no review
#148710 commented on
May 6, 2025 • 0 new comments -
Automated perf_linter changes: x in (...)
#148415 commented on
May 4, 2025 • 0 new comments -
Define USE_C10D_XCCL and USE_XCCL in pytorch
#147593 commented on
May 7, 2025 • 0 new comments -
[import][inductor] Simplify grid handling
#147583 commented on
May 3, 2025 • 0 new comments -
[ONNX][demo] Rotary embedding
#147576 commented on
May 4, 2025 • 0 new comments -
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on
Apr 30, 2025 • 0 new comments -
removed zero dim cpu logic from fake_tensor.py
#147501 commented on
May 1, 2025 • 0 new comments -
[test] sccache log
#147470 commented on
May 6, 2025 • 0 new comments -
[Inductor] Avoid tensor slice overflow for large step
#147433 commented on
May 6, 2025 • 0 new comments -
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on
May 6, 2025 • 0 new comments -
Add the memory and dispatch to the logging module.
#147262 commented on
May 3, 2025 • 0 new comments -
Fix clang-tidy warnings in torch/jit
#147253 commented on
May 2, 2025 • 0 new comments -
logging: close handler after removing it
#147235 commented on
May 4, 2025 • 0 new comments -
Record the XPU and XCCL build settings in the compiled binary
#147161 commented on
May 7, 2025 • 0 new comments -
[fsdp] add an experimental allocator hook for buffers that participate in collective communication
#147146 commented on
May 5, 2025 • 0 new comments -
fake_tensor: Handle op errors more gracefully
#147049 commented on
May 7, 2025 • 0 new comments -
experimental proposal DCP v2
#146999 commented on
May 2, 2025 • 0 new comments -
[BE]: Try to remove unused type ignores - attempt 1
#146989 commented on
May 2, 2025 • 0 new comments -
[DO NOT MERGE][cuDNN][SDPA] Testing sm90/sm100 priority for cuDNN SDPA
#146947 commented on
May 6, 2025 • 0 new comments -
Support pin_memory() during CUDA stream capture.
#146924 commented on
May 7, 2025 • 0 new comments -
[DO NOT MERGE] ROCm sandbox PR
#146903 commented on
May 7, 2025 • 0 new comments -
Fix non-bitwise type annotations for Tensor operators (see #145838)
#146845 commented on
May 1, 2025 • 0 new comments -
Enable Windows tests
#146695 commented on
May 4, 2025 • 0 new comments -
Optimize LRScheduler docs
#146684 commented on
May 2, 2025 • 0 new comments -
[HOP] Mutation and alias rework
#146658 commented on
May 7, 2025 • 0 new comments -
[WIP] BaseSubclass
#146612 commented on
May 5, 2025 • 0 new comments -
clang-format CUDASymmetricMemory.cu
#146592 commented on
May 5, 2025 • 0 new comments -
Update quantile doc
#146485 commented on
May 2, 2025 • 0 new comments -
[WIP][dynamic shapes] mark backed size symbols as size-like
#146335 commented on
May 5, 2025 • 0 new comments -
[dcp] Minor improvements to filesystem writer
#146273 commented on
May 6, 2025 • 0 new comments -
Format tests by PYFMT
#146267 commented on
May 2, 2025 • 0 new comments -
docs: change log to ln in Softplus function and class
#146199 commented on
May 2, 2025 • 0 new comments -
Automated perf_linter changes: list constructors
#148414 commented on
May 4, 2025 • 0 new comments -
Automated perf_linter changes: generators
#148413 commented on
May 3, 2025 • 0 new comments -
Add api info for torch._C._nn.pyi [1/N]
#148410 commented on
May 3, 2025 • 0 new comments -
Enable `_lazy_clone` between CPU and MPS
#148408 commented on
May 1, 2025 • 0 new comments -
[Utilization] Add utilization monitor for linux build
#148375 commented on
May 6, 2025 • 0 new comments -
test index_put
#148357 commented on
May 3, 2025 • 0 new comments -
[ROCm][CI] Add support for gfx1100 in rocm workflow + test skips
#148355 commented on
May 7, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
May 3, 2025 • 0 new comments -
Checking for cuda version to see if bf16 is natively supported or emulated
#148322 commented on
May 6, 2025 • 0 new comments -
```torch.as_strided``` negative stride SIGSEV fix when using ```torch.compile```
#148301 commented on
May 3, 2025 • 0 new comments -
Treat CUDA warnings as errors
#148294 commented on
May 3, 2025 • 0 new comments -
handle jk for emulation runs
#148240 commented on
May 2, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 commented on
May 1, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
May 3, 2025 • 0 new comments -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(ZI)
#148173 commented on
May 4, 2025 • 0 new comments -
Disable cudnn to avoid creating guards that denies exporting
#148140 commented on
May 4, 2025 • 0 new comments -
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 commented on
May 5, 2025 • 0 new comments -
[torch/elastic][upstream] Fix the wrong order when start_index is not 0
#147967 commented on
May 4, 2025 • 0 new comments -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Zi)
#147917 commented on
May 2, 2025 • 0 new comments -
[Draft] Enable cpu_offload for _distribute_state_dict
#147916 commented on
May 2, 2025 • 0 new comments -
[export][dynamic shapes] add Dim._OBLIVIOUS, _mark_oblivious()
#147881 commented on
May 2, 2025 • 0 new comments -
Set disable_clone=True when running opt_gm
#147845 commented on
May 2, 2025 • 0 new comments -
Use /permissive- for torch libraries in MSVC builds
#147825 commented on
May 4, 2025 • 0 new comments -
remove asserttion in expand_to_full_mesh_op_strategy
#147823 commented on
May 6, 2025 • 0 new comments -
[WIP][ptd][nccl] use current-stream as nccl-stream under async=False mode
#147820 commented on
May 2, 2025 • 0 new comments -
[cuda] Added a correctness test for layernorm backwards
#147763 commented on
May 5, 2025 • 0 new comments -
[DCP][OSS] Rank local checkpointing in DCP without collectives
#147758 commented on
Apr 30, 2025 • 0 new comments -
Modifications to RuntimeEstimator and SACEstimator
#147750 commented on
May 6, 2025 • 0 new comments -
Skip test_dtypes xpu test on bmm and addbmm
#147721 commented on
May 3, 2025 • 0 new comments -
[Dtensor] Pass device information in OffsetBasedRNGTracker
#147594 commented on
May 4, 2025 • 0 new comments -
Use std::apply for CPU code
#152526 commented on
May 2, 2025 • 0 new comments -
Add `padding="same"` for transposed convolution
#152228 commented on
May 5, 2025 • 0 new comments -
Add support for torch.cuda.FloatTensor()
#152208 commented on
May 1, 2025 • 0 new comments -
[submodule] Update ONNX to 1.18
#152200 commented on
May 7, 2025 • 0 new comments -
[inductor] propagate shapes in CSEVariable
#152198 commented on
May 7, 2025 • 0 new comments -
IGNORE: Testing OIDC
#152181 commented on
May 1, 2025 • 0 new comments -
Extend compute_global_tensor_shape to multi dimension sharding
#152166 commented on
May 3, 2025 • 0 new comments -
Add dynamo config to HOP-ify context managers
#152159 commented on
Apr 30, 2025 • 0 new comments -
Add runtime asserts to AOTI
#152125 commented on
May 6, 2025 • 0 new comments -
[dynamo][ca] support dynamic annotations on tensors in ListVariables/TupleVariables
#152119 commented on
May 7, 2025 • 0 new comments -
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on
May 5, 2025 • 0 new comments -
Switch to standard pep517 sdist generation
#152098 commented on
May 6, 2025 • 0 new comments -
unbreak fb:operator_benchmark_test
#152049 commented on
May 1, 2025 • 0 new comments -
[map] always turn on dynamo for map
#152041 commented on
May 6, 2025 • 0 new comments -
Add CPython complex tests
#152015 commented on
May 6, 2025 • 0 new comments -
[Kineto] Upgrade the kineto commit to fb36cce
#152007 commented on
May 7, 2025 • 0 new comments -
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on
May 6, 2025 • 0 new comments -
Add torchcheck for replication_pad3d_backward
#151986 commented on
May 2, 2025 • 0 new comments -
Make `aten.embedding` do not wrap negative index
#151967 commented on
May 5, 2025 • 0 new comments -
[ca] hide unused scalar int sizes from dynamo
#151962 commented on
May 7, 2025 • 0 new comments -
[ROCm][CI] Update dockerfile to use centos9
#151929 commented on
May 6, 2025 • 0 new comments -
[BE] Upgrade XPU support package to 2025.1 in CICD
#151899 commented on
May 7, 2025 • 0 new comments -
Avoid differing results in `linalg.(tensor_)solve`
#151896 commented on
May 6, 2025 • 0 new comments -
[aot][ca] save bw_module in AOTAutogradCache
#151860 commented on
May 7, 2025 • 0 new comments -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
May 6, 2025 • 0 new comments -
[2/n][Optimus][Auto-AC] Support activation quantization with scaling
#151770 commented on
May 1, 2025 • 0 new comments -
Add adaptive_avg_pool2d input and output_size check
#151769 commented on
May 2, 2025 • 0 new comments -
[Don't merge] Upgrade oneDNN to v3.8 for XPU build
#151767 commented on
May 7, 2025 • 0 new comments -
Implement avg_pool3d for MPS backend
#151742 commented on
May 6, 2025 • 0 new comments -
[ROCm] Maxpool forward NHWC Perf Improvement targeting Resnet scenarios
#151727 commented on
May 7, 2025 • 0 new comments -
elastic: do not shutdown rendezvous on leaving workers
#152525 commented on
May 2, 2025 • 0 new comments -
[compile async] [cache] testing
#152523 commented on
May 7, 2025 • 0 new comments -
[inductor] [compile async] Don't compile in eager
#152507 commented on
May 6, 2025 • 0 new comments -
[Hierarchical Compile] Take into account mutation deps in cycle detection
#152506 commented on
May 1, 2025 • 0 new comments -
Fix flaky test in test_custom_ops
#152484 commented on
May 6, 2025 • 0 new comments -
[IR] Input Adapter refactor prototype
#152459 commented on
Apr 30, 2025 • 0 new comments -
fix: Update padding_mode to use Literal for type checking
#152458 commented on
May 2, 2025 • 0 new comments -
Add epoch to fake tensor cache key
#152453 commented on
May 2, 2025 • 0 new comments -
[ROCm] cpp_extension allow user to override default flags
#152432 commented on
May 4, 2025 • 0 new comments -
Relax tolerance for test_quick_baddbmm_cpu_complex64
#152424 commented on
May 6, 2025 • 0 new comments -
[Inductor][CPP] Enable vectorized fp8 quant dequant
#152418 commented on
Apr 30, 2025 • 0 new comments -
[Hierarchical Compile] Add mutation dependencies to topological sorting
#152410 commented on
May 1, 2025 • 0 new comments -
[Hierarchical Compilation] Track node mutations
#152389 commented on
May 1, 2025 • 0 new comments -
Add vec_reduce_all specialization for std::plus on AArch64
#152388 commented on
Apr 30, 2025 • 0 new comments -
fix: outdated contents in dynamo overview
#152382 commented on
May 6, 2025 • 0 new comments -
complex.pow(2) on GPU by replacing with complex * complex to avoid numerical instability
#152373 commented on
May 2, 2025 • 0 new comments -
[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up
#152372 commented on
May 1, 2025 • 0 new comments -
vec::map: directly process reduced-precision floats when reasonable
#152366 commented on
Apr 30, 2025 • 0 new comments -
add is_vec_specialized_for
#152365 commented on
May 2, 2025 • 0 new comments -
Format all headers under ATen/cpu/vec, not just top-level
#152364 commented on
May 6, 2025 • 0 new comments -
Add codeowner for merge rules
#152354 commented on
May 6, 2025 • 0 new comments -
[inductor][dynamo] Include operator name in size/stride/alignment assertion
#152353 commented on
May 7, 2025 • 0 new comments -
[cp] dispatch flex_attention_backward to CP impl in TorchDispatchMode
#152311 commented on
May 2, 2025 • 0 new comments -
Enable the AMP precision with freezing for CPU nightly test
#152298 commented on
May 6, 2025 • 0 new comments -
[CI] Add xpu inductor test into periodic workflow
#152281 commented on
May 7, 2025 • 0 new comments -
[ROCm] Maxpool backward NHWC Perf Improvement targeting Resnet scenarios
#152267 commented on
Apr 30, 2025 • 0 new comments -
[executorch hash update] update the pinned executorch hash
#152238 commented on
May 7, 2025 • 0 new comments -
At least one of ROCM_HOME or CUDA_HOME must be None
#152236 commented on
Apr 30, 2025 • 0 new comments -
_get_total_norm should use float64 to avoid rounding errors
#152234 commented on
May 1, 2025 • 0 new comments -
flex attention: fix dispatch order for tensor subclasses, avoid hardcoding call to faketensor impl in dynamo
#151719 commented on
May 2, 2025 • 0 new comments -
Do not cover up `__dunder`__ method type-hints from `.pyi` file
#150875 commented on
May 6, 2025 • 0 new comments -
Add CPython tests for iter/sort
#150797 commented on
May 6, 2025 • 0 new comments -
Add CPython generator/contextlib tests
#150796 commented on
May 6, 2025 • 0 new comments -
Add CPython int/float tests
#150795 commented on
May 6, 2025 • 0 new comments -
Add CPython math/cmath tests
#150794 commented on
May 6, 2025 • 0 new comments -
Add CPython string tests
#150793 commented on
May 6, 2025 • 0 new comments -
[Set] Add CPython set tests
#150792 commented on
May 7, 2025 • 0 new comments -
Add CPython dict tests
#150791 commented on
May 6, 2025 • 0 new comments -
Add CPython list/tuple tests
#150790 commented on
May 6, 2025 • 0 new comments -
Add CPython exception tests
#150789 commented on
May 6, 2025 • 0 new comments -
Add CPython tests for unittest
#150788 commented on
May 6, 2025 • 0 new comments -
Make device check error message more descriptive
#150750 commented on
May 7, 2025 • 0 new comments -
[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files
#150732 commented on
May 2, 2025 • 0 new comments -
[BE] Resolve lint errors in `.pyi` stub files
#150731 commented on
May 2, 2025 • 0 new comments -
[BE] Ensure generated stub files by `gen_pyi` are properly formatted
#150730 commented on
May 2, 2025 • 0 new comments -
[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#150729 commented on
May 2, 2025 • 0 new comments -
[BE] Update `.pyi` stub template to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150728 commented on
May 2, 2025 • 0 new comments -
[torchgen] Refactor and simplify `gen_pyi.py` to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150727 commented on
May 2, 2025 • 0 new comments -
Avoid overwriting COW data in MPS code
#150721 commented on
May 2, 2025 • 0 new comments -
[export] add runtime assert messages to python torch checks
#150719 commented on
May 6, 2025 • 0 new comments -
Support XPU in memory tracker
#150703 commented on
May 7, 2025 • 0 new comments -
[draft][distributed] add into 3d composability test at AMD CI test
#150694 commented on
May 6, 2025 • 0 new comments -
AOTI: add all fallback ops that are missing from C-shim
#150673 commented on
May 2, 2025 • 0 new comments -
[Inductor] Fix CUDA memory usage for CPU only compile
#150669 commented on
May 2, 2025 • 0 new comments -
Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen`
#150626 commented on
May 2, 2025 • 0 new comments -
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on
May 2, 2025 • 0 new comments -
fix dynamic shapes for kwargs
#150583 commented on
Apr 30, 2025 • 0 new comments -
Enable lazy cloning in `Tensor.to` between CPU and MPS
#150569 commented on
May 2, 2025 • 0 new comments -
API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+
#150536 commented on
May 5, 2025 • 0 new comments -
Add a custom profiler configuration option
#151656 commented on
May 6, 2025 • 0 new comments -
[Intel GPU] Use user-friendly err msg in mm
#151655 commented on
May 7, 2025 • 0 new comments -
[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU
#151637 commented on
May 7, 2025 • 0 new comments -
Add OIDC perms to windows-[build|test] workflows
#151596 commented on
May 1, 2025 • 0 new comments -
Add OIDC permissions to linux-test workflow
#151585 commented on
May 1, 2025 • 0 new comments -
Add OIDC permissions to linux-build workflow
#151581 commented on
May 1, 2025 • 0 new comments -
[bazel] Fix unusual reference to cpuinfo workspace
#151578 commented on
May 5, 2025 • 0 new comments -
Add device agnostic support for distributed tests
#151560 commented on
May 6, 2025 • 0 new comments -
Update OpenBLAS commit
#151547 commented on
May 6, 2025 • 0 new comments -
Add hint message when parameters is empty in clip_grad_norm_
#151529 commented on
May 6, 2025 • 0 new comments -
Allow to byteswap data when reading saved torch jit data
#151447 commented on
May 1, 2025 • 0 new comments -
[ez] Rewrite comment to be more friendly to non haskellers
#151421 commented on
May 4, 2025 • 0 new comments -
[Cutlass] Add epilogue inputs/outputs to def_kernel
#151406 commented on
May 5, 2025 • 0 new comments -
[ROCm] Upgrade ROCm CI to ROCm6.4
#151368 commented on
May 7, 2025 • 0 new comments -
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on
May 6, 2025 • 0 new comments -
[WIP] Generalize device caching allocator
#151298 commented on
May 6, 2025 • 0 new comments -
Remove outdated Android workarounds of nearbyintf
#151292 commented on
May 3, 2025 • 0 new comments -
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on
May 6, 2025 • 0 new comments -
[aot autograd][logging] Profile large missing gaps in compile time tracing
#151256 commented on
May 6, 2025 • 0 new comments -
NCCL: Fix cmake file when cross compiling.
#151234 commented on
May 5, 2025 • 0 new comments -
[dynamo] keep C++ symbolic shape guards disabled for benchmarks
#151225 commented on
May 1, 2025 • 0 new comments -
Implement MKLGenerator
#151218 commented on
May 1, 2025 • 0 new comments -
Update slow tests
#151207 commented on
May 5, 2025 • 0 new comments -
[dynamo] Prevent lazy variable realization on STORE_FAST
#151184 commented on
May 2, 2025 • 0 new comments -
TESTING: IGNORE
#151116 commented on
May 1, 2025 • 0 new comments -
Update auto-tuning support for _scaled_grouped_mm
#150944 commented on
May 6, 2025 • 0 new comments -
[CI] Enable XCCL in XPU CI build
#150927 commented on
May 7, 2025 • 0 new comments -
fix shard tensor gather when a local tensor on certain ranks has zero elements
#150914 commented on
May 7, 2025 • 0 new comments -
[torch.compile] handle a custom __delattr__ method correctly
#150899 commented on
May 5, 2025 • 0 new comments -
An exported onnx model can't reduce on dim with value of 0 if 'keepdims' is false
#66200 commented on
May 4, 2025 • 0 new comments -
Install pytorch from pypi using local CUDA build
#150742 commented on
May 4, 2025 • 0 new comments -
pytorch pip install instructions: always include the cuda index
#150432 commented on
May 4, 2025 • 0 new comments -
[compile] DDPOptimizer + activation checkpointing not supported
#104674 commented on
May 4, 2025 • 0 new comments -
[ROCm] PyTorch slow on TTS
#150168 commented on
May 4, 2025 • 0 new comments -
torch.lobpcg producing different largest eigenvalue than scipy and np.linalg.eig
#101075 commented on
May 4, 2025 • 0 new comments -
Inconsistent `sum`/`dot`/`norm` behavior
#151761 commented on
May 5, 2025 • 0 new comments -
Logging when executing fx.Interpreter
#117351 commented on
May 5, 2025 • 0 new comments -
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151661 commented on
May 5, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8 (__main__.TestForeachCUDA)
#150407 commented on
May 5, 2025 • 0 new comments -
[C10D] Allow NCCL single P2P ops to use parent/collective communicator
#152220 commented on
May 5, 2025 • 0 new comments -
Unexpected behavior when using dist.all_reduce(x, op=dist.ReduceOp.SUM)
#152300 commented on
May 5, 2025 • 0 new comments -
can't reconstruct the communication group using PyTorch.
#152527 commented on
May 5, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex64 (__main__.TestForeachCUDA)
#150960 commented on
May 5, 2025 • 0 new comments -
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151675 commented on
May 5, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
May 5, 2025 • 0 new comments -
DISABLED test_comprehensive_lu_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152520 commented on
May 5, 2025 • 0 new comments -
The 2.7.0 release tarball is missing `.ci/docker/ci_commit_pins/nccl-cu12.txt` required for building
#152532 commented on
May 1, 2025 • 0 new comments -
Newly added lint-urls jobs are very flaky
#152439 commented on
May 2, 2025 • 0 new comments -
associative_scan not composable with vmap in eager-mode
#134000 commented on
May 2, 2025 • 0 new comments -
Multiple Learning Rate Scheduler for Specific Parameters Groups
#101082 commented on
May 2, 2025 • 0 new comments -
Adam (fused=True) issues
#90752 commented on
May 2, 2025 • 0 new comments -
Unwanted Warning in lr_scheduler.step()
#117540 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_bitwise_right_shift_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152057 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_nansum_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#139710 commented on
May 2, 2025 • 0 new comments -
AdamW(fused=True) slower than unfused AdamW
#121857 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_floor_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152058 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_nansum_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140693 commented on
May 2, 2025 • 0 new comments -
`einsum` is about 40x slower on CUDA than manually multiplying and summing
#101249 commented on
May 2, 2025 • 0 new comments -
torch.nn.functional.ctc_loss raises cuDNN error in PyTorch versions >=2.5.0
#152421 commented on
May 3, 2025 • 0 new comments -
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on
May 3, 2025 • 0 new comments -
A problem discovered when computing complex matrices in deep neural networks
#151182 commented on
May 3, 2025 • 0 new comments -
[dynamic shapes] data-dependent error when backed + unbacked expression resolves statically
#151491 commented on
May 3, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
May 4, 2025 • 0 new comments -
[dynamo] guard code generation triggers attribute error on DeviceMesh object
#152447 commented on
May 5, 2025 • 0 new comments -
[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition
#151478 commented on
May 5, 2025 • 0 new comments -
[inductor] [aot] `torch.linalg.lu` can't accept `slice operation`, behaving differently with eager
#151401 commented on
May 5, 2025 • 0 new comments -
[dynamo] Try tracing into einops
#152480 commented on
May 5, 2025 • 0 new comments -
[dynamo] `torch.compile` prevents fsdp warning from getting generated
#152451 commented on
May 5, 2025 • 0 new comments -
`torch.compile` causes assertion error in distributed checkpoint wrapper test
#152442 commented on
May 5, 2025 • 0 new comments -
[inductor] Improve codegen for argmax+max
#146643 commented on
May 5, 2025 • 0 new comments -
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 commented on
May 5, 2025 • 0 new comments -
[ued] Slow start up time for `torch.compile` on GGUF Auraflow
#150706 commented on
May 5, 2025 • 0 new comments -
[inductor] [assertion error] `torch.select_scatter` crashes on inductor but passes on eager
#151296 commented on
May 5, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferRuntimeConstantFoldingCuda (build.bin.test_aoti_inference)
#150299 commented on
May 5, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float16 (__main__.TestForeachCUDA)
#150985 commented on
May 5, 2025 • 0 new comments -
Cannot override __add__ in NamedTuple with __new__ + torch.compile
#133762 commented on
May 5, 2025 • 0 new comments -
`randint(max)` causes a graph break, but not `rand().mul(max).floor().to(torch.long)` (on CPU)
#135664 commented on
May 5, 2025 • 0 new comments -
Investigate FlexAttention performance degradation on low precision inputs
#147336 commented on
May 5, 2025 • 0 new comments -
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on
May 6, 2025 • 0 new comments -
upstream `apex.normalization.FusedRMSNorm`
#72643 commented on
May 5, 2025 • 0 new comments -
[XPU] Upgrade the XPU support packages version to 2025.1 in CI/CD
#151097 commented on
May 5, 2025 • 0 new comments -
RFC: Torch Native Runtime
#152034 commented on
May 5, 2025 • 0 new comments -
[TensorDict - compile] dynamo generator compatibility
#129658 commented on
May 5, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32 (__main__.TestForeachCUDA)
#149409 commented on
May 5, 2025 • 0 new comments -
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on
May 5, 2025 • 0 new comments -
Illegal Instruction Caused by `grid_sample` Under Windows
#152385 commented on
May 5, 2025 • 0 new comments -
optree package status in PyTorch
#152535 commented on
May 5, 2025 • 0 new comments -
DISABLED test_pattern_matcher_multi_user_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134433 commented on
May 5, 2025 • 0 new comments -
DISABLED test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151712 commented on
May 5, 2025 • 0 new comments -
Update quantization to make source files complient with /Zc:lambda
#92600 commented on
May 5, 2025 • 0 new comments -
Stop special-casing einops in Dynamo
#142486 commented on
May 5, 2025 • 0 new comments -
DISABLED test_comprehensive_index_select_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152416 commented on
May 5, 2025 • 0 new comments -
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152469 commented on
May 5, 2025 • 0 new comments -
DISABLED test_comprehensive_polygamma_polygamma_n_1_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152470 commented on
May 5, 2025 • 0 new comments -
DISABLED test_comprehensive_repeat_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152500 commented on
May 5, 2025 • 0 new comments -
[Async TP] all-gather-matuls not fusing properly when rowwise scales are used
#149990 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16 (__main__.TestForeachCUDA)
#150173 commented on
May 1, 2025 • 0 new comments -
Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)
#145374 commented on
May 1, 2025 • 0 new comments -
Ability to do aot/inductor compilation from a jit model (or torch.exported model)
#127928 commented on
May 1, 2025 • 0 new comments -
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on
May 1, 2025 • 0 new comments -
Update `torch/nn/modules/conv.py` to use Literal for support padding modes
#152280 commented on
May 1, 2025 • 0 new comments -
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on
May 1, 2025 • 0 new comments -
`nn.CrossEntropyLoss` accepts negative target probabilities
#152437 commented on
May 1, 2025 • 0 new comments -
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32 (__main__.TestForeachCUDA)
#150208 commented on
May 1, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128 (__main__.TestForeachCUDA)
#149323 commented on
May 1, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#150933 commented on
May 1, 2025 • 0 new comments -
DTensor slicing on sharded dimension leads to replication
#149447 commented on
May 1, 2025 • 0 new comments -
Status of pip wheels with _GLIBCXX_USE_CXX11_ABI=1
#51039 commented on
May 1, 2025 • 0 new comments -
Set `size` when `is_coalesced` is set in `torch.sparse_coo_tensor()`
#145371 commented on
May 2, 2025 • 0 new comments -
[ROCm] sdpa group query attention bf16 numeric error
#139352 commented on
Apr 30, 2025 • 0 new comments -
Profiler doesn't seem to work on AMD CPUs
#150052 commented on
Apr 30, 2025 • 0 new comments -
MPS: Conv1d fails with NotImplementedError for output_channels > 65536
#152278 commented on
Apr 30, 2025 • 0 new comments -
[ROCm] MI300X FP8 scaled_mm is extremely slow on nightly
#143465 commented on
Apr 30, 2025 • 0 new comments -
torch.compile on MPS progress tracker
#150121 commented on
Apr 30, 2025 • 0 new comments -
NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCUDA' backend.
#152226 commented on
Apr 30, 2025 • 0 new comments -
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on
Apr 30, 2025 • 0 new comments -
enhance documentation around the developer build
#108406 commented on
Apr 30, 2025 • 0 new comments -
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on
Apr 30, 2025 • 0 new comments -
missing docs for torch.Tag
#126518 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128 (__main__.TestForeachCUDA)
#150141 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on
May 1, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on
May 1, 2025 • 0 new comments -
DISABLED test_comprehensive_nanmean_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140339 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64 (__main__.TestForeachCUDA)
#150161 commented on
May 1, 2025 • 0 new comments -
[Tracker] Nested tensor op coverage requests
#118107 commented on
May 1, 2025 • 0 new comments -
Signature should be extended for `torch.hamming_window()`
#146590 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_special_xlog1py_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140648 commented on
May 2, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferCuda (build.bin.test_aoti_inference)
#149495 commented on
May 2, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 commented on
May 2, 2025 • 0 new comments -
[inductor] nan_asserts doesn't work for FP8, "RuntimeError: "isinf" not implemented for 'Float8_e4m3fn'"
#149002 commented on
May 2, 2025 • 0 new comments -
Major perf regression with `BatchNorm2d` + `torch.compile` with `reduce-overhead` + DDP
#139207 commented on
May 2, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64 (__main__.TestForeachCUDA)
#149199 commented on
May 2, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 commented on
May 2, 2025 • 0 new comments -
module.cuda() doesn't work under FakeTensorMode
#148977 commented on
May 2, 2025 • 0 new comments -
Support SDPA flash attention/ memory efficant attn on ROCm gfx908
#141958 commented on
May 2, 2025 • 0 new comments -
[CI] [anaconda] CI Build and Test scripts Windows
#148338 commented on
May 2, 2025 • 0 new comments -
Continuous calls to nn.Linear in fp32 on the 5090D cause severe performance degradation
#150725 commented on
May 2, 2025 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
May 2, 2025 • 0 new comments -
Pytorch DDP across nodes: self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected
#114357 commented on
May 2, 2025 • 0 new comments -
[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`
#125198 commented on
May 2, 2025 • 0 new comments -
fully_shard() for huggingface 72B model: pytorch caches too much GPU memory
#151936 commented on
May 2, 2025 • 0 new comments -
Dynamo unsupported: dynamic padding
#123855 commented on
May 1, 2025 • 0 new comments -
When using torch to convert to oxxn model, testing the inference results with actual images shows tensor mismatch
#152097 commented on
May 1, 2025 • 0 new comments -
dynamo cannot trace global op_set .__contains__
#145761 commented on
May 1, 2025 • 0 new comments -
[RFC] : Dynamically Quantized 8-bit Matrix Multiplication support
#149500 commented on
May 1, 2025 • 0 new comments -
LoadHIP.cmake should find_package(composable_kernel)
#149809 commented on
May 1, 2025 • 0 new comments -
`view()` + modify-in-place fails silently with DTensor
#147570 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on
May 1, 2025 • 0 new comments -
Context Parallel -- unsharded output doesn't match output without CP.
#152261 commented on
May 1, 2025 • 0 new comments -
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on
May 1, 2025 • 0 new comments -
The state of sparse Tensors
#9674 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64 (__main__.TestForeachCUDA)
#150298 commented on
May 2, 2025 • 0 new comments -
Add description of several params in the basic usage of `torch.min()`, `torch.max()`, `torch.all()` and `torch.any()`
#152176 commented on
May 2, 2025 • 0 new comments -
Raise an Error when File Not Found in `torch.jit.load()`
#152178 commented on
May 2, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on
May 2, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cpu (__main__.CpuTests)
#151540 commented on
May 2, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16 (__main__.TestForeachCUDA)
#150309 commented on
May 2, 2025 • 0 new comments -
[NCCL] Unordered destruction of `ProcessGroupNCCL` no longer supported
#137507 commented on
May 2, 2025 • 0 new comments -
[inductor] enable bf32 test for mkldnn conv
#127293 commented on
May 7, 2025 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
May 7, 2025 • 0 new comments -
[1/N] Update CI jobs to use CMake >= 3.25
#130522 commented on
May 6, 2025 • 0 new comments -
Make IPC features extendable on third-party devices
#133222 commented on
May 6, 2025 • 0 new comments -
add ranking for grouped benchmarks
#133287 commented on
May 2, 2025 • 0 new comments -
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 commented on
May 6, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
May 6, 2025 • 0 new comments -
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on
May 2, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
May 5, 2025 • 0 new comments -
[WIP] Add DeviceAllocator as the base device allocator
#138222 commented on
May 6, 2025 • 0 new comments -
Always produce XML
#138513 commented on
May 6, 2025 • 0 new comments -
[cuDNN] Add an option to force cuDNN usage (incl. SDPA)
#139699 commented on
May 5, 2025 • 0 new comments -
Fix warnings and simplify code in TensorShape
#141971 commented on
May 4, 2025 • 0 new comments -
Fix platform detection in MKLDNN CMake file
#142067 commented on
May 2, 2025 • 0 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
May 7, 2025 • 0 new comments -
Replacing explicit backend search with api call
#144944 commented on
May 5, 2025 • 0 new comments -
Wrong formula for CosineAnnealingLR
#152081 commented on
May 6, 2025 • 0 new comments -
`Aborted (core dumped)` in `torch.cuda.nccl.reduce`
#150836 commented on
May 7, 2025 • 0 new comments -
[RFC] zentorch Integration
#150296 commented on
May 7, 2025 • 0 new comments -
DISABLED test_comprehensive_signal_windows_general_cosine_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139682 commented on
May 7, 2025 • 0 new comments -
[inductor] [silent incorrectness] `torch.nn.PairwiseDistance(p=2)` outputs incorrect results with eager
#151198 commented on
May 7, 2025 • 0 new comments -
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU
#145702 commented on
May 7, 2025 • 0 new comments -
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 commented on
May 7, 2025 • 0 new comments -
`setup.py develop` command is disappearing soon from `setuptools`
#152276 commented on
May 7, 2025 • 0 new comments -
ImportError: dlopen: cannot load any more object with static TLS
#2575 commented on
May 7, 2025 • 0 new comments -
[ATen][Sparse] Use Third-Party Eigen for sparse addmm
#101814 commented on
May 5, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
May 2, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
May 5, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
May 7, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
May 7, 2025 • 0 new comments -
refine fp32 precision api
#125888 commented on
May 7, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
May 7, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
May 7, 2025 • 0 new comments -
[test] fix unit test
#144977 commented on
May 2, 2025 • 0 new comments -
removed check for ConvTranspose3D on MPS
#145366 commented on
May 7, 2025 • 0 new comments -
Open up PT UTs to cover additional devices
#145589 commented on
May 6, 2025 • 0 new comments -
[micro_pipeline_tp] add logging for all-gather-matmul fusion
#145594 commented on
May 5, 2025 • 0 new comments -
[micro_pipeline_tp] support pattern matching row-wise scaled_mm with sharded scale
#145595 commented on
May 5, 2025 • 0 new comments -
[c10d] implement ReduceOp.unbox()
#145652 commented on
May 5, 2025 • 0 new comments -
Avoid data-dependent errors by runtime assert substitution.
#145681 commented on
May 2, 2025 • 0 new comments -
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on
May 1, 2025 • 0 new comments -
[Async-TP] Port _fused_all_gather_matmul_native to cpp to reduce launching overhead
#145794 commented on
May 5, 2025 • 0 new comments -
[AsyncMM] preliminary tuning
#145795 commented on
May 5, 2025 • 0 new comments -
[Async-TP] _pipelined_multi_all_gather_and_consume reduce overhead
#145796 commented on
May 5, 2025 • 0 new comments -
[Async-TP] improve algo selection
#145797 commented on
May 5, 2025 • 0 new comments -
[will-not-merge] tuning
#145798 commented on
May 5, 2025 • 0 new comments -
Replace distutils.version with copied looseversion
#145819 commented on
May 6, 2025 • 0 new comments -
[CUDAEvent.h] support external cuda events in cudagraphs
#146145 commented on
May 5, 2025 • 0 new comments -
[CI] Get rid of UCC builds
#146173 commented on
May 2, 2025 • 0 new comments -
Add where_ ops
#143636 commented on
May 6, 2025 • 0 new comments -
Defaults to C++20 in CMake torch targets
#143959 commented on
May 7, 2025 • 0 new comments -
[Intel GPU] add tf32 support for matmul on XPU
#144240 commented on
May 7, 2025 • 0 new comments -
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on
May 2, 2025 • 0 new comments -
[pytree][1/N] change pytree usages to implementation agnostic: `torch.distributed`
#144332 commented on
May 2, 2025 • 0 new comments -
[TorchInductor] Add ALiBi (Attention with Linear Biases) Fused Attention Pattern
#144338 commented on
May 5, 2025 • 0 new comments -
[BE][pytree][Easy] change imports `torch.utils._pytree` -> `torch.utils.pytree.python`
#144405 commented on
May 2, 2025 • 0 new comments -
Remove the `_stacklevel` arg from `log_softmax`, `softmax` and `softmin`
#144451 commented on
May 6, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
May 1, 2025 • 0 new comments -
[Intel CPU] Fix issue #143482.
#144760 commented on
May 2, 2025 • 0 new comments -
[Intel CPU] Fix issue #143483.
#144854 commented on
May 2, 2025 • 0 new comments -
[export] check non-negative modulus, avoid unnecessary congruences, in export solver
#144925 commented on
May 3, 2025 • 0 new comments -
DISABLED test_comprehensive_rot90_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140773 commented on
May 6, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float64 (__main__.TestForeachCUDA)
#151019 commented on
May 6, 2025 • 0 new comments -
DISABLED test_cublas_addmm_size_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA)
#151834 commented on
May 6, 2025 • 0 new comments -
Semi-Structured Sparsity unsupported for Windows
#125302 commented on
May 6, 2025 • 0 new comments -
Add switch to disable truncation to long list print
#152427 commented on
May 6, 2025 • 0 new comments -
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on
May 6, 2025 • 0 new comments -
DISABLED test_cublas_addmm_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA)
#151862 commented on
May 6, 2025 • 0 new comments -
Segmentation error for torch==2.2.1 on MacOs
#121101 commented on
May 6, 2025 • 0 new comments -
Expand Tag Set: views & reductions
#129020 commented on
May 6, 2025 • 0 new comments -
at::BlasBackend::Ck does not handle all ROCm BLAS gpus
#150187 commented on
May 6, 2025 • 0 new comments -
compile generates inefficient code for mutations on small slices of inputs
#152346 commented on
May 6, 2025 • 0 new comments -
DeepSeek: mixed precision optimizers (BF16AdamW)
#146542 commented on
May 6, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_single (__main__.CompileTest)
#147707 commented on
May 6, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151054 commented on
May 6, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8 (__main__.TestForeachCUDA)
#149858 commented on
May 6, 2025 • 0 new comments -
[t.compile][Functools] Cache decorator support for dynamo
#146598 commented on
May 6, 2025 • 0 new comments -
Attributeless FakeRootModule
#135696 commented on
May 5, 2025 • 0 new comments -
torch.onnx.export causes floating point exception with core dump for empty slice assignment
#110056 commented on
May 5, 2025 • 0 new comments -
torch.export with dynamic shapes on Static Cache HF LLama model fails
#152465 commented on
May 5, 2025 • 0 new comments -
torch.matrix_exp gets stuck on GPU
#149335 commented on
May 6, 2025 • 0 new comments -
welfordreduce slows down forward layernorm in a bunch of cases
#120184 commented on
May 6, 2025 • 0 new comments -
dist.barrier() hangs after calling async_save
#123447 commented on
May 6, 2025 • 0 new comments -
[inductor] `proxy_tensor.py` throws `SyntaxError` when using `.random_`
#151432 commented on
May 6, 2025 • 0 new comments -
[inductor] [silence] `nn.ConvTranspose2d-F.dropout` outputs inconsistent results with eager
#148061 commented on
May 6, 2025 • 0 new comments -
[inductor] [cuda] [fake tensor] `torch.triu_indices` throws `pointer argument` error when using `[0, 0]`
#151737 commented on
May 6, 2025 • 0 new comments -
Device Error on vmap
#151591 commented on
May 6, 2025 • 0 new comments -
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on
May 6, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float32 (__main__.TestForeachCUDA)
#151003 commented on
May 6, 2025 • 0 new comments -
[dynamo] Dynamo fails to run torch.cat() with FakeTensors because it can't confirm 's0 + s1*u0' is nonzero
#152473 commented on
May 6, 2025 • 0 new comments -
[DTensor] Calling .item() on DTensor with Partial placement results in local value
#152406 commented on
May 6, 2025 • 0 new comments -
Simplification of pruned models
#58846 commented on
May 6, 2025 • 0 new comments -
torch.compile fails in FSDP due to .data assignment with different floating type
#152162 commented on
May 6, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
May 6, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex128 (__main__.TestForeachCUDA)
#151093 commented on
May 6, 2025 • 0 new comments -
inductor `full_like` decompositions give incorrect strides
#144699 commented on
May 6, 2025 • 0 new comments -
[CI] No workflows scheduled on PRs
#151322 commented on
May 6, 2025 • 0 new comments -
NCCL out of memory error after updating to PyTorch 2.7
#152302 commented on
May 6, 2025 • 0 new comments -
[ued] HF diffusers pipeline `enable_cpu_offload` errors or graph breaks with a `torch.compile`-ed transformer
#150711 commented on
May 6, 2025 • 0 new comments -
Question about that support of torch.compile for a custom CUDA operator?
#152270 commented on
May 7, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex64 (__main__.TestForeachCUDA)
#151099 commented on
May 7, 2025 • 0 new comments -
AttributeError: type object 'torch._C._distributed_c10d.BackendType' has no attribute 'XCCL'.
#147059 commented on
May 7, 2025 • 0 new comments -
Windows inductor genarated code without function declaration, and compile failed on MSVC.
#152251 commented on
May 7, 2025 • 0 new comments -
Device assert throws a runtime error in cuda backend and results in a crash in xpu backend
#142135 commented on
May 7, 2025 • 0 new comments -
`context_parallel` fails for training with `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
#149306 commented on
May 7, 2025 • 0 new comments -
Request to cherrypick a fix into v1.13.1 (v1.8 has a CVE)
#98115 commented on
May 7, 2025 • 0 new comments -
`RuntimeError: UR error` with XPU
#149953 commented on
May 7, 2025 • 0 new comments -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on
May 7, 2025 • 0 new comments -
DISABLED test_cublas_and_lt_reduced_precision_fp16_accumulate_cuda (__main__.TestMatmulCudaCUDA)
#151890 commented on
May 7, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#151114 commented on
May 7, 2025 • 0 new comments -
[Manylinux 2.28] Migrate Docker container to use gcc 13, CUDA 12.6 and gcc14 CUDA 12.8
#152426 commented on
May 6, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
May 6, 2025 • 0 new comments -
[dynamo] torch._dynamo crashes on `self.value.__module__` inside SkipFunctionVariable.call_function() (PyTorch 2.7, works 2.6)
#152316 commented on
May 6, 2025 • 0 new comments -
`torch.compile()` produces incorrect results for `asinh_()` operation on large/small values
#152299 commented on
May 6, 2025 • 0 new comments -
Unusually slow draft_export time
#152337 commented on
May 6, 2025 • 0 new comments -
Silent incorrectness between static torch.compile vs eager
#152425 commented on
May 6, 2025 • 0 new comments -
Softmax Decomp Causes Incorrect Gradients when Using `torch.compile` with `F.multi_head_attention_forward`
#152309 commented on
May 6, 2025 • 0 new comments -
RMS norm causes NaNs when used with torch.compile + float8 with rowwise scales
#150859 commented on
May 6, 2025 • 0 new comments -
aot_eager produces wrong output with all_gather_tensor_autograd
#148701 commented on
May 6, 2025 • 0 new comments -
Significant precision error from torch.compile
#145213 commented on
May 6, 2025 • 0 new comments -
Composition of torch.compile and torch.func.grad silently produces a wrong result.
#136662 commented on
May 6, 2025 • 0 new comments -
Fx Graph cache hit generates guards that does not exists in the original cached program causing recompilations only at cache hit.
#152435 commented on
May 6, 2025 • 0 new comments -
Invalid handling of nans in compiled torch.quantile / torch.nanquantile on cuda
#152423 commented on
May 6, 2025 • 0 new comments -
OptimizedModule __getattr__ may causes dead recursive call loop
#138157 commented on
May 6, 2025 • 0 new comments -
Graph break on .t() when Tensor._make_subclass
#151771 commented on
May 6, 2025 • 0 new comments -
torch.compile on MPS fails: generated Metal kernel uses loop-local variable out of scope
#152155 commented on
May 6, 2025 • 0 new comments -
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on
May 6, 2025 • 0 new comments