Codestin Search App

SeanNaren · 2017-10-31T21:23:24Z

Enables tensor core operations for RNNs. Checks for cudnn V7 and cuda 9 or above. Only supported on Volta cards (AWS P3's work!).

soumith · 2017-10-31T22:16:54Z

@pytorchbot test this please

soumith · 2017-11-01T09:51:30Z

thanks Sean!

SeanNaren · 2017-11-01T12:10:15Z

Oh should probably mention, tensor core ops are only activated when handling matrix sizes that are multiples of 8 (make your hidden sizes multiples of 8)! I'm not sure where the best place to make this obvious is...

… .tolist() in sharding prop Summary: - DTensor sharding propagator's local-shape adjustment for view ops calls _StridedShard.local_shard_size_and_offset, which materialises offsets via .tolist() on a fake index tensor. Under FakeTensorMode that allocates ~131k unbacked SymInts per call, none bound to the returned DTensor's tensor_meta, tripping PendingUnbackedSymbolNotFound at the downstream compute_unbacked_bindings check. Fix: honor existing skip_offset flag inside _StridedShard.local_shard_size_and_offset to skip the .tolist() when offsets aren't needed (the only known leak source). Run 25 narrowed the fix per wconstab-style blocking review by removing the previously-included defensive ignore_fresh_unbacked_symbols() wrap from _sharding_prop.py. Run 26 addresses aditvenk-style blocking by capturing clean lint evidence inside the submitted artifact set (lint_evidence.txt + verbatim block in report.md). End-to-end verified on torchtitan gpt_oss MoE+EP+TP+FSDP compile: #3409 surface error is gone (separate downstream aten.histc-on-DTensor bug is unrelated). The in-tree regression test test_strided_shard_view_unbacked_local under test/distributed/tensor/test_dtensor_compile.py uses a fake PG (no GPU required) and bisect-verifies the fix. - User re-issued the same fix-and-verify instruction. The Run 22-26 fix is - in place in both the job's pytorch worktree - (`~/.ptq_workspace/jobs/20260520-torchtitan-3409/pytorch/`) and the conda Fixes pytorch/torchtitan#3409

The comment previously framed the unbacked-local / backed-global asymmetry as the precondition for the bug. The actual root cause is .tolist() in _StridedShard.local_shard_size_and_offset creating unbacked SymInts during compile-time sharding propagation for any _StridedShard view, regardless of whether the local tensor has backed or unbacked dims. torch.nonzero is just how torchtitan #3409 surfaced it. Co-authored-by: Aditya Venkataraman <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>

… .tolist() in sharding prop Summary: - DTensor sharding propagator's local-shape adjustment for view ops calls _StridedShard.local_shard_size_and_offset, which materialises offsets via .tolist() on a fake index tensor. Under FakeTensorMode that allocates ~131k unbacked SymInts per call, none bound to the returned DTensor's tensor_meta, tripping PendingUnbackedSymbolNotFound at the downstream compute_unbacked_bindings check. Fix: honor existing skip_offset flag inside _StridedShard.local_shard_size_and_offset to skip the .tolist() when offsets aren't needed (the only known leak source). Run 25 narrowed the fix per wconstab-style blocking review by removing the previously-included defensive ignore_fresh_unbacked_symbols() wrap from _sharding_prop.py. Run 26 addresses aditvenk-style blocking by capturing clean lint evidence inside the submitted artifact set (lint_evidence.txt + verbatim block in report.md). End-to-end verified on torchtitan gpt_oss MoE+EP+TP+FSDP compile: #3409 surface error is gone (separate downstream aten.histc-on-DTensor bug is unrelated). The in-tree regression test test_strided_shard_view_unbacked_local under test/distributed/tensor/test_dtensor_compile.py uses a fake PG (no GPU required) and bisect-verifies the fix. - User re-issued the same fix-and-verify instruction. The Run 22-26 fix is - in place in both the job's pytorch worktree - (`~/.ptq_workspace/jobs/20260520-torchtitan-3409/pytorch/`) and the conda Fixes pytorch/torchtitan#3409

The comment previously framed the unbacked-local / backed-global asymmetry as the precondition for the bug. The actual root cause is .tolist() in _StridedShard.local_shard_size_and_offset creating unbacked SymInts during compile-time sharding propagation for any _StridedShard view, regardless of whether the local tensor has backed or unbacked dims. torch.nonzero is just how torchtitan #3409 surfaced it. Co-authored-by: Aditya Venkataraman <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>

… materialization Root cause: DTensor's sharding propagator adjusts local shapes for view ops by calling _StridedShard.local_shard_size_and_offset, which materializes the offsets list via .tolist() on an index tensor. Under FakeTensorMode (e.g. when the op is compiled) that .tolist() allocates one unbacked SymInt per element, none of which are bound to the returned DTensor's tensor_meta. The downstream compute_unbacked_bindings check then trips PendingUnbackedSymbolNotFound. This surfaces on torchtitan #3409 (gpt_oss MoE+EP+TP+FSDP under torch.compile), but the leak is independent of whether local dims are backed or unbacked. Fix: skip the .tolist() when the caller discards the offsets. The previous boolean return_first_offset flag could not express "no offsets needed", so it is replaced by a small _StridedShardOffsetMode enum (FIRST / ALL / NONE). _get_shard_size_and_offsets passes NONE when skip_offset is set, avoiding the unbacked-SymInt allocation entirely. Also removes three stale xfails in test_dtensor_ops.py (linalg.tensorsolve, nn.functional.instance_norm, take_along_dim) that now compile and pass under TestCompiledDTensorOps and were reported as unexpected successes. Fixes pytorch/torchtitan#3409

Added tensor op check for cudnn rnns

52826ba

soumith approved these changes Oct 31, 2017

View reviewed changes

soumith merged commit cf256ee into pytorch:master Nov 1, 2017

SeanNaren deleted the rnn-volta branch November 1, 2017 10:19

ezyang added the open source label Jun 24, 2019

aditvenk mentioned this pull request May 22, 2026

[DTensor] Fix PendingUnbackedSymbolNotFound from _StridedShard offset .tolist() in sharding prop #184945

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tensor Core ops to RNNs for Volta#3409

Add Tensor Core ops to RNNs for Volta#3409
soumith merged 1 commit into
pytorch:masterfrom
SeanNaren:rnn-volta

SeanNaren commented Oct 31, 2017

Uh oh!

soumith commented Oct 31, 2017

Uh oh!

soumith commented Nov 1, 2017

Uh oh!

SeanNaren commented Nov 1, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SeanNaren commented Oct 31, 2017

Uh oh!

soumith commented Oct 31, 2017

Uh oh!

soumith commented Nov 1, 2017

Uh oh!

SeanNaren commented Nov 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SeanNaren commented Nov 1, 2017 •

edited

Loading