Support INT8 mixed-precision training from torchao?

Recently I worked on INT8 mixed-precision training in torchao. The relevant PR is here https://github.com/pytorch/ao/pull/748

Preliminary results show that with torchtitan, it improves speed by 20% on 8x A100 with no noticeable difference in loss curve. See the PR for more details.

Would you be open to add an experimental flag for this in torchtitan? Similar to Float8 training. This can also help to profile and improve INT8 training performance directly in torchtitan for future perf optimization.

cc @msaroufim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support INT8 mixed-precision training from torchao? #578

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support INT8 mixed-precision training from torchao? #578

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions