Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ChangseokSong
Copy link

@ChangseokSong ChangseokSong commented Nov 4, 2025

Motivation

This PR adds a feature to dump both auxiliary and last hidden states after a request is finished.
It is mainly intended for SpecForge offline training and this feature can also be useful for collecting data from real user requests for training purposes.

The existing --enable-return-hidden-states option only returned the last hidden state and caused slowdowns due to frequent device-to-host copies.

Modifications

  • hidden states (aux and last) are concat and copied to host at only once after the request is done.
  • dump is performed in a round robin way across TP ranks.
  • added a dump stream for concat and DtoH comm of hidden states which can overlap with the main inference job
  • added a dump process pool that actually saves data to disk

Accuracy Tests

Benchmarking and Profiling

Checklist

@ChangseokSong ChangseokSong changed the title dump hidden states for SpecForge eagle training dump hidden states for SpecForge eagle training (do not merge) Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants