Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Atry/hpcnc

Repository files navigation

HPCNC (Hippocampus & Neocortex)

RWKV7-1.5B Training Benchmarks

Tested on NVIDIA GeForce RTX 3060 (12 GB VRAM) with full parameter training (no LoRA).

Training Results (10.5 GB VRAM limit)

Model Optimizer Max Context Throughput Memory Breakdown
RWKV7-1.5B Per-param BF16 AdamW 3,072 1,174 tok/s Model 2.9 GB + Opt 5.7 GB
RWKV7-1.5B Per-param 8-bit AdamW 7,168 1,320 tok/s Model 2.9 GB + Opt ~3 GB
RWKV7-2.9B Per-param SGD 7,168 716 tok/s Model 5.5 GB + Opt 0 GB
RWKV7-2.9B Per-param 8-bit Lion 3,072 598 tok/s Model 5.5 GB + Opt ~2.8 GB

Note: Standard AdamW OOMs - optimizer states alone require ~11.6 GB for 1.5B model.

Memory-Efficient Training Techniques

  1. Gradient Checkpointing (grad_cp=1): Recompute activations during backward

    • Saves ~7x activation memory
    • Faster at high memory utilization (less allocation overhead)
  2. Per-Parameter Optimizer: Run optimizer step during backward via register_post_accumulate_grad_hook

    • Each parameter is updated immediately when its gradient is computed
    • Gradient is freed right after update (param.grad = None)
    • Avoids storing any gradients in VRAM - only one gradient exists at a time
  3. Infinite Context Mode (train_type="infctx"): This project always trains with infinite context length

    • Model names like rwkv7-g1a-0.1b-20250728-ctx4096 indicate the original training context length
    • This does NOT limit inference or training context - we always use unbounded context
    • Both ctx_len=sys.maxsize and chunk_ctx=sys.maxsize are set to allow arbitrarily long sequences

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published