Implement auxiliary-loss-free load balancing #11031

lshpku · 2025-08-29T07:46:54Z

PR types

New features

PR changes

Models

Description

实现了 DeepseekV3 论文所述的 Auxiliary-Loss-Free Load Balancing 机制

该机制在 MoE gating score 上增加了一个统计性的 expert-wise bias，对于一个 expert，若它在上个 batch 中访问量高于平均，则减少它的 bias，反之增加它的 bias，以此实现负载均衡

由于该 bias 是统计性的，不需要梯度，因此我在代码中添加了必要的 no_grad 和 detach 进行修饰

参数 alpha 和 gamma 已根据论文设置如下：

实验效果

通过 log 确认，bias 确实按预期在更新，高于平均的 expert 会减小 bias，低于平均的 expert 会增加 bias

使用【PP4EP8 29层32专家】配置进行冷启的访问量热力图：

可以看到，Aux-Free-Loss 在 300 step 开始就进入了一种非常均衡的状态，而默认 loss 一直有比较严重的均衡问题

使用【PP4EP8 29层64专家】配置进行热启的访问量热力图：

由于热启的权重已经经过训练，因此不均衡阶段应当比较短，Aux-Free-Loss 基本 100 step 就进入了均衡状态，而默认 loss 的均衡性则越来越差，然后很快因为 OOM 而终止

代码规范

由于 bias 不是通过梯度更新的，不能走 optimizer，因此我使用了 Callback 的方式来实现更新

我新增的 MoECorrectionBiasAdjustCallback 考虑了通用性，并不局限于 DSV3 模型，任何 MoE 模型只要用了 topk_method == "noaux_tc" 的方式都可以用

根据 DSV3 论文，bias 的 lr 在前 14.3T 个 token 为 0.001，之后则为 0.0，这里我不那么严谨地将其固定为 0.001，如果用户真的需要训练超过 14.3T 个 token，再由用户自己来调整

paddle-bot · 2025-08-29T07:47:00Z

Thanks for your contribution!

lshpku closed this Sep 2, 2025

lshpku force-pushed the e-score-correction-bias branch from 3e6510c to 3737315 Compare September 2, 2025 02:12

lshpku reopened this Sep 2, 2025

lshpku force-pushed the e-score-correction-bias branch 2 times, most recently from d163cf0 to 9705c67 Compare September 8, 2025 05:07

lshpku force-pushed the e-score-correction-bias branch from 9705c67 to 626b586 Compare September 17, 2025 07:35

Implement auxiliary-loss-free load balancing

997e8f9

lshpku force-pushed the e-score-correction-bias branch from 626b586 to 997e8f9 Compare September 17, 2025 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement auxiliary-loss-free load balancing #11031

Implement auxiliary-loss-free load balancing #11031

Uh oh!

lshpku commented Aug 29, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 29, 2025

Uh oh!

Uh oh!

Implement auxiliary-loss-free load balancing #11031

Are you sure you want to change the base?

Implement auxiliary-loss-free load balancing #11031

Uh oh!

Conversation

lshpku commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

实验效果

代码规范

Uh oh!

paddle-bot bot commented Aug 29, 2025

Uh oh!

Uh oh!

lshpku commented Aug 29, 2025 •

edited

Loading