CMoE

This repository contains a implementation of our "Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts model"

Environment Setup

1.tensorflow=1.15
2.python=3.6.8

Guideline

models

EAR.py Expert Agreement Regularization
EHP.py Expert Homogeneity Penalty
data_augment.py Data augment methods
project_head.py Projection head function
model MoE based model including MMoE, PLE, etc.

Example to run the codes

bash MMOE.sh
bash PLE.sh

Application

The paper "Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture of Experts Model" is applied to the Deepseek MoE model (scheduled for release in August 2024). By introducing orthogonal constraints between the hidden states of different experts, it alleviates issues such as imbalanced expert utilization, representation degradation, and representation collapse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CMoE

Environment Setup

Guideline

models

Example to run the codes

Application

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
model		model
EAR.py		EAR.py
EHP.py		EHP.py
README.md		README.md
data_augment.py		data_augment.py
modeling_deepseek.py		modeling_deepseek.py
project_head.py		project_head.py

BZX667/CMoE

Folders and files

Latest commit

History

Repository files navigation

CMoE

Environment Setup

Guideline

models

Example to run the codes

Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages