FLAME-MoE 🔥: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

FLAME-MoE is a transparent, end-to-end research platform for Mixture-of-Experts (MoE) language models. It is designed to facilitate scalable training, evaluation, and experimentation with MoE architectures. arXiv

🔗 Model Checkpoints

Explore our publicly released checkpoints on Hugging Face:

🚀 Getting Started

1. Clone the Repository

Ensure you clone the repository recursively to include all submodules:

git clone --recursive https://github.com/cmu-flame/MoE-Research
cd MoE-Research

2. Set Up the Environment

Set up the Conda environment using the provided script:

sbatch scripts/miscellaneous/install.sh

Note: This assumes you're using a SLURM-managed cluster. Adapt accordingly if running locally.

📚 Data Preparation

3. Download and Tokenize the Dataset

Use the following SLURM jobs to download and tokenize the dataset:

sbatch scripts/dataset/download.sh
sbatch scripts/dataset/tokenize.sh

🧠 Training

4. Train FLAME-MoE Models

Launch training jobs for the desired model configurations:

bash scripts/release/flame-moe-1.7b.sh
bash scripts/release/flame-moe-721m.sh
bash scripts/release/flame-moe-419m.sh
bash scripts/release/flame-moe-290m.sh
bash scripts/release/flame-moe-115m.sh
bash scripts/release/flame-moe-98m.sh
bash scripts/release/flame-moe-38m.sh

📈 Evaluation

5. Evaluate the Model

To evaluate a trained model, set the appropriate job ID and iteration number before submitting the evaluation script:

export JOBID=...    # Replace with your training job ID
export ITER=...     # Replace with the iteration to evaluate (e.g., 11029)
sbatch scripts/evaluate.sh

6. WandB Workspace

All the loss curves during training for both the scaling law studies and the final releases can be found in the following WandB workspace: https://wandb.ai/haok/flame-moe

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
Megatron-LM @ cbaf684		Megatron-LM @ cbaf684
TransformerEngine @ fc03478		TransformerEngine @ fc03478
analysis		analysis
apex @ c02c6c8		apex @ c02c6c8
configs		configs
empirical_analysis		empirical_analysis
lm-evaluation-harness @ 0c8c0d8		lm-evaluation-harness @ 0c8c0d8
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLAME-MoE 🔥: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

🔗 Model Checkpoints

🚀 Getting Started

1. Clone the Repository

2. Set Up the Environment

📚 Data Preparation

3. Download and Tokenize the Dataset

🧠 Training

4. Train FLAME-MoE Models

📈 Evaluation

5. Evaluate the Model

6. WandB Workspace

About

Uh oh!

Releases

Packages

Contributors 2

Languages

cmu-flame/FLAME-MoE

Folders and files

Latest commit

History

Repository files navigation

FLAME-MoE 🔥​: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

🔗 Model Checkpoints

🚀 Getting Started

1. Clone the Repository

2. Set Up the Environment

📚 Data Preparation

3. Download and Tokenize the Dataset

🧠 Training

4. Train FLAME-MoE Models

📈 Evaluation

5. Evaluate the Model

6. WandB Workspace

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

FLAME-MoE 🔥: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

Packages