FLAME-MoE is a transparent, end-to-end research platform for Mixture-of-Experts (MoE) language models. It is designed to facilitate scalable training, evaluation, and experimentation with MoE architectures. arXiv
Explore our publicly released checkpoints on Hugging Face:
- FLAME-MoE-1.7B-10.3B
- FLAME-MoE-721M-3.8B
- FLAME-MoE-419M-2.2B
- FLAME-MoE-290M-1.3B
- FLAME-MoE-115M-459M
- FLAME-MoE-98M-349M
- FLAME-MoE-38M-100M
Ensure you clone the repository recursively to include all submodules:
git clone --recursive https://github.com/cmu-flame/MoE-Research
cd MoE-ResearchSet up the Conda environment using the provided script:
sbatch scripts/miscellaneous/install.shNote: This assumes you're using a SLURM-managed cluster. Adapt accordingly if running locally.
Use the following SLURM jobs to download and tokenize the dataset:
sbatch scripts/dataset/download.sh
sbatch scripts/dataset/tokenize.shLaunch training jobs for the desired model configurations:
bash scripts/release/flame-moe-1.7b.sh
bash scripts/release/flame-moe-721m.sh
bash scripts/release/flame-moe-419m.sh
bash scripts/release/flame-moe-290m.sh
bash scripts/release/flame-moe-115m.sh
bash scripts/release/flame-moe-98m.sh
bash scripts/release/flame-moe-38m.shTo evaluate a trained model, set the appropriate job ID and iteration number before submitting the evaluation script:
export JOBID=... # Replace with your training job ID
export ITER=... # Replace with the iteration to evaluate (e.g., 11029)
sbatch scripts/evaluate.shAll the loss curves during training for both the scaling law studies and the final releases can be found in the following WandB workspace: https://wandb.ai/haok/flame-moe