Thanks to visit codestin.com
Credit goes to github.com

Skip to content

stepfun-ai/Step3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StepFun: Cost-Effective Multimodal Intelligence

Chat Homepage
Hugging Face ModelScope Twitter Follow
Discord License
📰  Step3 Model Blog     |     📄  Step3 System Tech Report

Introduction

Step3 is our cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Step3 model card:

Config Value
Number of Layers (Dense layer included) 61
Number of Dense Layers 5
Hidden Dimension 7168
Attention Mechanism MFA
Low-rank Query Dimension 2048
Number of Query Heads 64
Head Dimension 256
Number of Experts 48
Selected Experts per Token 3
Number of Shared Experts 1
Max Context Length 65536
Tokenizer Deepseek V3
Total Parameters (LLM) 316B
Activated Params per Token 38B
Total Parameters (VLM) 321B

Evaluation Results

Deployment

You can access Step3's API on https://platform.stepfun.com/ , we provide OpenAI/Anthropic-compatible API for you.

Our model checkpoints are stored in bf16 and block-fp8 format, you can find it on Huggingface.

Currently, it is recommended to run Step3 on the following inference engines:

  • vLLM
  • SGLang

Deployment and Request examples for vLLM and SGLang can be found in the Model Deployment Guide.

Contact Us

If you have any questions, please reach out at [email protected] .

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{step3system,
      title={Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding}, 
      author={StepFun Team},
      year={2025},
      eprint={2507.19427},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.19427}, 
}

@misc{step3blog,
      title={Step3: Cost-Effective Multimodal Intelligence}, 
      author={StepFun Team},
      url={https://stepfun.ai/research/step3}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •