Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SmallDoges/small-doge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SmallDoges

SmallDoge: Ultra-Fast Small Language Models

Train a 20M parameter language model in just 3 hours! ๐Ÿš€

SmallDoge is a family of dynamic, ultra-fast small language models designed for efficiency and accessibility.

โœจ Key Features

  • ๐Ÿš€ Ultra-Fast Training: 3-hour training for 20M models
  • ๐Ÿ’ก Innovative Architecture: Dynamic Mask Attention + Cross Domain MoE
  • ๐ŸŽ๏ธ Lightning Inference: 142 tokens/s on i7-11 CPU
  • ๐Ÿ”ง Complete Toolkit: Pre-training โ†’ Instruction Fine-tuning โ†’ Reasoning Fine-tuning
  • ๐ŸŒ Web Interface: Built-in chat interface and OpenAI-compatible API
Doge-60M-Instruct demo
Webui-Doge-320M-Instruct running on i7-11 CPU

๐Ÿš€ Quick Start

Installation

git clone https://github.com/SmallDoges/small-doge.git
cd small-doge
pip install -e .

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
model_name = "SmallDoge/Doge-60M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "Explain machine learning in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Web Interface

# Install WebUI
pip install -e '.[webui]'

# Launch interface
small-doge-webui

Access: http://localhost:7860 (Frontend) | http://localhost:8000 (API)

๐Ÿ“– Detailed guides: Quick Start | Installation | Training

๐Ÿ“Š Available Models

Model Size Speed (i7-11 CPU) MMLU Use Case
Doge-20M 20M 142 tok/s 25.4 Ultra-fast prototyping
Doge-60M 60M 62 tok/s 26.4 Balanced performance
Doge-160M 160M 28 tok/s 29.2 Better reasoning
Doge-320M 320M 16 tok/s 33.8 Production ready

Instruction Models: Add -Instruct to any model name for chat-optimized versions.

Checkpoints: Add -checkpoint for continued training (see Model Docs).

๐Ÿ—๏ธ Architecture

Doge Architecture

Key Innovations:

  • Dynamic Mask Attention: Dynamic attention mechanism for efficient long sequences
  • Cross Domain Mixture of Experts: Sparse experts with dense-to-sparse continuation training
  • WSD Scheduler: Warmup-Stable-Decay for seamless checkpoint resumption

๐ŸŽ“ Training Pipeline

SmallDoge supports complete three-stage training:

  1. Pre-training โ†’ Base models (Doge-Base)
  2. Instruction Fine-tuning โ†’ Chat models (Doge-Instruct)
  3. Reasoning Fine-tuning โ†’ Reasoning models (Doge-Reason)

Key Features:

  • ๐Ÿš€ One-stop processor: Unified data handling across all stages
  • ๐Ÿ”ง Flexible recipes: Pre-configured training configs
  • ๐Ÿ“Š Efficient training: Optimized for small models
  • ๐Ÿ”„ Seamless continuation: WSD scheduler for checkpoint resumption

Training Times (RTX 4090):

  • Doge-20M: 14 hours | Doge-60M: 128 hours | Doge-160M: 522 hours | Doge-320M: 1856 hours

๐Ÿ“š Learn more: Training Guide

๐Ÿ“ˆ Evaluation Results

Base Models

Model MMLU ARC PIQA HellaSwag Winogrande
Doge-20M 25.4 29.8 58.4 27.3 50.2
Doge-60M 26.4 37.9 61.4 31.5 50.8
Doge-160M 29.2 44.4 70.1 43.4 52.2
Doge-320M 33.8 52.1 73.9 52.7 55.0

Instruction Models

Model IFEval MMLU BBH Performance
Doge-20M-Instruct 7.3 26.3 18.3 Good for basic chat
Doge-60M-Instruct 7.4 27.5 27.7 Balanced chat model
Doge-160M-Instruct 16.8 29.7 29.1 Advanced reasoning

๐Ÿ” Evaluation toolkit: Evaluation Guide

๐Ÿ› ๏ธ Use Cases

  • ๐Ÿค– Edge AI: Deploy on resource-constrained devices
  • ๐ŸŽฎ Gaming: Real-time NPC dialogue and game mechanics
  • ๐Ÿ“ฑ Mobile Apps: On-device AI assistants
  • ๐Ÿ”ฌ Research: Fast prototyping and experimentation
  • ๐Ÿ“š Education: Learning AI/ML with manageable models
  • ๐Ÿญ Industry: Lightweight production deployments

๐Ÿ“ฆ Project Structure

small-doge/
โ”œโ”€โ”€ src/small_doge/          # Core implementation
โ”‚   โ”œโ”€โ”€ models/              # Model architectures  
โ”‚   โ”œโ”€โ”€ trainer/             # Training code
โ”‚   โ”œโ”€โ”€ processor/           # Data processing
โ”‚   โ””โ”€โ”€ webui/               # Web interface
โ”œโ”€โ”€ recipes/                 # Training recipes
โ”‚   โ””โ”€โ”€ doge/                # Doge model configs
โ”œโ”€โ”€ examples/                # Tutorials & examples
โ”œโ”€โ”€ evaluation/              # Evaluation toolkit
โ”œโ”€โ”€ docs/                    # Documentation
โ””โ”€โ”€ assets/                  # Images & resources

๐Ÿค Contributing

We welcome contributions! Here's how you can help:

  • ๐Ÿ› Report bugs: GitHub Issues
  • ๐Ÿ’ก Suggest features: Discussions
  • ๐Ÿ“š Improve docs: Submit PRs for documentation
  • ๐Ÿ‹๏ธ Share models: Contribute trained models and recipes
  • ๐Ÿ’ฌ Join community: Discord

๐Ÿ“š Documentation

๐Ÿ“„ Citation

@misc{smalldoges2025,
    title={SmallDoges: A Family of Dynamic Ultra-Fast Small Language Models}, 
    author={Jingze Shi and Yifan Wu and Bingheng Wu and Yuyu Luo},
    year={2025},
    month={March},
    url={https://github.com/SmallDoges/small-doge}
}

๐Ÿ“„ License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.


Built with โค๏ธ by the SmallDoge Team

Star History

Give us a โญ if you find SmallDoge helpful!