Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.

License

Notifications You must be signed in to change notification settings

phonism/genesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Genesis

Genesis

License Python CUDA

A high-performance deep learning framework with educational clarity

πŸ“š Documentation | πŸš€ Quick Start | πŸ“Š Performance


Overview

Genesis is a modern deep learning framework built from scratch, combining production-level performance with educational transparency. Featuring Triton-optimized kernels, automatic differentiation, and comprehensive neural network modules, Genesis serves both as a learning resource and a practical training framework.

Key Features

Core Capabilities

  • πŸ”₯ High Performance: Triton-optimized GPU kernels achieving near-native performance
  • ⚑ Automatic Differentiation: Dynamic computational graph with full gradient support
  • 🧠 Neural Networks: Complete module library including transformers and attention mechanisms
  • 🎯 Mixed Precision: AMP support with FP16/BF16 training
  • πŸš€ Distributed Training: Multi-GPU training with NCCL backend
  • πŸ“¦ Model Support: Built-in LLM implementations (Qwen) with training pipelines

Technical Highlights

  • Modular backend system with clean CPU/CUDA separation
  • Advanced CUDA memory management with pooling and statistics
  • Unified operation dispatch routing to optimal implementations
  • Complete optimizer suite (Adam, AdamW, SGD) with schedulers
  • Production-ready training pipeline with checkpointing

Performance

Genesis delivers competitive performance through hand-optimized Triton kernels:

Operation Efficiency vs Reference
Matrix Multiplication ~95%
Softmax ~112%
LayerNorm ~120%
Multi-Head Attention ~97%

Benchmarked on NVIDIA A100 GPU

Quick Start

Installation

# Clone repository
git clone https://github.com/phonism/genesis.git
cd genesis

# Install (CPU only)
pip install -e .

# Install with LLM support
pip install -e ".[llm]"

# Verify installation
python -c "import genesis; print(genesis.__version__)"

Basic Usage

import genesis
import genesis.nn as nn
import genesis.optim as optim

# Define model
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.fc1(x).relu()
        x = self.dropout(x)
        return self.fc2(x)

# Training setup
model = Net()
optimizer = optim.AdamW(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Training loop
for data, target in dataloader:
    output = model(data)
    loss = criterion(output, target)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Mixed Precision Training

from genesis.cuda import amp

scaler = amp.GradScaler()

for data, target in dataloader:
    with amp.autocast():
        output = model(data)
        loss = criterion(output, target)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    optimizer.zero_grad()

Distributed Training

# Single command for multi-GPU training
torchrun --nproc_per_node=4 train.py
import genesis.distributed as dist

# Initialize
dist.init_process_group(backend='nccl')

# Wrap model
from genesis.distributed import DistributedDataParallel as DDP
model = DDP(model)

# Train normally - gradients synchronized automatically

Architecture

genesis/
β”œβ”€β”€ tensor.py              # Core tensor with autograd
β”œβ”€β”€ function.py            # Autodiff functions
β”œβ”€β”€ backends/              # CPU/CUDA implementations
β”‚   β”œβ”€β”€ cpu.py
β”‚   β”œβ”€β”€ cuda.py
β”‚   └── cuda_memory.py
β”œβ”€β”€ ops/                   # Operation dispatch
β”œβ”€β”€ nn/                    # Neural network modules
β”‚   β”œβ”€β”€ modules/          # Layer implementations
β”‚   β”œβ”€β”€ functional.py     # Functional operations
β”‚   └── triton_ops/       # Optimized kernels
β”œβ”€β”€ optim/                # Optimizers
β”œβ”€β”€ distributed/          # Multi-GPU support
└── cuda/                 # CUDA utilities & AMP

Examples

Train Qwen LLM

cd apps/llm
python train_sft_qwen.py --amp --dtype fp16

Interactive Chat

cd apps/llm
python chat_qwen.py --checkpoint model.pth

Benchmarks

python benchmark/bench_matmul.py
python benchmark/bench_qwen_training.py

Documentation

Testing

# Run test suite
pytest tests/ -v

# With coverage
pytest tests/ --cov=genesis --cov-report=html

Contributing

We welcome contributions! Genesis is designed to be hackable and educational.

# Development setup
pip install -e ".[dev]"
black genesis/ && isort genesis/
pytest tests/

See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Genesis builds on ideas from PyTorch, Triton, TinyGrad, and JAX. We thank these projects for their inspiration and the deep learning community for their support.


Built for deep learning researchers and practitioners

⭐ Star us on GitHub if you find Genesis useful!

About

Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published