FemtoFlow is a lightweight deep learning library built entirely from scratch using Python and NumPy. It aims to provide a clear and understandable implementation of core neural network components, suitable for educational purposes and experimentation.
The library focuses on implementing essential building blocks like layers, optimizers, loss functions, and metrics, allowing users to construct and train various neural network architectures. It also includes implementations of more advanced concepts crucial for modern deep learning, particularly in Natural Language Processing.
- Pure NumPy Implementation: Built entirely using NumPy for numerical operations, making the underlying mechanics transparent.
- Modular Design: Easily extensible architecture based on
LayerandNetworkclasses. - Core Components:
- Layers: Base
Layer,MetaLayer(sequential container),Activation,Dense1D,Dense2D,Embedding,PositionalEmbedding,LayerNormalisation,InvertedDropout. - Optimizers: Base optimizer interface integrated into layers (ADAM implementation mentioned, likely present in
femto_flow.optimizers). - Loss Functions: Interface for loss functions and their derivatives (CCE shown in example).
- Metrics: Interface for evaluation metrics (Categorical Accuracy shown in example).
- Layers: Base
- Advanced Features:
- Multi-Head Self-Attention: Implementation (
MultiHeadSelfAttention) crucial for Transformer architectures. - Byte-Pair Encoding (BPE): A
BytePairTokenizerclass for subword tokenization. - Vectorizer: Simple vocabulary mapping and sequence vectorization.
- Positional Embeddings: Standard sinusoidal positional encodings combined with token embeddings.
- Layer Normalization: Standard layer normalization implementation.
- Weight Initialization: Xavier/Glorot uniform initialization used in Dense/Embedding layers.
- Learning Rate Schedules: Support for dynamic learning rates (Example shows
ExponentialDecayScheduleand basicLearningRateSchedule).
- Multi-Head Self-Attention: Implementation (
- Training Loop: A flexible
fitmethod in theNetworkclass handling batching, epochs, validation, callbacks, and gradient clipping.
The project is structured around several key modules (based on imports and typical structure):
femto_flow/layers.py: Contains definitions for all neural network layers. Each layer implementsforwardandbackwardmethods.femto_flow/network.py: Defines theNetworkclass, responsible for assembling layers, managing the training process (fit), and making predictions (predict).femto_flow/tokenizers.pyContains theBytePairTokenizerandVectorizerclasses for text processing.femto_flow/optimizers.py: Contains optimizer implementations (like ADAM) and learning rate schedules.femto_flow/losses.py: Defines loss functions (like CCE) and their derivatives.femto_flow/activations.py: Defines activation functions (like Softmax, Swish) and their derivatives.femto_flow/metrics.py: Defines evaluation metrics (like Categorical Accuracy).femto_flow/callbacks.py: Contains callback classes for use during training (e.g.,PrintLRCallback,SaveOnProgressCallback).demos/: Contains example scripts showcasing how to use the library (like the provided generative C model).
- Clone the repository:
git clone https://github.com/TQCB/femto-flow cd femto_flow - Install dependencies:
pip install numpy regex
Here's a simplified example based on the provided main.py, showing how to define and train a Transformer-like model:
import femto_flow as ff
import femto_flow.layers as l
import numpy as np
# --- Configuration ---
output_size = 64
embed_size = 128
seq_len = 8
n_heads = 4
n_transformers = 3
dropout_rate = 0.1 # Using dropout now requires InvertedDropout
# --- Load Data (Example) ---
# x_train, y_train = load_your_batched_data(...)
# x_val, y_val = load_your_validation_data(...)
# Ensure data is NumPy arrays with shape (num_batches, batch_size, seq_len)
# --- Model Definition ---
model = ff.network.Network()
def create_transformer():
# Example Transformer Block structure
return l.MetaLayer([
l.MultiHeadSelfAttention(input_dim=embed_size, n_dim=embed_size, n_heads=n_heads),
l.LayerNormalisation(embed_size),
# FeedForward part
l.Dense2D(input_dim=embed_size, output_dim=embed_size * 2), # Example expansion
l.Activation(ff.activations.Swish), # Or ReLU, GeLU etc.
l.InvertedDropout(dropout_rate),
l.Dense2D(input_dim=embed_size * 2, output_dim=embed_size),
l.InvertedDropout(dropout_rate),
l.LayerNormalisation(embed_size), # Often applied after residual connection
])
# Input Embedding + Positional Encoding
model.add(l.PositionalEmbedding(seq_len=seq_len, output_dim=embed_size, vocab_size=output_size))
model.add(l.InvertedDropout(dropout_rate)) # Dropout after embedding
# Transformer Blocks
for _ in range(n_transformers):
model.add(create_transformer())
# Final Layers for Classification/Generation
model.add(l.MultiHeadSelfAttention(input_dim=embed_size, n_dim=embed_size, n_heads=n_heads, return_sequences=False)) # Use last token output
model.add(l.LayerNormalisation(embed_size))
model.add(l.Dense1D(input_dim=embed_size, output_dim=output_size))
model.add(l.Activation(ff.activations.Softmax)) # Output probabilities
# --- Build & Train ---
lr_schedule = ff.optimizers.LearningRateSchedule(1e-3) # Or use a decay schedule
optimizer = ff.optimizers.AdamOptimizer # Or another optimizer
model.build(loss=ff.losses.cce,
d_loss=ff.losses.d_cce,
metric=ff.metrics.categorical_accuracy,
optimizer=optimizer,
learning_rate_schedule=lr_schedule)
print(f"Parameter count: {model.param_count:,.0f}")
# Define callbacks if needed
# save_cb = ff.callbacks.SaveOnProgressCallback('checkpoints')
# model.fit(x_train, y_train,
# epochs=50,
# x_val=x_val, y_val=y_val,
# validation=True,
# callbacks=[save_cb],
# batch_print_steps=10)- Implement dropout layer: the standard Dropout layer currently raises NotImplementedError.
- Refine BPE tokenizer: transform method in BytePairTokenizer using Trie needs thorough testing and potentially refinement for edge cases and efficiency. Add encoding/decoding pipeline methods.
- Add more optimizers:
- SGD (with momentum)
- RMSprop
- AdaGrad
- Expand loss functions
- Add more activation functions
- GELU
- Implement more layer types:
- Convolutional layers (Conv1D, Conv2D)
- Pooling layers (MaxPooling, AveragePooling)
- Recurrent layers (RNN, LSTM, GRU) - might be challenging with just NumPy but possible.
- Model serialization: add functionality to save and load trained model weights and architecture.
- Unit testing: develop a comprehensive test suite to ensure correctness of layers, optimizers, and training process.
- Documentation: improve docstrings and potentially add Sphinx documentation.
- Input validation: Add more robust checks for input shapes and types in layers.
- Regularization: implement L1/L2 weight regularization options.
- Explore multi-latent Attention
Not currently in need of contributions but they are always welcome.