This is a simple and easy-to-implement autoregressive Transformer model for sequence generation tasks such as SMILES generation. It is implemented entirely in PyTorch with minimal dependencies.
- Clean and lightweight implementation
- Supports autoregressive (causal) sequence modeling
- Includes loss computation and sampling functions
- Automatically selects device (MPS, CUDA, or CPU)
- Easy to integrate with any custom vocabulary
The model uses:
- Token and positional embeddings
- A Transformer encoder with causal masking
- A linear output layer projecting to vocabulary logits
The TransformerModel class wraps this architecture and provides:
compute_loss()for trainingsample()for autoregressive generation
- Python 3.8+
- PyTorch
- RDKit (optional, for SMILES visualization)