Thanks to visit codestin.com
Credit goes to srashed.com

Sultan Alrashed

Sultan Alrashed

Research engineer at KAUST (in Prof. Francesco Orabona's lab) working on pretraining bilingual Arabic-English LLMs. I've spent the last few years getting my hands dirty with large-scale distributed training! Most recent thing I've done is release a SOTA Arabic pretraining dataset (AraMix).

Previously at SDAIA as a founding member of the ALLaM team, where I helped build Saudi Arabia's flagship Arabic language model.

Experience

Research Engineer
King Abdullah University of Science & Technology (KAUST) Thuwal, SA

Research engineer in the OPTIMAL lab under Prof. Francesco Orabona. Leading development of a bilingual Arabic-English LLM at the 3B parameter and 5T token scale.

Artificial Intelligence Engineer
Saudi Data & Artificial Intelligence Authority (SDAIA) Riyadh, SA

Founding member of the ALLaM team, worked across the full LLM pipeline because of our initial understaffing.

Research Engineering Fellowship
KAUST & SDAIA Partnership Thuwal, SA

Selected for a fellowship program where I focused on AI for education. Built an AI-based learning management system for the Ministry of Education that I presented to the Minister at GAIN 2024. System started piloting in public schools.

Publications & Preprints

First Author EACL Findings 2026
Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning
A contamination-resistant English/Arabic text-game suite with scalable difficulty.
Co-Author AISTATS 2026 | CPAL 2026 Spotlight Track
Beyond the Ideal: Analyzing the Inexact Muon Update
Analysis of Muon's inexact orthogonalized update with performance bounds under an additive-error LMO framework.
First Author Arxiv 2025
AraMix: Recycling, Refiltering, and Deduplicating to Deliver the Largest Arabic Pretraining Corpus
Introduces the largest and best performing Arabic pretraining dataset at 178B tokens. Uses cross-dataset agreement as a quality signal.
First Author Arxiv 2025
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data
Introduces the largest multi-turn, tool-calling, reasoning-inclusive Arabic post-training dataset, at around 2B tokens.
Co-Author Arxiv 2025
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
Wrote the Arabic Saudi dialect portion of a multilingual version of the PIQA benchmark.
Co-Author ICML 2025 World Models Workshop
ReviseQA: A Benchmark for Belief Revision in Multi-Turn Logical Reasoning
Benchmark testing logical consistency under iterative context updates.
First Author Arxiv 2024
Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models
Introduces a machine-translated Arabic corpus derived from FineWeb-Edu (202B tokens) for training and evaluating Arabic language models, the largest at the time.
Solo Author Arxiv 2024
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
Shows that higher LR:batch-size ratios can boost reasoning in small LMs. Achieved the highest IFEval score of any sub-3B model at release.
Co-Author ICLR 2025
ALLaM: A Series of Large Language Models for Arabic and English
SDAIA's bilingual Arabic/English pretrained LLM series. Initially was on pretraining, later focused on finetuning and alignment as the team scaled.
Co-Author ACL Main 2024
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
A deep dive into the sensitivity of LLM benchmarks and evaluations to minor structural perturbations, many results contributed to LM-Harness.

Projects & Volunteering

PyTorch Captum Contributions Improved support for LLMs in PyTorch's Captum repository, adding support for a larger range of models and tasks.
Megatron-Deepspeed Contributions Fixed backwards compatibility bug.
Lighteval Contributions Added quantization support for vLLM models.
Nanotron Contributions Fixed bugs to get pretraining example in docs to work.
Next-Token Agent A project focused on pretraining and finetuning tiny language models to solve ASCII games by predicting each successive frame, able to perfectly solve mazes from (frozenlake).
Environment Encoder A proposal and implementation of an idea to train a reinforcement learning agent to play games given a vision-language model's embeddings, instead of frames.
Reinforcement Learning Roguelike Solver For my university honours project, I wrote PPO, DQN, and A2C agents to compare effects of perfect and imperfect information on my CLI game.
Cheatsheet A never finished project that implements a new augmentation and objective for learning classification for computer vision models.