A from-scratch TypeScript implementation of Word2Vec Skip-Gram with Negative Sampling (Mikolov et al. 2013)
demo_word2vec-2026-02-20_12.36.33.mp4
Word2Vec learns a dense vector (embedding) for every word in a corpus so that words used in similar contexts end up close together in vector space.
The skip-gram variant works like this: for every word in the text, look at the surrounding words (the "context window") and train a model to predict those neighbors. Instead of a full softmax over the entire vocabulary (expensive), we use negative sampling — for each real (word, context) pair, we sample k random "negative" words and train the model to distinguish the real pair from the fake ones.
- Tokenizer lowercase words.
- Build vocabulary
- Build an alias table (Walker 1977)
- Initialize embeddings
- Train
pnpm install
pnpm dev # starts the UI at http://localhost:5173
pnpm test # runs all module unit tests
pnpm typecheck # typechecks both module and uiword2vec/
├── module/ @word2vec/module — core library, zero dependencies
│ └── src/
│ ├── word2vec.ts training loop (generator), mostSimilar, analogy
│ ├── math.ts sigmoid, dot, cosineSimilarity
│ ├── tokenizer.ts tokenize, countWords
│ ├── alias.ts Walker's alias method for O(1) sampling
│ ├── projection.ts PCA and t-SNE for 2D visualization
│ ├── types.ts all type definitions
│ └── __tests__/ 32 unit tests (vitest)
└── ui/ @word2vec/ui — Vite + React 19 + Tailwind v4
└── src/
├── workers/ Web Worker for off-thread training
├── hooks/ useTraining (state machine via useReducer)
├── pages/ TrainPage, ExplorePage
└── components/ FileUpload, ConfigPanel, LossChart, EmbeddingMap, etc.
import {
tokenize, countWords,
train, mostSimilar, analogy, getEmbedding,
pcaProject2D, tsneProject2D,
} from "@word2vec/module";const tokens = tokenize("the king rules the kingdom ...");
const wordCounts = countWords(tokens);
// train() is a generator — iterate to drive training
const gen = train(tokens, wordCounts, {
d: 100, // embedding dimensions
L: 5, // context window half-size
k: 10, // negative samples per positive pair
eta: 0.025, // initial learning rate
epochs: 5, // full passes over the corpus
minCount: 2, // discard words with count below this
debug: true, // yield per-pair debug events
});
let model;
for (const event of gen) {
if (event.type === "init") console.log(`Vocab: ${event.vocabSize}`);
if (event.type === "epoch") console.log(`Epoch ${event.epoch} — loss: ${event.avgLoss.toFixed(4)}`);
if (event.type === "done") model = event.model;
}// Top 10 most similar words
mostSimilar(model, "king", 10);
// → [{ word: "queen", similarity: 0.69 }, ...]
// Analogy: king is to queen as man is to ???
analogy(model, "king", "queen", "man", 5);
// → [{ word: "woman", similarity: 0.58 }, ...]
// Raw embedding vector
getEmbedding(model, "king");
// → number[100]
// Sum of target + context embeddings (sometimes better)
getEmbedding(model, "king", true);// PCA — fast, deterministic
const points: [number, number][] = pcaProject2D(model.W);
// t-SNE — slower, better local structure
const tsnePoints = tsneProject2D(model.W, 30, 500, 10);
// perplexity, iterations, learningRate| Parameter | Default | Description |
|---|---|---|
d |
100 | Embedding dimensions |
L |
5 | Max context window half-size |
k |
10 | Negative samples per positive pair |
alpha |
0.75 | Smoothing exponent for P_alpha |
eta |
0.025 | Initial learning rate |
epochs |
5 | Full passes over the corpus |
minCount |
2 | Min word frequency to keep |
debug |
false | Emit per-pair debug events |
- Mikolov et al. — Efficient Estimation of Word Representations in Vector Space (2013)
- Mikolov et al. — Distributed Representations of Words and Phrases and their Compositionality (2013)
- Goldberg & Levy — word2vec Explained (2014)
- Walker — An Efficient Method for Generating Discrete Random Variables with General Distributions (1977)
- van der Maaten & Hinton — Visualizing Data using t-SNE (2008)
