maxsim-web

⚡ High-performance MaxSim scoring with WebAssembly and SIMD for ColBERT-style retrieval

🚀 Live Demo • 📊 Performance Report • 📚 API Guide

Features

🚀 3-5x faster than JavaScript - WASM+SIMD optimized implementation 📦 45KB gzipped - Tiny bundle size, 12% smaller than v0.5.0 ⚡ Preloading API - Load documents once, search thousands of times 🎯 Zero dependencies - Pure WASM implementation 🔧 Simple API - Works with any L2-normalized embeddings

Performance (v0.6.0)

Latest benchmarks:

Scenario	Documents	Query Tokens	Performance	vs JavaScript
Variable Large	1000 docs	32 tokens	265ms	3.55x faster 🔥
Variable Medium	1000 docs	13 tokens	134ms	3.34x faster 🔥

With preloading:

Load documents once: ~230ms (one-time cost)
Each search: ~265ms (vs 479ms non-preloaded)
1.81x faster per search + zero conversion overhead
Break-even after 2 searches

See Performance Report for detailed benchmarks and analysis.

What's New in v0.6.0

🚀 Major performance improvement: 2.8-5.2x faster!

Removed cosine similarity, use dot product exclusively (3-5x faster per operation)
Binary size: 51KB → 45KB (12% smaller)
WASM now 3-5x faster than JavaScript (was 1.0-1.2x)
No breaking changes - all APIs work the same

See Release Notes for details.

Installation

npm install maxsim-web

Quick Start

Basic Usage

import { createMaxSim } from 'maxsim-web';

// Initialize (auto-detects best: WASM+SIMD, WASM, or JS fallback)
const maxsim = await createMaxSim();

// Prepare embeddings (must be L2-normalized!)
const queryEmbedding = [[0.1, 0.2, ...], ...];  // [query_tokens, embedding_dim]
const docEmbeddings = [
  [[0.3, 0.4, ...], ...],  // Doc 1
  [[0.5, 0.6, ...], ...],  // Doc 2
];

// Compute MaxSim scores
const scores = maxsim.maxsimBatch(queryEmbedding, docEmbeddings);
console.log(scores);  // Float32Array of scores (one per document)

Preloading API (Recommended for Production)

Use case: Search the same document set repeatedly with different queries

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Step 1: Prepare documents as flat arrays (one-time conversion)
const embeddingDim = 256;
const docTokenCounts = new Uint32Array([doc1.length, doc2.length, ...]);

// Flatten all document embeddings into single Float32Array
const allEmbeddings = new Float32Array(totalTokens * embeddingDim);
// ... copy embeddings into allEmbeddings ...

// Step 2: Load documents (one-time, ~230ms for 1000 docs)
await maxsim.loadDocuments(allEmbeddings, docTokenCounts, embeddingDim);

// Step 3: Search repeatedly (fast! ~265ms per search)
const queryFlat = new Float32Array(queryTokens * embeddingDim);
// ... copy query into queryFlat ...

const scores1 = maxsim.wasmInstance.search_preloaded(queryFlat, queryTokens);
const scores2 = maxsim.wasmInstance.search_preloaded(queryFlat2, queryTokens2);
// ... search 1000s of times with zero conversion overhead!

Performance benefit:

First search: 230ms (load) + 265ms (search) = 495ms
Subsequent searches: 265ms each (vs 479ms non-preloaded)
Recommended for 10+ searches on same document set

API Reference

Main Methods

`maxsimBatch(queryEmbedding, docEmbeddings)`

Compute MaxSim scores for multiple documents (raw sum).

Parameters:

queryEmbedding: Array of query token embeddings [query_tokens][embedding_dim]
docEmbeddings: Array of document embeddings [num_docs][doc_tokens][embedding_dim]

Returns: Float32Array of scores (one per document)

Use case: Ranking documents for a single query

Example:

const query = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]];  // 2 tokens
const docs = [
  [[0.7, 0.8, 0.9]],           // Doc 1: 1 token
  [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]  // Doc 2: 2 tokens
];
const scores = maxsim.maxsimBatch(query, docs);
// scores = Float32Array [score1, score2]

`maxsimBatch_normalized(queryEmbedding, docEmbeddings)`

Same as maxsimBatch but returns averaged scores (score / query_tokens).

Use case: Comparing scores across queries with different lengths

Example:

// Query A: 10 tokens, score = 25.0 → normalized = 2.5
// Query B: 20 tokens, score = 40.0 → normalized = 2.0
// Query B ranks higher when normalized

`loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)`

Load documents for repeated searching (preloading API).

Parameters:

embeddingsFlat: Float32Array - all embeddings concatenated
docTokenCounts: Uint32Array - token count per document
embeddingDim: number - embedding dimension

Returns: Promise<void>

Use case: Search the same documents with 10+ different queries

Example:

const dim = 256;
const docs = [
  new Float32Array(10 * dim),  // Doc 1: 10 tokens
  new Float32Array(20 * dim),  // Doc 2: 20 tokens
];

const allEmbeddings = new Float32Array(30 * dim);
allEmbeddings.set(docs[0], 0);
allEmbeddings.set(docs[1], 10 * dim);

const docTokenCounts = new Uint32Array([10, 20]);

await maxsim.loadDocuments(allEmbeddings, docTokenCounts, dim);

`search_preloaded(queryFlat, queryTokens)`

Search preloaded documents (fast!).

Parameters:

queryFlat: Float32Array - query embeddings (flattened)
queryTokens: number - number of query tokens

Returns: Float32Array of scores

Use case: After calling loadDocuments(), search repeatedly with zero overhead

Example:

const queryFlat = new Float32Array(5 * 256);  // 5 tokens × 256 dims
// ... fill queryFlat with embeddings ...

const scores = maxsim.wasmInstance.search_preloaded(queryFlat, 5);

Utility Methods

`numDocumentsLoaded()`

Get number of preloaded documents.

Returns: number

Example:

console.log(`Loaded ${maxsim.numDocumentsLoaded()} documents`);

`getInfo()`

Get implementation details (SIMD support, version, etc.).

Returns: string

Example:

console.log(maxsim.getInfo());
// "MaxSim WASM v0.6.0 (SIMD: true, adaptive_batching: true, ...)"

Requirements

⚠️ Important: maxsim-web requires L2-normalized embeddings

Modern embedding models (ColBERT, BGE, E5, Jina, etc.) output normalized embeddings by default.

Why this matters:

For L2-normalized embeddings (unit vectors):

dot_product(a, b) === cosine_similarity(a, b)

This allows maxsim-web to use efficient dot product operations (3-5x faster than cosine similarity).

Verify your embeddings are normalized:

function isNormalized(embedding) {
  const magnitude = Math.sqrt(
    embedding.reduce((sum, val) => sum + val * val, 0)
  );
  return Math.abs(magnitude - 1.0) < 0.01;  // Within 1%
}

console.log(isNormalized(embedding));  // Should be true

Browser Compatibility

Browser	WASM+SIMD	WASM	JavaScript
Chrome 91+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Edge 91+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Firefox 89+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Safari 16.4+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline
Node.js 16+	✅ 3-5x faster	✅ 2-3x faster	✅ Baseline

maxsim-web automatically detects and uses the best available implementation.

Use Cases

1. Dense Retrieval (ColBERT-style)

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Embed query and documents
const queryEmb = await embedModel.encode(query);
const docEmbs = await Promise.all(docs.map(doc => embedModel.encode(doc)));

// Rank by MaxSim similarity
const scores = maxsim.maxsimBatch(queryEmb, docEmbs);
const topK = scores
  .map((score, idx) => ({ score, idx }))
  .sort((a, b) => b.score - a.score)
  .slice(0, 10);

console.log('Top 10 documents:', topK);

2. Re-ranking Search Results

import { createMaxSim } from 'maxsim-web';

const maxsim = await createMaxSim();

// Load candidate documents from first-stage retrieval
const candidates = await firstStageSearch(query, { topK: 100 });
const candidateEmbs = candidates.map(doc => doc.embedding);

// Flatten and load
await maxsim.loadDocuments(flattenEmbeddings(candidateEmbs), docTokens, dim);

// Re-rank for each user query variation
const queries = [originalQuery, expandedQuery1, expandedQuery2];
const allScores = queries.map(q =>
  maxsim.wasmInstance.search_preloaded(flattenQuery(q), q.length)
);

// Combine scores (e.g., max, average, weighted)
const finalScores = combineScores(allScores);

3. Semantic Search at Scale

import express from 'express';
import { createMaxSim } from 'maxsim-web';

const app = express();
const maxsim = await createMaxSim();

// Preload document collection (once at startup)
const docs = await loadDocuments();  // e.g., 100K documents
await maxsim.loadDocuments(docs.embeddings, docs.tokenCounts, 256);

// Handle search requests (fast!)
app.get('/search', async (req, res) => {
  const queryEmb = await embedModel.encode(req.query.q);
  const scores = maxsim.wasmInstance.search_preloaded(
    flattenQuery(queryEmb),
    queryEmb.length
  );

  const topK = getTopK(scores, 10);
  res.json({ results: topK.map(idx => docs[idx]) });
});

app.listen(3000);  // ~265ms per search for 1000 docs!

4. Batch Processing Pipeline

import { createMaxSim } from 'maxsim-web';

async function processQueries(queries, documents) {
  const maxsim = await createMaxSim();

  // Load documents once
  await maxsim.loadDocuments(documents.embeddings, documents.tokenCounts, 256);

  // Process all queries (fast!)
  const results = queries.map(query => {
    const scores = maxsim.wasmInstance.search_preloaded(
      query.embedding,
      query.tokens
    );
    return getTopK(scores, 10);
  });

  return results;
}

// Process 1000 queries in ~4.4 minutes (vs 12.4 minutes with v0.5.0)
const results = await processQueries(queries, documents);

Performance Tips

Use preloading for 10+ searches on the same document set
- Break-even after 2 searches
- Maximum benefit at 100+ searches
Pre-flatten embeddings to Float32Array
- Avoid 2D array conversion overhead
- Direct WASM memory access
Use WASM+SIMD browsers for best performance
- Chrome/Edge 91+, Firefox 89+, Safari 16.4+
- 3-5x faster than JavaScript fallback
Batch documents together
- Process multiple documents per call
- Amortize function call overhead
Profile your specific use case
- Use browser DevTools Performance tab
- Measure actual query/document sizes

Examples

See examples/ directory for complete working examples:

basic-usage.js - Simple MaxSim scoring
preloading-api.js - Preloading for repeated searches
colbert-integration.js - Integration with ColBERT models
batch-processing.js - Large-scale batch processing
nodejs-server.js - Express server with MaxSim

Benchmarks

Run benchmarks yourself:

git clone https://github.com/joe32140/maxsim-web
cd maxsim-web
npm install
npm run benchmark
# Open http://localhost:8080/benchmark/

Or view online: MaxSim Web Benchmarks

Development

Build from Source

# Install Rust and wasm-pack
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install wasm-pack

# Clone repository
git clone https://github.com/joe32140/maxsim-web
cd maxsim-web

# Install dependencies
npm install

# Build WASM
cd src/rust
RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web --out-dir ../../dist/wasm

# Run benchmarks
cd ../..
npm run benchmark

Project Structure

maxsim-web/
├── src/
│   ├── rust/           # Rust WASM implementation
│   │   ├── src/lib.rs  # Core MaxSim algorithm
│   │   └── Cargo.toml
│   └── js/             # JavaScript wrappers
│       ├── maxsim-wasm.js
│       ├── maxsim-baseline.js
│       └── maxsim-optimized.js
├── dist/               # Built artifacts
│   └── wasm/           # WASM binaries
├── docs/               # Documentation
├── benchmark/          # Browser benchmarks
└── examples/           # Usage examples

Contributing

Contributions welcome! Please:

Add tests for new features
Run benchmarks to verify performance
Update documentation
Follow code style (rustfmt, prettier)

See CONTRIBUTING.md for details.

License

MIT - see LICENSE file

Citation

If you use maxsim-web in your research, please cite:

@software{maxsim_web,
  title = {maxsim-web: High-performance MaxSim scoring with WebAssembly},
  author = {Hsu, Joe},
  year = {2025},
  email = {joe32140@gmail.com},
  url = {https://github.com/joe32140/maxsim-web},
  version = {0.6.0}
}

Related Projects

fast-plaid - ColBERT search with IVF indexing
ColBERT - Original ColBERT implementation
pylate - Python ColBERT training framework
Vespa ColBERT - Production ColBERT at scale

Acknowledgments

Inspired by ColBERT's MaxSim scoring algorithm (Khattab & Zaharia, 2020)
Built with Rust and wasm-bindgen
SIMD optimizations based on modern browser capabilities
Performance testing inspired by fast-plaid benchmarks

Support

📧 Email: [email protected]
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ⚡ by Joe Hsu (2025)

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
backup/old-fast-wasm		backup/old-fast-wasm
benchmark		benchmark
docs		docs
examples		examples
scripts		scripts
src		src
test		test
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PUBLISHING.md		PUBLISHING.md
README.md		README.md
RELEASE_v0.6.0.md		RELEASE_v0.6.0.md
WASM_HYPER_OPTIMIZATION_REPORT.md		WASM_HYPER_OPTIMIZATION_REPORT.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json

License

joe32140/maxsim-web

Folders and files

Latest commit

History

Repository files navigation

maxsim-web

Features

Performance (v0.6.0)

What's New in v0.6.0

Installation

Quick Start

Basic Usage

Preloading API (Recommended for Production)

API Reference

Main Methods

maxsimBatch(queryEmbedding, docEmbeddings)

maxsimBatch_normalized(queryEmbedding, docEmbeddings)

loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)

search_preloaded(queryFlat, queryTokens)

Utility Methods

numDocumentsLoaded()

getInfo()

Requirements

Browser Compatibility

Use Cases

1. Dense Retrieval (ColBERT-style)

2. Re-ranking Search Results

3. Semantic Search at Scale

4. Batch Processing Pipeline

Performance Tips

Examples

Benchmarks

Development

Build from Source

Project Structure

Contributing

License

Citation

Related Projects

Acknowledgments

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`maxsimBatch(queryEmbedding, docEmbeddings)`

`maxsimBatch_normalized(queryEmbedding, docEmbeddings)`

`loadDocuments(embeddingsFlat, docTokenCounts, embeddingDim)`

`search_preloaded(queryFlat, queryTokens)`

`numDocumentsLoaded()`

`getInfo()`

Packages