A tool for generating compressed persistence files for Orama databases with vector embeddings, designed to be used with the @orama/plugin-data-persistence plugin in browser environments.
This project generates a .zip file containing a complete Orama database with vector embeddings that can be restored using Orama's data persistence plugin. The generated persistence file uses binary format (Orama's default) for optimal compression, reducing file size from ~12MB (JSON) to ~7MB (binary).
- Configurable Embedding Models: Support for OpenAI embedding models and Orama's built-in embeddings
- Binary Persistence: Generates compressed binary files for optimal browser performance
- Flexible Input: Processes HTML or NJK files from configurable local directories
- Browser-Ready: Generated files are optimized for browser environments
npm install# Copy the example file
cp env.example .env
# Edit .env and configure your API keys
OPENAI_API_KEY=your-openai-api-key-hereCreate a .env file with the following variables:
# OpenAI API Key (required for OpenAI embeddings)
OPENAI_API_KEY=your-api-key-here
# Embedding Model Configuration
EMBEDDING_MODEL=openai # Options: 'openai' or 'orama'
OPENAI_MODEL=text-embedding-ada-002 # OpenAI model for embeddings
# Processing Configuration
LANG=en
VERSION=local- OpenAI Models: Uses OpenAI's embedding API with configurable models
- Orama Built-in: Uses Orama's native embedding capabilities
npm run generate-embeddingsnode generate-embeddings.jsnode penpot_chunks_generator.js ../docs/user-guide "**/*.{html,njk}" --out ./public --lang en --version localThe script generates a compressed persistence file in the public/ directory:
penpotRagToolContents.zip: Binary persistence file containing the complete Orama database with embeddings
The generated persistence file contains:
{
"id": "page-slug",
"path": "relative/path/to/file.html",
"url": "https://example.com/page/",
"title": "Page Title",
"description": "Page description",
"lang": "en",
"version": "local",
"sectionCount": 5,
"headings": ["H2", "H3", ...],
"kind": "guide|tutorial|reference",
"searchableText": "full text for search"
}{
"id": "page-slug#section-id",
"pageId": "page-slug",
"url": "https://example.com/page/#section",
"sourcePath": "relative/path/to/file.html",
"lang": "en",
"version": "local",
"breadcrumbs": ["Page", "H2 Section", "H3 Section"],
"sectionLevel": 2,
"sectionId": "section-id",
"heading": "Section title",
"text": "Section content",
"summary": "Section summary",
"hasCode": true,
"codeLangs": ["javascript", "css"],
"links": [{"text": "Link", "href": "/url"}],
"images": [{"alt": "Description", "src": "/image.png"}],
"tokens": 150,
"embedding": [0.1, 0.2, ...], // Vector of configurable dimensions
"vectorDim": 1536,
"searchableText": "full text for embeddings"
}docsRoot: Root directory of documentation (default:../docs/user-guide)pattern: File pattern to process (default:**/*.{html,njk})--out: Output directory (default:./public)--lang: Language (default:en)--version: Version (default:local)--baseUrl: Base URL for links (default:https://example.com/)
EMBEDDING_MODEL=openai
OPENAI_MODEL=text-embedding-ada-002 # or text-embedding-3-small, text-embedding-3-largeEMBEDDING_MODEL=oramaTo use the generated persistence file in a browser environment:
import { create, insertMultiple } from '@orama/orama'
import { restore } from '@orama/plugin-data-persistence'
// Load the persistence file
const response = await fetch('./penpotRagToolContents.zip')
const arrayBuffer = await response.arrayBuffer()
// Restore the database
const db = await restore(arrayBuffer)
// Search the restored database
const results = await search(db, {
term: 'your search query',
limit: 10
})Configure the environment variable:
export OPENAI_API_KEY="your-api-key"Verify that the documentation is available in ../docs/user-guide/
Reinstall dependencies:
npm installThe binary format significantly reduces file size compared to JSON. If you need even smaller files, consider:
- Reducing the number of processed files
- Adjusting chunk sizes
- Using different embedding models
- Binary Format: Reduces file size by ~40% compared to JSON
- Browser Optimization: Generated files are optimized for browser environments
- Memory Efficient: Uses streaming for large file processing
- Configurable Compression: Supports different compression levels
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details