Orama Persistence Generator

A tool for generating compressed persistence files for Orama databases with vector embeddings, designed to be used with the @orama/plugin-data-persistence plugin in browser environments.

Overview

This project generates a .zip file containing a complete Orama database with vector embeddings that can be restored using Orama's data persistence plugin. The generated persistence file uses binary format (Orama's default) for optimal compression, reducing file size from ~12MB (JSON) to ~7MB (binary).

Features

Configurable Embedding Models: Support for OpenAI embedding models and Orama's built-in embeddings
Binary Persistence: Generates compressed binary files for optimal browser performance
Flexible Input: Processes HTML or NJK files from configurable local directories
Browser-Ready: Generated files are optimized for browser environments

Installation

1. Install dependencies

npm install

2. Configure API Keys

# Copy the example file
cp env.example .env

# Edit .env and configure your API keys
OPENAI_API_KEY=your-openai-api-key-here

Configuration

Environment Variables

Create a .env file with the following variables:

# OpenAI API Key (required for OpenAI embeddings)
OPENAI_API_KEY=your-api-key-here

# Embedding Model Configuration
EMBEDDING_MODEL=openai  # Options: 'openai' or 'orama'
OPENAI_MODEL=text-embedding-ada-002  # OpenAI model for embeddings

# Processing Configuration
LANG=en
VERSION=local

Embedding Model Options

OpenAI Models: Uses OpenAI's embedding API with configurable models
Orama Built-in: Uses Orama's native embedding capabilities

Usage

Generate Persistence File (Recommended)

npm run generate-embeddings

Run Directly

node generate-embeddings.js

Run with Custom Parameters

node penpot_chunks_generator.js ../docs/user-guide "**/*.{html,njk}" --out ./public --lang en --version local

Generated Files

The script generates a compressed persistence file in the public/ directory:

penpotRagToolContents.zip: Binary persistence file containing the complete Orama database with embeddings

Data Structure

The generated persistence file contains:

Pages Metadata

{
  "id": "page-slug",
  "path": "relative/path/to/file.html",
  "url": "https://example.com/page/",
  "title": "Page Title",
  "description": "Page description",
  "lang": "en",
  "version": "local",
  "sectionCount": 5,
  "headings": ["H2", "H3", ...],
  "kind": "guide|tutorial|reference",
  "searchableText": "full text for search"
}

Chunks with Embeddings

{
  "id": "page-slug#section-id",
  "pageId": "page-slug",
  "url": "https://example.com/page/#section",
  "sourcePath": "relative/path/to/file.html",
  "lang": "en",
  "version": "local",
  "breadcrumbs": ["Page", "H2 Section", "H3 Section"],
  "sectionLevel": 2,
  "sectionId": "section-id",
  "heading": "Section title",
  "text": "Section content",
  "summary": "Section summary",
  "hasCode": true,
  "codeLangs": ["javascript", "css"],
  "links": [{"text": "Link", "href": "/url"}],
  "images": [{"alt": "Description", "src": "/image.png"}],
  "tokens": 150,
  "embedding": [0.1, 0.2, ...], // Vector of configurable dimensions
  "vectorDim": 1536,
  "searchableText": "full text for embeddings"
}

Advanced Configuration

Script Parameters

docsRoot: Root directory of documentation (default: ../docs/user-guide)
pattern: File pattern to process (default: **/*.{html,njk})
--out: Output directory (default: ./public)
--lang: Language (default: en)
--version: Version (default: local)
--baseUrl: Base URL for links (default: https://example.com/)

Embedding Model Configuration

OpenAI Models

EMBEDDING_MODEL=openai
OPENAI_MODEL=text-embedding-ada-002  # or text-embedding-3-small, text-embedding-3-large

Orama Built-in Embeddings

EMBEDDING_MODEL=orama

Browser Integration

To use the generated persistence file in a browser environment:

import { create, insertMultiple } from '@orama/orama'
import { restore } from '@orama/plugin-data-persistence'

// Load the persistence file
const response = await fetch('./penpotRagToolContents.zip')
const arrayBuffer = await response.arrayBuffer()

// Restore the database
const db = await restore(arrayBuffer)

// Search the restored database
const results = await search(db, {
  term: 'your search query',
  limit: 10
})

Troubleshooting

Error: "Missing OPENAI_API_KEY"

Configure the environment variable:

export OPENAI_API_KEY="your-api-key"

Error: "Documentation not found"

Verify that the documentation is available in ../docs/user-guide/

Dependency Errors

Reinstall dependencies:

npm install

Large File Sizes

The binary format significantly reduces file size compared to JSON. If you need even smaller files, consider:

Reducing the number of processed files
Adjusting chunk sizes
Using different embedding models

Performance Notes

Binary Format: Reduces file size by ~40% compared to JSON
Browser Optimization: Generated files are optimized for browser environments
Memory Efficient: Uses streaming for large file processing
Configurable Compression: Supports different compression levels

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
stylesRAG		stylesRAG
test-docs		test-docs
.gitignore		.gitignore
README.md		README.md
config.penpotRAG.json		config.penpotRAG.json
config.stylesRAG.json		config.stylesRAG.json
embeddings-service.js		embeddings-service.js
env.example		env.example
generate-embeddings.js		generate-embeddings.js
orama-database.js		orama-database.js
package-lock.json		package-lock.json
package.json		package.json
penpot_chunks_generator.js		penpot_chunks_generator.js
test-config.js		test-config.js
test-orama-plugin-embeddings.js		test-orama-plugin-embeddings.js

Folders and files

Latest commit

History

Repository files navigation

Orama Persistence Generator

Overview

Features

Installation

1. Install dependencies

2. Configure API Keys

Configuration

Environment Variables

Embedding Model Options

Usage

Generate Persistence File (Recommended)

Run Directly

Run with Custom Parameters

Generated Files

Data Structure

Pages Metadata

Chunks with Embeddings

Advanced Configuration

Script Parameters

Embedding Model Configuration

OpenAI Models

Orama Built-in Embeddings

Browser Integration

Troubleshooting

Error: "Missing OPENAI_API_KEY"

Error: "Documentation not found"

Dependency Errors

Large File Sizes

Performance Notes

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages