HyperView is an open‑source curation engine that lets teams explore, clean, and balance multimodal datasets at million‑sample scale on modest hardware.
🔴 Try the Interactive Visualization
Modern AI curation tools rely almost exclusively on Euclidean embeddings (Cosine similarity, L2 distance). While effective for flat data, Euclidean space has a fundamental mathematical flaw when dealing with the complex, hierarchical data found in the real world (biology, medical imaging, social demographics): volume grows polynomially (
As datasets grow, the space fills up. To fit a massive "Majority" group, the embedding model is forced to crush "Minority" subgroups together, a phenomenon we term Representation Collapse.
The "Hidden Diagnosis" Problem (Example)
Imagine training an AI doctor on 10,000 chest X-rays:
- 9,000 Healthy (Majority)
- 900 Common Pneumonia (Minority)
- 100 Rare Early-Stage Tuberculosis (Rare Subgroup)
In Euclidean space, the model runs out of room. To fit the 9,000 healthy images, it crushes the 100 Tuberculosis cases into the middle of the Pneumonia cluster. To the AI (and the human curator), the rare cases just look like "noisy" Pneumonia. The result: The AI fails to diagnose the patients who need help the most.
HyperView leverages Hyperbolic Geometry (specifically the Poincaré disk model), where volume grows exponentially (
- Native Hyperbolic Embeddings: Utilizes the Poincaré ball model to naturally represent hierarchical data structures without distortion.
- Fairness-Aware Curation: Mathematically guarantees that long-tail and minority samples remain distinct and retrievable, preventing them from being "crushed" by majority classes.
- Million-Scale Performance: Designed with a Rust core extending Qdrant with custom non-Euclidean distance metrics (Proof of Concept).
- Hybrid Architecture: Seamless integration of Python (PyTorch/Geoopt) for model adaptation and WebGL for high-performance browser visualization.
This repository serves as a Showcase for the HyperView technology stack.
poc/bias_demonstration.py: A simulation script usinggeomstatsto mathematically prove the "Representation Collapse" in Euclidean space.poc/hyperbolic_adapter.py: A minimal PyTorch implementation usinggeooptto project standard Euclidean vectors (e.g., CLIP) into the Poincaré ball.docs/index.html: The source code for the interactive WebGL visualization.docs/architecture.md: Detailed system design for the full engine.
git clone https://github.com/HackerRoomAI/HyperView.git
cd HyperView
uv venv
uv pip install -r requirements.txtTo generate the comparison figure (Figure 1) locally:
uv run poc/bias_demonstration.pyTo run the hyperbolic adapter demo:
uv run poc/hyperbolic_adapter.pyTo run the interactive visualization locally:
uv run python -m http.server 8000
# Open http://localhost:8000/docs/index.htmlHyperView employs a "Hybrid Engine" approach:
- Ingestion:
HyperbolicAdapter(Python) projects raw embeddings to the manifold. - Storage: Custom Rust-based vector engine (Qdrant fork) indexes data using Poincaré distance.
- Visualization: WebGL frontend renders the Poincaré disk directly using custom shaders.
See docs/architecture.md for details.
- Poincaré Embeddings for Learning Hierarchical Representations (Nickel & Kiela, 2017) - The seminal paper demonstrating how hyperbolic space can represent hierarchies with significantly fewer dimensions than Euclidean space.
- Hyperbolic Neural Networks (Ganea et al., 2018) - Extends deep learning operations to hyperbolic space.
- Compositional Entailment Learning for Hyperbolic Vision-Language Models (Pal et al., 2024) - Defines a CLIP vision-language model on hyperbolic space
This project is licensed under the MIT License - see the LICENSE file for details.
