FineVision Visual Diversity Metric (Reproducible Implementation)
Quantifying image dataset quality using SSCD embeddings and spectral diversity analysis
This repository provides a clean and reproducible implementation of the visual diversity evaluation metric used in HuggingFace FineVision for assessing MLLM (Multimodal Large Language Model) SFT datasets.
The method converts qualitative visual diversity into a single quantitative score by analyzing the global structure of SSCD embedding distributions.
Visual diversity is a first-order factor in large-scale vision and multimodal model training.
This metric enables:
- Objective dataset quality measurement
- Automatic bias and near-duplication detection
- Quantitative comparison between datasets
- Guidance for data augmentation strategies
- Core-set selection for active learning
- Model: SSCD (Self-Supervised Copy Detection, Meta AI)
- Embedding Dimension: 512
- Key Property: Robust to near-duplicate and semantic similarity
-
Covariance Estimation
Captures the global variance structure of the embedding distribution. -
Eigenvalue Decomposition
Extracts principal directions and their associated variance magnitudes. -
Effective Rank (ER)
Measures the effective dimensionality of the embedding space. Higher values indicate variance spread across many independent directions. -
Participation Ratio (PR)
Measures how evenly variance is distributed across dimensions. Higher values indicate balanced usage rather than dominance by a few axes. -
Final Diversity Score
A normalized combination of ER and PR, reflecting both dimensional richness and variance balance.
- Scales to millions of images
- Multi-GPU inference via
torch.nn.DataParallel - CPU / Single GPU / Multi-GPU compatible
- Local
.npyembedding cache for memory efficiency - Fully deterministic and reproducible evaluation
pip install -r requirements.txt- Python β₯ 3.8
- PyTorch β₯ 2.0.0
- torchvision β₯ 0.15.0
- numpy β₯ 1.24.0
- scipy β₯ 1.10.0
- Pillow β₯ 9.5.0
- tqdm β₯ 4.65.0
- pyyaml β₯ 6.0
from embedders.sscd_embedding import SSCDEmbedder
from diversity.diversity_calculation import DiversityCalculator
embedder = SSCDEmbedder(device="cuda", batch_size=32)
embeddings = embedder.extract("/path/to/images")
calculator = DiversityCalculator()
score = calculator.calculate(embeddings)
print(f"Diversity Score: {score:.4f}")# CPU
python test.py --config configuration/config_cpu.yaml
# Single GPU
python test.py --config configuration/config_specific_gpu.yaml
# Multi-GPU
python test.py --config configuration/config_specific_multi_gpu.yaml
# Large-scale dataset with local cache
python test.py --config configuration/config_specific_gpu_local_cache.yaml| Dataset | Images | Diversity Score | Rating |
|---|---|---|---|
| FineVision | 17.3M | 0.500 | βββββ |
| Cambrian-7M | 5.4M | 0.458 | ββββ |
| M4-Instruct | 2.48M | 0.413 | ββββ |
| Cauldron | 2.0M | 0.400 | ββββ |
| LLaVA-Vision | 2.5M | 0.298 | βββ |
| Dataset | Task | Diversity | Interpretation |
|---|---|---|---|
| Pascal VOC | Classification | 0.885 | Very High |
| V3Det | Detection | 0.879 | Very High |
| WiderFace | Face Detection | 0.813 | Very High |
| CrowdHuman | Detection | 0.758 | Very High |
| RVSD | DeSnowing | 0.293 | Low |
| SeaDroneSee | Detection | 0.183 | Very Low |
| DanceTrack | Tracking | 0.145 | Very Low |
| R7_Tracking | Tracking | 0.071 | Extremely Low |
| Score Range | Meaning |
|---|---|
| β₯ 0.50 | FineVision-level diversity |
| 0.40 β 0.50 | Suitable for MLLM training |
| 0.30 β 0.40 | Augmentation recommended |
| 0.20 β 0.30 | Strong bias suspected |
| < 0.20 | Severe redundancy |
visual-diversity-evaluation/
βββ configuration/
β βββ config_cpu.yaml
β βββ config_specific_gpu.yaml
β βββ config_specific_multi_gpu.yaml
β βββ config_specific_gpu_local_cache.yaml
βββ data_loaders/
β βββ custom_dataset.py
βββ embedders/
β βββ sscd_embedding.py
βββ diversity/
β βββ diversity_calculation.py
βββ utils.py
βββ test.py
βββ requirements.txt
βββ README.md
score = evaluate_diversity("/path/to/dataset")effective_rank, participation_ratio = get_diversity_components(embeddings)selected_indices = select_diverse_samples(embeddings, k=1000)- Roy & Vetterli, The Effective Rank, EUSIPCO 2007
- Morcos et al., On the Importance of Single Directions for Generalization, ICLR 2018
- Meta AI, SSCD: Self-Supervised Copy Detection
This project is licensed under the MIT License.
- HuggingFace M4 β FineVision
- Meta AI β SSCD
- Roy & Vetterli β Effective Rank
- Morcos et al. β Participation Ratio