Thanks to visit codestin.com
Credit goes to github.com

Skip to content

neverabandon80/visual-diversity-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Visual Diversity Evaluation for Image Datasets

FineVision Visual Diversity Metric (Reproducible Implementation)
Quantifying image dataset quality using SSCD embeddings and spectral diversity analysis

Python PyTorch License


πŸ“Œ Overview

This repository provides a clean and reproducible implementation of the visual diversity evaluation metric used in HuggingFace FineVision for assessing MLLM (Multimodal Large Language Model) SFT datasets.

The method converts qualitative visual diversity into a single quantitative score by analyzing the global structure of SSCD embedding distributions.


🎯 Motivation

Visual diversity is a first-order factor in large-scale vision and multimodal model training.

This metric enables:

  • Objective dataset quality measurement
  • Automatic bias and near-duplication detection
  • Quantitative comparison between datasets
  • Guidance for data augmentation strategies
  • Core-set selection for active learning

πŸ”¬ Methodology

1. SSCD Embedding Extraction

  • Model: SSCD (Self-Supervised Copy Detection, Meta AI)
  • Embedding Dimension: 512
  • Key Property: Robust to near-duplicate and semantic similarity

2. Diversity Computation Pipeline

  1. Covariance Estimation
    Captures the global variance structure of the embedding distribution.

  2. Eigenvalue Decomposition
    Extracts principal directions and their associated variance magnitudes.

  3. Effective Rank (ER)
    Measures the effective dimensionality of the embedding space. Higher values indicate variance spread across many independent directions.

  4. Participation Ratio (PR)
    Measures how evenly variance is distributed across dimensions. Higher values indicate balanced usage rather than dominance by a few axes.

  5. Final Diversity Score
    A normalized combination of ER and PR, reflecting both dimensional richness and variance balance.


✨ Key Features

  • Scales to millions of images
  • Multi-GPU inference via torch.nn.DataParallel
  • CPU / Single GPU / Multi-GPU compatible
  • Local .npy embedding cache for memory efficiency
  • Fully deterministic and reproducible evaluation

πŸ“¦ Installation

Install Dependencies

pip install -r requirements.txt

Core Requirements

  • Python β‰₯ 3.8
  • PyTorch β‰₯ 2.0.0
  • torchvision β‰₯ 0.15.0
  • numpy β‰₯ 1.24.0
  • scipy β‰₯ 1.10.0
  • Pillow β‰₯ 9.5.0
  • tqdm β‰₯ 4.65.0
  • pyyaml β‰₯ 6.0

πŸš€ Quick Start

Minimal Example

from embedders.sscd_embedding import SSCDEmbedder
from diversity.diversity_calculation import DiversityCalculator

embedder = SSCDEmbedder(device="cuda", batch_size=32)
embeddings = embedder.extract("/path/to/images")

calculator = DiversityCalculator()
score = calculator.calculate(embeddings)

print(f"Diversity Score: {score:.4f}")

βš™οΈ Configuration-Based Execution

# CPU
python test.py --config configuration/config_cpu.yaml

# Single GPU
python test.py --config configuration/config_specific_gpu.yaml

# Multi-GPU
python test.py --config configuration/config_specific_multi_gpu.yaml

# Large-scale dataset with local cache
python test.py --config configuration/config_specific_gpu_local_cache.yaml

πŸ“Š Benchmark Results

FineVision-Scale Dataset Comparison

Dataset Images Diversity Score Rating
FineVision 17.3M 0.500 ⭐⭐⭐⭐⭐
Cambrian-7M 5.4M 0.458 ⭐⭐⭐⭐
M4-Instruct 2.48M 0.413 ⭐⭐⭐⭐
Cauldron 2.0M 0.400 ⭐⭐⭐⭐
LLaVA-Vision 2.5M 0.298 ⭐⭐⭐

πŸ“ˆ Public Dataset Evaluation

Dataset Task Diversity Interpretation
Pascal VOC Classification 0.885 Very High
V3Det Detection 0.879 Very High
WiderFace Face Detection 0.813 Very High
CrowdHuman Detection 0.758 Very High
RVSD DeSnowing 0.293 Low
SeaDroneSee Detection 0.183 Very Low
DanceTrack Tracking 0.145 Very Low
R7_Tracking Tracking 0.071 Extremely Low

🧭 Diversity Score Interpretation

Score Range Meaning
β‰₯ 0.50 FineVision-level diversity
0.40 – 0.50 Suitable for MLLM training
0.30 – 0.40 Augmentation recommended
0.20 – 0.30 Strong bias suspected
< 0.20 Severe redundancy

πŸ—‚ Project Structure

visual-diversity-evaluation/
β”œβ”€β”€ configuration/
β”‚   β”œβ”€β”€ config_cpu.yaml
β”‚   β”œβ”€β”€ config_specific_gpu.yaml
β”‚   β”œβ”€β”€ config_specific_multi_gpu.yaml
β”‚   └── config_specific_gpu_local_cache.yaml
β”œβ”€β”€ data_loaders/
β”‚   └── custom_dataset.py
β”œβ”€β”€ embedders/
β”‚   └── sscd_embedding.py
β”œβ”€β”€ diversity/
β”‚   └── diversity_calculation.py
β”œβ”€β”€ utils.py
β”œβ”€β”€ test.py
β”œβ”€β”€ requirements.txt
└── README.md

🎯 Use Cases

Dataset Quality Assessment

score = evaluate_diversity("/path/to/dataset")

Diversity Component Analysis

effective_rank, participation_ratio = get_diversity_components(embeddings)

Active Learning Core-set Selection

selected_indices = select_diverse_samples(embeddings, k=1000)

πŸ“š References

  • Roy & Vetterli, The Effective Rank, EUSIPCO 2007
  • Morcos et al., On the Importance of Single Directions for Generalization, ICLR 2018
  • Meta AI, SSCD: Self-Supervised Copy Detection

πŸ“„ License

This project is licensed under the MIT License.


πŸ™ Acknowledgements

  • HuggingFace M4 – FineVision
  • Meta AI – SSCD
  • Roy & Vetterli – Effective Rank
  • Morcos et al. – Participation Ratio

About

πŸ“Š Visual Diversity Evaluation for Image Datasets | Implementation of FineVision's diversity metric using SSCD embeddings, Effective Rank & Participation Ratio

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages