starBoard

Photo-ID and morphometric analysis platform for Pycnopodia helianthoides (sunflower sea star) conservation research

Designed for both field and laboratory use

Overview

Conservation Context

The sunflower sea star (Pycnopodia helianthoides) has experienced catastrophic population declines of over 90% since 2013 due to Sea Star Wasting Disease (SSWD). Once the largest sea star in the world—reaching up to 1 meter in diameter with 16-24 arms—this keystone predator played a crucial role in maintaining kelp forest ecosystems by controlling sea urchin populations.

Conservation efforts now require:

Individual identification to track survival and movement
Population monitoring without invasive marking techniques
Growth tracking to understand recovery dynamics
Re-sighting history to estimate survival rates

What This Tool Does

starBoard is a desktop application designed for both field and laboratory workflows, enabling researchers to:

Field Use (Photo-ID Pipeline):

Identify individual sea stars from photographs using unique morphological features (arm patterns, colors, stripe characteristics, regenerating arms)
Match new sightings against a gallery of known individuals using both manual annotation and AI-powered visual recognition
Maintain encounter histories tracking where and when each individual was observed
Make and review match decisions with full audit trails

Laboratory Use (Morphometric Analysis): 5. Record calibrated morphometric measurements including body size, arm lengths, surface area, and estimated volume using webcam-based checkerboard calibration

Why Photo-ID Works for Sea Stars

Sunflower sea stars exhibit several characteristics that make them suitable for photo-identification:

Feature	Description	Persistence
Arm number	16-24 arms; exact count varies by individual	Stable (unless regenerating)
Short/regenerating arms	Position and relative size of small arms	Semi-stable (changes over months)
Stripe patterns	Radial stripes along arms with varying prominence	Stable
Color variation	Purple, orange, brown, pink; often multi-colored	Stable
Madreporite position	Sieve plate location relative to arm arrangement	Stable
Rosette patterns	Raised papulae clusters on dorsal surface	Stable

Features

Field Use Features

Category	Capability	Description
Data Management	Image Archive	Organized storage of gallery (known) and query (unknown) individuals
	Encounter Tracking	Date-stamped observation sessions with location metadata
	Batch Upload	Import multiple images with automatic folder discovery
	Merge/Revert	Combine confirmed matches; undo operations with full history
Annotation	30+ Morphological Fields	Numeric measurements, color categories, ordinal traits
	Short Arm Coding	Position-specific notation for regenerating arms
	Extensible Vocabularies	User-defined color and location terms
	Best Photo Selection	Mark representative images for each individual
Search & Matching	First-Order Search	Rank gallery by metadata similarity to query
	Field-Weighted Scoring	Customize importance of each annotation field
	Visual Similarity	Deep learning-based appearance matching
	Fusion Ranking	Blend metadata and visual scores with adjustable weighting
Deep Learning	MegaStarID Model	ConvNeXt-small backbone trained on 140k+ wildlife images
	YOLO Segmentation	Automatic star detection and background removal
	Precomputation	Offline embedding extraction for instant queries
	Verification Model	Pair-wise match confirmation scoring
Visualization	Image Strips	Fast preview of all photos per individual
	Side-by-Side Compare	Synchronized pan/zoom for detailed comparison
	Match History	Timeline and matrix views of past decisions
	Interaction Logging	Analytics on annotation workflow

Laboratory Use Features

Category	Capability	Description
Morphometrics	Checkerboard Calibration	Sub-millimeter measurement accuracy via webcam
	Arm Detection	Automatic arm tip localization with manual correction
	Size Measurements	Area, arm lengths, major/minor axes, estimated volume
	Water Refraction Correction	Calibrate through water for aquarium measurements
	Depth Estimation	Optional 3D volume calculation

System Architecture

flowchart TB
    subgraph UI [User Interface - PySide6/Qt]
        MainWindow --> TabSetup[Setup Tab]
        MainWindow --> TabMorph[Morphometric Tab - Lab]
        MainWindow --> TabFirst[First-Order Tab]
        MainWindow --> TabSecond[Second-Order Tab]
        MainWindow --> TabPast[Analytics & History Tab]
        MainWindow --> TabDL[Deep Learning Tab]
    end

    subgraph Field [Field Use Components]
        subgraph Data [Data Layer]
            Archive[(Archive Directory)]
            GalleryCSV[gallery_metadata.csv]
            QueriesCSV[queries_metadata.csv]
            ImageIndex[Image Index]
            IDRegistry[ID Registry]
        end

        subgraph Search [Search Engine]
            Engine[FirstOrderSearchEngine]
            NumericScorer[Numeric Gaussian]
            ColorScorer[Color LAB Distance]
            TextScorer[BGE Embeddings]
            ShortArmScorer[Position-Aware Matching]
        end

        subgraph DL [Deep Learning Module]
            ReIDAdapter[ReID Adapter]
            YOLOPreproc[YOLO Preprocessor]
            EmbedCache[Embedding Cache]
            SimLookup[Similarity Lookup]
            Verification[Verification Model]
        end
    end

    subgraph Lab [Laboratory Use Components]
        subgraph Morph [Morphometric Tool]
            Camera[Webcam Capture]
            Checkerboard[Calibration Module]
            YOLODetect[YOLO Detection]
            PolarAnalysis[Polar Profile Analysis]
        end
    end

    TabSetup --> Archive
    TabSetup --> GalleryCSV
    TabFirst --> Engine
    Engine --> NumericScorer
    Engine --> ColorScorer
    Engine --> TextScorer
    Engine --> ShortArmScorer
    TabFirst --> SimLookup
    TabDL --> ReIDAdapter
    ReIDAdapter --> YOLOPreproc
    ReIDAdapter --> EmbedCache
    TabMorph --> Morph
    TabSecond --> Verification

Component Overview

Component	Location	Purpose
Main Application	`main.py`	Entry point, logging setup, Qt application
UI Layer	`src/ui/`	All tab implementations and widgets
Data Layer	`src/data/`	Archive paths, CSV I/O, validators, annotation schema
Search Engine	`src/search/`	Field scorers and ranking logic
Deep Learning	`src/dl/`	Model loading, precomputation, similarity lookup
MegaStarID Training	`star_identification/megastarid/`	Model training scripts
Morphometric Tool	`starMorphometricTool/`	Laboratory measurement application (webcam-based)

Installation

Prerequisites

Python 3.9 or higher (3.9 recommended for best compatibility)
Anaconda or Miniconda (recommended for environment management)
8GB+ RAM (16GB recommended for deep learning features)
NVIDIA GPU with CUDA (optional but recommended for deep learning)

Step 1: Install Anaconda

If you don't have Anaconda installed:

Download from anaconda.com/download
Run the installer for your operating system
Verify installation:

conda --version
# Should output: conda 24.x.x or similar

Step 2: Create Environment

# Create a new environment with Python 3.9
conda create -n starboard python=3.9 -y

# Activate the environment
conda activate starboard

Step 3: Install Core Dependencies

# Core application dependencies
pip install PySide6 pandas numpy pillow scipy tqdm

# For text embedding search (optional but recommended)
pip install sentence-transformers

Step 4: Install Deep Learning Dependencies (Optional)

For GPU-accelerated visual re-identification:

# Install PyTorch with CUDA support (adjust cu121 for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install remaining DL dependencies
pip install -r requirements-dl.txt

requirements-dl.txt includes:

torch>=2.0.0 - Deep learning framework
torchvision>=0.15.0 - Image transforms
transformers>=4.30.0 - Transformer model support
ultralytics>=8.0.0 - YOLO instance segmentation
opencv-python>=4.8.0 - Image processing
albumentations>=1.3.0 - Data augmentation
scikit-learn>=1.2.0 - Clustering and metrics

For CPU-only installation:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements-dl.txt

Step 5: Download Model Weights

YOLO Segmentation Model: Place starseg_best.pt in:

star_identification/wildlife_reid_inference/starseg_best.pt

MegaStarID Re-ID Model: Place checkpoint files in:

star_identification/checkpoints/megastarid/

Contact the development team or check project releases for model weight downloads.

Step 6: Verify Installation

# Activate environment
conda activate starboard

# Launch the application
python main.py

The starBoard window should open. If you see the Deep Learning tab with status indicators, DL dependencies are correctly installed.

Quick Start Guide

1. Launch the Application

conda activate starboard
python main.py

2. Understand the Interface

starBoard uses a tabbed interface:

Tab	Purpose	Use Context
Setup	Upload images, create new IDs, edit metadata	Field & Lab
Morphometric	Measure specimens via calibrated webcam	Laboratory only
First-Order	Search gallery by metadata/visual similarity	Field & Lab
Second-Order	Detailed side-by-side comparison	Field & Lab
Analytics & History	Review decision history, merge confirmed matches	Field & Lab
Deep Learning	Manage models, run precomputation	Field & Lab

3. Upload Your First Images

Go to Setup tab
Select Single Upload Mode
Choose target: Gallery (known individual) or Queries (unknown)
Click Choose Files and select images
Enter or select the Individual ID (e.g., "anchovy", "star_001")
Set the Encounter Date (observation date)
Click Save

4. Annotate Morphological Features

Stay in Setup tab, switch to Metadata Edit Mode
Select the ID you just created
Fill in observable features:
- Number of arms (apparent and total)
- Tip-to-tip size (if measured)
- Short arm codes (positions of regenerating arms)
- Colors (stripe, arm, disc, rosette)
- Stripe characteristics (order, prominence, extent)
- Location and notes
Click Save Edits

5. Run Deep Learning Precomputation

Go to Deep Learning tab
Ensure a model is registered and set as active
Click Run Full Precomputation
Wait for completion (progress bar shows ETA)

Precomputation extracts visual embeddings from all images. This is a one-time operation per dataset; only new images need processing later.

6. Search for Matches

Go to First-Order tab
Select a Query ID to search for
Adjust search settings:
- Check/uncheck annotation fields to include
- Enable Visual checkbox for DL similarity
- Adjust Fusion slider (0%=metadata only, 100%=visual only)
Click Refresh to run the search
Review ranked gallery candidates

7. Compare and Decide

Click Pin for Compare on promising candidates
Go to Second-Order tab
Select the Query and Gallery IDs to compare
Use synchronized pan/zoom viewers to examine details
Record your decision: Yes (match), No (different), or Maybe (uncertain)
Add notes and click Save

8. Merge Confirmed Matches

Go to Analytics & History tab
Review Yes decisions
For confirmed matches, use Merge to combine Query into Gallery
The Query images move to the Gallery individual's folder

Detailed Module Documentation

Archive Structure

starBoard uses a file-based archive with a standardized directory layout:

archive/
├── gallery/                          # Known individuals
│   ├── gallery_metadata.csv          # All annotations for gallery
│   ├── _embeddings/                   # BGE text embeddings cache
│   ├── anchovy/                       # Individual "anchovy"
│   │   ├── 03_15_24/                  # Encounter on March 15, 2024
│   │   │   ├── IMG_001.jpg
│   │   │   ├── IMG_002.jpg
│   │   │   └── ...
│   │   └── 06_22_24/                  # Another encounter
│   │       └── ...
│   ├── pepperoni/                     # Individual "pepperoni"
│   │   └── ...
│   └── ...
│
├── queries/                           # Unidentified individuals
│   ├── queries_metadata.csv
│   ├── Q_2024_001/
│   │   └── 04_10_24/
│   │       └── ...
│   └── ...
│
├── _dl_precompute/                    # Deep learning cache
│   ├── _dl_registry.json             # Model registry
│   └── <model_key>/                  # Per-model embeddings
│       ├── embeddings/
│       └── similarity/
│
├── reports/                           # Generated reports
│   └── past_matches_master.csv
│
└── starboard.log                      # Application log

Encounter Naming Convention

Encounter folders follow the format: MM_DD_YY or MM_DD_YY_suffix

Component	Format	Example
Month	01-12	`03` for March
Day	01-31	`15`
Year	2-digit	`24` for 2024
Suffix	Optional	`_morning`, `_dive2`

Examples:

03_15_24 - March 15, 2024
06_22_24_dock - June 22, 2024 at dock site
11_03_24_pm - November 3, 2024 afternoon session

Annotation Schema

The annotation system supports 30+ typed fields organized into groups:

Numeric Measurements

Field	Type	Range	Description
`num_apparent_arms`	Integer	0-30	Arms visible in photos
`num_total_arms`	Integer	0-30	Total arms including hidden
`tip_to_tip_size_cm`	Float	0-150	Maximum diameter

Short Arm Coding

Regenerating or abnormally short arms are encoded with position and severity:

Format: severity(position), severity(position), ...

Severity	Code	Description
`short`	Normal short	Noticeably shorter than neighbors
`small`	Small short	Very short, ~25-50% of normal
`tiny`	Tiny	Barely visible, <25% of normal

Examples:

short(3) - Arm 3 is short
tiny(7), small(12) - Arm 7 is tiny, arm 12 is small
short(1), short(2), tiny(3) - Three affected arms

Arm positions are numbered 1 to N clockwise from the madreporite.

Color Fields

All color fields use an extensible vocabulary:

Field	Description
`overall_color`	General impression
`arm_color`	Primary arm coloration
`stripe_color`	Color of radial stripes
`central_disc_color`	Central body region
`papillae_central_disc_color`	Papillae on disc
`rosette_color`	Raised papulae clusters
`papillae_stripe_color`	Papillae in stripe regions
`madreporite_color`	Sieve plate color

Default color vocabulary: white, yellow, orange, peach, pink, red, maroon, burgundy, purple, mauve, brown, tan, light-purple, dark-orange, burnt-orange, etc.

Ordinal Categorical Fields

Field	Options
`stripe_order`	None (0), Mixed (1), Irregular (2), Regular (3)
`stripe_prominence`	None (0), Weak (1), Medium (2), Strong (3), Strongest (4)
`stripe_extent`	None (0), Quarter (0.25), Halfway (0.5), Three-quarters (0.75), Full (1.0)
`stripe_thickness`	None (0), Thin (1), Medium (2), Thick (3)
`arm_thickness`	Thin (0), Medium (1), Thick (2)
`rosette_prominence`	Weak (0), Medium (1), Strong (2)
`reticulation_order`	None (0), Mixed (1), Meandering (2), Train tracks (3)

Image Quality Fields

Field	Options	Purpose
`madreporite_visibility`	Not visible (0) to Excellent (3)	Can you see the sieve plate?
`anus_visibility`	Not visible (0) to Excellent (3)	Can you see the anus (for orientation)?
`postural_visibility`	Very poor (0) to Excellent (4)	Is the star flat and fully visible?

Text Fields

Field	Description
`location`	Where the star was observed (auto-complete from history)
`unusual_observation`	Any notable features or behaviors
`health_observation`	Signs of wasting, lesions, or disease

Morphometric Auto-Fields

These fields are automatically populated when measurements are imported from the morphometric tool:

Field	Unit	Description
`morph_num_arms`	count	Arms detected by YOLO
`morph_area_mm2`	mm²	Calibrated surface area
`morph_major_axis_mm`	mm	Fitted ellipse major axis
`morph_minor_axis_mm`	mm	Fitted ellipse minor axis
`morph_mean_arm_length_mm`	mm	Average arm length
`morph_max_arm_length_mm`	mm	Longest arm
`morph_tip_to_tip_mm`	mm	Maximum diameter
`morph_volume_mm3`	mm³	Estimated volume (requires depth)

Search Engine

The First-Order search engine computes similarity scores between a query and all gallery individuals.

Scoring Algorithm

For each gallery candidate:

Per-field scoring: Each enabled field produces a similarity score in [0, 1]
Presence filtering: Fields contribute only if both query and candidate have values
Weighted average: Final score = weighted mean of contributing field scores

Score = Σ(weight_f × score_f × present_f) / Σ(weight_f × present_f)

Field Scorers

Field Type	Scorer	Algorithm
Numeric	`NumericGaussianScorer`	Gaussian decay: `exp(-
Ordinal	`NumericGaussianScorer`	Same as numeric (order matters)
Color	`ColorSpaceScorer`	CIELAB perceptual distance with configurable threshold
Text	`TextEmbeddingBGEScorer`	BGE-small embedding cosine similarity
Text (fallback)	`TextNgramScorer`	Character 3-5 gram Jaccard similarity
Short Arm	`ShortArmCodeScorer`	Position-aware bipartite matching with Hungarian algorithm

Short Arm Code Matching Details

The short arm scorer handles the complex task of comparing regenerating arm patterns:

Parse codes into {position: severity} pairs
Position similarity: Gaussian decay on circular distance (σ=1.0)
Severity similarity: Ordinal matching (same=1.0, off-by-one=0.5)
Optimal pairing: Hungarian algorithm finds best arm-to-arm matching
Normalization: Divide by max(|query arms|, |gallery arms|) to penalize missing arms

Deep Learning Pipeline

Architecture Overview

The deep learning system uses a precomputation-first architecture:

┌─────────────────────────────────────────────────────────────────┐
│                    OFFLINE (One-Time)                           │
├─────────────────────────────────────────────────────────────────┤
│  1. YOLO Preprocessing                                          │
│     - Detect star in image                                      │
│     - Segment from background                                   │
│     - Crop and resize to 640px                                  │
│     - Cache to precompute_cache/                                │
│                                                                 │
│  2. Embedding Extraction                                        │
│     - Load cached images                                        │
│     - Apply test transforms (384px, normalize)                  │
│     - Run through ConvNeXt + embedding head                     │
│     - Apply test-time augmentation (flip)                       │
│     - Aggregate with outlier rejection                          │
│                                                                 │
│  3. Similarity Matrix                                           │
│     - Compute all pairwise cosine similarities                  │
│     - Optional: k-reciprocal re-ranking                         │
│     - Save to archive/_dl_precompute/                           │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    ONLINE (At Query Time)                       │
├─────────────────────────────────────────────────────────────────┤
│  - Load precomputed similarity matrix                           │
│  - Look up query row                                            │
│  - Return sorted gallery scores                                 │
│  - No neural network inference required!                        │
└─────────────────────────────────────────────────────────────────┘

Model Architecture

MegaStarID uses a ConvNeXt-small backbone, which was found to provide the best performance for sea star re-identification:

Component	Configuration
Backbone	ConvNeXt-small
Input Size	384 × 384 pixels
Embedding Dim	512
Pooling	GeM (Generalized Mean)
Training Loss	Circle Loss (0.7) + Triplet Loss (0.3)

Why ConvNeXt?

Best empirical performance on sea star re-identification tasks
Modern CNN architecture with transformer-inspired design
Efficient inference on both GPU and CPU
Robust to the texture and pattern variations in sea star imagery

Alternative backbones available: DenseNet-121, DenseNet-169, SwinV2-tiny, ResNet-50

Training Data:

Wildlife10k: 140,000 images across 37 species for broad feature learning
star_dataset: 8,000 sunflower sea star images for domain-specific fine-tuning

Outlier Detection

When aggregating embeddings from multiple images of an individual:

Compute pairwise similarities between all image embeddings
For each image, find its nearest-neighbor similarity
Compute median and MAD of NN similarities
Flag images >3 MAD below median as outliers
Aggregate only inlier embeddings into final centroid

This removes:

Wrong YOLO detections (different animal)
Severely corrupted images
Mislabeled images

Hardware Profiles

Profile	Device	Batch Size	TTA	Speed
GPU Quality	CUDA	16	H+V flip	~50 img/s
GPU Fast	CUDA	32	H flip only	~80 img/s
CPU Quality	CPU	4	H flip only	~4 img/s
CPU Fast	CPU	8	None	~8 img/s

Morphometric Tool (Laboratory Use)

The Morphometric tab provides calibrated measurements via webcam, designed for laboratory settings where specimens can be positioned under a fixed camera with a calibration checkerboard:

Laboratory Setup

Equipment needed:

USB webcam (tested with Logitech C270)
Flat, non-square checkerboard with known square dimensions
Stable camera mount (tripod or fixed position)
Consistent lighting (diffuse recommended)
Optional: shallow water container for aquatic measurements

Calibration Process

Position checkerboard flat in camera view (or underwater if measuring through water)
Enter parameters: rows, columns, square size (mm)
Detect checkerboard to establish pixel-to-mm mapping
Keep camera fixed for all subsequent measurements

The calibration corrects for:

Perspective distortion (oblique viewing angle)
Lens distortion (barrel/pincushion)
Water refraction (if measuring through water in an aquarium setting)

Measurement Procedure

In the laboratory:

Place specimen in center of checkerboard area (live or preserved)
Ensure star is flat and all arms are visible
Start YOLO detections
Capture when detection box appears
Run morphometrics analysis
Adjust parameters:
- Smoothing: Reduce contour noise
- Prominence: Arm tip detection sensitivity
- Distance: Minimum separation between arms
Click to add/Shift+click to remove arm tips
Save with initials and notes

Tips for best results:

Measurement accuracy decreases toward checkerboard edges—keep specimen centered
Use consistent, diffuse lighting to avoid shadows
For specimens with very short arms, manually adjust detection parameters
Recalibrate if camera position changes

Measurements Collected

Measurement	Unit	Method
Total area	mm²	Calibrated mask pixel count
Number of arms	count	Polar profile peak detection
Individual arm lengths	mm	Center to each tip
Mean arm length	mm	Average of all arms
Max arm length	mm	Longest arm
Major axis	mm	Fitted ellipse
Minor axis	mm	Fitted ellipse
Tip-to-tip diameter	mm	Maximum across tips
Volume	mm³	Depth estimation (optional)

Workflow Guide for Biologists

Field vs. Laboratory Workflows

starBoard supports two complementary workflows:

Workflow	Location	Primary Activities
Field Photo-ID	Dive sites, docks, tidepools	Photograph encounters, upload images, annotate features, search for matches, record decisions
Laboratory Morphometrics	Lab with webcam setup	Calibrated size measurements, arm length analysis, growth tracking

Typical research workflow:

Field work → Collect photographs of individuals in situ
Post-dive processing → Upload to starBoard, annotate, run matches
Laboratory sessions → Bring specimens in for precise measurements (if applicable)
Data integration → Morphometric data automatically populates annotation fields

Setting Up a New Study

1. Create Your Archive

The archive directory is automatically created on first launch at:

<project>/archive/

Or set a custom location by modifying src/data/archive_paths.py.

2. Establish Your Gallery

Start with known individuals from previous studies, captive animals, or distinctively marked wild individuals:

Gather reference photos - Best images of each known individual
Create consistent IDs - Use memorable names or systematic codes
Batch upload - Use Setup tab's batch mode for efficient import
Annotate thoroughly - Complete all observable fields for best matching

Recommended minimum per individual:

3-5 images from different angles
At least one clear dorsal view
Complete arm count
Short arm codes (if applicable)
Primary colors noted

Daily Workflow: Processing New Encounters

Step 1: Upload Field Images

After returning from fieldwork:

Transfer images from camera/phone
Open starBoard, go to Setup tab
For each new sighting:
- Select Queries as target (unknown until confirmed)
- Create new Query ID (e.g., Q_2024_047)
- Set encounter date
- Upload images

Tip: Use Batch Upload if you have images organized by individual in folders.

Step 2: Quick Visual Triage

Before detailed annotation:

Review image quality - Are key features visible?
Note obvious candidates - Does this look like a known individual?
Flag problematic images - Poor lighting, partial views, etc.

Step 3: Annotate Key Features

Focus on the most discriminating features first:

Arm count - Count carefully; short arms are easy to miss
Short arm positions - Critical for matching
Overall color - First impression
Stripe prominence - None/weak/medium/strong
Location - For spatial analysis

Quality annotations >> Quantity. A few well-annotated individuals are more valuable than many poorly annotated ones.

Step 4: Run Searches

Go to First-Order tab
Select your Query ID
Initial search settings:
- Enable: arm counts, short arm code, overall color
- Visual: ON (if precomputed)
- Fusion: 50% (balanced)
Review top 10-20 candidates
Pin promising matches for detailed comparison

Step 5: Detailed Comparison

Go to Second-Order tab
For each pinned candidate:
- Synchronize viewers on same features
- Compare arm-by-arm
- Check stripe patterns
- Note color consistency
Record decision: Yes / No / Maybe
Add notes explaining your reasoning

Interpreting Search Results

Understanding Scores

Score Range	Interpretation
0.8 - 1.0	Strong match - Review carefully
0.6 - 0.8	Moderate match - Worth comparing
0.4 - 0.6	Weak match - Check if few fields contributed
0.0 - 0.4	Poor match - Unlikely to be same individual

Field Breakdown

Each candidate shows which fields contributed and their individual scores:

Score: 0.72 (5 fields)
├── num_total_arms: 0.95
├── short_arm_code: 0.88
├── overall_color: 0.60
├── stripe_prominence: 0.55
└── location: 0.62

Interpretation:

High arm/short-arm scores suggest morphological similarity
Lower color scores may indicate lighting differences
Check k_contrib (number of contributing fields) - more is better

When to Trust Visual vs. Metadata

Scenario	Recommendation
Good photos, complete annotations	Use 50/50 fusion
Poor photos, good annotations	Weight metadata higher (fusion < 30%)
Good photos, sparse annotations	Weight visual higher (fusion > 70%)
Query has few distinctive features	Rely more on visual similarity
Gallery individual has changed over time	Rely more on metadata (arm codes change less than appearance)

Managing the Gallery

Merging Confirmed Matches

When you're confident a Query matches a Gallery individual:

Go to Analytics & History tab
Find the Yes decision
Select the Gallery ID from the merge dropdown
Click Merge

What happens:

Query images move to Gallery individual's folder
Query metadata is preserved in CSV
Decision is logged in audit trail
Query ID becomes inactive

Handling Uncertain Cases

For Maybe decisions:

Wait for additional sightings
Collect more photos at next encounter
Focus annotation on distinguishing features
Re-run comparison when more data available

Tips:

Don't rush to merge uncertain matches
A false positive (merging different individuals) is worse than false negative (keeping them separate)
Use notes field to record what additional evidence would confirm the match

Reverting Mistakes

If you merged incorrectly:

Go to Analytics & History tab → Revert section
Select the Gallery ID
Choose the batch to revert
Click Revert Batch

Images return to original Query location.

Configuration Reference

Environment Variables

Variable	Default	Description
`STARBOARD_LOG_LEVEL`	`INFO`	Logging verbosity: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`STARBOARD_DUMP_RANK_CSV`	(unset)	Set to `1` to export each ranking to CSV
`STARBOARD_SESSION_ID`	(auto)	Custom session ID for log correlation

Example:

# Windows PowerShell
$env:STARBOARD_LOG_LEVEL = "DEBUG"
python main.py

# Linux/macOS
STARBOARD_LOG_LEVEL=DEBUG python main.py

Fields Configuration

Access via First-Order tab → Config button:

Setting	Description
Enable/Disable Fields	Choose which fields contribute to ranking
Field Weights	Adjust relative importance (1.0 = normal)
Numeric Offsets	Shift query values for "what if" searches

Preset Configurations:

Preset	Fields Included
Average (All)	All non-empty fields equally weighted
Size Only	`num_total_arms`, `tip_to_tip_size_cm`
Colors Only	All color fields
Morphology	Arms, short arm codes, stripe characteristics
Text Only	Location and observation notes

Log Files

Log	Location	Contents
Main log	`archive/starboard.log`	Application events, search results
Interaction log	`archive/logs/*.csv`	User interaction analytics
Rank exports	`archive/logs/first_order_*.csv`	Per-query ranking details

Technical Reference

Programmatic Usage

The search engine can be used directly for scripting or testing:

from src.search.engine import FirstOrderSearchEngine

# Initialize and build indices
engine = FirstOrderSearchEngine()
engine.rebuild()

# Run a search
results = engine.rank(
    query_id="Q_2024_001",
    include_fields={"num_total_arms", "short_arm_code", "overall_color"},
    equalize_weights=True,
    top_k=20,
)

# Process results
for item in results:
    print(f"{item.gallery_id}: {item.score:.3f} ({item.k_contrib} fields)")
    for field, score in item.field_breakdown.items():
        print(f"  {field}: {score:.3f}")

Deep Learning API

from src.dl.similarity_lookup import get_visual_scores
from src.dl.registry import DLRegistry

# Get active model
registry = DLRegistry.load()
model_key = registry.active_model

# Get visual similarity scores for a query
if model_key:
    scores = get_visual_scores("Q_2024_001", model_key)
    # scores is Dict[str, float] mapping gallery_id -> similarity
    for gid, score in sorted(scores.items(), key=lambda x: -x[1])[:10]:
        print(f"{gid}: {score:.3f}")

Model Training

For advanced users who want to train custom models:

Temporal Re-ID Training

cd star_identification

# Train with temporal split (recommended)
python -m temporal_reid.train \
    --dataset-root ./star_dataset \
    --epochs 25 \
    --batch-size 48 \
    --gpus 0,1

# Grid search over loss configurations
python -m temporal_reid.grid_search \
    --dataset-root ./star_dataset \
    --epochs 25

MegaStarID Training

cd star_identification

# Pre-train on Wildlife10k
python -m megastarid.pretrain --epochs 50

# Fine-tune on star_dataset
python -m megastarid.finetune \
    --checkpoint checkpoints/megastarid/pretrain/best.pth \
    --epochs 100

# Or co-train on both
python -m megastarid.cotrain --epochs 100 --star-batch-ratio 0.3

See star_identification/megastarid/readme.md and star_identification/temporal_reid/readme.md for detailed training documentation.

Adding Custom Field Scorers

Implement the FieldScorer protocol:

from src.search.interfaces import FieldScorer
from typing import Dict, Any, Tuple

class MyCustomScorer:
    name = "my_field"
    
    def build_gallery(self, gallery_rows_by_id: Dict[str, Dict]) -> None:
        """Build index from gallery data."""
        self._index = {}
        for gid, row in gallery_rows_by_id.items():
            value = row.get(self.name, "").strip()
            if value:
                self._index[gid] = self._parse(value)
    
    def prepare_query(self, query_row: Dict) -> Any:
        """Prepare query state from row data."""
        value = query_row.get(self.name, "").strip()
        return self._parse(value) if value else None
    
    def has_query_signal(self, query_state: Any) -> bool:
        """Check if query has usable data for this field."""
        return query_state is not None
    
    def score_pair(self, query_state: Any, gallery_id: str) -> Tuple[float, bool]:
        """Score query against gallery item. Returns (score, present)."""
        if gallery_id not in self._index:
            return 0.0, False
        gallery_state = self._index[gallery_id]
        score = self._compute_similarity(query_state, gallery_state)
        return score, True
    
    def _parse(self, value: str) -> Any:
        """Parse string value into internal representation."""
        return value
    
    def _compute_similarity(self, q: Any, g: Any) -> float:
        """Compute similarity in [0, 1]."""
        return 1.0 if q == g else 0.0

Register in src/search/engine.py:

def _build_scorers(self, use_bge: bool) -> None:
    # ... existing scorers ...
    self.scorers["my_field"] = MyCustomScorer("my_field")

Troubleshooting

Common Issues

"Failed to load model"

Symptoms: Error message when accessing DL features

Solutions:

Check checkpoint path exists:

star_identification/checkpoints/megastarid/best.pth

Verify PyTorch version compatibility
Check logs for specific error: archive/starboard.log

"YOLO preprocessor not available"

Symptoms: Precomputation fails at Phase 1

Solutions:

Install ultralytics: pip install ultralytics

Check model file exists:

star_identification/wildlife_reid_inference/starseg_best.pt

Slow precomputation

Symptoms: Progress shows very low img/s rate

Solutions:

Use "Fast" speed mode on CPU
First run is slowest (building image cache)
Subsequent runs skip Phase 1 if cache exists
Consider GPU installation for 10x speedup

Deep Learning tab missing features

Symptoms: DL tab shows "Not Available" status

Solutions:

Install DL dependencies: pip install -r requirements-dl.txt

Verify PyTorch imports work:

python -c "import torch; print(torch.__version__)"

Image strip stuttering

Symptoms: Slow/jerky image preview when scrolling

Solutions:

Image strip uses scaled decoding; very large images may still be slow
Reduce preview size in code if needed
Use SSD storage for archive

Search returns no results

Symptoms: First-order tab shows empty lineup

Causes:

Query has no values for enabled fields
No gallery individuals have matching fields
Engine not rebuilt after data changes

Solutions:

Enable more fields in search settings
Check query has annotations saved
Click "Rebuild" button

Merge fails

Symptoms: Error when trying to merge Query into Gallery

Solutions:

Ensure Query ID exists and has images
Check Gallery ID is valid
Verify no file permission issues
Check logs for specific error

Getting Help

Check logs: archive/starboard.log contains detailed error information
Enable debug mode: Set STARBOARD_LOG_LEVEL=DEBUG
File an issue: Include log excerpts and steps to reproduce

Contributing

Contributions are welcome! Areas of interest:

Field scorers for additional metadata types
UI improvements for annotation efficiency
Model architectures for improved re-identification
Documentation and tutorials

Development Setup

# Clone repository
git clone <repository-url>
cd starBoard

# Create development environment
conda create -n starboard-dev python=3.9 -y
conda activate starboard-dev

# Install all dependencies
pip install -r requirements-dl.txt
pip install pytest black flake8

# Run application
python main.py

Code Style

Follow PEP 8 guidelines
Use type hints for function signatures
Document public APIs with docstrings
Log important operations with appropriate levels

Citation and Acknowledgments

Citation

If you use starBoard in your research, please cite:

@software{starboard2024,
  title = {starBoard: Photo-ID and Morphometric Analysis Platform for Sea Star Conservation},
  year = {2024},
  url = {https://github.com/[repository]}
}

Acknowledgments

Friday Harbor Laboratories - Field site and specimen access
Wildlife10k Dataset - Pre-training data for visual re-identification
Meta AI Research - ConvNeXt architecture
Ultralytics - YOLO implementation

Related Projects

MegaDetector - Wildlife detection in camera trap images
Wildlife ReID-10k - Benchmark dataset for wildlife re-identification
Wildbook - Individual animal identification platform

License

[License information to be added]

starBoard is developed for sunflower sea star conservation research. For questions or collaborations, please contact the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
archive		archive
docs		docs
src		src
starMorphometricTool		starMorphometricTool
star_identification		star_identification
.gitignore		.gitignore
README.md		README.md
export_for_reid.py		export_for_reid.py
main.py		main.py
migrate_metadata.py		migrate_metadata.py
requirements-dl.txt		requirements-dl.txt
starBoard_Launcher.applescript		starBoard_Launcher.applescript

weertman/starBoard

Folders and files

Latest commit

History

Repository files navigation

starBoard

Table of Contents

Overview

Conservation Context

What This Tool Does

Why Photo-ID Works for Sea Stars

Features

Field Use Features

Laboratory Use Features

System Architecture

Component Overview

Installation

Prerequisites

Step 1: Install Anaconda

Step 2: Create Environment

Step 3: Install Core Dependencies

Step 4: Install Deep Learning Dependencies (Optional)

Step 5: Download Model Weights

Step 6: Verify Installation

Quick Start Guide

1. Launch the Application

2. Understand the Interface

3. Upload Your First Images

4. Annotate Morphological Features

5. Run Deep Learning Precomputation

6. Search for Matches

7. Compare and Decide

8. Merge Confirmed Matches

Detailed Module Documentation

Archive Structure

Encounter Naming Convention

Annotation Schema

Numeric Measurements

Short Arm Coding

Color Fields

Ordinal Categorical Fields

Image Quality Fields

Text Fields

Morphometric Auto-Fields

Search Engine

Scoring Algorithm

Field Scorers

Short Arm Code Matching Details

Deep Learning Pipeline

Architecture Overview

Model Architecture

Outlier Detection

Hardware Profiles

Morphometric Tool (Laboratory Use)

Laboratory Setup

Calibration Process

Measurement Procedure

Measurements Collected

Workflow Guide for Biologists

Field vs. Laboratory Workflows

Setting Up a New Study

1. Create Your Archive

2. Establish Your Gallery

Daily Workflow: Processing New Encounters

Step 1: Upload Field Images

Step 2: Quick Visual Triage

Step 3: Annotate Key Features

Step 4: Run Searches

Step 5: Detailed Comparison

Interpreting Search Results

Understanding Scores

Field Breakdown

When to Trust Visual vs. Metadata

Managing the Gallery

Merging Confirmed Matches

Handling Uncertain Cases

Reverting Mistakes

Configuration Reference

Environment Variables

Fields Configuration

Packages