Photo-ID and morphometric analysis platform for Pycnopodia helianthoides (sunflower sea star) conservation research
Designed for both field and laboratory use
- Overview
- Features
- System Architecture
- Installation
- Quick Start Guide
- Detailed Module Documentation
- Workflow Guide for Biologists
- Configuration Reference
- Technical Reference
- Troubleshooting
- Contributing
- Citation and Acknowledgments
The sunflower sea star (Pycnopodia helianthoides) has experienced catastrophic population declines of over 90% since 2013 due to Sea Star Wasting Disease (SSWD). Once the largest sea star in the world—reaching up to 1 meter in diameter with 16-24 arms—this keystone predator played a crucial role in maintaining kelp forest ecosystems by controlling sea urchin populations.
Conservation efforts now require:
- Individual identification to track survival and movement
- Population monitoring without invasive marking techniques
- Growth tracking to understand recovery dynamics
- Re-sighting history to estimate survival rates
starBoard is a desktop application designed for both field and laboratory workflows, enabling researchers to:
Field Use (Photo-ID Pipeline):
- Identify individual sea stars from photographs using unique morphological features (arm patterns, colors, stripe characteristics, regenerating arms)
- Match new sightings against a gallery of known individuals using both manual annotation and AI-powered visual recognition
- Maintain encounter histories tracking where and when each individual was observed
- Make and review match decisions with full audit trails
Laboratory Use (Morphometric Analysis): 5. Record calibrated morphometric measurements including body size, arm lengths, surface area, and estimated volume using webcam-based checkerboard calibration
Sunflower sea stars exhibit several characteristics that make them suitable for photo-identification:
| Feature | Description | Persistence |
|---|---|---|
| Arm number | 16-24 arms; exact count varies by individual | Stable (unless regenerating) |
| Short/regenerating arms | Position and relative size of small arms | Semi-stable (changes over months) |
| Stripe patterns | Radial stripes along arms with varying prominence | Stable |
| Color variation | Purple, orange, brown, pink; often multi-colored | Stable |
| Madreporite position | Sieve plate location relative to arm arrangement | Stable |
| Rosette patterns | Raised papulae clusters on dorsal surface | Stable |
| Category | Capability | Description |
|---|---|---|
| Data Management | Image Archive | Organized storage of gallery (known) and query (unknown) individuals |
| Encounter Tracking | Date-stamped observation sessions with location metadata | |
| Batch Upload | Import multiple images with automatic folder discovery | |
| Merge/Revert | Combine confirmed matches; undo operations with full history | |
| Annotation | 30+ Morphological Fields | Numeric measurements, color categories, ordinal traits |
| Short Arm Coding | Position-specific notation for regenerating arms | |
| Extensible Vocabularies | User-defined color and location terms | |
| Best Photo Selection | Mark representative images for each individual | |
| Search & Matching | First-Order Search | Rank gallery by metadata similarity to query |
| Field-Weighted Scoring | Customize importance of each annotation field | |
| Visual Similarity | Deep learning-based appearance matching | |
| Fusion Ranking | Blend metadata and visual scores with adjustable weighting | |
| Deep Learning | MegaStarID Model | ConvNeXt-small backbone trained on 140k+ wildlife images |
| YOLO Segmentation | Automatic star detection and background removal | |
| Precomputation | Offline embedding extraction for instant queries | |
| Verification Model | Pair-wise match confirmation scoring | |
| Visualization | Image Strips | Fast preview of all photos per individual |
| Side-by-Side Compare | Synchronized pan/zoom for detailed comparison | |
| Match History | Timeline and matrix views of past decisions | |
| Interaction Logging | Analytics on annotation workflow |
| Category | Capability | Description |
|---|---|---|
| Morphometrics | Checkerboard Calibration | Sub-millimeter measurement accuracy via webcam |
| Arm Detection | Automatic arm tip localization with manual correction | |
| Size Measurements | Area, arm lengths, major/minor axes, estimated volume | |
| Water Refraction Correction | Calibrate through water for aquarium measurements | |
| Depth Estimation | Optional 3D volume calculation |
flowchart TB
subgraph UI [User Interface - PySide6/Qt]
MainWindow --> TabSetup[Setup Tab]
MainWindow --> TabMorph[Morphometric Tab - Lab]
MainWindow --> TabFirst[First-Order Tab]
MainWindow --> TabSecond[Second-Order Tab]
MainWindow --> TabPast[Analytics & History Tab]
MainWindow --> TabDL[Deep Learning Tab]
end
subgraph Field [Field Use Components]
subgraph Data [Data Layer]
Archive[(Archive Directory)]
GalleryCSV[gallery_metadata.csv]
QueriesCSV[queries_metadata.csv]
ImageIndex[Image Index]
IDRegistry[ID Registry]
end
subgraph Search [Search Engine]
Engine[FirstOrderSearchEngine]
NumericScorer[Numeric Gaussian]
ColorScorer[Color LAB Distance]
TextScorer[BGE Embeddings]
ShortArmScorer[Position-Aware Matching]
end
subgraph DL [Deep Learning Module]
ReIDAdapter[ReID Adapter]
YOLOPreproc[YOLO Preprocessor]
EmbedCache[Embedding Cache]
SimLookup[Similarity Lookup]
Verification[Verification Model]
end
end
subgraph Lab [Laboratory Use Components]
subgraph Morph [Morphometric Tool]
Camera[Webcam Capture]
Checkerboard[Calibration Module]
YOLODetect[YOLO Detection]
PolarAnalysis[Polar Profile Analysis]
end
end
TabSetup --> Archive
TabSetup --> GalleryCSV
TabFirst --> Engine
Engine --> NumericScorer
Engine --> ColorScorer
Engine --> TextScorer
Engine --> ShortArmScorer
TabFirst --> SimLookup
TabDL --> ReIDAdapter
ReIDAdapter --> YOLOPreproc
ReIDAdapter --> EmbedCache
TabMorph --> Morph
TabSecond --> Verification
| Component | Location | Purpose |
|---|---|---|
| Main Application | main.py |
Entry point, logging setup, Qt application |
| UI Layer | src/ui/ |
All tab implementations and widgets |
| Data Layer | src/data/ |
Archive paths, CSV I/O, validators, annotation schema |
| Search Engine | src/search/ |
Field scorers and ranking logic |
| Deep Learning | src/dl/ |
Model loading, precomputation, similarity lookup |
| MegaStarID Training | star_identification/megastarid/ |
Model training scripts |
| Morphometric Tool | starMorphometricTool/ |
Laboratory measurement application (webcam-based) |
- Python 3.9 or higher (3.9 recommended for best compatibility)
- Anaconda or Miniconda (recommended for environment management)
- 8GB+ RAM (16GB recommended for deep learning features)
- NVIDIA GPU with CUDA (optional but recommended for deep learning)
If you don't have Anaconda installed:
- Download from anaconda.com/download
- Run the installer for your operating system
- Verify installation:
conda --version
# Should output: conda 24.x.x or similar# Create a new environment with Python 3.9
conda create -n starboard python=3.9 -y
# Activate the environment
conda activate starboard# Core application dependencies
pip install PySide6 pandas numpy pillow scipy tqdm
# For text embedding search (optional but recommended)
pip install sentence-transformersFor GPU-accelerated visual re-identification:
# Install PyTorch with CUDA support (adjust cu121 for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install remaining DL dependencies
pip install -r requirements-dl.txtrequirements-dl.txt includes:
torch>=2.0.0- Deep learning frameworktorchvision>=0.15.0- Image transformstransformers>=4.30.0- Transformer model supportultralytics>=8.0.0- YOLO instance segmentationopencv-python>=4.8.0- Image processingalbumentations>=1.3.0- Data augmentationscikit-learn>=1.2.0- Clustering and metrics
For CPU-only installation:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements-dl.txtYOLO Segmentation Model:
Place starseg_best.pt in:
star_identification/wildlife_reid_inference/starseg_best.pt
MegaStarID Re-ID Model: Place checkpoint files in:
star_identification/checkpoints/megastarid/
Contact the development team or check project releases for model weight downloads.
# Activate environment
conda activate starboard
# Launch the application
python main.pyThe starBoard window should open. If you see the Deep Learning tab with status indicators, DL dependencies are correctly installed.
conda activate starboard
python main.pystarBoard uses a tabbed interface:
| Tab | Purpose | Use Context |
|---|---|---|
| Setup | Upload images, create new IDs, edit metadata | Field & Lab |
| Morphometric | Measure specimens via calibrated webcam | Laboratory only |
| First-Order | Search gallery by metadata/visual similarity | Field & Lab |
| Second-Order | Detailed side-by-side comparison | Field & Lab |
| Analytics & History | Review decision history, merge confirmed matches | Field & Lab |
| Deep Learning | Manage models, run precomputation | Field & Lab |
- Go to Setup tab
- Select Single Upload Mode
- Choose target: Gallery (known individual) or Queries (unknown)
- Click Choose Files and select images
- Enter or select the Individual ID (e.g., "anchovy", "star_001")
- Set the Encounter Date (observation date)
- Click Save
- Stay in Setup tab, switch to Metadata Edit Mode
- Select the ID you just created
- Fill in observable features:
- Number of arms (apparent and total)
- Tip-to-tip size (if measured)
- Short arm codes (positions of regenerating arms)
- Colors (stripe, arm, disc, rosette)
- Stripe characteristics (order, prominence, extent)
- Location and notes
- Click Save Edits
- Go to Deep Learning tab
- Ensure a model is registered and set as active
- Click Run Full Precomputation
- Wait for completion (progress bar shows ETA)
Precomputation extracts visual embeddings from all images. This is a one-time operation per dataset; only new images need processing later.
- Go to First-Order tab
- Select a Query ID to search for
- Adjust search settings:
- Check/uncheck annotation fields to include
- Enable Visual checkbox for DL similarity
- Adjust Fusion slider (0%=metadata only, 100%=visual only)
- Click Refresh to run the search
- Review ranked gallery candidates
- Click Pin for Compare on promising candidates
- Go to Second-Order tab
- Select the Query and Gallery IDs to compare
- Use synchronized pan/zoom viewers to examine details
- Record your decision: Yes (match), No (different), or Maybe (uncertain)
- Add notes and click Save
- Go to Analytics & History tab
- Review Yes decisions
- For confirmed matches, use Merge to combine Query into Gallery
- The Query images move to the Gallery individual's folder
starBoard uses a file-based archive with a standardized directory layout:
archive/
├── gallery/ # Known individuals
│ ├── gallery_metadata.csv # All annotations for gallery
│ ├── _embeddings/ # BGE text embeddings cache
│ ├── anchovy/ # Individual "anchovy"
│ │ ├── 03_15_24/ # Encounter on March 15, 2024
│ │ │ ├── IMG_001.jpg
│ │ │ ├── IMG_002.jpg
│ │ │ └── ...
│ │ └── 06_22_24/ # Another encounter
│ │ └── ...
│ ├── pepperoni/ # Individual "pepperoni"
│ │ └── ...
│ └── ...
│
├── queries/ # Unidentified individuals
│ ├── queries_metadata.csv
│ ├── Q_2024_001/
│ │ └── 04_10_24/
│ │ └── ...
│ └── ...
│
├── _dl_precompute/ # Deep learning cache
│ ├── _dl_registry.json # Model registry
│ └── <model_key>/ # Per-model embeddings
│ ├── embeddings/
│ └── similarity/
│
├── reports/ # Generated reports
│ └── past_matches_master.csv
│
└── starboard.log # Application log
Encounter folders follow the format: MM_DD_YY or MM_DD_YY_suffix
| Component | Format | Example |
|---|---|---|
| Month | 01-12 | 03 for March |
| Day | 01-31 | 15 |
| Year | 2-digit | 24 for 2024 |
| Suffix | Optional | _morning, _dive2 |
Examples:
03_15_24- March 15, 202406_22_24_dock- June 22, 2024 at dock site11_03_24_pm- November 3, 2024 afternoon session
The annotation system supports 30+ typed fields organized into groups:
| Field | Type | Range | Description |
|---|---|---|---|
num_apparent_arms |
Integer | 0-30 | Arms visible in photos |
num_total_arms |
Integer | 0-30 | Total arms including hidden |
tip_to_tip_size_cm |
Float | 0-150 | Maximum diameter |
Regenerating or abnormally short arms are encoded with position and severity:
Format: severity(position), severity(position), ...
| Severity | Code | Description |
|---|---|---|
short |
Normal short | Noticeably shorter than neighbors |
small |
Small short | Very short, ~25-50% of normal |
tiny |
Tiny | Barely visible, <25% of normal |
Examples:
short(3)- Arm 3 is shorttiny(7), small(12)- Arm 7 is tiny, arm 12 is smallshort(1), short(2), tiny(3)- Three affected arms
Arm positions are numbered 1 to N clockwise from the madreporite.
All color fields use an extensible vocabulary:
| Field | Description |
|---|---|
overall_color |
General impression |
arm_color |
Primary arm coloration |
stripe_color |
Color of radial stripes |
central_disc_color |
Central body region |
papillae_central_disc_color |
Papillae on disc |
rosette_color |
Raised papulae clusters |
papillae_stripe_color |
Papillae in stripe regions |
madreporite_color |
Sieve plate color |
Default color vocabulary:
white, yellow, orange, peach, pink, red, maroon, burgundy, purple, mauve, brown, tan, light-purple, dark-orange, burnt-orange, etc.
| Field | Options |
|---|---|
stripe_order |
None (0), Mixed (1), Irregular (2), Regular (3) |
stripe_prominence |
None (0), Weak (1), Medium (2), Strong (3), Strongest (4) |
stripe_extent |
None (0), Quarter (0.25), Halfway (0.5), Three-quarters (0.75), Full (1.0) |
stripe_thickness |
None (0), Thin (1), Medium (2), Thick (3) |
arm_thickness |
Thin (0), Medium (1), Thick (2) |
rosette_prominence |
Weak (0), Medium (1), Strong (2) |
reticulation_order |
None (0), Mixed (1), Meandering (2), Train tracks (3) |
| Field | Options | Purpose |
|---|---|---|
madreporite_visibility |
Not visible (0) to Excellent (3) | Can you see the sieve plate? |
anus_visibility |
Not visible (0) to Excellent (3) | Can you see the anus (for orientation)? |
postural_visibility |
Very poor (0) to Excellent (4) | Is the star flat and fully visible? |
| Field | Description |
|---|---|
location |
Where the star was observed (auto-complete from history) |
unusual_observation |
Any notable features or behaviors |
health_observation |
Signs of wasting, lesions, or disease |
These fields are automatically populated when measurements are imported from the morphometric tool:
| Field | Unit | Description |
|---|---|---|
morph_num_arms |
count | Arms detected by YOLO |
morph_area_mm2 |
mm² | Calibrated surface area |
morph_major_axis_mm |
mm | Fitted ellipse major axis |
morph_minor_axis_mm |
mm | Fitted ellipse minor axis |
morph_mean_arm_length_mm |
mm | Average arm length |
morph_max_arm_length_mm |
mm | Longest arm |
morph_tip_to_tip_mm |
mm | Maximum diameter |
morph_volume_mm3 |
mm³ | Estimated volume (requires depth) |
The First-Order search engine computes similarity scores between a query and all gallery individuals.
For each gallery candidate:
- Per-field scoring: Each enabled field produces a similarity score in [0, 1]
- Presence filtering: Fields contribute only if both query and candidate have values
- Weighted average: Final score = weighted mean of contributing field scores
Score = Σ(weight_f × score_f × present_f) / Σ(weight_f × present_f)
| Field Type | Scorer | Algorithm |
|---|---|---|
| Numeric | NumericGaussianScorer |
Gaussian decay: `exp(- |
| Ordinal | NumericGaussianScorer |
Same as numeric (order matters) |
| Color | ColorSpaceScorer |
CIELAB perceptual distance with configurable threshold |
| Text | TextEmbeddingBGEScorer |
BGE-small embedding cosine similarity |
| Text (fallback) | TextNgramScorer |
Character 3-5 gram Jaccard similarity |
| Short Arm | ShortArmCodeScorer |
Position-aware bipartite matching with Hungarian algorithm |
The short arm scorer handles the complex task of comparing regenerating arm patterns:
- Parse codes into
{position: severity}pairs - Position similarity: Gaussian decay on circular distance (σ=1.0)
- Severity similarity: Ordinal matching (same=1.0, off-by-one=0.5)
- Optimal pairing: Hungarian algorithm finds best arm-to-arm matching
- Normalization: Divide by max(|query arms|, |gallery arms|) to penalize missing arms
The deep learning system uses a precomputation-first architecture:
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (One-Time) │
├─────────────────────────────────────────────────────────────────┤
│ 1. YOLO Preprocessing │
│ - Detect star in image │
│ - Segment from background │
│ - Crop and resize to 640px │
│ - Cache to precompute_cache/ │
│ │
│ 2. Embedding Extraction │
│ - Load cached images │
│ - Apply test transforms (384px, normalize) │
│ - Run through ConvNeXt + embedding head │
│ - Apply test-time augmentation (flip) │
│ - Aggregate with outlier rejection │
│ │
│ 3. Similarity Matrix │
│ - Compute all pairwise cosine similarities │
│ - Optional: k-reciprocal re-ranking │
│ - Save to archive/_dl_precompute/ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (At Query Time) │
├─────────────────────────────────────────────────────────────────┤
│ - Load precomputed similarity matrix │
│ - Look up query row │
│ - Return sorted gallery scores │
│ - No neural network inference required! │
└─────────────────────────────────────────────────────────────────┘
MegaStarID uses a ConvNeXt-small backbone, which was found to provide the best performance for sea star re-identification:
| Component | Configuration |
|---|---|
| Backbone | ConvNeXt-small |
| Input Size | 384 × 384 pixels |
| Embedding Dim | 512 |
| Pooling | GeM (Generalized Mean) |
| Training Loss | Circle Loss (0.7) + Triplet Loss (0.3) |
Why ConvNeXt?
- Best empirical performance on sea star re-identification tasks
- Modern CNN architecture with transformer-inspired design
- Efficient inference on both GPU and CPU
- Robust to the texture and pattern variations in sea star imagery
Alternative backbones available: DenseNet-121, DenseNet-169, SwinV2-tiny, ResNet-50
Training Data:
- Wildlife10k: 140,000 images across 37 species for broad feature learning
- star_dataset: 8,000 sunflower sea star images for domain-specific fine-tuning
When aggregating embeddings from multiple images of an individual:
- Compute pairwise similarities between all image embeddings
- For each image, find its nearest-neighbor similarity
- Compute median and MAD of NN similarities
- Flag images >3 MAD below median as outliers
- Aggregate only inlier embeddings into final centroid
This removes:
- Wrong YOLO detections (different animal)
- Severely corrupted images
- Mislabeled images
| Profile | Device | Batch Size | TTA | Speed |
|---|---|---|---|---|
| GPU Quality | CUDA | 16 | H+V flip | ~50 img/s |
| GPU Fast | CUDA | 32 | H flip only | ~80 img/s |
| CPU Quality | CPU | 4 | H flip only | ~4 img/s |
| CPU Fast | CPU | 8 | None | ~8 img/s |
The Morphometric tab provides calibrated measurements via webcam, designed for laboratory settings where specimens can be positioned under a fixed camera with a calibration checkerboard:
Equipment needed:
- USB webcam (tested with Logitech C270)
- Flat, non-square checkerboard with known square dimensions
- Stable camera mount (tripod or fixed position)
- Consistent lighting (diffuse recommended)
- Optional: shallow water container for aquatic measurements
- Position checkerboard flat in camera view (or underwater if measuring through water)
- Enter parameters: rows, columns, square size (mm)
- Detect checkerboard to establish pixel-to-mm mapping
- Keep camera fixed for all subsequent measurements
The calibration corrects for:
- Perspective distortion (oblique viewing angle)
- Lens distortion (barrel/pincushion)
- Water refraction (if measuring through water in an aquarium setting)
In the laboratory:
- Place specimen in center of checkerboard area (live or preserved)
- Ensure star is flat and all arms are visible
- Start YOLO detections
- Capture when detection box appears
- Run morphometrics analysis
- Adjust parameters:
- Smoothing: Reduce contour noise
- Prominence: Arm tip detection sensitivity
- Distance: Minimum separation between arms
- Click to add/Shift+click to remove arm tips
- Save with initials and notes
Tips for best results:
- Measurement accuracy decreases toward checkerboard edges—keep specimen centered
- Use consistent, diffuse lighting to avoid shadows
- For specimens with very short arms, manually adjust detection parameters
- Recalibrate if camera position changes
| Measurement | Unit | Method |
|---|---|---|
| Total area | mm² | Calibrated mask pixel count |
| Number of arms | count | Polar profile peak detection |
| Individual arm lengths | mm | Center to each tip |
| Mean arm length | mm | Average of all arms |
| Max arm length | mm | Longest arm |
| Major axis | mm | Fitted ellipse |
| Minor axis | mm | Fitted ellipse |
| Tip-to-tip diameter | mm | Maximum across tips |
| Volume | mm³ | Depth estimation (optional) |
starBoard supports two complementary workflows:
| Workflow | Location | Primary Activities |
|---|---|---|
| Field Photo-ID | Dive sites, docks, tidepools | Photograph encounters, upload images, annotate features, search for matches, record decisions |
| Laboratory Morphometrics | Lab with webcam setup | Calibrated size measurements, arm length analysis, growth tracking |
Typical research workflow:
- Field work → Collect photographs of individuals in situ
- Post-dive processing → Upload to starBoard, annotate, run matches
- Laboratory sessions → Bring specimens in for precise measurements (if applicable)
- Data integration → Morphometric data automatically populates annotation fields
The archive directory is automatically created on first launch at:
<project>/archive/
Or set a custom location by modifying src/data/archive_paths.py.
Start with known individuals from previous studies, captive animals, or distinctively marked wild individuals:
- Gather reference photos - Best images of each known individual
- Create consistent IDs - Use memorable names or systematic codes
- Batch upload - Use Setup tab's batch mode for efficient import
- Annotate thoroughly - Complete all observable fields for best matching
Recommended minimum per individual:
- 3-5 images from different angles
- At least one clear dorsal view
- Complete arm count
- Short arm codes (if applicable)
- Primary colors noted
After returning from fieldwork:
- Transfer images from camera/phone
- Open starBoard, go to Setup tab
- For each new sighting:
- Select Queries as target (unknown until confirmed)
- Create new Query ID (e.g.,
Q_2024_047) - Set encounter date
- Upload images
Tip: Use Batch Upload if you have images organized by individual in folders.
Before detailed annotation:
- Review image quality - Are key features visible?
- Note obvious candidates - Does this look like a known individual?
- Flag problematic images - Poor lighting, partial views, etc.
Focus on the most discriminating features first:
- Arm count - Count carefully; short arms are easy to miss
- Short arm positions - Critical for matching
- Overall color - First impression
- Stripe prominence - None/weak/medium/strong
- Location - For spatial analysis
Quality annotations >> Quantity. A few well-annotated individuals are more valuable than many poorly annotated ones.
- Go to First-Order tab
- Select your Query ID
- Initial search settings:
- Enable: arm counts, short arm code, overall color
- Visual: ON (if precomputed)
- Fusion: 50% (balanced)
- Review top 10-20 candidates
- Pin promising matches for detailed comparison
- Go to Second-Order tab
- For each pinned candidate:
- Synchronize viewers on same features
- Compare arm-by-arm
- Check stripe patterns
- Note color consistency
- Record decision: Yes / No / Maybe
- Add notes explaining your reasoning
| Score Range | Interpretation |
|---|---|
| 0.8 - 1.0 | Strong match - Review carefully |
| 0.6 - 0.8 | Moderate match - Worth comparing |
| 0.4 - 0.6 | Weak match - Check if few fields contributed |
| 0.0 - 0.4 | Poor match - Unlikely to be same individual |
Each candidate shows which fields contributed and their individual scores:
Score: 0.72 (5 fields)
├── num_total_arms: 0.95
├── short_arm_code: 0.88
├── overall_color: 0.60
├── stripe_prominence: 0.55
└── location: 0.62
Interpretation:
- High arm/short-arm scores suggest morphological similarity
- Lower color scores may indicate lighting differences
- Check k_contrib (number of contributing fields) - more is better
| Scenario | Recommendation |
|---|---|
| Good photos, complete annotations | Use 50/50 fusion |
| Poor photos, good annotations | Weight metadata higher (fusion < 30%) |
| Good photos, sparse annotations | Weight visual higher (fusion > 70%) |
| Query has few distinctive features | Rely more on visual similarity |
| Gallery individual has changed over time | Rely more on metadata (arm codes change less than appearance) |
When you're confident a Query matches a Gallery individual:
- Go to Analytics & History tab
- Find the Yes decision
- Select the Gallery ID from the merge dropdown
- Click Merge
What happens:
- Query images move to Gallery individual's folder
- Query metadata is preserved in CSV
- Decision is logged in audit trail
- Query ID becomes inactive
For Maybe decisions:
- Wait for additional sightings
- Collect more photos at next encounter
- Focus annotation on distinguishing features
- Re-run comparison when more data available
Tips:
- Don't rush to merge uncertain matches
- A false positive (merging different individuals) is worse than false negative (keeping them separate)
- Use notes field to record what additional evidence would confirm the match
If you merged incorrectly:
- Go to Analytics & History tab → Revert section
- Select the Gallery ID
- Choose the batch to revert
- Click Revert Batch
Images return to original Query location.
| Variable | Default | Description |
|---|---|---|
STARBOARD_LOG_LEVEL |
INFO |
Logging verbosity: DEBUG, INFO, WARNING, ERROR |
STARBOARD_DUMP_RANK_CSV |
(unset) | Set to 1 to export each ranking to CSV |
STARBOARD_SESSION_ID |
(auto) | Custom session ID for log correlation |
Example:
# Windows PowerShell
$env:STARBOARD_LOG_LEVEL = "DEBUG"
python main.py
# Linux/macOS
STARBOARD_LOG_LEVEL=DEBUG python main.pyAccess via First-Order tab → Config button:
| Setting | Description |
|---|---|
| Enable/Disable Fields | Choose which fields contribute to ranking |
| Field Weights | Adjust relative importance (1.0 = normal) |
| Numeric Offsets | Shift query values for "what if" searches |
Preset Configurations:
| Preset | Fields Included |
|---|---|
| Average (All) | All non-empty fields equally weighted |
| Size Only | num_total_arms, tip_to_tip_size_cm |
| Colors Only | All color fields |
| Morphology | Arms, short arm codes, stripe characteristics |
| Text Only | Location and observation notes |
| Log | Location | Contents |
|---|---|---|
| Main log | archive/starboard.log |
Application events, search results |
| Interaction log | archive/logs/*.csv |
User interaction analytics |
| Rank exports | archive/logs/first_order_*.csv |
Per-query ranking details |
The search engine can be used directly for scripting or testing:
from src.search.engine import FirstOrderSearchEngine
# Initialize and build indices
engine = FirstOrderSearchEngine()
engine.rebuild()
# Run a search
results = engine.rank(
query_id="Q_2024_001",
include_fields={"num_total_arms", "short_arm_code", "overall_color"},
equalize_weights=True,
top_k=20,
)
# Process results
for item in results:
print(f"{item.gallery_id}: {item.score:.3f} ({item.k_contrib} fields)")
for field, score in item.field_breakdown.items():
print(f" {field}: {score:.3f}")from src.dl.similarity_lookup import get_visual_scores
from src.dl.registry import DLRegistry
# Get active model
registry = DLRegistry.load()
model_key = registry.active_model
# Get visual similarity scores for a query
if model_key:
scores = get_visual_scores("Q_2024_001", model_key)
# scores is Dict[str, float] mapping gallery_id -> similarity
for gid, score in sorted(scores.items(), key=lambda x: -x[1])[:10]:
print(f"{gid}: {score:.3f}")For advanced users who want to train custom models:
cd star_identification
# Train with temporal split (recommended)
python -m temporal_reid.train \
--dataset-root ./star_dataset \
--epochs 25 \
--batch-size 48 \
--gpus 0,1
# Grid search over loss configurations
python -m temporal_reid.grid_search \
--dataset-root ./star_dataset \
--epochs 25cd star_identification
# Pre-train on Wildlife10k
python -m megastarid.pretrain --epochs 50
# Fine-tune on star_dataset
python -m megastarid.finetune \
--checkpoint checkpoints/megastarid/pretrain/best.pth \
--epochs 100
# Or co-train on both
python -m megastarid.cotrain --epochs 100 --star-batch-ratio 0.3See star_identification/megastarid/readme.md and star_identification/temporal_reid/readme.md for detailed training documentation.
Implement the FieldScorer protocol:
from src.search.interfaces import FieldScorer
from typing import Dict, Any, Tuple
class MyCustomScorer:
name = "my_field"
def build_gallery(self, gallery_rows_by_id: Dict[str, Dict]) -> None:
"""Build index from gallery data."""
self._index = {}
for gid, row in gallery_rows_by_id.items():
value = row.get(self.name, "").strip()
if value:
self._index[gid] = self._parse(value)
def prepare_query(self, query_row: Dict) -> Any:
"""Prepare query state from row data."""
value = query_row.get(self.name, "").strip()
return self._parse(value) if value else None
def has_query_signal(self, query_state: Any) -> bool:
"""Check if query has usable data for this field."""
return query_state is not None
def score_pair(self, query_state: Any, gallery_id: str) -> Tuple[float, bool]:
"""Score query against gallery item. Returns (score, present)."""
if gallery_id not in self._index:
return 0.0, False
gallery_state = self._index[gallery_id]
score = self._compute_similarity(query_state, gallery_state)
return score, True
def _parse(self, value: str) -> Any:
"""Parse string value into internal representation."""
return value
def _compute_similarity(self, q: Any, g: Any) -> float:
"""Compute similarity in [0, 1]."""
return 1.0 if q == g else 0.0Register in src/search/engine.py:
def _build_scorers(self, use_bge: bool) -> None:
# ... existing scorers ...
self.scorers["my_field"] = MyCustomScorer("my_field")Symptoms: Error message when accessing DL features
Solutions:
- Check checkpoint path exists:
star_identification/checkpoints/megastarid/best.pth - Verify PyTorch version compatibility
- Check logs for specific error:
archive/starboard.log
Symptoms: Precomputation fails at Phase 1
Solutions:
- Install ultralytics:
pip install ultralytics - Check model file exists:
star_identification/wildlife_reid_inference/starseg_best.pt
Symptoms: Progress shows very low img/s rate
Solutions:
- Use "Fast" speed mode on CPU
- First run is slowest (building image cache)
- Subsequent runs skip Phase 1 if cache exists
- Consider GPU installation for 10x speedup
Symptoms: DL tab shows "Not Available" status
Solutions:
- Install DL dependencies:
pip install -r requirements-dl.txt - Verify PyTorch imports work:
python -c "import torch; print(torch.__version__)"
Symptoms: Slow/jerky image preview when scrolling
Solutions:
- Image strip uses scaled decoding; very large images may still be slow
- Reduce preview size in code if needed
- Use SSD storage for archive
Symptoms: First-order tab shows empty lineup
Causes:
- Query has no values for enabled fields
- No gallery individuals have matching fields
- Engine not rebuilt after data changes
Solutions:
- Enable more fields in search settings
- Check query has annotations saved
- Click "Rebuild" button
Symptoms: Error when trying to merge Query into Gallery
Solutions:
- Ensure Query ID exists and has images
- Check Gallery ID is valid
- Verify no file permission issues
- Check logs for specific error
- Check logs:
archive/starboard.logcontains detailed error information - Enable debug mode: Set
STARBOARD_LOG_LEVEL=DEBUG - File an issue: Include log excerpts and steps to reproduce
Contributions are welcome! Areas of interest:
- Field scorers for additional metadata types
- UI improvements for annotation efficiency
- Model architectures for improved re-identification
- Documentation and tutorials
# Clone repository
git clone <repository-url>
cd starBoard
# Create development environment
conda create -n starboard-dev python=3.9 -y
conda activate starboard-dev
# Install all dependencies
pip install -r requirements-dl.txt
pip install pytest black flake8
# Run application
python main.py- Follow PEP 8 guidelines
- Use type hints for function signatures
- Document public APIs with docstrings
- Log important operations with appropriate levels
If you use starBoard in your research, please cite:
@software{starboard2024,
title = {starBoard: Photo-ID and Morphometric Analysis Platform for Sea Star Conservation},
year = {2024},
url = {https://github.com/[repository]}
}- Friday Harbor Laboratories - Field site and specimen access
- Wildlife10k Dataset - Pre-training data for visual re-identification
- Meta AI Research - ConvNeXt architecture
- Ultralytics - YOLO implementation
- MegaDetector - Wildlife detection in camera trap images
- Wildlife ReID-10k - Benchmark dataset for wildlife re-identification
- Wildbook - Individual animal identification platform
[License information to be added]
starBoard is developed for sunflower sea star conservation research. For questions or collaborations, please contact the development team.