Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MohanadDiab/langrs

Repository files navigation

LangRS

Open In Colab PyPI version

LangRS Logo

A modern, extensible Python package for zero-shot segmentation of aerial images using Rex-Omni or Grounding DINO with the Segment Anything Model (SAM).

Introduction

LangRS is a Python package for remote sensing image segmentation that combines advanced techniques like bounding box detection, semantic segmentation, and outlier rejection to deliver precise and reliable segmentation of geospatial images. Built with modern Python best practices, SOLID principles, and a modular architecture for easy extension.

How it works

Performance Comparison

📊 Package Performance vs Ground Truth

Performance Comparison

🔄 Direct Comparison with SAMGEO Package

Comparison with Older Package

Features

  • Bounding Box Detection: Locate objects in remote sensing images with a sliding window approach.
  • Outlier Detection: Apply various statistical and machine learning methods to filter out anomalies in the detected objects based on the area of the detected bounding boxes.
  • Non-Max Suppression: Applies NMS to the input bounding boxes, can reduce accuracy slightly, but greatly increases inference speed and lowers memory usage.
  • Area Calculation: Compute and rank bounding boxes by their areas.
  • Image Segmentation: Detect and extract objects based on text prompts using Rex-Omni (default) or Grounding DINO, plus SAM.
  • Modern Architecture: Built with SOLID principles, dependency injection, and abstract base classes for easy extension.
  • Geospatial Support: Automatic CRS extraction and shapefile export for bounding boxes and masks.

Installation

PyPI installs

LangRS includes a direct Rex-Omni implementation under langrs/rex_omni/ (wrapper, parser, tasks, and visualization helpers). For standard LangRS detection workflows, you do not need to clone the upstream Rex-Omni repository separately. Optional heavy GPU dependencies (for example flash-attn and/or vLLM) may still be required depending on your environment and chosen backend. For that reason:

  • pip install langrs — core runtime.
  • pip install "langrs[rex-omni]" — optional heavy GPU extras for the default Rex-Omni detection path (e.g. flash-attn / vLLM).
  • pip install "langrs[dino]" — optional Grounding DINO only. Do not combine with [rex-omni] in the same environment if you hit version conflicts; use separate virtual environments.

Recommended CUDA install for Rex-Omni

Rex-Omni with the transformers backend is CUDA-only in LangRS. Install a compatible CUDA PyTorch build first, then install Rex-Omni extras:

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install "langrs[rex-omni]"

If your cluster/driver stack is different, choose a matching CUDA wheel from the PyTorch index URL and keep langrs[rex-omni] as the second step.

Licensing / notices: see THIRD_PARTY_NOTICES.md.

Migration (previous single requirements.txt): Grounding DINO is no longer in the default dependency set. To keep the old detector, use pip install "langrs[dino]" and pass detection_model="grounding_dino" when constructing LangRS.

Install from source (development)

git clone https://github.com/MohanadDiab/langrs.git
cd langrs
pip install -r requirements.txt
pip install -e ".[rex-omni]"

Optional files: requirements-core.txt (runtime core), requirements-dino.txt (DINO-only pins), requirements-dev.txt (pytest).

Usage

Rex-Omni Prompt Format

When using the default detection_model="rex_omni", pass text_prompt as a comma-separated category list (for example "building, road, solar panel"). This aligns with Rex-Omni's category-driven detection prompt path.

Quick Start

Here is the simplest way to use LangRS:

from langrs import LangRS

# Create LangRS with default settings
langrs = LangRS(output_path="output")

# Run the complete pipeline
masks = langrs.run_full_pipeline(
    image_source="path_to_your_tif_file",
    text_prompt="roof",
    window_size=600,
    overlap=300,
    box_threshold=0.25,
    text_threshold=0.25,
)

Step-by-Step Usage

For more control over the pipeline:

from langrs import LangRS

# Create LangRS
langrs = LangRS(output_path="output")

# Load image
langrs.load_image("path_to_your_tif_file")

# Detect objects
boxes = langrs.detect_objects(
    text_prompt="roof",
    window_size=600,
    overlap=300,
    box_threshold=0.25,
    text_threshold=0.25,
)

# Apply outlier rejection
# This will return a dict with the following keys:
# ['zscore', 'iqr', 'svm', 'svm_sgd', 'robust_covariance', 'lof', 'isolation_forest']
# The value of each key represents the bounding boxes from the previous step with the 
# outlier rejection method of the key's name applied to them
bboxes_filtered = langrs.filter_outliers()

# Retrieve certain bounding boxes 
bboxes_zscore = bboxes_filtered['zscore']

# Generate segmentation masks for the filtered bounding boxes
masks = langrs.segment(boxes=bboxes_zscore)

Advanced Usage with Custom Configuration

from langrs import LangRS, LangRSConfig

# Create custom configuration
config = LangRSConfig()
config.detection.box_threshold = 0.25
config.detection.text_threshold = 0.25
config.detection.window_size = 600
config.detection.overlap = 300
config.outlier_detection.zscore_threshold = 2.5

# Create LangRS with custom settings
langrs = LangRS(
    output_path="output",
    device="cpu",  # or "cuda" for GPU
    config=config,
)

# Use LangRS
masks = langrs.run_full_pipeline("path_to_your_tif_file", "roof")

Input Parameters

LangRS() Initialization:

  • output_path: Directory to save output files
  • detection_model: Name of detection model (default: "rex_omni")
  • segmentation_model: Name of segmentation model (default: "sam")
  • device: Device to use ('cpu' or 'cuda', default: auto-detect)
  • config: Optional LangRSConfig object

detect_objects():

  • text_prompt: Text description of objects to detect. For rex_omni, prefer comma-separated categories (for example "building, road").
  • window_size (int): Size of each chunk for processing. Default is 500.
  • overlap (int): Overlap size between chunks. Default is 200.
  • box_threshold (float): Confidence threshold for box detection. Default is 0.5.
  • text_threshold (float): Confidence threshold for text detection. Default is 0.5.

Advanced: Custom Rex-Omni Initialization

If you need backend-specific configuration (for example forcing CPU placement or using a non-default backend), initialize a custom detector and inject it into LangRS:

from langrs import LangRS
from langrs.models.detection.rex_omni import RexOmniDetector

detection_model = RexOmniDetector(
    model_path="IDEA-Research/Rex-Omni",
    backend="transformers",  # or "vllm" when environment supports it
    device="cpu",
)

langrs = LangRS(
    output_path="output",
    _detection_model_instance=detection_model,
)

filter_outliers():

  • method (optional): Specific method to apply. If None, applies all methods.
  • Returns a dictionary with keys: ['zscore', 'iqr', 'svm', 'svm_sgd', 'robust_covariance', 'lof', 'isolation_forest']

segment():

  • boxes (optional): List of bounding boxes. If None, uses detected boxes.
  • window_size (int): Window size for tiling. Default from config.
  • overlap (int): Overlap between windows. Default from config.

Output

When the code runs, it generates the following outputs:

  1. Original Image with Bounding Boxes: Shows the detected bounding boxes.
  2. Filtered Bounding Boxes: Bounding boxes after applying outlier rejection.
  3. Segmentation Masks: Overlays segmentation masks on the original image.
  4. Area Plot: A scatter plot of bounding box areas to visualize distributions.
  5. Geospatial Files: Shapefiles for bounding boxes and masks (if GeoTIFF input).

The results are saved in the specified output directory, organized with a timestamp to separate runs.

Examples

See the examples/ directory for:

  • basic_usage.py - Simple usage
  • advanced_usage.py - Advanced features
  • step_by_step.py - Step-by-step execution
  • custom_models.py - Custom model selection
  • configuration_example.py - Configuration management

Citation

@article{DIAB2025100105,
title = {Optimizing zero-shot text-based segmentation of remote sensing imagery using SAM and Grounding DINO},
journal = {Artificial Intelligence in Geosciences},
volume = {6},
number = {1},
pages = {100105},
year = {2025},
issn = {2666-5441},
doi = {https://doi.org/10.1016/j.aiig.2025.100105},
url = {https://www.sciencedirect.com/science/article/pii/S2666544125000012},
author = {Mohanad Diab and Polychronis Kolokoussis and Maria Antonia Brovelli},
keywords = {Foundation models, Multi-modal models, Vision language models, Semantic segmentation, Segment anything model, Earth observation, Remote sensing},
}

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines on:

  • Development setup
  • Code style
  • Testing
  • Pull request process

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For any questions or issues, please open an issue on GitHub or contact the project maintainers.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages