FrameSkipSAM: Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation

This repository contains the implementation of Memory-Guided Frame Skipping (MGFS) for SAM 2, a novel approach to accelerate video object segmentation while maintaining high accuracy. Our work extends Meta's SAM 2 (Segment Anything Model 2) with intelligent frame-skipping strategies that significantly improve inference speed.

Overview

Segment Anything Model 2 (SAM 2) is Meta AI's foundation model for promptable visual segmentation in images and videos. While SAM 2 achieves state-of-the-art accuracy, its computational demands limit real-time applications. This project introduces Memory-Guided Frame Skipping, which intelligently skips frames with minimal scene changes, achieving up to 4.3× speedup with minimal accuracy loss.

Key Features

Memory-Guided Frame Skipping (MGFS): Intelligent frame-skipping based on temporal changes
Multiple Strategies: Naive and Mask-Aware implementations with optional optical flow
Comprehensive Evaluation: Tested on DAVIS 2017 dataset with J&F metrics
Production-Ready: Drop-in replacement for SAM 2's video predictor
Configurable Thresholds: Tunable skip thresholds to balance speed vs. accuracy

Performance Highlights

Method	Threshold	FPS	J&F Mean	Speedup
Baseline	-	4.1	0.419	1.0×
MGFS (Naive)	0.05	5.23	0.415	1.28×
MGFS (Naive)	0.15	17.81	0.380	4.34×
MGFS (Mask-Aware)	0.05	4.23	0.417	1.03×
MGFS (Mask-Aware)	0.10	6.28	0.388	1.53×

Results on DAVIS 2017 validation set. See our paper for full details.

Research Paper

📄 Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation

Our paper presents a comprehensive analysis of frame-skipping strategies for SAM 2, including:

Theoretical framework for memory-guided segmentation
Comparison of naive vs. mask-aware skipping approaches
Evaluation with and without optical flow
Speed-accuracy tradeoff analysis across multiple thresholds

Evaluation Results

Comprehensive evaluation results and prediction masks are available at:

Evaluation Repository: https://github.com/bchou9/davis2017eval
Contains DAVIS 2017 predictions for all MGFS variants
Includes baseline comparisons and detailed per-sequence metrics

Original SAM 2 Information

AI at Meta, FAIR

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer

[SAM 2 Paper] [Project] [Demo] [Dataset] [Blog]

Getting Started with Frame Skipping

Quick Start: Memory-Guided Frame Skipping

To use MGFS with SAM 2 for video segmentation:

import torch
from sam2.build_sam import build_sam2_video_predictor

# Build predictor with frame skipping
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor(model_cfg, checkpoint)

# Configure skip threshold (default: 0.05)
# Lower = more conservative (fewer skips), Higher = more aggressive (more skips)
predictor.skip_mad_threshold = 0.10  # 10% change threshold

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(video_path="<your_video_dir>")

    # Add initial prompt (e.g., first frame mask)
    predictor.add_new_mask(state, frame_idx=0, obj_id=1, mask=<your_mask>)

    # Propagate with automatic frame skipping
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        # Process results - skipped frames reuse previous predictions
        ...

Running MGFS on DAVIS 2017

We provide scripts to reproduce our DAVIS 2017 evaluation:

# Run MGFS inference on all DAVIS sequences
python run_mgfs_davis.py

# Evaluate predictions against ground truth
python eval_davis_jf.py \
    --gt_root ./datasets/DAVIS/DAVIS2017/DAVIS \
    --pred1 ./predictions/DAVIS2017_baseline \
    --pred2 ./predictions/DAVIS2017

Or use the Jupyter notebook for interactive exploration:

jupyter notebook run_mgfs_davis.ipynb

Frame Skipping Strategies

This repository implements multiple frame-skipping approaches:

Naive Frame Skipping: Skips frames based on whole-frame pixel difference
- Fast and simple
- Works well for static camera scenarios
- Configure via skip_mad_threshold
Mask-Aware Frame Skipping: Only analyzes regions of interest
- Focuses on object regions
- Better for dynamic backgrounds
- Slightly slower but more accurate
Optical Flow Enhancement: Uses dense optical flow for mask warping
- Can improve accuracy in some scenarios
- Higher computational cost
- See sam2/utils/optical_flow.py

Threshold Selection Guide

Choose your skip_mad_threshold based on your speed vs. accuracy requirements:

0.05 (Conservative): ~1.3× speedup, minimal accuracy loss (~0.4% J&F drop)
0.07 (Balanced): ~1.4× speedup, small accuracy loss (~0.7% J&F drop)
0.10 (Moderate): ~2.3× speedup, moderate accuracy loss (~1.8% J&F drop)
0.15 (Aggressive): ~4.3× speedup, larger accuracy loss (~3.9% J&F drop)

Latest Updates (Original SAM 2)

12/11/2024 -- full model compilation for a major VOS speedup and a new SAM2VideoPredictor to better handle multi-object tracking

We now support torch.compile of the entire SAM 2 model on videos, which can be turned on by setting vos_optimized=True in build_sam2_video_predictor, leading to a major speedup for VOS inference.
We update the implementation of SAM2VideoPredictor to support independent per-object inference, allowing us to relax the assumption of prompting for multi-object tracking and adding new objects after tracking starts.
See RELEASE_NOTES.md for full details.

09/30/2024 -- SAM 2.1 Developer Suite (new checkpoints, training code, web demo) is released

A new suite of improved model checkpoints (denoted as SAM 2.1) are released. See Model Description for details.
- To use the new SAM 2.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, please first uninstall the previous version via pip uninstall SAM-2, pull the latest code from this repo (with git pull), and then reinstall the repo following Installation below.
The training (and fine-tuning) code has been released. See training/README.md on how to get started.
The frontend + backend code for the SAM 2 web demo has been released. See demo/README.md for details.

Installation

SAM 2 with Frame Skipping needs to be installed first before use. The code requires python>=3.10, as well as torch>=2.5.1 and torchvision>=0.20.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies.

Install FrameSkipSAM

git clone https://github.com/bchou9/FrameSkipSAM.git && cd FrameSkipSAM

pip install -e .

Additional Dependencies for Frame Skipping

The MGFS implementation requires OpenCV for change detection and optical flow:

pip install opencv-python numpy

If you are installing on Windows, it's strongly recommended to use Windows Subsystem for Linux (WSL) with Ubuntu.

To use the SAM 2 predictor and run the example notebooks, jupyter and matplotlib are required and can be installed by:

pip install -e ".[notebooks]"

For DAVIS Evaluation

To reproduce our DAVIS 2017 evaluation results:

# Install evaluation dependencies
pip install tqdm imageio

# Download DAVIS 2017 dataset (follow instructions at https://davischallenge.org/)
# Place in ./datasets/DAVIS/DAVIS2017/DAVIS/

Note:

It's recommended to create a new Python environment via Anaconda for this installation and install PyTorch 2.5.1 (or higher) via pip following https://pytorch.org/. If you have a PyTorch version lower than 2.5.1 in your current environment, the installation command above will try to upgrade it to the latest PyTorch version using pip.
The step above requires compiling a custom CUDA kernel with the nvcc compiler. If it isn't already available on your machine, please install the CUDA toolkits with a version that matches your PyTorch CUDA version.
If you see a message like Failed to build the SAM 2 CUDA extension during installation, you can ignore it and still use SAM 2 (some post-processing functionality may be limited, but it doesn't affect the results in most cases).

Please see INSTALL.md for FAQs on potential issues and solutions.

Getting Started

Download Checkpoints

First, we need to download a model checkpoint. All the model checkpoints can be downloaded by running:

cd checkpoints && \
./download_ckpts.sh && \
cd ..

or individually from:

(note that these are the improved checkpoints denoted as SAM 2.1; see Model Description for details.)

Then SAM 2 can be used in a few lines as follows for image and video prediction.

Video prediction with Frame Skipping

For promptable segmentation and tracking in videos with intelligent frame skipping, we provide an enhanced video predictor. The frame-skipping logic is built directly into the propagate_in_video method:

import torch
from sam2.build_sam import build_sam2_video_predictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor(model_cfg, checkpoint)

# Configure frame skipping threshold
predictor.skip_mad_threshold = 0.05  # Skip if <5% pixel change

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)

    # propagate the prompts to get masklets throughout the video
    # Frame skipping happens automatically based on temporal changes
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        # Frames with minimal changes reuse previous predictions (much faster!)
        # You'll see console output: "Skipping frame X due to low MAD (Y)"
        ...

How it works: During propagation, each frame is compared to the previous frame using Mean Absolute Difference (MAD). If MAD is below skip_mad_threshold, the previous frame's masks are reused, skipping expensive inference. This dramatically speeds up videos with static scenes or slow camera motion.

Please refer to the examples in video_predictor_example.ipynb for details on how to add click or box prompts, make refinements, and track multiple objects in videos.

Image prediction

SAM 2 has all the capabilities of SAM on static images, and we provide image prediction APIs that closely resemble SAM for image use cases. The SAM2ImagePredictor class has an easy interface for image prompting.

import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)

Please refer to the examples in image_predictor_example.ipynb (also in Colab here) for static image use cases.

SAM 2 also supports automatic mask generation on images just like SAM. Please see automatic_mask_generator_example.ipynb (also in Colab here) for automatic mask generation in images.

Load from 🤗 Hugging Face

Alternatively, models can also be loaded from Hugging Face (requires pip install huggingface_hub).

For image prediction:

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)

For video prediction:

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large")

# Enable frame skipping (MGFS extension)
predictor.skip_mad_threshold = 0.05  # Adjust threshold as needed

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)

    # propagate the prompts to get masklets throughout the video
    # Frame skipping automatically applied
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

Implementation Details

Core MGFS Components

The Memory-Guided Frame Skipping implementation consists of several key components:

sam2/sam2_video_predictor.py: Enhanced video predictor with frame-skipping logic
- skip_mad_threshold: Configurable threshold parameter (default: 0.05)
- _mean_abs_diff(): Computes Mean Absolute Difference between consecutive frames
- Modified propagate_in_video(): Implements the frame-skipping decision logic
sam2/utils/change_detection.py: Temporal change detection utilities
- Frame comparison and threshold-based skip decisions
- Handles both PyTorch tensors and NumPy arrays
sam2/utils/optical_flow.py: Optical flow-based mask warping (optional)
- Dense optical flow computation using OpenCV
- Forward mask warping for improved accuracy
run_mgfs_davis.py: DAVIS 2017 evaluation script
- Batch processing of video sequences
- Automatic mask generation and saving
eval_davis_jf.py: J&F metrics evaluation
- Region similarity (J) and contour accuracy (F) computation
- Comparative evaluation between baseline and MGFS

How Frame Skipping Works

During video propagation, the algorithm:

Computes temporal change: Calculates Mean Absolute Difference (MAD) between current and previous frames
Makes skip decision: If MAD < skip_mad_threshold, the frame is skipped
Reuses predictions: Skipped frames use cached mask outputs from the previous frame
Maintains memory: Only non-skipped frames update the memory bank

This approach achieves significant speedups by avoiding expensive transformer inference on frames with minimal scene changes, while maintaining the temporal consistency benefits of SAM 2's streaming memory architecture.

Repository Structure

FrameSkipSAM/
├── sam2/                           # Core SAM 2 model with MGFS extensions
│   ├── sam2_video_predictor.py    # Enhanced video predictor with frame skipping
│   ├── utils/
│   │   ├── change_detection.py    # Temporal change detection
│   │   └── optical_flow.py        # Optical flow utilities
│   └── ...
├── run_mgfs_davis.py              # DAVIS evaluation script
├── run_mgfs_davis.ipynb           # Interactive notebook for DAVIS
├── eval_davis_jf.py               # J&F metrics evaluation
├── results.ipynb                  # Results visualization
├── convert.ipynb                  # Prediction format conversion
├── predictions/                    # Generated predictions
│   └── DAVIS2017/                 # DAVIS 2017 predictions
└── Memory_Guided_Frame_Skipping_for_Real_Time_SAM_2_Video_Segmentation.pdf

Model Description

SAM 2.1 checkpoints

The table below shows the improved SAM 2.1 checkpoints released on September 29, 2024.

Model	Size (M)	Speed (FPS)	SA-V test (J&F)	MOSE val (J&F)	LVOS v2 (J&F)
sam2.1_hiera_tiny (config, checkpoint)	38.9	91.2	76.5	71.8	77.3
sam2.1_hiera_small (config, checkpoint)	46	84.8	76.6	73.5	78.3
sam2.1_hiera_base_plus (config, checkpoint)	80.8	64.1	78.2	73.7	78.2
sam2.1_hiera_large (config, checkpoint)	224.4	39.5	79.5	74.6	80.6

SAM 2 checkpoints

The previous SAM 2 checkpoints released on July 29, 2024 can be found as follows:

Model	Size (M)	Speed (FPS)	SA-V test (J&F)	MOSE val (J&F)	LVOS v2 (J&F)
sam2_hiera_tiny (config, checkpoint)	38.9	91.5	75.0	70.9	75.3
sam2_hiera_small (config, checkpoint)	46	85.6	74.9	71.5	76.4
sam2_hiera_base_plus (config, checkpoint)	80.8	64.8	74.7	72.8	75.8
sam2_hiera_large (config, checkpoint)	224.4	39.7	76.0	74.6	79.8

Speed measured on an A100 with torch 2.5.1, cuda 12.4. See benchmark.py for an example on benchmarking (compiling all the model components). Compiling only the image encoder can be more flexible and also provide (a smaller) speed-up (set compile_image_encoder: True in the config).

Segment Anything Video Dataset

See sav_dataset/README.md for details.

Training SAM 2

You can train or fine-tune SAM 2 on custom datasets of images, videos, or both. Please check the training README on how to get started.

Web demo for SAM 2

We have released the frontend + backend code for the SAM 2 web demo (a locally deployable version similar to https://sam2.metademolab.com/demo). Please see the web demo README for details.

License

The SAM 2 model checkpoints, SAM 2 demo code (front-end and back-end), and SAM 2 training code are licensed under Apache 2.0, however the Inter Font and Noto Color Emoji used in the SAM 2 demo code are made available under the SIL Open Font License, version 1.1.

Contributing

See contributing and the code of conduct.

Contributors

FrameSkipSAM (MGFS) Contributors

The Memory-Guided Frame Skipping extension was created by Henry Chou (Head Developer and Team Lead), Raymond Kang, Wei Shao, Yiqiao Lin. For questions or contributions related to the frame-skipping implementation, please open an issue or submit a pull request.

Original SAM 2 Contributors

The SAM 2 project was made possible with the help of many contributors (alphabetical):

Karen Bergan, Daniel Bolya, Alex Bosenberg, Kai Brown, Vispi Cassod, Christopher Chedeau, Ida Cheng, Luc Dahlin, Shoubhik Debnath, Rene Martinez Doehner, Grant Gardner, Sahir Gomez, Rishi Godugu, Baishan Guo, Caleb Ho, Andrew Huang, Somya Jain, Bob Kamma, Amanda Kallet, Jake Kinney, Alexander Kirillov, Shiva Koduvayur, Devansh Kukreja, Robert Kuo, Aohan Lin, Parth Malani, Jitendra Malik, Mallika Malhotra, Miguel Martin, Alexander Miller, Sasha Mitts, William Ngan, George Orlin, Joelle Pineau, Kate Saenko, Rodrick Shepard, Azita Shokrpour, David Soofian, Jonathan Torres, Jenny Truong, Sagar Vaze, Meng Wang, Claudette Ward, Pengchuan Zhang.

Third-party code: we use a GPU-based connected component algorithm adapted from cc_torch (with its license in LICENSE_cctorch) as an optional post-processing step for the mask predictions.

Citations

Citing FrameSkipSAM (Memory-Guided Frame Skipping)

If you use the Memory-Guided Frame Skipping implementation in your research, please cite our work:

@article{frameskipsam2025,
  title={Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation},
  author={Henry Chou, Raymond Kang, Wei Shao, Yiqiao Lin},
  year={2025},
  note={Available at: https://github.com/bchou9/FrameSkipSAM}
}

Citing SAM 2

If you use SAM 2 or the SA-V dataset in your research, please use the following BibTeX entry.

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
assets		assets
checkpoints		checkpoints
davis2017-evaluation		davis2017-evaluation
demo		demo
notebooks		notebooks
predictions		predictions
sam2		sam2
sav_dataset		sav_dataset
tools		tools
training		training
.clang-format		.clang-format
.gitignore		.gitignore
.watchmanconfig		.watchmanconfig
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
LICENSE_cctorch		LICENSE_cctorch
MANIFEST.in		MANIFEST.in
Memory_Guided_Frame_Skipping_for_Real_Time_SAM_2_Video_Segmentation.pdf		Memory_Guided_Frame_Skipping_for_Real_Time_SAM_2_Video_Segmentation.pdf
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
backend.Dockerfile		backend.Dockerfile
convert.ipynb		convert.ipynb
docker-compose.yaml		docker-compose.yaml
eval_davis_jf.py		eval_davis_jf.py
pyproject.toml		pyproject.toml
results.ipynb		results.ipynb
run_mgfs_davis.ipynb		run_mgfs_davis.ipynb
run_mgfs_davis.py		run_mgfs_davis.py
setup.py		setup.py

License

Licenses found

bchou9/FrameSkipSAM

Folders and files

Latest commit

History

Repository files navigation

FrameSkipSAM: Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation

Overview

Key Features

Performance Highlights

Research Paper

Evaluation Results

Original SAM 2 Information

Getting Started with Frame Skipping

Quick Start: Memory-Guided Frame Skipping

Running MGFS on DAVIS 2017

Frame Skipping Strategies

Threshold Selection Guide

Latest Updates (Original SAM 2)

Installation

Install FrameSkipSAM

Additional Dependencies for Frame Skipping

For DAVIS Evaluation

Getting Started

Download Checkpoints

Video prediction with Frame Skipping

Image prediction

Load from 🤗 Hugging Face

Implementation Details

Core MGFS Components

How Frame Skipping Works

Repository Structure

Model Description

SAM 2.1 checkpoints

SAM 2 checkpoints

Segment Anything Video Dataset

Training SAM 2

Web demo for SAM 2

License

Contributing

Contributors

FrameSkipSAM (MGFS) Contributors

Original SAM 2 Contributors

Citations

Citing FrameSkipSAM (Memory-Guided Frame Skipping)

Citing SAM 2

About

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 13

Uh oh!

Languages

Packages