This repository contains the implementation of Memory-Guided Frame Skipping (MGFS) for SAM 2, a novel approach to accelerate video object segmentation while maintaining high accuracy. Our work extends Meta's SAM 2 (Segment Anything Model 2) with intelligent frame-skipping strategies that significantly improve inference speed.
Segment Anything Model 2 (SAM 2) is Meta AI's foundation model for promptable visual segmentation in images and videos. While SAM 2 achieves state-of-the-art accuracy, its computational demands limit real-time applications. This project introduces Memory-Guided Frame Skipping, which intelligently skips frames with minimal scene changes, achieving up to 4.3× speedup with minimal accuracy loss.
- Memory-Guided Frame Skipping (MGFS): Intelligent frame-skipping based on temporal changes
- Multiple Strategies: Naive and Mask-Aware implementations with optional optical flow
- Comprehensive Evaluation: Tested on DAVIS 2017 dataset with J&F metrics
- Production-Ready: Drop-in replacement for SAM 2's video predictor
- Configurable Thresholds: Tunable skip thresholds to balance speed vs. accuracy
| Method | Threshold | FPS | J&F Mean | Speedup |
|---|---|---|---|---|
| Baseline | - | 4.1 | 0.419 | 1.0× |
| MGFS (Naive) | 0.05 | 5.23 | 0.415 | 1.28× |
| MGFS (Naive) | 0.15 | 17.81 | 0.380 | 4.34× |
| MGFS (Mask-Aware) | 0.05 | 4.23 | 0.417 | 1.03× |
| MGFS (Mask-Aware) | 0.10 | 6.28 | 0.388 | 1.53× |
Results on DAVIS 2017 validation set. See our paper for full details.
📄 Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation
Our paper presents a comprehensive analysis of frame-skipping strategies for SAM 2, including:
- Theoretical framework for memory-guided segmentation
- Comparison of naive vs. mask-aware skipping approaches
- Evaluation with and without optical flow
- Speed-accuracy tradeoff analysis across multiple thresholds
Comprehensive evaluation results and prediction masks are available at:
- Evaluation Repository: https://github.com/bchou9/davis2017eval
- Contains DAVIS 2017 predictions for all MGFS variants
- Includes baseline comparisons and detailed per-sequence metrics
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer
[SAM 2 Paper] [Project] [Demo] [Dataset] [Blog]
To use MGFS with SAM 2 for video segmentation:
import torch
from sam2.build_sam import build_sam2_video_predictor
# Build predictor with frame skipping
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor(model_cfg, checkpoint)
# Configure skip threshold (default: 0.05)
# Lower = more conservative (fewer skips), Higher = more aggressive (more skips)
predictor.skip_mad_threshold = 0.10 # 10% change threshold
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(video_path="<your_video_dir>")
# Add initial prompt (e.g., first frame mask)
predictor.add_new_mask(state, frame_idx=0, obj_id=1, mask=<your_mask>)
# Propagate with automatic frame skipping
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
# Process results - skipped frames reuse previous predictions
...We provide scripts to reproduce our DAVIS 2017 evaluation:
# Run MGFS inference on all DAVIS sequences
python run_mgfs_davis.py
# Evaluate predictions against ground truth
python eval_davis_jf.py \
--gt_root ./datasets/DAVIS/DAVIS2017/DAVIS \
--pred1 ./predictions/DAVIS2017_baseline \
--pred2 ./predictions/DAVIS2017Or use the Jupyter notebook for interactive exploration:
jupyter notebook run_mgfs_davis.ipynbThis repository implements multiple frame-skipping approaches:
-
Naive Frame Skipping: Skips frames based on whole-frame pixel difference
- Fast and simple
- Works well for static camera scenarios
- Configure via
skip_mad_threshold
-
Mask-Aware Frame Skipping: Only analyzes regions of interest
- Focuses on object regions
- Better for dynamic backgrounds
- Slightly slower but more accurate
-
Optical Flow Enhancement: Uses dense optical flow for mask warping
- Can improve accuracy in some scenarios
- Higher computational cost
- See
sam2/utils/optical_flow.py
Choose your skip_mad_threshold based on your speed vs. accuracy requirements:
- 0.05 (Conservative): ~1.3× speedup, minimal accuracy loss (~0.4% J&F drop)
- 0.07 (Balanced): ~1.4× speedup, small accuracy loss (~0.7% J&F drop)
- 0.10 (Moderate): ~2.3× speedup, moderate accuracy loss (~1.8% J&F drop)
- 0.15 (Aggressive): ~4.3× speedup, larger accuracy loss (~3.9% J&F drop)
12/11/2024 -- full model compilation for a major VOS speedup and a new SAM2VideoPredictor to better handle multi-object tracking
- We now support
torch.compileof the entire SAM 2 model on videos, which can be turned on by settingvos_optimized=Trueinbuild_sam2_video_predictor, leading to a major speedup for VOS inference. - We update the implementation of
SAM2VideoPredictorto support independent per-object inference, allowing us to relax the assumption of prompting for multi-object tracking and adding new objects after tracking starts. - See
RELEASE_NOTES.mdfor full details.
09/30/2024 -- SAM 2.1 Developer Suite (new checkpoints, training code, web demo) is released
- A new suite of improved model checkpoints (denoted as SAM 2.1) are released. See Model Description for details.
- To use the new SAM 2.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, please first uninstall the previous version via
pip uninstall SAM-2, pull the latest code from this repo (withgit pull), and then reinstall the repo following Installation below.
- To use the new SAM 2.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, please first uninstall the previous version via
- The training (and fine-tuning) code has been released. See
training/README.mdon how to get started. - The frontend + backend code for the SAM 2 web demo has been released. See
demo/README.mdfor details.
SAM 2 with Frame Skipping needs to be installed first before use. The code requires python>=3.10, as well as torch>=2.5.1 and torchvision>=0.20.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies.
git clone https://github.com/bchou9/FrameSkipSAM.git && cd FrameSkipSAM
pip install -e .The MGFS implementation requires OpenCV for change detection and optical flow:
pip install opencv-python numpyIf you are installing on Windows, it's strongly recommended to use Windows Subsystem for Linux (WSL) with Ubuntu.
To use the SAM 2 predictor and run the example notebooks, jupyter and matplotlib are required and can be installed by:
pip install -e ".[notebooks]"To reproduce our DAVIS 2017 evaluation results:
# Install evaluation dependencies
pip install tqdm imageio
# Download DAVIS 2017 dataset (follow instructions at https://davischallenge.org/)
# Place in ./datasets/DAVIS/DAVIS2017/DAVIS/Note:
- It's recommended to create a new Python environment via Anaconda for this installation and install PyTorch 2.5.1 (or higher) via
pipfollowing https://pytorch.org/. If you have a PyTorch version lower than 2.5.1 in your current environment, the installation command above will try to upgrade it to the latest PyTorch version usingpip. - The step above requires compiling a custom CUDA kernel with the
nvcccompiler. If it isn't already available on your machine, please install the CUDA toolkits with a version that matches your PyTorch CUDA version. - If you see a message like
Failed to build the SAM 2 CUDA extensionduring installation, you can ignore it and still use SAM 2 (some post-processing functionality may be limited, but it doesn't affect the results in most cases).
Please see INSTALL.md for FAQs on potential issues and solutions.
First, we need to download a model checkpoint. All the model checkpoints can be downloaded by running:
cd checkpoints && \
./download_ckpts.sh && \
cd ..or individually from:
(note that these are the improved checkpoints denoted as SAM 2.1; see Model Description for details.)
Then SAM 2 can be used in a few lines as follows for image and video prediction.
For promptable segmentation and tracking in videos with intelligent frame skipping, we provide an enhanced video predictor. The frame-skipping logic is built directly into the propagate_in_video method:
import torch
from sam2.build_sam import build_sam2_video_predictor
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor(model_cfg, checkpoint)
# Configure frame skipping threshold
predictor.skip_mad_threshold = 0.05 # Skip if <5% pixel change
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(<your_video>)
# add new prompts and instantly get the output on the same frame
frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)
# propagate the prompts to get masklets throughout the video
# Frame skipping happens automatically based on temporal changes
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
# Frames with minimal changes reuse previous predictions (much faster!)
# You'll see console output: "Skipping frame X due to low MAD (Y)"
...How it works: During propagation, each frame is compared to the previous frame using Mean Absolute Difference (MAD). If MAD is below skip_mad_threshold, the previous frame's masks are reused, skipping expensive inference. This dramatically speeds up videos with static scenes or slow camera motion.
Please refer to the examples in video_predictor_example.ipynb for details on how to add click or box prompts, make refinements, and track multiple objects in videos.
SAM 2 has all the capabilities of SAM on static images, and we provide image prediction APIs that closely resemble SAM for image use cases. The SAM2ImagePredictor class has an easy interface for image prompting.
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)Please refer to the examples in image_predictor_example.ipynb (also in Colab here) for static image use cases.
SAM 2 also supports automatic mask generation on images just like SAM. Please see automatic_mask_generator_example.ipynb (also in Colab here) for automatic mask generation in images.
Alternatively, models can also be loaded from Hugging Face (requires pip install huggingface_hub).
For image prediction:
import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor
predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)For video prediction:
import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor
predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large")
# Enable frame skipping (MGFS extension)
predictor.skip_mad_threshold = 0.05 # Adjust threshold as needed
with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
state = predictor.init_state(<your_video>)
# add new prompts and instantly get the output on the same frame
frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)
# propagate the prompts to get masklets throughout the video
# Frame skipping automatically applied
for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
...The Memory-Guided Frame Skipping implementation consists of several key components:
-
sam2/sam2_video_predictor.py: Enhanced video predictor with frame-skipping logicskip_mad_threshold: Configurable threshold parameter (default: 0.05)_mean_abs_diff(): Computes Mean Absolute Difference between consecutive frames- Modified
propagate_in_video(): Implements the frame-skipping decision logic
-
sam2/utils/change_detection.py: Temporal change detection utilities- Frame comparison and threshold-based skip decisions
- Handles both PyTorch tensors and NumPy arrays
-
sam2/utils/optical_flow.py: Optical flow-based mask warping (optional)- Dense optical flow computation using OpenCV
- Forward mask warping for improved accuracy
-
run_mgfs_davis.py: DAVIS 2017 evaluation script- Batch processing of video sequences
- Automatic mask generation and saving
-
eval_davis_jf.py: J&F metrics evaluation- Region similarity (J) and contour accuracy (F) computation
- Comparative evaluation between baseline and MGFS
During video propagation, the algorithm:
- Computes temporal change: Calculates Mean Absolute Difference (MAD) between current and previous frames
- Makes skip decision: If MAD <
skip_mad_threshold, the frame is skipped - Reuses predictions: Skipped frames use cached mask outputs from the previous frame
- Maintains memory: Only non-skipped frames update the memory bank
This approach achieves significant speedups by avoiding expensive transformer inference on frames with minimal scene changes, while maintaining the temporal consistency benefits of SAM 2's streaming memory architecture.
FrameSkipSAM/
├── sam2/ # Core SAM 2 model with MGFS extensions
│ ├── sam2_video_predictor.py # Enhanced video predictor with frame skipping
│ ├── utils/
│ │ ├── change_detection.py # Temporal change detection
│ │ └── optical_flow.py # Optical flow utilities
│ └── ...
├── run_mgfs_davis.py # DAVIS evaluation script
├── run_mgfs_davis.ipynb # Interactive notebook for DAVIS
├── eval_davis_jf.py # J&F metrics evaluation
├── results.ipynb # Results visualization
├── convert.ipynb # Prediction format conversion
├── predictions/ # Generated predictions
│ └── DAVIS2017/ # DAVIS 2017 predictions
└── Memory_Guided_Frame_Skipping_for_Real_Time_SAM_2_Video_Segmentation.pdf
The table below shows the improved SAM 2.1 checkpoints released on September 29, 2024.
| Model | Size (M) | Speed (FPS) | SA-V test (J&F) | MOSE val (J&F) | LVOS v2 (J&F) |
|---|---|---|---|---|---|
| sam2.1_hiera_tiny (config, checkpoint) |
38.9 | 91.2 | 76.5 | 71.8 | 77.3 |
| sam2.1_hiera_small (config, checkpoint) |
46 | 84.8 | 76.6 | 73.5 | 78.3 |
| sam2.1_hiera_base_plus (config, checkpoint) |
80.8 | 64.1 | 78.2 | 73.7 | 78.2 |
| sam2.1_hiera_large (config, checkpoint) |
224.4 | 39.5 | 79.5 | 74.6 | 80.6 |
The previous SAM 2 checkpoints released on July 29, 2024 can be found as follows:
| Model | Size (M) | Speed (FPS) | SA-V test (J&F) | MOSE val (J&F) | LVOS v2 (J&F) |
|---|---|---|---|---|---|
| sam2_hiera_tiny (config, checkpoint) |
38.9 | 91.5 | 75.0 | 70.9 | 75.3 |
| sam2_hiera_small (config, checkpoint) |
46 | 85.6 | 74.9 | 71.5 | 76.4 |
| sam2_hiera_base_plus (config, checkpoint) |
80.8 | 64.8 | 74.7 | 72.8 | 75.8 |
| sam2_hiera_large (config, checkpoint) |
224.4 | 39.7 | 76.0 | 74.6 | 79.8 |
Speed measured on an A100 with torch 2.5.1, cuda 12.4. See benchmark.py for an example on benchmarking (compiling all the model components). Compiling only the image encoder can be more flexible and also provide (a smaller) speed-up (set compile_image_encoder: True in the config).
See sav_dataset/README.md for details.
You can train or fine-tune SAM 2 on custom datasets of images, videos, or both. Please check the training README on how to get started.
We have released the frontend + backend code for the SAM 2 web demo (a locally deployable version similar to https://sam2.metademolab.com/demo). Please see the web demo README for details.
The SAM 2 model checkpoints, SAM 2 demo code (front-end and back-end), and SAM 2 training code are licensed under Apache 2.0, however the Inter Font and Noto Color Emoji used in the SAM 2 demo code are made available under the SIL Open Font License, version 1.1.
See contributing and the code of conduct.
The Memory-Guided Frame Skipping extension was created by Henry Chou (Head Developer and Team Lead), Raymond Kang, Wei Shao, Yiqiao Lin. For questions or contributions related to the frame-skipping implementation, please open an issue or submit a pull request.
The SAM 2 project was made possible with the help of many contributors (alphabetical):
Karen Bergan, Daniel Bolya, Alex Bosenberg, Kai Brown, Vispi Cassod, Christopher Chedeau, Ida Cheng, Luc Dahlin, Shoubhik Debnath, Rene Martinez Doehner, Grant Gardner, Sahir Gomez, Rishi Godugu, Baishan Guo, Caleb Ho, Andrew Huang, Somya Jain, Bob Kamma, Amanda Kallet, Jake Kinney, Alexander Kirillov, Shiva Koduvayur, Devansh Kukreja, Robert Kuo, Aohan Lin, Parth Malani, Jitendra Malik, Mallika Malhotra, Miguel Martin, Alexander Miller, Sasha Mitts, William Ngan, George Orlin, Joelle Pineau, Kate Saenko, Rodrick Shepard, Azita Shokrpour, David Soofian, Jonathan Torres, Jenny Truong, Sagar Vaze, Meng Wang, Claudette Ward, Pengchuan Zhang.
Third-party code: we use a GPU-based connected component algorithm adapted from cc_torch (with its license in LICENSE_cctorch) as an optional post-processing step for the mask predictions.
If you use the Memory-Guided Frame Skipping implementation in your research, please cite our work:
@article{frameskipsam2025,
title={Memory-Guided Frame Skipping for Real-Time SAM 2 Video Segmentation},
author={Henry Chou, Raymond Kang, Wei Shao, Yiqiao Lin},
year={2025},
note={Available at: https://github.com/bchou9/FrameSkipSAM}
}If you use SAM 2 or the SA-V dataset in your research, please use the following BibTeX entry.
@article{ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2408.00714},
url={https://arxiv.org/abs/2408.00714},
year={2024}
}