This is the course project platform for NYU ROB-GY 6203 Robot Perception. For more information, please reach out to AI4CE lab (cfeng at nyu dot edu).
- Install
conda update conda
git clone https://github.com/ai4ce/vis_nav_player.git
cd vis_nav_player
conda env create -f environment.yaml
conda activate game
- Play using the default keyboard player
python player.py
- Modify the player.py to implement your own solutions, unless you have photographic memories!
- Download the exploration data and extract it to
./data. Under your data folder, you should at least have:data ├── data_info.json ├── images - Run the baseline solution by
python source/baseline.py. The first run may take longer as we need to download data for the maze and computes the features for localization and navigation. - Press
qto show the navigation panel.
The baseline (source/baseline.py) implements a visual place recognition pipeline:
- Feature Extraction — RootSIFT descriptors from exploration images
- Codebook — K-Means clustering (k=128) to build a visual vocabulary
- VLAD Encoding — Aggregate local descriptors into a global vector per image (with intra-normalization and power normalization)
- Graph Construction — Temporal edges (consecutive frames) + visual shortcut edges (top-K most similar non-adjacent frames)
- Localization & Planning — Match current FPV to database via VLAD similarity, then Dijkstra shortest path to goal node