🌐 Home Page 📄 ArXiv Paper 🤗 Hugging Face ☁️ Google Drive ☁️ Baidu Drive
OceanGym is a high-fidelity embodied underwater environment that simulates a realistic ocean setting with diverse scenes. As illustrated in figure, OceanGym establishes a robust benchmark for evaluating autonomous agents through a series of challenging tasks, encompassing various perception analyses and decision-making navigation. The platform facilitates these evaluations by supporting multi-modal perception and providing action spaces for continuous control.
We have provided a teaching demonstration video here:bilibili
OceanGym environment is built upon Unreal Engine (UE) 5.3, with certain components developed by drawing inspiration from and partially based on HoloOcean. We sincerely acknowledge their valuable contribution.
- 10-2025, we released the initial version of OceanGym along with the accompanying paper.
- 04-2025, we launched the OceanGym project.
Contents:
- 💐 Acknowledgement
- 🔔 News
- 📺 Quick Start
- ⚙️ Set up Environment
- 🧠 Decision Task
- 👀 Perception Task
- ⏱️ Results
- 📚 Datasets
- 🚩 Citation
Install the experimental code environment using pip:
pip install -r requirements.txtOnly the environment is ready! Build the environment based on here.
Step 1: Run a Task Script
For example, to run task 4:
python decision\tasks\task4.pyFollow the keyboard instructions or switch to LLM mode for automatic decision-making.
Step 2: Keyboard Control Guide
| Key | Action |
|---|---|
| W | Move Forward |
| S | Move Backward |
| A | Move Left |
| D | Move Right |
| J | Turn Left |
| L | Turn Right |
| I | Move Up |
| K | Move Down |
| M | Switch to LLM Mode |
| Q | Exit |
You can use WASD for movement, J/L for turning, I/K for up/down. Press
Mto switch to large language model mode (may cause temporary lag). PressQto exit.
Step 3: View Results
Logs and memory files are automatically saved in the log/ and memory/ directories.
Step 4: Evaluate the results
Place the generated memory and important_memory files into the corresponding point folders.
Then, set the evaluation paths in the evaluate.py file.
We provide 6 experimental evaluation paths. In evaluate.py, you can configure them as follows:
eval_roots = [
os.path.join(eval_root, "main", "gpt4omini"),
os.path.join(eval_root, "main", "gemini"),
os.path.join(eval_root, "main", "qwen"),
os.path.join(eval_root, "migration", "gpt4o"),
os.path.join(eval_root, "migration", "qwen"),
os.path.join(eval_root, "scale", "qwen"),
]To run the evaluation:
python decision\utils\evaluate.pyThe generated results will be saved under the \eval\decision folder.
All commands are applicable to Linux, so if you using Windows, you need to change the corresponding path representation (especially the slash).
Step 1: Prepare the dataset
After downloading from Hugging Face or Google Drive, put it into the data/perception folder.
Step 2: Select model parameters
| parameter | function |
|---|---|
| model_template | The large language model message queue template you selected. |
| model_name_or_path | If it is an API model, it is the model name; if it is a local model, it is the path. |
| api_key | If it is an API model, enter your key. |
| base_url | If it is an API model, enter its baseful URL. |
Now we only support OpenAI, Google Gemma, Qwen and OpenBMB.
MODELS_TEMPLATE="Yours"
MODEL_NAME_OR_PATH="Yours"
API_KEY="Yours"
BASE_URL="Yours"Step 3: Run the experiments
| parameter | function |
|---|---|
| exp_name | Customize the name of the experiment to save the results. |
| exp_idx | Select the experiment number, or enter "all" to select all. |
| exp_json | JSON file containing the experiment label data. |
| images_dir | The folder where the experimental image data is stored. |
For the experimental types, We designed (1) multi-view perception task and (2) context-based perception task.
For the lighting conditions, We designed (1) high illumination and (2) low illumination.
For the auxiliary sonar, We designed (1) without sonar image (2) zero-shot sonar image and (3) sonar image with few sonar example.
Such as this command is used to evaluate the multi-view perception task under high illumination:
python perception/eval/mv.py \
--exp_name Result_MV_highLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLight.json" \
--images_dir "/data/perception/highLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLFor more patterns about perception tasks, please read this part carefully.
This project is based on the HoloOcean environment. 💐
We have placed a simplified version here. If you encounter any detailed issues, please refer to the original installation document.
We have provided a teaching demonstration video here:bilibili
From ☁️ Google Drive ☁️ Baidu Drive download the OceanGym_large.zip And extract it to the folder you want
- Python Library
From the cloned repository, install the Python package by doing the following:
cd OceanGym_large/client
pip install .- Worlds Packages
Install the package by running the following Python commands:
import holoocean
holoocean.install("Ocean")To do these steps in a single console command, use:
python -c "import holoocean; holoocean.install('Ocean')"Place the JSON config file from asset/decision/map_config or asset\perception\map_config into some place like:
(Windows)
C:\Users\Windows\AppData\Local\holoocean\2.0.0\worlds\Ocean
1. If you're use it in first time, you have to compile it
1-1. find the Holodeck.uproject in engine folder
1-2. Right-click and select:Generate Visual Studio project files
1-3. If the version is not 5.3.2,please choose the Switch Unreal Engine Version
1-4. Then open the project
2. Then find the HAIDI map in demo directory
3. Run the project
4. Run the code
When the ue editor shows as follows, namely: "LogD3D12RHI: Cannot end block when stack is empty" , it indicates that the scene has been loaded.
Then you can start the code, either directly using vscode
or by entering the following command in the command linepython decision\tasks\task4.pyAll commands are applicable to Windows only, because it requires full support from the
UE5 Engine.
The decision experiment can be run with reference to the Quick Start.
We have provided eight tasks. For specific task descriptions, please refer to the paper.
The following are the coordinates for each target object in the environment (in meters):
- MINING ROBOT: (-71, 149, -61), (325, -47, -83)
- OIL PIPELINE: (345, -165, -32), (539, -233, -42), (207, -30, -66)
- OIL DRUM: (447, -203, -98)
- SUNKEN SHIP: (429, -151, -69), (78, -11, -47)
- ELECTRICAL BOX: (168, 168, -65)
- WIND POWER STATION: (207, -30, -66)
- AIRCRAFT WRECKAGE: (40, -9, -54), (296, 78, -70), (292, -186, -67)
- H-MARKED LANDING PLATFORM: (267, 33, -80)
- If the target is not found, use the final stopping position for evaluation.
- If the target is found, use the closest distance to any target point.
- For found targets:
- Minimum distance ≤ 30: full score
- 30 < distance < 100: score decreases proportionally
- Distance ≥ 100: score is 0
- Score composition:
- One point: 100
- Two points: 60 / 40
- Three points: 60 / 20 / 20
All commands are applicable to Linux, so if you using Windows, you need to change the corresponding path representation (especially the slash).
Now we only support OpenAI, Google Gemma, Qwen and OpenBMB. If you need to customize the model, please contact the author.
First, you need download our data from Hugging Face or Google Drive.
And then create a new data folder in the project root directory:
mkdir -p data/perceptionFinally, put the downloaded data into the corresponding folder.
Just open a terminal in the root directory and set it directly.
| parameter | function |
|---|---|
| model_template | The large language model message queue template you selected. |
| model_name_or_path | If it is an API model, it is the model name; if it is a local model, it is the path. |
| api_key | If it is an API model, enter your key. |
| base_url | If it is an API model, enter its baseful URL. |
MODELS_TEMPLATE="Yours"
MODEL_NAME_OR_PATH="Yours"
API_KEY="Yours"
BASE_URL="Yours"All of these scripts evaluate the perception task, and the parameters are as follows:
| parameter | function |
|---|---|
| exp_name | Customize the name of the experiment to save the results. |
| exp_idx | Select the experiment number, or enter "all" to select all. |
| exp_json | JSON file containing the experiment label data. |
| images_dir | The folder where the experimental image data is stored. |
This command is used to evaluate the multi-view perception task under high illumination:
python perception/eval/mv.py \
--exp_name Result_MV_highLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLight.json" \
--images_dir "/data/perception/highLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under high illumination:
python perception/eval/mv.py \
--exp_name Result_MV_highLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLightContext.json" \
--images_dir "/data/perception/highLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the multi-view perception task under low illumination:
python perception/eval/mv.py \
--exp_name Result_MV_lowLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLight.json" \
--images_dir "/data/perception/lowLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under low illumination:
python perception/eval/mv.py \
--exp_name Result_MV_lowLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLightContext.json" \
--images_dir "/data/perception/lowLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the multi-view perception task under high illumination with sonar image:
python perception/eval/mvs.py \
--exp_name Result_MVwS_highLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLight.json" \
--images_dir "/data/perception/highLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under high illumination with sonar image:
python perception/eval/mvs.py \
--exp_name Result_MVwS_highLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLightContext.json" \
--images_dir "/data/perception/highLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the multi-view perception task under low illumination with sonar image:
python perception/eval/mvs.py \
--exp_name Result_MVwS_lowLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLight.json" \
--images_dir "/data/perception/lowLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under low illumination with sonar image:
python perception/eval/mvs.py \
--exp_name Result_MVwS_lowLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLightContext.json" \
--images_dir "/data/perception/lowLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the multi-view perception task under high illumination with sona image examples:
python perception/eval/mvsex.py \
--exp_name Result_MVwSss_highLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLight.json" \
--images_dir "/data/perception/highLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under high illumination with sona image examples:
python perception/eval/mvsex.py \
--exp_name Result_MVwSss_highLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/highLightContext.json" \
--images_dir "/data/perception/highLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the multi-view perception task under low illumination with sona image examples:
python perception/eval/mvsex.py \
--exp_name Result_MVwSss_lowLight_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLight.json" \
--images_dir "/data/perception/lowLight" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis command is used to evaluate the context-based perception task under low illumination with sona image examples:
python perception/eval/mvsex.py \
--exp_name Result_MVwSss_lowLightContext_00 \
--exp_idx "all" \
--exp_json "/data/perception/lowLightContext.json" \
--images_dir "/data/perception/lowLightContext" \
--model_template $MODELS_TEMPLATE \
--model_name_or_path $MODEL_NAME_OR_PATH \
--api_key $API_KEY \
--base_url $BASE_URLThis part is optional. Only use when you need to collect pictures by yourself.
The sample configuration files can be found in asset/perception/map_config. You need to copy this and paste it into your HoloOcean project's configuration.
This command is used to collect camera images only, and the parameters are as follows:
| parameter | function |
|---|---|
| scenario | The name of the json configuration file you want to replace. |
| task_name | Customize the name of the experiment to save the results. |
| rgbcamera | The camera directions you can choose. If select all, enter "all". |
python perception/task/init_map.py \
--scenario without_sonar \
--task_name "Exp_Camera_Only" \
--rgbcamera "all"This command is used to collect both camera images and sonar images at same time:
python perception/task/init_map_with_sonar.py \
--scenario with_sonar \
--task_name "Exp_Add_Sonar" \
--rgbcamera "FrontCamera"We provide the trajectory data of OceanGym’s various task evaluations at the next section, enabling readers to analyze and reproduce the results.
- This table is the performance in decision tasks requiring autonomous completion by MLLM-driven agents.
- This table is the performance of perception tasks across different models and conditions.
- Values represent accuracy percentages.
- Adding sonar means using both RGB and sonar images.
The link to the dataset is as follows
☁️ Google Drive
- Decision Task
decision_dataset
├── main
│ ├── gpt4omini
│ │ ├── task1
│ │ │ ├── point1
│ │ │ │ ├── llm_output_...log
│ │ │ │ ├── memory_...json
│ │ │ │ └── important_memory_...json
│ │ │ └── ... (other data points like point2, point3...)
│ │ └── ... (other tasks like task2, task3...)
│ ├── gemini
│ │ └── ... (structure is the same as gpt4omini)
│ └── qwen
│ └── ... (structure is the same as gpt4omini)
│
├── migration
│ ├── gpt4o
│ │ └── ... (structure is the same as above)
│ └── qwen
│ └── ... (structure is the same as above)
│
└── scale
├── qwen
└── gpt4omini
In the main folder, you can see the data generated by the three models corresponding to the three folders. Within each model folder, there are task1-12 task folders, and within the task folders, there are point1-3 folders, representing the results generated from different starting points. Among them, point1 and point2 are fixed starting points, which are respectively [144 ,-114,-63] and [350 ,-118 -7] and point3 is a random point
In the scale experiment, Point1-4 represent different task durations, with point1 being 1 hour, point2 1.5 hours, point3 2 hours, and point4 3 hours. Note that the actual duration may vary to some extent due to the influence of large model calls, network fluctuations, and other factors
If you want to evaluate the files generated by yourself, please place the corresponding memory_{time_stamp}.json and important_memory_{time_stamp}.json files in the corresponding folders
- Perception Task
perception_dataset
├── data
│ ├── highLight
│ ├── highLightContext
│ ├── lowLight
│ ├── lowLightContext
│ └── ... (label files)
│
└── result
└── ... (detail result fils)
In the main folder, data is the test data of perception task, result is the detail results of this table.
Below the folder data, there are 4 folders and 4 JSON files. Each folder contains test data for each perception task, and each JSON file is the label of its corresponding folder.
OceanGym supports custom scenarios. You can freely exert yourself in the scenarios we provide!
You can find the assets you need in the ue5 fab Mall and add them to OceanGym to test the exploration ability of the robot!
Or modify parameters such as terrain and lighting to simulate the weather in different scenarios!
Step 1: Find the DirectionalLight in outliner
Step 2: Choose the details of DirectionalLight
Step 3: Modify the data of light as per your requirements
Notice
In our paper, we simulate low-light and high-light environments, where the Intensity of light is 10.0lux in the high-light environment
Intensity of light is 1.5lux in a low-light environment
Step 1: Find the initial config file OceanGym.json in
C:\Users\Windows\AppData\Local\holoocean\2.0.0\worlds\Ocean
Step 2: Modify the data of location as per your requirements
If you want to develop more functions, you can visit the official website of holoocean
If this OceanGym paper or benchmark is helpful, please kindly cite as this:
@misc{xue2025oceangymbenchmarkenvironmentunderwater,
title={OceanGym: A Benchmark Environment for Underwater Embodied Agents},
author={Yida Xue and Mingjun Mao and Xiangyuan Ru and Yuqi Zhu and Baochang Ren and Shuofei Qiao and Mengru Wang and Shumin Deng and Xinyu An and Ningyu Zhang and Ying Chen and Huajun Chen},
year={2025},
eprint={2509.26536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.26536},
}💐 Thanks again!