LangSuit⋅E is a systematic and simulation-free testbed for evaluating embodied capabilities of large language models (LLMs) across different tasks in embodied textual worlds. The highlighted features include:
- Embodied Textual Environments: The testbed provides a general simulation-free textual world that supports most embodied tasks, including navigation, manipulation, and communications. The environment is based on Gymnasium and inherits the design patterns.
- Embodied Observations and Actions: All agents' observations are designed to be embodied with customizable
max_view_distance,max_manipulate_distance,focal_length, etc. - Customizable Embodied Agents: The agents in LangSuit⋅E are fully-customizable w.r.t their action spaces and communicative capabilities, i.e., one can easily adapt the communication and acting strategy from one task to another.
- Multi-agent Cooperation: The testbed supports planning, acting and communication among multiple agents, where each agent can be customized to have different configurations.
- Human-agent Communication: Besides communication between agents, the testbed supports communication and cooperation between humans and agents.
- Full support to LangChain library: The LangSuitE testbed supports full usage of API language models, Open-source language models, tool usages, Chain-of-Thought (CoT) strategies, etc..
- Expert Trajectory Generation: We provide expert trajectory generation algorithms for most tasks.
We form a benchmark by adapting from existing annotations of simulated embodied engines, a by-product benefit of pursuing a general textual embodied world. Below showcase 6 representative embodied tasks, with variants of the number of rooms, the number of agents, and the action spaces of agents (whether they can communicate with each other or ask humans).
| Task | Simulator | # of Scenes | # of Tasks | # of Actions | Multi-Room | Multi-Agent | Communicative |
|---|---|---|---|---|---|---|---|
| BabyAI | Mini Grid | 105 | 500 | 6 | ✓ | ✗ | ✗ |
| Rearrange | AI2Thor | 120 | 500 | 8 | ✗ | ✗ | ✗ |
| IQA | AI2Thor | 30 | 3,000 | 5 | ✗ | ✗ | ✓ |
| ALFred | AI2Thor | 120 | 506 | 12 | ✗ | ✗ | ✗ |
| TEACh | AI2Thor | 120 | 200 | 13 | ✗ | ✓ | ✓ |
| CWAH | Virtual Home | 2 | 50 | 6 | ✓ | ✓ | ✓ |
- Clone this repository
git clone https://github.com/langsuite/langsuite.git
cd langsuite- Create a conda environment with
Python3.8+and install python requirements
conda create -n langsuite python=3.8
conda activate langsuite
pip install -e .- Export your
OPENAI_API_KEYby
export OPENAI_API_KEY="your_api_key_here"or you can customize your APIs by
cp api.config.yml.example api.config.ymland add or update your API configurations. For a full API agent list, please refer to LangChain Chat Models.
- Download the task dataset by
bash ./data/download.sh <data name>Currently supported datasets include: alfred, babyai, cwah, iqa, rearrange.
langsuite task <config-file.yml>- Start langsuite server
langsuite serve <config-file.yml>- Start webui
langsuite webuiThe user inferface will run on http://localhost:8501/
task: ExampleTask:Procthor2DEnv
template: ./langsuite/envs/ai2thor/templates/procthor_rearrange.json
env:
type: Procthor2DEnv
world:
type: ProcTHORWorld
id: test_world
grid_size: 0.25
asset_path: ./data/asset-database.json
metadata_path: ./data/ai2thor-object-metadata.json
receptacles_path: ./data/receptacles.json
agents:
- type: ChatGPTAgent
position: 'random'
inventory_capacity: 1
focal_length: 10
max_manipulate_distance: 1
max_view_distance: 2
step_size: 0.25
llm:
llm_type: ChatOpenAI{
"intro": {
"default": [
"You are an autonomous intelligent agent tasked with navigating a vitual home. You will be given a household task. These tasks will be accomplished through the use of specific actions you can issue. [...]"
]
},
"example": {
"default": [
"Task: go to the red box. \nObs:You can see a blue key in front of you; You can see a red box on your right. \nManipulable object: A blue key.\n>Act: turn_right."
]
},
"InvalidAction": {
"failure.invalidObjectName": [
"Feedback: Action failed. There is no the object \"{object}\" in your view space. Please operate the object in sight.\nObs: {observation}"
],
...
},
...
}
If you find our work useful, please cite
@misc{langsuite2023,
author = {Zilong Zheng, Zixia Jia, Mengmeng Wang, Wentao Ding, Baichen Tong, Songchun Zhu},
title = {LangSuit⋅E: Controlling, Planning, and Interacting with Large Language Models in Embodied Text Environments},
year = {2023},
publisher = {GitHub},
url = {https://github.com/bigai-nlco/langsuite}
}For any questions and issues, please contact [email protected].
Some of the tasks of LangSuit⋅E are based on the datasets and source-code proposed by previous researchers, including BabyAI, AI2Thor, ALFred, TEAch, CWAH.