A platform for benchmarking AI agents in Minecraft using Microsoft's Project Malmo and the PIANO cognitive architecture.
This project extends Microsoft's Project Malmo to create a standardized benchmarking environment for evaluating AI agent performance across multiple domains:
- Alignment - How well agents align with human/organizational goals
- Autonomy - Degree of independent operation
- Performance - Task completion, resource gathering, survival
- Social Intelligence - Multi-agent cooperation and communication
- Economic Utility - Practical value generation
┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastAPI) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ PIANO Architecture │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Perception│ │ Social │ │ Goal │ │ Action │ │ │
│ │ │ │ │Awareness│ │ Gen │ │Awareness│ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┴─────┬─────┴───────────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────┐ │ │
│ │ │ Cognitive Controller │◄── LLM │ │
│ │ │ (Bottleneck) │ (Gemini/ │ │
│ │ └─────────────────────┘ Claude/etc) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────┐
│ Malmo / Minecraft │
│ (via MalmoEnv) │
└─────────────────────────┘
- Google Gemini (gemini-2.5-flash-lite) - Free tier available
- OpenRouter - Access to DeepSeek, GLM, Llama models
- Cerebras - Ultra-fast inference (1M tokens/day free)
- Cloudflare Workers AI - Reliable infrastructure
- Anthropic Claude - Premium option
Based on Altera's Project Sid research:
- 5 concurrent processing modules
- Information bottleneck for decision-making
- Multi-timescale memory systems
- Action awareness for hallucination prevention
- PostgreSQL-backed metrics storage
- Domain-specific scoring (Alignment, Autonomy, Performance, Social, Economic)
- Multi-agent performance tracking
- Human oversight dashboard
- Python 3.10+
- Docker Desktop (for Malmo/Minecraft)
- At least one API key (Gemini recommended - free tier)
# Clone the repository
git clone https://github.com/ricardotwumasi/tt_malmo.git
cd tt_malmo/tt_malmo_mcp_server
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your API keys# Load environment variables
export $(cat .env | grep -v '^#' | xargs)
# Start the server
python -m uvicorn mcp_server.server:app --reload --host 0.0.0.0 --port 8000# Create an agent
curl -X POST http://localhost:8000/agents \
-H "Content-Type: application/json" \
-d '{"name":"Explorer","llm_type":"gemini","role":0,"traits":["curious"]}'
# Start the agent
curl -X POST http://localhost:8000/agents/AGENT_ID/start
# View API docs
open http://localhost:8000/docsFor full Minecraft integration, use Docker:
cd tt_malmo_mcp_server
# Start Malmo in Docker
./start_docker_malmo.sh
# Or start full stack
./start_docker_malmo.sh --fullSee DOCKER_SETUP.md for detailed instructions.
| Document | Description |
|---|---|
| QUICKSTART_APPLE_SILICON.md | Setup guide for M1/M2/M3 Macs |
| DOCKER_SETUP.md | Docker deployment guide |
| LINUX_DEPLOYMENT.md | Linux server deployment (Ubuntu/Debian) |
| MACOS_SETUP.md | macOS setup instructions |
| PROJECT_STATUS.md | Current development status |
| CHANGELOG.md | Version history |
tt_malmo/
├── tt_malmo_mcp_server/ # Main application
│ ├── mcp_server/ # FastAPI server
│ ├── piano_architecture/ # PIANO cognitive modules
│ ├── llm_adapters/ # LLM provider integrations
│ ├── malmo_integration/ # Malmo/Minecraft bridge
│ ├── benchmarking/ # Metrics and evaluation
│ ├── deployment/ # Docker configurations
│ └── tests/ # Test suite (95+ tests)
├── malmo/ # Microsoft Malmo (submodule)
├── CHANGELOG.md # Version history
├── PROJECT_STATUS.md # Development status
└── README.md # This file
cd tt_malmo_mcp_server
source venv/bin/activate
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=. --cov-report=term-missing| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Server health check |
/agents |
GET | List all agents |
/agents |
POST | Create new agent |
/agents/{id} |
GET | Get agent details |
/agents/{id} |
DELETE | Delete agent |
/agents/{id}/start |
POST | Start agent PIANO loop |
/agents/{id}/stop |
POST | Stop agent |
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft Project Malmo - Minecraft AI platform
- Altera Project Sid - PIANO architecture research
- Google Gemini - LLM provider
Ricardo Twumasi - @ricardotwumasi
Project Link: https://github.com/ricardotwumasi/tt_malmo