This repository contains the full implementation of my MSc Robotics Individual Project at Kingโs College London:
Enhancing Metamorphic Legged Robot Locomotion Using Machine Learning and Nature-Inspired Design
MSc Robotics Individual Project | King's College London | August 2025
This project develops a unified autonomy framework combining:
- Hybrid CPG (Matsuoka + Hopf) biomechanical gait generation
- PPO reinforcement learning with domain randomization
- SLAM-based perception
- A(star) global path planning + DWA local planning
- Terrain-driven morphological reconfiguration
Everything is implemented in PyBullet.
Origaker is a cutting-edge autonomous quadruped robot that pioneering the integration of bio-inspired locomotion with artificial intelligence for robust navigation in complex environments. The system uniquely combines Central Pattern Generators (CPG) derived from neuroscience researchโspecifically Matsuoka and Hopf oscillatorsโwith deep reinforcement learning (PPO) to achieve energy-efficient, adaptive gaits that respond dynamically to terrain variations. Beyond locomotion, Origaker features autonomous morphology reconfiguration capabilities, allowing real-time switching between four distinct leg configurations based on environmental analysis through integrated SLAM perception systems. The robot demonstrates exceptional performance with <5% simulation-to-reality gap, 98% navigation success rate and 15% greater energy efficiency compared to traditional quadruped controllers, making it a valuable platform for advancing research in bio-inspired robotics, adaptive systems, continuous reinforcement learning and autonomous navigation in GPS-denied environments.
- Project Motivation
- System Architecture
- Simulation Environment
- Hybrid CPG Architecture
- Reinforcement Learning Framework
- SLAM & Planning Pipeline
- Morphology Reconfiguration
- Results
- Demonstrations
- Installation
- Future Work
- References
- Acknowledgements
Metamorphic robots promise superior adaptability through physical reconfiguration, yet current systems face critical limitations:
- Fixed Gaits: Pre-scripted locomotion patterns cannot adapt to dynamic terrain variations
- No Perception: Lack of real-time environmental awareness and mapping capabilities
- No Morphological Autonomy: Manual transitions between body configurations
- Dynamic Terrain Failures: High failure rates on unstructured surfaces
- Limited Real-World Deployment: Poor generalization beyond training conditions
|
2011 Fukushima Disaster |
ExoMars Mission |
According to the UN Office for Disaster Risk Reduction (2020):
- 300+ natural disasters annually affect 200M+ people
- Limited robotic assistance due to terrain-accessibility issues
- Critical need for autonomous, adaptive ground robots in:
- Search & rescue operations
- Planetary exploration
- Industrial inspection
- Hazardous environment navigation
This project presents a unified simulation-based framework enabling autonomous navigation and real-time morphological adaptation through:
- Bio-inspired rhythmic control (Hybrid CPG networks)
- Adaptive learning (PPO-based reinforcement learning)
- Environmental perception (SLAM-based mapping)
- Intelligent planning (A* global + DWA local)
- Dynamic reconfiguration (Terrain-aware morphology switching)
- Robust generalization (Domain randomization)
- Combines Matsuoka + Hopf oscillators for biologically plausible gaits
- PPO agent modulates CPG parameters for terrain adaptation
- 30% faster convergence vs. naive reward approaches
- Real-time SLAM with depth sensor and IMU fusion
- A* global path planning + DWA local trajectory control
- 84.3% mapping accuracy in complex environments
- 4 discrete modes: Crawler, Walker, Spreader, High-Step
- Terrain-aware switching based on obstacle height, corridor width, roughness
- 22% reduction in pose variance (stability improvement)
| Metric | Improvement |
|---|---|
| Task Success Rate | 92% (vs 68% baseline) |
| Cost of Transport | โ 15% |
| Pose Stability | โ 22% variance |
| Path Efficiency | โ 9-17% |
- Annealed domain randomization schedule
- ยฑ10% friction, ยฑ5% restitution, ยฑ15% compliance variation
- 25% improvement in terrain traversal under perturbations
Integrated simulation-based framework for autonomous morphological adaptation
- Matsuoka oscillators: Neuron-inspired adaptation dynamics
- Hopf oscillators: Stable limit-cycle generation
- Hybrid coupling: Hopf modulates Matsuoka tonic input
- Output: Phase-coordinated joint trajectories
- Algorithm: Proximal Policy Optimization (PPO)
- Observations: Joint states, body pose, oscillator phases
- Actions: CPG parameter modulation (scale, offset)
- Reward: Multi-objective (forward progress, energy, jerk)
- Inputs: Depth camera (640ร480), IMU (100Hz)
- Processing: Point cloud โ RANSAC ground removal โ Voxel filter
- Output: 2D occupancy grid (0.05m resolution)
- Update Rate: 10Hz
- Global: A* with Euclidean heuristic + obstacle inflation
- Local: Dynamic Window Approach (DWA) with clearance scoring
- Integration: Real-time waypoint tracking
- Inputs: Terrain features (elevation ฯ, corridor width, obstacle height)
- Logic: Rule-based classifier โ mode selection
- Execution: Joint-space interpolation (0.5s transition time)
Origaker URDF model in PyBullet
Simulation Parameters:
- Physics Engine: PyBullet 3.2.5
- Time Step: 1ms (1000 Hz)
- Gravity: -9.81 m/sยฒ
- Control Mode: Torque-based
- Solver: Featherstone algorithm
- Contact Model: Soft constraints
Model Specifications:
- DOF: 12 (3 per leg)
- Total Mass: 8.2 kg
- Base Dimensions: 350ร250ร120 mm
- Leg Length: 280 mm
URDF model validation - Link mass and inertia tensor comparison against CAD reference
Validation Process:
- Extract mass/inertia from
getDynamicsInfo() - Compare with CAD specifications
- Enforce <10% deviation threshold
- Correct URDF
<inertial>tags if needed
The annealed randomization schedule ensures robust policy generalization:
r_t = r_init * (1 - t/T) + r_final * (t/T)Where:
r_t: Randomized parameter at step tr_init: Initial perturbation range (wide)r_final: Final range (nominal)T: Total training steps (1M)
Randomized Parameters:
| Parameter | Initial Range | Final Range |
|---|---|---|
| Friction | ยฑ10% | ยฑ2% |
| Restitution | ยฑ5% | ยฑ1% |
| Link Mass | ยฑ8% | ยฑ2% |
| Terrain Slope | ยฑ15ยฐ | ยฑ5ยฐ |
| Sensor Latency | 0-50ms | 0-10ms |
Six coupled first-order ODEs representing mutual inhibition and adaptation:
แบแตข = -xแตข - wแตขโฑผyโฑผ - ฮฒvแตข + uแตข (membrane potential)
vฬแตข = -vแตข + yแตข (adaptation state)
yแตข = max(0, xแตข) (firing rate)
Parameters:
wแตขโฑผ: Inhibitory connection weightฮฒ: Adaptation gainuแตข: External tonic input โ Hopf modulates this
Two-dimensional system with stable limit cycle:
แบ = (ฮผ - xยฒ - yยฒ)x - ฯy (polar dynamics)
แบ = (ฮผ - xยฒ - yยฒ)y + ฯx
Parameters:
ฮผ: Amplitude controlฯ: Angular frequency
Comparative phase portraits - Hopf (circular limit cycle), Matsuoka (convergent) and hybrid ฮฑ-interpolations
Key Observations:
- Hopf: Perfect circular limit cycle โ stable rhythms
- Matsuoka: Fixed-point attractor โ adaptive bursting
- Hybrid ฮฑ=0.3: Slight spiral convergence (more Hopf-like)
- Hybrid ฮฑ=0.7: Straight trajectories (more Matsuoka-like)
โโโโโโโโโโโโโโโ modulation โโโโโโโโโโโโโโโโ
โ Hopf โโโโโโโโโโโโโโโโโโโโโโโถโ Matsuoka โ
โ Oscillator โ (tonic input) โ Oscillator โ
โ (ฮผ, ฯ) โ โ (w, ฮฒ, u) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ โ
โ โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
Phase-coordinated
joint trajectories
Grid Search Strategy:
- Search Space: 1000+ parameter combinations
- Biological Seeding: Based on quadruped gait data [Alexander, 2003]
- Objective: Pareto-optimal (energy, stability)
- Storage: JSON gait library for runtime retrieval
Optimized Parameter Ranges:
| Parameter | Range | Selected |
|---|---|---|
| Matsuoka ฮฒ | 0.5-2.5 | 1.2 |
| Matsuoka wแตขโฑผ | 1.0-5.0 | 2.8 |
| Hopf ฮผ | 0.1-1.0 | 0.5 |
| Hopf ฯ | 1.0-10.0 | 4.2 |
| Coupling ฮฑ | 0.0-1.0 | 0.6 |
Adaptive hybrid RL-CPG control architecture
Network Structure:
Observations (36-dim)
โ
โโ Joint positions (12)
โโ Joint velocities (12)
โโ Base pose (6: x,y,z,roll,pitch,yaw)
โโ CPG phases (4: one per leg)
โโ Terrain features (2: slope, roughness)
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Actor Network โ 256โ256 (ReLU)
โ (Policy ฯ) โ โโโโโโโโโโโโโโถ Actions (8-dim)
โโโโโโโโโโโโโโโโโโโ - CPG scale (4)
- CPG offset (4)
โโโโโโโโโโโโโโโโโโโ
โ Critic Network โ 256โ256 (ReLU)
โ (Value V) โ โโโโโโโโโโโโโโถ State Value (1-dim)
โโโโโโโโโโโโโโโโโโโ
Multi-objective reward shaping balances speed, efficiency and smoothness:
R = wโยทฮx - wโยทโ(ฯแตขยทqฬแตข) - wโยทโqฬโโ
โ โ โ
Progress Energy Jerk
Cost Penalty
Reward component analysis over full gait cycle
Component Analysis:
| Term | Weight | Purpose | Impact |
|---|---|---|---|
| Forward Progress (ฮx) | wโ=1.0 | Encourage locomotion | Primary drive |
| Energy Cost (ฯยทqฬ) | wโ=0.01 | Minimize power | 15% COT reduction |
| Jerk Penalty (โqฬโโ) | wโ=0.005 | Smooth motion | 22% stability โ |
Hyperparameters:
Algorithm: PPO
Total Timesteps: 1,000,000
Learning Rate: 3e-4 (linear decay)
Batch Size: 64
n_epochs: 10
Clip Range: 0.3 โ 0.1 (annealed)
GAE Lambda: 0.95
Discount (ฮณ): 0.99
Value Coef: 0.5
Entropy Coef: 0.01
Max Grad Norm: 0.5Hardware:
- Platform: Windows 11, Intel i7, 16GB RAM
- Training Time: ~18 hours
- Checkpoint Interval: Every 20k steps
Key Milestones:
- 100k steps: Basic forward locomotion acquired
- 300k steps: Energy-efficient gait emerges
- 500k steps: Stable morphology transitions
- 1M steps: Convergence with 30% improvement vs. baseline
SLAM system - Front-end and back-end processing
Depth Camera (640ร480, 30Hz)
โ
โผ
Point Cloud
โ
โผ
RANSAC Ground Removal
โ
โผ
Voxel Downsampling
โ
โผ
2D Occupancy Grid (10Hz)
โ
โโโโโถ Global Planner (A*)
โ
โโโโโถ Local Planner (DWA)
Simulated SLAM system with multi-modal camera input
A* global path planning in (a) simple maze and (b) corridor maze environments
Algorithm Configuration:
- Heuristic: Euclidean distance
- Obstacle Inflation: 0.15m radius
- Cost Function: g(n) + h(n)
- Resolution: 0.05m grid cells
Dynamic Window Approach Parameters:
Velocity Search Space:
- Linear: [-0.5, 1.0] m/s
- Angular: [-ฯ/2, ฯ/2] rad/s
Sampling:
- dt: 0.1s
- prediction_horizon: 1.5s
- num_samples: 50
Scoring Weights:
- heading: 0.4
- clearance: 0.3
- velocity: 0.3
Discrete morphological modes - (a) Crawler, (b) Walker, (c) Spreader, (d) High-Step
| Mode | Use Case | Joint Config | Energy | Stability |
|---|---|---|---|---|
| Crawler | Narrow spaces, low clearance | Legs tucked (30ยฐ from body) | Low | High |
| Walker | Normal terrain, standard gait | Balanced stance (60ยฐ spread) | Medium | High |
| Spreader | Wide obstacles, lateral stability | Wide stance (90ยฐ spread) | Medium | Very High |
| High-Step | Tall obstacles, rough terrain | Extended legs (45ยฐ elevation) | High | Medium |
Decision Tree:
Input: Local terrain features
โโ Obstacle Height > 0.12m?
โ โโ YES โ High-Step Mode
โ
โโ Corridor Width < 0.4m?
โ โโ YES โ Crawler Mode
โ
โโ Surface Roughness ฯ > 0.08?
โ โโ YES โ Spreader Mode
โ
โโ ELSE โ Walker Mode (default)
Feature Extraction:
# From SLAM occupancy grid
elevation_variance = np.std(heightmap[local_window])
corridor_width = detect_lateral_clearance(occupancy_grid)
forward_obstacle = max_height_in_path(occupancy_grid, lookahead=1.0m)
Origaker morphology timeline over 40s navigation sequence
Transition Statistics:
- Total Transitions: 8 over 40s (0.2 trans/s)
- Most Frequent: Walker โ Spreader (stable terrain)
- Strategic: High-Step used in 2 short bursts (energy-intensive)
- Smooth: Zero failed transitions (kinematic continuity maintained)
Joint-Space Interpolation:
def interpolate_morphology(current_config, target_config, duration=0.5):
"""
Smooth transition between morphologies using cubic interpolation
"""
t = np.linspace(0, duration, num_steps)
interpolated_angles = []
for joint_idx in range(12):
q_start = current_config[joint_idx]
q_end = target_config[joint_idx]
# Cubic polynomial ensures smooth velocity profile
q_t = cubic_interpolate(q_start, q_end, t)
interpolated_angles.append(q_t)
return interpolated_anglesSafety Constraints:
- Transition Time: 0.5s (prevents dynamic instability)
- Max Angular Velocity: 2.0 rad/s
- Kinematic Limits: Joint angles within [โฯ, ฯ]
Controller performance comparison across key metrics
| Metric | Scripted CPG | PPO-Only | Hybrid PPO-CPG | Improvement |
|---|---|---|---|---|
| Cost of Transport โ | 2.1 | 1.8 | 1.6 | 24% โ |
| Jerk Index โ | 1.03 | 0.71 | 0.45 | 56% โ |
| Slip Ratio โ | 0.21 | 0.13 | 0.09 | 57% โ |
| Tracking Error โ | 0.12 m | 0.08 m | 0.05 m | 58% โ |
| Recovery Time โ | 1.8 s | 1.2 s | 0.8 s | 56% โ |
Full System (Hybrid + SLAM + Morphing): โโโโโโโโโโโโโโโโโโโโ 92%
Fixed-Mode CPG Baseline: โโโโโโโโโโโโโ 68%
No SLAM (Oracle Map): โโโโโโโโโโโโโโ 75%
No Domain Randomization: โโโโโโโโโโโโโโโ 81%
Key Finding: Integrated system achieves 36% relative improvement over baseline.
Per-Mode Energy Profile:
| Mode | Avg. Power (W) | Duration (s) | COT |
|---|---|---|---|
| Crawler | 8.2 | 12.5 | 1.42 |
| Walker | 10.5 | 18.0 | 1.55 |
| Spreader | 11.8 | 6.5 | 1.68 |
| High-Step | 15.3 | 3.0 | 2.12 |
Insight: Strategic mode selection minimizes High-Step usage (high energy) to critical moments.
Pose Variance (Roll/Pitch):
- Full System: ฯ = 0.08 rad
- Fixed-Mode: ฯ = 0.14 rad
- Improvement: 43% reduction in pose instability
Key Contributions:
| Component Removed | Success Rate โ | COT โ | Explanation |
|---|---|---|---|
| SLAM | -17% | +12% | Blind navigation fails obstacle avoidance |
| Morphology Switching | -14% | +8% | Fixed configuration limits versatility |
| Domain Randomization | -11% | +6% | Overfitting to training conditions |
| Hybrid CPG | -9% | +15% | Pure RL lacks rhythmic stability |
Metrics:
- Path Deviation: Mean = 0.05m, Max = 0.12m
- Goal Reach Accuracy: 0.03m (within tolerance)
- Completion Time: 38.2s (vs. 45.1s baseline)
Real-time autonomous navigation system visualization
Dashboard Components:
- SLAM Mapping: 84.3% coverage, real-time point cloud
- Terrain Classification: Confidence levels per region
- Morphology Distribution: Mode usage histogram
- Navigation Trajectory: Planned vs. executed path
- PPO Action Selection: Policy output distribution
- Performance Metrics: Live KPI monitoring
origaker.reconfiguration.locomotion.mp4
origaker_clip1.mp4
origaker_clip2.mp4
origaker_clip3.mp4
| Requirement | Minimum | Recommended |
|---|---|---|
| ๐ Python | 3.8+ | 3.9+ |
| ๐พ RAM | 8GB | 16GB+ |
| ๐ฎ GPU | Optional | CUDA-capable |
| ๐ฟ Storage | 2GB | 5GB+ |
Python >= 3.8
CUDA 11.7+ (optional, for GPU-accelerated training)pip install numpy scipy matplotlib pandas
pip install pybullet gym stable-baselines3[extra]
pip install tensorboard opencv-python open3d
pip install scikit-learn scikit-image torchgit clone https://github.com/Degas01/origaker_main.git
cd origaker_main# Using venv
python -m venv origaker_env
source origaker_env/bin/activate # Linux/Mac
origaker_env\Scripts\activate # Windows
# Or using conda
conda create -n origaker python=3.8
conda activate origakerpip install -r requirements.txtKey Dependencies:
pybullet==3.2.5
stable-baselines3==2.0.0
torch==2.0.1
numpy==1.24.3
scipy==1.10.1
matplotlib==3.7.1
opencv-python==4.7.0
open3d==0.17.0python scripts/smoke_test.pyExpected output:
โ PyBullet initialized
โ Origaker URDF loaded (12 joints)
โ Torque control enabled
โ Smoke test passed: Simulation stable
- System identification on physical Origaker platform
- Adaptive domain randomization refinement
- Real-time sensor noise characterization
- Contact dynamics calibration
- Power consumption validation
- RGB-D integration (currently depth-only)
- ORB feature tracking for loop closure
- Semantic segmentation for terrain classification
- Multi-modal sensor fusion (LiDAR + camera)
- Replace discrete modes with continuous joint-space optimization
- Online trajectory optimization (e.g., iLQR, DDP)
- Learned mode selection via RL (meta-learning)
- Energy-optimal configuration search
- Train hierarchical policy: meta-controller selects modes
- Multi-task learning across terrain types
- Transfer learning from simulation clusters
- Curriculum learning for progressively harder terrains
- Expand test suite: sand, mud, ice, gravel, vegetation
- Deformable terrain simulation (e.g., Taichi-MPM)
- Dynamic obstacles and moving platforms
- Outdoor field trials (unstructured environments)
- Failure recovery strategies (e.g., self-righting)
- Fault-tolerant control (leg damage scenarios)
- Battery-aware planning (energy-constrained missions)
- Communication loss resilience
- Fleet coordination for search & rescue
- Distributed SLAM and map merging
- Task allocation and role specialization
- Swarm behavior emergence
- King's College campus autonomous navigation trials
- Industrial inspection applications (nuclear, offshore)
- Disaster response scenario testing (UK Fire Service collaboration)
- Planetary analog missions (ESA partnership)
- ROS2 integration for broader compatibility
- Web-based simulation interface (JavaScript/WebAssembly)
- Benchmarking suite for locomotion research
- Educational modules for university courses
If you use this work in your research, please cite:
@mastersthesis{masone2025origaker,
title={Enhancing Metamorphic Legged Robot Locomotion Using Machine Learning and Nature-Inspired Design},
author={Masone, Giacomo Demetrio},
year={2025},
school={King's College London},
type={MSc Thesis},
department={Engineering Department},
supervisor={Spyrakos-Papastavridis, Emmanouil}
}Related Publications:
@article{tang2022origaker,
title={Origaker: A Novel Multi-Mimicry Quadruped Robot Based on a Metamorphic Mechanism},
author={Tang, Z. and Wang, K. and Spyrakos-Papastavridis, E. and Dai, J.S.},
journal={Journal of Mechanisms and Robotics},
volume={14},
number={6},
year={2022}
}This research was conducted at King's College London as part of the MSc Robotics program.
-
Prof./Dr. Emmanouil Spyrakos-Papastavridis โ Primary Supervisor
For invaluable guidance, expertise, and unwavering support throughout this project -
Dr. Taisir Elgorashi โ Degree Committee Member
For insightful feedback and scholarly input that enriched this work
-
MSc Robotics Cohort 2024-2025 โ Course Colleagues
For collaborative discussions, moral support, and friendship -
King's College London Engineering Department
For providing world-class resources, facilities, and academic environment
This project builds upon foundational work:
- Origaker Platform โ Tang et al. (2022)
- Stable-Baselines3 โ Raffin et al.
- PyBullet โ Erwin Coumans & team