This project involves creating and training a walker agent using Unity's ML-Agents toolkit. The walker learns to stand, walk, and balance through reinforcement learning, specifically using the Proximal Policy Optimization (PPO) algorithm.
The agent observes the positions, rotations, velocities, and angular velocities of its body parts, as well as the spring settings of its joints.
- Position:
16 body parts × 3 coordinates = 48 values - Rotation:
16 body parts × 4 values = 64 values - Velocity:
16 body parts × 3 values = 48 values - Angular Velocity:
16 body parts × 3 values = 48 values
Total for Body Parts: 208 values
- Spring Value:
16 joints
Total for Joints: 16 values
Overall Observation Size: 224 values
The agent applies torque to body parts to control movement, and the torques are clamped for stability. The agent also applies forward and upward forces to simulate walking and stepping. Random rotations are applied at the start of each episode to vary initial conditions.
- Torque per Body Part:
16 body parts × 3 directions = 48 values
- Spring Adjustment per Joint:
16 joints
- Forward and upward movement.:
2 joints
Total Action Space Size: 66 values
The reward system encourages the agent to Stay upright, Move forward efficiently, Alternate leg movements and Avoid falling.
-
Standing Reward: Stay upright by maintaining a high head position.
-
Forward Movement Reward: Move forward efficiently by rewarding forward velocity.
-
Leg movement Reward: Alternate leg movements, simulating a walking character.
-
Falling Reward: Avoid falling or excessive rotation by penalizing low hip or head positions.
These rewards aim to incentivize the agent to stand upright and move forward, helping it learn effective walking behavior over time.
behaviors:
walker-agent: # Identifier for the agent's behavior configuration
trainer_type: ppo # The RL algorithm to use; PPO stands for Proximal Policy Optimization
hyperparameters: # Parameters for the PPO algorithm
batch_size: 2048 # Number of samples to process in each training step
buffer_size: 20480 # Size of the buffer storing experiences for training
learning_rate: 0.0001 # Learning rate for the optimizer
beta: 0.001 # Weight for the entropy term in the loss function
epsilon: 0.15 # Epsilon for the clipping function in PPO
lambd: 0.95 # Discount factor for rewards in Generalized Advantage Estimation (GAE)
num_epoch: 5 # Number of epochs to train on each batch
network_settings: # Configuration for the neural network used in the policy
normalize: True # Whether to normalize inputs to the network
hidden_units: 256 # Number of hidden units per layer in the network
num_layers: 2 # Number of layers in the network
vis_encode_type: simple # Type of visual encoding (e.g., simple, nature_cnn)
reward_signals: # Configuration for reward signals used in training
extrinsic: # Extrinsic reward signal configuration
gamma: 0.99 # Discount factor for the reward signal
strength: 1.0 # Strength of the extrinsic reward signal
max_steps: 5000000 # Maximum number of steps to run the training
summary_freq: 10000 # Frequency (in steps) to save training summaries
Using Unity 2022.3 LTS
ML-Agents Release 21
Create a new Conda environment with Python 3.9.18:
conda create -n mlagents python=3.9.18 && conda activate mlagentsInstall PyTorch 2.2.1 (or compatible version) with CUDA support:
pip3 install torch~=2.2.1 --index-url https://download.pytorch.org/whl/cu121Install ML-Agents and ML-Agents Environments from the local source:
cd /path/to/ml-agents
python -m pip install ./ml-agents-envs
python -m pip install ./ml-agents- Open your Unity project.
- Go to Window > Package Manager.
- Click "+" > "Add package from disk..."
- Select com.unity.ml-agents and com.unity.ml-agents.extensions from your project root.
mlagents-learn --helpIf the help information appears, your setup is complete.
To train the WalkerAgent from scratch, run the following command:
mlagents-learn config/walker-agent.yaml --run-id=walker-agent --forceIf you want to continue training from a previously saved model, run:
mlagents-learn config/walker-agent.yaml --initialize-from=walker-agent --run-id=walker-agent --force