Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Kooroshoo/ml-agents

Repository files navigation

WalkerAgent with Unity ML-Agents

This project involves creating and training a walker agent using Unity's ML-Agents toolkit. The walker learns to stand, walk, and balance through reinforcement learning, specifically using the Proximal Policy Optimization (PPO) algorithm.

Table of Contents

Initial Scenario:

Initial

Train to Stand:

standing

Train to Walk Forward:

walking



Observation Space

The agent observes the positions, rotations, velocities, and angular velocities of its body parts, as well as the spring settings of its joints.

Body Part Observations

  • Position: 16 body parts × 3 coordinates = 48 values
  • Rotation: 16 body parts × 4 values = 64 values
  • Velocity: 16 body parts × 3 values = 48 values
  • Angular Velocity: 16 body parts × 3 values = 48 values

Total for Body Parts: 208 values

Joint Spring Observations (stiffness)

  • Spring Value: 16 joints

Total for Joints: 16 values

Overall Observation Size: 224 values


Action Space

The agent applies torque to body parts to control movement, and the torques are clamped for stability. The agent also applies forward and upward forces to simulate walking and stepping. Random rotations are applied at the start of each episode to vary initial conditions.

Torque Actions

  • Torque per Body Part: 16 body parts × 3 directions = 48 values

Spring Actions (stiffness)

  • Spring Adjustment per Joint: 16 joints

Movement Actions

  • Forward and upward movement.: 2 joints

Total Action Space Size: 66 values


Reward System

The reward system encourages the agent to Stay upright, Move forward efficiently, Alternate leg movements and Avoid falling.

  • Standing Reward: Stay upright by maintaining a high head position.

  • Forward Movement Reward: Move forward efficiently by rewarding forward velocity.

  • Leg movement Reward: Alternate leg movements, simulating a walking character.

  • Falling Reward: Avoid falling or excessive rotation by penalizing low hip or head positions.

These rewards aim to incentivize the agent to stand upright and move forward, helping it learn effective walking behavior over time.


Other Config

behaviors:
  walker-agent:                 # Identifier for the agent's behavior configuration
    trainer_type: ppo           # The RL algorithm to use; PPO stands for Proximal Policy Optimization

    hyperparameters:            # Parameters for the PPO algorithm
      batch_size: 2048          # Number of samples to process in each training step
      buffer_size: 20480        # Size of the buffer storing experiences for training
      learning_rate: 0.0001     # Learning rate for the optimizer
      beta: 0.001               # Weight for the entropy term in the loss function
      epsilon: 0.15             # Epsilon for the clipping function in PPO
      lambd: 0.95               # Discount factor for rewards in Generalized Advantage Estimation (GAE)
      num_epoch: 5              # Number of epochs to train on each batch

    network_settings:           # Configuration for the neural network used in the policy
      normalize: True           # Whether to normalize inputs to the network
      hidden_units: 256         # Number of hidden units per layer in the network
      num_layers: 2             # Number of layers in the network
      vis_encode_type: simple   # Type of visual encoding (e.g., simple, nature_cnn)

    reward_signals:             # Configuration for reward signals used in training
      extrinsic:                # Extrinsic reward signal configuration
        gamma: 0.99             # Discount factor for the reward signal
        strength: 1.0           # Strength of the extrinsic reward signal

    max_steps: 5000000          # Maximum number of steps to run the training
    summary_freq: 10000         # Frequency (in steps) to save training summaries



Using Unity 2022.3 LTS

ML-Agents Release 21

1. Create Conda Environment

Create a new Conda environment with Python 3.9.18:

conda create -n mlagents python=3.9.18 && conda activate mlagents

2. Install PyTorch with CUDA 12.1

Install PyTorch 2.2.1 (or compatible version) with CUDA support:

pip3 install torch~=2.2.1 --index-url https://download.pytorch.org/whl/cu121

3. Install ML-Agents from Source

Install ML-Agents and ML-Agents Environments from the local source:

cd /path/to/ml-agents
python -m pip install ./ml-agents-envs
python -m pip install ./ml-agents

4. Configure Unity for ML-Agents

  • Open your Unity project.
  • Go to Window > Package Manager.
  • Click "+" > "Add package from disk..."
  • Select com.unity.ml-agents and com.unity.ml-agents.extensions from your project root.

5. Verify Installation

mlagents-learn --help

If the help information appears, your setup is complete.



To train the WalkerAgent from scratch, run the following command:

mlagents-learn config/walker-agent.yaml --run-id=walker-agent --force

If you want to continue training from a previously saved model, run:

mlagents-learn config/walker-agent.yaml --initialize-from=walker-agent --run-id=walker-agent --force



About

WalkerAgent with Unity ML-Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages