Shift in actions sequence between agent and simulator #238

elasriz · 2024-03-29T16:46:32Z

elasriz
Mar 29, 2024

I'm working on a model-based reinforcement learning project on upkie, and I have observed a shift between the actions sequence provided by the agent and the actions sequence executed by the simulator.

I think this is caused by the Spine::simulate( nb_substeps) function, where 3 actuation cycles are executed during the reset.

To illustrate, I have run the following code and displayed the action (torque) and (observation left_wheel_velocity) inside the env.step().

def run(env: upkie.envs.UpkieGroundVelocity):
    action = env.get_neutral_action()

    # Position commands to keep the legs extended
    action["left_hip"]["position"] = 0.0
    action["left_knee"]["position"] = 0.0
    action["right_hip"]["position"] = 0.0
    action["right_knee"]["position"] = 0.0

    # Disable velocity feedback in the wheels
    # (we don't set kp_scale as the neutral action has no position command)
    action["left_wheel"]["kd_scale"] = 0.0
    action["right_wheel"]["kd_scale"] = 0.0
    action["right_wheel"]["maximum_torque"] = 1.0
    action["right_wheel"]["maximum_torque"] = 1.0

    state, info = env.reset()  # connects to the spine

    for step in range(16):
        
        force = env.action_space.sample()

        action["left_wheel"]["feedforward_torque"] = +force
        action["right_wheel"]["feedforward_torque"] = -force
        _, _, terminated, truncated, info = env.step(action)
        if step == 6:

            state, info = env.reset()

I compared the results with the observation and action input of Spine::cycle_actuation() (Spine.cpp file) and BulletInterface::cycle() (BulletInterface.cpp)

I have summarized the results of the comparison in the following table:

could you please help me on this point ?

Additional informations:

I am running the simulation with nb_substeps = 1
frequency: 50 ( I observed the same behaviour with frequency = 200.0 as well)
Env= UpkieGroundVelocity-v3

Answered by stephane-caron

Apr 2, 2024

Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.

Details on the three-cycle reset behavior

Upon reset, Spine::simulate cycles the actuation three times, but that is a consequence rather than a cause of how Spine::cycle_actuation is implemented. The internal state of the spine (looking at actuation_output_ and latest_replies_) during these three steps looks like this:

(Because actuation_output_ is a promise, I wrote between brackets values that will actually become available at the next call.)

So far we haven't considered reducing thi…

View full answer

stephane-caron · 2024-04-02T13:42:07Z

stephane-caron
Apr 2, 2024
Maintainer

Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.

Details on the three-cycle reset behavior

Upon reset, Spine::simulate cycles the actuation three times, but that is a consequence rather than a cause of how Spine::cycle_actuation is implemented. The internal state of the spine (looking at actuation_output_ and latest_replies_) during these three steps looks like this:

(Because actuation_output_ is a promise, I wrote between brackets values that will actually become available at the next call.)

So far we haven't considered reducing this delay. For instance in the PPO balancer we rather add lag to actions to training environments (although that's a lag not a delay, on actions rather than an observations). The spine was mainly designed for 1 ms substeps (to match the real robot), but in the setting you describe substeps are much longer. So I guess your question is: can we reduce the delay between the simulation state and observation dictionary?

Reducing substeps delay within the Bullet interface

Looking at BulletInterface::cycle again, one straightforward way in which we could reduce this number by one would be to step the simulation before reading sensors:

  read_joint_sensors();  // currently
  read_imu_data(imu_data_, bullet_, robot_, imu_link_index_, params_.dt);
  send_commands(data);
  bullet_.stepSimulation();

However, if the simulation step is $\delta t = 1$ ms, I'm concerned that stepping before would be less realistic than stepping after (as currently done). When a servo receives a CAN packet "here is your action, please report your observations" at time $t$, it will report the observation at time $t$ (corresponds to stepping after), rather than wait for $\delta t$ and report the observation at time $t + \delta t$ (corresponds to stepping before).

1 reply

stephane-caron Oct 31, 2024
Maintainer

Following up on this, I posted some more thoughts in Starting CAN cycles after sleep or after action related to actuation cycles in general (both in simulation and on real robots). To link with that note:

In simulation we assume observations are made instantly ($T_o = 0$). If we made the change I mentioned above, it would correspond to $T_o = T$, but for the spine to run sustainably we want $T_o \leq T_c < T$.
The action sent to the simulator at step 2 was computed using the observation from step 0 (diagram above, which is consistent with the property $t_{send} - t_{obs} = 2 T + T_o$).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upkie

Shift in actions sequence between agent and simulator #238

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Upkie

Shift in actions sequence between agent and simulator #238

Uh oh!

Uh oh!

elasriz Mar 29, 2024

Details on the three-cycle reset behavior

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

stephane-caron Apr 2, 2024 Maintainer

Details on the three-cycle reset behavior

Reducing substeps delay within the Bullet interface

Uh oh!

Uh oh!

stephane-caron Oct 31, 2024 Maintainer

elasriz
Mar 29, 2024

Replies: 1 comment 1 reply

stephane-caron
Apr 2, 2024
Maintainer

stephane-caron Oct 31, 2024
Maintainer