Shift in actions sequence between agent and simulator #238
-
|
I'm working on a model-based reinforcement learning project on upkie, and I have observed a shift between the actions sequence provided by the agent and the actions sequence executed by the simulator. I think this is caused by the Spine::simulate( nb_substeps) function, where 3 actuation cycles are executed during the reset. To illustrate, I have run the following code and displayed the action (torque) and (observation left_wheel_velocity) inside the env.step(). I compared the results with the observation and action input of Spine::cycle_actuation() (Spine.cpp file) and BulletInterface::cycle() (BulletInterface.cpp) I have summarized the results of the comparison in the following table: could you please help me on this point ? Additional informations:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state. Details on the three-cycle reset behaviorUpon reset, (Because So far we haven't considered reducing this delay. For instance in the PPO balancer we rather add lag to actions to training environments (although that's a lag not a delay, on actions rather than an observations). The spine was mainly designed for 1 ms substeps (to match the real robot), but in the setting you describe substeps are much longer. So I guess your question is: can we reduce the delay between the simulation state and observation dictionary? Reducing substeps delay within the Bullet interfaceLooking at read_joint_sensors(); // currently
read_imu_data(imu_data_, bullet_, robot_, imu_link_index_, params_.dt);
send_commands(data);
bullet_.stepSimulation();However, if the simulation step is |
Beta Was this translation helpful? Give feedback.
Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.
Details on the three-cycle reset behavior
Upon reset,
Spine::simulatecycles the actuation three times, but that is a consequence rather than a cause of howSpine::cycle_actuationis implemented. The internal state of the spine (looking atactuation_output_andlatest_replies_) during these three steps looks like this:(Because
actuation_output_is a promise, I wrote between brackets values that will actually become available at the next call.)So far we haven't considered reducing thiβ¦