-
Notifications
You must be signed in to change notification settings - Fork 118
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
It is possible for envpool to return done=True, but infos['TimeLimit.truncated'] and infos["terminated"] return `False.
To Reproduce
See the following snippet and download the model. The code is roughly
envs = envpool.make(
"UpNDown-v5",
env_type="gym",
num_envs=1,
episodic_life=True, # Espeholt et al., 2018, Tab. G.1
repeat_action_probability=0, # Hessel et al., 2022 (Muesli) Tab. 10
noop_max=30, # Espeholt et al., 2018, Tab. C.1 "Up to 30 no-ops at the beginning of each episode."
full_action_space=False, # Espeholt et al., 2018, Appendix G., "Following related work, experts use game-specific action sets."
max_episode_steps=int(108000 / 4), # Hessel et al. 2018 (Rainbow DQN), Table 3, Max frames per episode
reward_clip=True,
seed=1,
)
next_obs = envs.reset()
step = 0
done = False
while not done:
step += 1
actions, key = get_action_and_value(network_params, actor_params, next_obs, key)
next_obs, _, d, infos = envs.step(np.array(actions))
episodic_return += infos["reward"][0]
done = sum(infos["terminated"]) + sum(infos["TimeLimit.truncated"]) >= 1
print(step, infos['TimeLimit.truncated'], infos["terminated"], infos["elapsed_step"], d)
if step > int(108000 / 4) + 10:
print("the environment is supposed to be done by now")
break
print(f"eval_episode={len(episodic_returns)}, episodic_return={episodic_return}")
27000 [False] [0] [26997] [False]
27001 [False] [0] [26998] [False]
27002 [False] [0] [26999] [False]
27003 [False] [0] [27000] [ True]
27004 [False] [0] [0] [False]
27005 [False] [0] [1] [False]
27006 [False] [0] [2] [False]
27007 [False] [0] [3] [False]
27008 [False] [0] [4] [False]
27009 [False] [0] [5] [False]
27010 [False] [0] [6] [False]
27011 [False] [0] [7] [False]
the environment is supposed to be done by now
eval_episode=0, episodic_return=367860.0
As shown, done=True, but it reports that the environment is neither terminated nor truncated. This is a bug because the environment is in fact truncated.
Expected behavior
infos['TimeLimit.truncated'] should be True at the 27000 step.
Screenshots
If applicable, add screenshots to help explain your problem.
System info
Describe the characteristic of your environment:
- Describe how the library was installed (pip, source, ...)
- Python version
- Versions of any other relevant libraries
import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)0.8.1 1.21.6 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] linux
Additional context
Might be related to #179
Reason and Possible fixes
If you know or suspect the reason for this bug, paste the code lines and suggest modifications.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working