Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BUG] done=True, but infos['TimeLimit.truncated'] and infos["terminated"] are False #239

@vwxyzjn

Description

@vwxyzjn

Describe the bug

It is possible for envpool to return done=True, but infos['TimeLimit.truncated'] and infos["terminated"] return `False.

To Reproduce

See the following snippet and download the model. The code is roughly

envs = envpool.make(
    "UpNDown-v5",
    env_type="gym",
    num_envs=1,
    episodic_life=True,  # Espeholt et al., 2018, Tab. G.1
    repeat_action_probability=0,  # Hessel et al., 2022 (Muesli) Tab. 10
    noop_max=30,  # Espeholt et al., 2018, Tab. C.1 "Up to 30 no-ops at the beginning of each episode."
    full_action_space=False,  # Espeholt et al., 2018, Appendix G., "Following related work, experts use game-specific action sets."
    max_episode_steps=int(108000 / 4),  # Hessel et al. 2018 (Rainbow DQN), Table 3, Max frames per episode
    reward_clip=True,
    seed=1,
)
next_obs = envs.reset()
step = 0
done = False
while not done:
    step += 1
    actions, key = get_action_and_value(network_params, actor_params, next_obs, key)
    next_obs, _, d, infos = envs.step(np.array(actions))
    episodic_return += infos["reward"][0]
    done = sum(infos["terminated"]) + sum(infos["TimeLimit.truncated"]) >= 1
    print(step, infos['TimeLimit.truncated'], infos["terminated"], infos["elapsed_step"], d)

    if step > int(108000 / 4) + 10:
        print("the environment is supposed to be done by now")
        break
print(f"eval_episode={len(episodic_returns)}, episodic_return={episodic_return}")
27000 [False] [0] [26997] [False]
27001 [False] [0] [26998] [False]
27002 [False] [0] [26999] [False]
27003 [False] [0] [27000] [ True]
27004 [False] [0] [0] [False]
27005 [False] [0] [1] [False]
27006 [False] [0] [2] [False]
27007 [False] [0] [3] [False]
27008 [False] [0] [4] [False]
27009 [False] [0] [5] [False]
27010 [False] [0] [6] [False]
27011 [False] [0] [7] [False]
the environment is supposed to be done by now
eval_episode=0, episodic_return=367860.0

As shown, done=True, but it reports that the environment is neither terminated nor truncated. This is a bug because the environment is in fact truncated.

Expected behavior

infos['TimeLimit.truncated'] should be True at the 27000 step.

Screenshots

If applicable, add screenshots to help explain your problem.

System info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, source, ...)
  • Python version
  • Versions of any other relevant libraries
import envpool, numpy, sys
print(envpool.__version__, numpy.__version__, sys.version, sys.platform)
0.8.1 1.21.6 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] linux

Additional context

Might be related to #179

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions