[BUG] Issues with `TensorDictPrimer`

## Without the primer, the collector does not feed any hidden state to the policy

in the [RNN tutorial ](https://github.com/pytorch/rl/blob/main/tutorials/sphinx-tutorials/dqn_with_rnn.py) it is stated that the primer is optional and it is used just to store the hidden states in the buffer.

This is not true in practice. Not adding the primer will result in the collector not feeding the hidden states to the policy during execution. Which will silently cause the rnn to loose any recurrency.

To reproduce, comment out this line 

https://github.com/pytorch/rl/blob/0063741839a3e5e1a527947945494d54f91bc629/tutorials/sphinx-tutorials/dqn_with_rnn.py#L269

and print the policy input at this line

https://github.com/pytorch/rl/blob/0063741839a3e5e1a527947945494d54f91bc629/torchrl/collectors/collectors.py#L733

you will see that no hidden state is fed to the rnn during execution and no  errors or warnings are thrown

## The primer overwrites any nested spec

Consider an env with nested specs

```python
 env = VmasEnv(
        scenario="balance,
        num_envs=5,
    )
```

add to it a primer for a nested hidden state

```python
    env = TransformedEnv(
        env,
        TensorDictPrimer(
            {
                "agents": CompositeSpec(
                    {
                        "h": UnboundedContinuousTensorSpec(
                            shape=(*env.shape, env.n_agents, 2, 128)
                        )
                    },
                    shape=(*env.shape, env.n_agents),
                )
            }
        ),
    )
```

the primer code in https://github.com/pytorch/rl/blob/0063741839a3e5e1a527947945494d54f91bc629/torchrl/envs/transforms/transforms.py#L4649 will overwirite the observation spec instead of updating it, resulting in the loss of all the spec keys that previoulsy were in the "agents" spec

The same result is obtained with

```python
    env = TransformedEnv(
        env,
        TensorDictPrimer(
            {
               ("agents","h"): UnboundedContinuousTensorSpec(
                            shape=(*env.shape, env.n_agents, 2, 128)
                )
            }
        ),
    )
```

here, updating the spec instead of overwriting it should do the job

## The order of the primer in the transforms seems to have an impact

In the same vmas environemnt as above, if i put the primer and then the reward sum

```python
 env = TransformedEnv(
        env,
        Compose(
            TensorDictPrimer(
                {
                    "agents": CompositeSpec(
                        {
                            "h": UnboundedContinuousTensorSpec(
                                shape=(*env.shape, env.n_agents, 2, 128)
                            )
                        },
                        shape=(*env.shape, env.n_agents),
                    )
                }
            ),
           RewardSum(
                        in_keys=[env.reward_key],
                        out_keys=[("agents", "episode_reward")],
                    ),
        ),
    )
```

all works well

but the opposite

```python
 env = TransformedEnv(
        env,
        Compose(
            RewardSum(
                in_keys=[env.reward_key],
                out_keys=[("agents", "episode_reward")],
            ),
            TensorDictPrimer(
                {
                    "agents": CompositeSpec(
                        {
                            "h": UnboundedContinuousTensorSpec(
                                shape=(*env.shape, env.n_agents, 2, 128)
                            )
                        },
                        shape=(*env.shape, env.n_agents),
                    )
                }
            ),
        ),
    )
```

causes

```bash
Traceback (most recent call last):
  File "/Users/Matteo/PycharmProjects/torchrl/sota-implementations/multiagent/mappo_ippo.py", line 302, in train
    collector = SyncDataCollector(
                ^^^^^^^^^^^^^^^^^^
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/collectors/collectors.py", line 644, in __init__
    self._make_shuttle()
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/collectors/collectors.py", line 661, in _make_shuttle
    self._shuttle = self.env.reset()
                    ^^^^^^^^^^^^^^^^
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/envs/common.py", line 2143, in reset
    tensordict_reset = self._reset(tensordict, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/envs/transforms/transforms.py", line 814, in _reset
    tensordict_reset = self.transform._reset(tensordict, tensordict_reset)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/envs/transforms/transforms.py", line 1129, in _reset
    tensordict_reset = t._reset(tensordict, tensordict_reset)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Matteo/PycharmProjects/torchrl/torchrl/envs/transforms/transforms.py", line 4722, in _reset
    value = self.default_value[key]
            ~~~~~~~~~~~~~~~~~~^^^^^
KeyError: ('agents', 'episode_reward')
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Issues with `TensorDictPrimer` #2327

Without the primer, the collector does not feed any hidden state to the policy

The primer overwrites any nested spec

The order of the primer in the transforms seems to have an impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Issues with TensorDictPrimer #2327

Description

Without the primer, the collector does not feed any hidden state to the policy

The primer overwrites any nested spec

The order of the primer in the transforms seems to have an impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] Issues with `TensorDictPrimer` #2327