refactor(project): using single env class#36
Conversation
## Walkthrough
This update refactors the configuration management for all RL algorithms (A2C, DQN, PPO, REINFORCE, SARSA) by removing nested `EnvConfig` and `TrainConfig` dataclasses and replacing them with single, flattened config classes per algorithm. All code and tests are updated to use direct attribute access, simplifying instantiation and usage. Some test coverage is reduced or streamlined.
## Changes
| File(s) | Change Summary |
|----------------------------------------------|---------------|
| `toyrl/a2c.py`, `toyrl/dqn.py`, `toyrl/ppo.py`, `toyrl/reinforce.py`, `toyrl/sarsa.py` | Refactored configuration structure: removed nested `EnvConfig` and `TrainConfig` dataclasses, introduced flat config dataclasses per algorithm (`A2CConfig`, `DqnConfig`, `PPOConfig`, `ReinforceConfig`, `SarsaConfig`). Updated all code to use direct attribute access. |
| `tests/test_a2c.py`, `tests/test_dqn.py`, `tests/test_ppo.py`, `tests/test_reinforce.py`, `tests/test_sarsa.py` | Updated all test suites to use flattened config classes and direct attribute access. Removed or simplified tests related to nested configs, agent internals, and replay buffer. Some tests removed or merged for brevity and clarity. |
| `tests/test_a2c.py` | Removed tests: `test_agent_net_update`, `test_trainer_evaluation`. Renamed variables for clarity, simplified assertions and test logic. |
| `tests/test_ppo.py` | Removed tests: `test_replay_buffer`, `test_agent_creation`, `test_agent_act`, `test_agent_net_update`. Updated config usage. |
| `tests/test_sarsa.py` | Merged and renamed agent and trainer test functions, improved replay buffer test coverage, and switched optimizer in tests. |
| `tests/test_dqn.py`, `tests/test_reinforce.py` | Removed imports of now-defunct nested config classes, updated all config usage to flat structure. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant FlatConfig
participant Trainer
participant Env
User->>FlatConfig: Instantiate with parameters
User->>Trainer: Pass FlatConfig to Trainer
Trainer->>Env: Create environment using FlatConfig.env_name, render_mode, etc.
Trainer->>Trainer: Access training params (gamma, learning_rate, etc.) directly from FlatConfig
Trainer->>Trainer: Run training loop using flat config fieldsPossibly related PRs
Poem
|
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the project configuration by consolidating the environment and training configuration classes into a single configuration class for each algorithm. Key changes include replacing nested EnvConfig/TrainConfig structures with flat configurations and updating all corresponding references in the SARSA, REINFORCE, PPO, DQN, and A2C implementations.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| toyrl/sarsa.py | Replaced nested config classes with a consolidated SarsaConfig and updated attribute references, including wandb logging. |
| toyrl/reinforce.py | Replaced nested config classes with a consolidated ReinforceConfig and updated attribute references. |
| toyrl/ppo.py | Replaced nested config classes with a consolidated PPOConfig and updated attribute references in trainer methods. |
| toyrl/dqn.py | Replaced nested config classes with a consolidated DqnConfig and updated references in policy update and training loops. |
| toyrl/a2c.py | Replaced nested config classes with a consolidated A2CConfig and updated all corresponding trainer attribute references. |
| tests/* | Updated test files to validate the new flat configuration structure. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #36 +/- ##
==========================================
+ Coverage 98.67% 99.73% +1.06%
==========================================
Files 6 6
Lines 376 376
==========================================
+ Hits 371 375 +4
+ Misses 5 1 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
tests/test_a2c.py (1)
1-148: 💡 Verification agent🧩 Analysis chain
Verify test coverage after removing test functions
The test file has been simplified by removing
test_agent_net_updateandtest_trainer_evaluation. It's also simplified thetest_replay_bufferby removing second experience tests and sampling.Please verify that the critical functionality is still adequately tested somewhere else in the codebase.
🏁 Script executed:
#!/bin/bash # Check if there are any remaining tests for agent network updates # and trainer evaluation in other test files echo "Searching for agent net update tests in other files..." rg -l "test_agent_net_update|test_net_update" tests/ echo "Searching for trainer evaluation tests in other files..." rg -l "test_trainer_evaluation|test_evaluation" tests/Length of output: 353
I’ll search for any references to “update” and “evaluate” in the remaining tests to ensure coverage wasn’t moved or renamed elsewhere.
🏁 Script executed:
#!/bin/bash echo "Searching for 'update' in test files..." rg -n "update" tests/ || true echo "Searching for 'evaluate' in test files..." rg -n "evaluate" tests/ || trueLength of output: 1285
I’ll check whether
A2CTraineractually implements anevaluatemethod to confirm if its removal from tests is safe.
🏁 Script executed:
#!/bin/bash # Locate the file defining A2CTrainer trainer_file=$(rg -l "class A2CTrainer" -g "*.py") echo "A2CTrainer defined in: $trainer_file" # Inspect that file for an evaluate method echo "Searching for evaluate method in $trainer_file" rg -n "def evaluate" "$trainer_file" || trueLength of output: 339
Add tests for A2CTrainer.evaluate and Agent update logic
It looks like we’ve removed both
test_trainer_evaluationand any tests covering the agent’s learning/update step for A2C, and no equivalent coverage exists elsewhere. To prevent regressions, please add:• A test for
A2CTrainer.evaluate(num_episodes)in toyrl/a2c.py (method at line 232).
• A test exercising the A2C agent’s update call (e.g. invoking whatever internal policy/value update the Agent performs during training).This will ensure that both evaluation and the core learning step remain verified.
tests/test_ppo.py (1)
1-139: 💡 Verification agent🧩 Analysis chain
Verify implications of removed tests
According to the AI summary, multiple tests related to
ReplayBuffer,Experience, and the PPO agent's methods have been removed. This is a significant reduction in test coverage.Please verify that these components are still tested elsewhere or confirm that the removed tests are redundant.
🏁 Script executed:
#!/bin/bash # Check if ReplayBuffer and Experience are tested in other files echo "Searching for ReplayBuffer tests in other files..." rg -l "test.*ReplayBuffer|ReplayBuffer.*test" tests/ echo "Searching for Experience tests in other files..." rg -l "test.*Experience|Experience.*test" tests/ echo "Searching for PPO agent method tests in other files..." rg -l "test.*PPOAgent|PPOAgent.*test" tests/Length of output: 489
Missing tests for core PPO components
It looks like coverage for
ReplayBuffer,Experience, andPPOAgentwas removed and no equivalent tests exist elsewhere intests/. To maintain robustness:
- Add tests for
ReplayBuffer(e.g., pushing and sampling behavior, buffer overflow, empty-buffer edge cases).- Cover the
Experiencedata structure (correct fields, dtype checks, any helper methods).- Exercise
PPOAgentmethods (action selection, log-prob calculations, policy/value updates).If these classes have been intentionally deprecated, please remove their code or update the documentation to reflect the change. Otherwise, restore or rewrite tests to ensure these critical pieces remain covered.
🧹 Nitpick comments (3)
tests/test_a2c.py (3)
19-20: Variable naming inconsistencyThe variables are named
in_dimandout_dim, but the constructor parameters areenv_dimandaction_num. This mismatch could cause confusion for readers.Consider using consistent naming:
-in_dim, out_dim = 4, 2 -net = ActorCriticNet(env_dim=in_dim, action_num=out_dim) +env_dim, action_num = 4, 2 +net = ActorCriticNet(env_dim=env_dim, action_num=action_num)Or alternatively:
-in_dim, out_dim = 4, 2 -net = ActorCriticNet(env_dim=in_dim, action_num=out_dim) +in_dim, out_dim = 4, 2 +net = ActorCriticNet(env_dim=in_dim, action_num=out_dim) +# Variable names reflect the role (input/output dimensions) while +# parameter names reflect the domain semantics (environment/action)
61-62: Variable naming inconsistencySame naming inconsistency here as in the
test_actor_critic_netfunction.Consider using consistent naming as suggested earlier.
75-76: Variable naming inconsistencySame naming inconsistency here as in the previous functions.
Consider using consistent naming as suggested earlier.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
tests/test_a2c.py(5 hunks)tests/test_dqn.py(2 hunks)tests/test_ppo.py(2 hunks)tests/test_reinforce.py(2 hunks)tests/test_sarsa.py(3 hunks)toyrl/a2c.py(6 hunks)toyrl/dqn.py(5 hunks)toyrl/ppo.py(5 hunks)toyrl/reinforce.py(5 hunks)toyrl/sarsa.py(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
toyrl/a2c.py (2)
toyrl/dqn.py (1)
Agent(117-180)toyrl/reinforce.py (1)
Agent(81-129)
tests/test_sarsa.py (1)
toyrl/sarsa.py (6)
PolicyNet(33-51)sample(78-97)Agent(100-148)act(113-121)SarsaConfig(13-30)SarsaTrainer(151-219)
🪛 GitHub Check: codecov/patch
toyrl/dqn.py
[warning] 279-279: toyrl/dqn.py#L279
Added line #L279 was not covered by tests
🔇 Additional comments (47)
tests/test_dqn.py (4)
190-192: LGTM - Direct attribute access in testsThe updated tests now correctly access configuration parameters directly from the flattened
DqnConfigstructure instead of through nested objects, aligning with the refactoring changes.
195-209: LGTM - Custom configuration testsThe test for custom configuration correctly instantiates the flattened
DqnConfigwith direct attributes and verifies them appropriately. The test logic remains sound while adapting to the new structure.
215-222: LGTM - Trainer creation testThe test for trainer creation has been updated to use the flattened configuration structure, maintaining test coverage while adapting to the code changes.
237-246: LGTM - Minimal training testThe minimal training test correctly uses the flattened configuration structure with appropriate parameters, preserving the test's functionality.
tests/test_reinforce.py (4)
94-96: LGTM - Default configuration testThe test correctly verifies default values using direct attribute access on the flattened configuration object.
98-108: LGTM - Custom configuration testThe custom configuration test has been successfully updated to use the flattened
ReinforceConfigstructure with direct parameter access.
113-119: LGTM - Trainer creation testThe trainer creation test properly uses the flattened configuration structure, maintaining test functionality.
132-138: LGTM - Minimal training testThe minimal training test has been updated to use the flattened configuration structure correctly.
toyrl/sarsa.py (5)
12-31: LGTM - Well-structured flattened configurationThe new flattened
SarsaConfigclass is well-organized with clear grouping of parameters by functionality (environment, training, logging) and includes helpful comments. This simplifies configuration management and improves code readability.
160-165: LGTM - Direct config access in agent initializationThe code now correctly accesses learning rate, gamma, and solved threshold directly from the flattened config object.
175-176: LGTM - Environment creation with direct config accessEnvironment creation now properly uses direct attributes from the flattened config.
185-186: LGTM - Training loop with direct config accessThe training loop condition now correctly uses the max training steps directly from the config.
223-230: LGTM - Default config instantiationThe code now correctly instantiates the default config with direct parameters instead of nested config objects.
toyrl/dqn.py (3)
13-53: LGTM - Comprehensive flattened configurationThe new flattened
DqnConfigclass is well-structured with clear grouping of parameters by functionality (environment, training, target network, logging) and includes detailed docstrings for each parameter. This improves code readability and maintenance.
193-204: LGTM - Agent initialization with direct config accessThe agent initialization now correctly uses parameters directly from the flattened config object.
292-302: LGTM - Default config instantiationThe default config instantiation now correctly uses direct attributes instead of nested config objects, consistent with the refactoring.
tests/test_sarsa.py (7)
18-23: Good rename to improve readability.Changing the variable names from numeric values to
in_dimandout_dimmakes the test more descriptive and easier to understand.
44-47: Improved test coverage for buffer sampling.Testing the edge case where there's only one experience in the buffer and no next state-action pairs is available is good practice.
48-64: Well-structured test for replay buffer sampling.The test now properly verifies that next state-action pairs are linked correctly between experiences, which is crucial for the SARSA algorithm's functioning.
70-87: Good consolidation of agent tests.Merging the agent creation and action selection tests makes the test suite more concise. The switch from RMSprop to Adam optimizer is also appropriate as Adam often performs better in practice for reinforcement learning.
93-99: Test correctly updated for flattened config structure.The test now properly checks all direct attributes of the SarsaConfig class instead of nested config objects, aligning with the refactoring goal.
102-116: Trainer test improved with better function name and assertions.Renaming to
test_trainer()is more concise, and the assertions directly check the config equality and key parameters, which is clearer than the previous implementation.
122-128: Minimal training test properly updated.The test configuration has been updated to use the flattened config structure correctly.
toyrl/a2c.py (6)
12-34: Well-structured flattened configuration class.The new
A2CConfigdataclass properly consolidates environment and training parameters with clear section comments, making the configuration more maintainable and easier to use.
156-173: A2CTrainer correctly updated to use flattened config.The trainer initialization now properly accesses attributes directly from the flattened config, reducing unnecessary nesting and improving code clarity.
174-180: WandB logging updated appropriately.The wandb initialization has been properly updated to work with the flattened config structure.
199-200: Rendering check correctly updated.The conditional rendering check now properly uses the direct config attribute.
214-230: Evaluation logic correctly updated.The evaluation interval and episodes parameters are now properly accessed from the flattened config.
245-253: Main block default config updated correctly.The default configuration now uses the flattened structure and maintains the key parameters from the previous implementation.
toyrl/reinforce.py (5)
12-27: Well-structured REINFORCE configuration class.The new
ReinforceConfigdataclass properly organizes both environment and training parameters with clear section comments, making the configuration more maintainable.
135-148: ReinforceTrainer correctly updated to use flattened config.The trainer initialization now properly accesses all configuration parameters directly from the flattened config.
149-155: WandB logging updated correctly.The wandb initialization now properly uses the flattened config structure for naming and tracking.
187-194: WandB logging condition updated correctly.The wandb logging condition now properly checks the flattened config attribute.
198-206: Main block default config updated correctly.The default configuration now properly uses the flattened structure with appropriate parameters.
toyrl/ppo.py (7)
12-38: Well-structured PPO configuration class.The new
PPOConfigdataclass properly consolidates all environment and training parameters with helpful docstrings for key parameters, making the configuration more maintainable and self-documenting.
209-227: PPOTrainer initialization updated correctly.The trainer initialization now properly uses the flattened config for optimizer creation and wandb setup.
228-235: Environment creation updated correctly.The environment creation now properly uses direct attributes from the flattened config.
238-240: Training parameters correctly accessed.Batch size and iteration calculations now properly use direct attributes from the flattened config.
244-247: Learning rate annealing updated correctly.The learning rate annealing logic now properly checks and uses the flattened config attributes.
285-296: Policy update logic correctly updated.The policy update loop now properly uses all parameters directly from the flattened config.
309-323: Main block default config updated correctly.The default configuration now uses the flattened structure with appropriate parameters for the PPO algorithm.
tests/test_a2c.py (3)
86-87: Good improvement on assertionThe assertion using
action in [0, 1]is clearer and more direct than a range check.
91-109: Nicely simplified config testingThe flattened config structure makes the config creation and testing more straightforward and easier to understand.
113-139: Clear and simplified trainer testingThe refactored trainer tests using flattened config structure are more readable and maintainable.
tests/test_ppo.py (3)
57-96: Well-structured config testsThe flattened config structure makes the tests more direct and easier to understand. All the attributes are now accessed directly from the config object instead of through nested objects.
98-114: Simplified trainer creation testThe trainer creation test is now cleaner with the flattened config structure.
116-139: Clear minimal training testThe minimal training test is now easier to understand with the flattened config structure. Good use of comments to explain the purpose of specific parameters.
Summary by CodeRabbit
Refactor
Chores