Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@paulo101977
Copy link
Contributor

@paulo101977 paulo101977 commented Sep 23, 2025

Description

closes #80

Performance report: https://wandb.ai/openrlbenchmark/sbx/reports/PPO-CNN-Performance-report--VmlldzoxNDU2OTk2OA

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

Copy link
Contributor Author

@paulo101977 paulo101977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, I forgot of send a message here.

@araffin araffin requested a review from Copilot September 26, 2025 10:27

This comment was marked as resolved.

@araffin
Copy link
Owner

araffin commented Sep 26, 2025

Hello,
thanks for the PR, but the code you provided was actually not used.
I refactored, fixed the code and updated the tests.

@araffin
Copy link
Owner

araffin commented Sep 26, 2025

Some report about the performance: I could get good results on Breakout but couldn't get it to work on Pong for now.
After checking with SB3, this is due to the shared vs non-shared CNN.

Runs can be found on W&B: https://wandb.ai/openrlbenchmark/sbx

@paulo101977
Copy link
Contributor Author

paulo101977 commented Sep 26, 2025

Hello, thanks for the PR, but the code you provided was actually not used. I refactored, fixed the code and updated the tests.

Dont worry!!!!

I appreciate you looking at my code.

@araffin
Copy link
Owner

araffin commented Sep 29, 2025

After sharing the CNN features extractor between the actor and the critic, the learning curves match the ones from SB3 =)
https://wandb.ai/openrlbenchmark/sbx/reports/PPO-CNN-Performance-report--VmlldzoxNDU2OTk2OA

I'll do a bit more runs and then merge.

@araffin araffin merged commit d792a63 into araffin:master Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] CnnPolicy for PPO

2 participants