Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Correction of benchmark results #87

@kvas7andy

Description

@kvas7andy

Hi everyone,

Found several bugs while checking the code of ipynb notebooks with benchmark results for 3 environments TinyToy, ToyCTF, Chain.

I think my findings might be useful for community, who uses this nice implementation of cyberattacks simulation.

MOVED TO SEPARATE ISSUE #115

  1. Issue 1: learner.epsilon_greedy_search(...) is often used for training agents with different algorithms, including DQL in the dql_run. However dql_exploit_run with input network dql_run as policy-agent and eval_episode_count parameter for the number of episodes, gives an impression that runs are used for evaluation of the trained DQN. The only distinguishable difference between 2 runs is epsilon queal to 0, which leads to exploitation mode of training, but does not exclude training, because during run with learner.epsilon_greedy_search the optimizer.step() is executed on each step of training in the file agent_dql.py, function call learner.on_step(...).
  1. Issue 2: During training each episode ends only within the maximum number of iterations, which is due to the mistype in AttackerGoal class. Default value for parameter own_atleast_percent: float 1.0 is included as condition with AND, for raising flag done = True, thus for TinyToy and ToyCTF (not Chain) leads to long duration of training, wrong RL signal for evaluating Q function and low sample-efficiency.

MOVED TO SEPARATE ISSUE #115
3. Issue 3: ToyCTF benchmark is inaccurate, because with correct evaluation procedure, like with chain network configuration, agent does not reqch goal of 6 owned nodes after 200 training episodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions