Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Batch records cannot be generated from resumed checkpoints when removes_subdirs=True #5

Description

@airallergy

_record_survival needs to iterate through all job records, but job records from the generations before the checkpoint would have been removed if removes_subdirs=True, by _record_survival itself:

sober/sober/_evolver.py

Lines 104 to 105 in ab5201d

if self._output_manager._removes_subdirs:
shutil.rmtree(batch_dir)

This is an issue when the checkpoint is from a completed run, where _record_survival is executed and batch records are generated. A typical occurrence is on HPC with a request limit on resources.

Two possible remedies:

  • Comment out this two lines to leave job records untouched
  • Somehow detect from which generation the optimisation resumes, only iterate the newly generated job records, and reuse the previous batch records.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions