_record_survival needs to iterate through all job records, but job records from the generations before the checkpoint would have been removed if removes_subdirs=True, by _record_survival itself:
|
if self._output_manager._removes_subdirs: |
|
shutil.rmtree(batch_dir) |
This is an issue when the checkpoint is from a completed run, where _record_survival is executed and batch records are generated. A typical occurrence is on HPC with a request limit on resources.
Two possible remedies:
- Comment out this two lines to leave job records untouched
- Somehow detect from which generation the optimisation resumes, only iterate the newly generated job records, and reuse the previous batch records.
_record_survivalneeds to iterate through all job records, but job records from the generations before the checkpoint would have been removed ifremoves_subdirs=True, by_record_survivalitself:sober/sober/_evolver.py
Lines 104 to 105 in ab5201d
This is an issue when the checkpoint is from a completed run, where
_record_survivalis executed and batch records are generated. A typical occurrence is on HPC with a request limit on resources.Two possible remedies: