Install pre-requisites
for proj in nanoeval alcatraz nanoeval_alcatraz; do
pip install -e project/"$proj"
done-
Code Quality vs. Generation Time
During local testing, we observed that the quality of the generated code does not correlate directly with the generation time. In some cases, a complete environment was generated within 6 minutes, while in others, after 18 minutes, no meaningful output was produced.
Thegeneration timeis controlled by the configuration file located at:
/project/paperbench/paperbench/agents/aisi-basic-agent/config.yaml. -
Environment-Only Generation Approach
The first version did not directly generate only the environment and stop in advance. Instead, it generated the entire repository first, then cropped the environment-related parts after a predefined time limit. This was necessary because the original repository included feedback processes where code could modify the environment during generation.
-
New Pipeline and Storage
- Added
/project/paperbench/pipeline.pyfor a dedicated environment extraction pipeline. - Added a new directory
/project/paperbench/env_only/to store the environment-related scripts and README files. - To run:
The generated files related to the environment will appear in the
cd project/paperbench/ python pipeline.pyenv_onlyfolder.
- Added
-
Rubric Tree Simplification
- Modified the default rubric tree at:
/project/paperbench/data/papers/rice/rubric.json - Now, the rubric tree only contains branches and leaves related to environment setup.
- Modified the default rubric tree at:
-
Customizable Dataset Split
- Updated the dataset split configuration file:
/project/paperbench/experiments/splits/debug.txt - By default, it uses the "rice" dataset.
- You can replace "rice" with any custom dataset name to adapt to new data.
- Updated the dataset split configuration file:
-
Input Adjustment:
Preprocess the GitHub repository dataset to match the current system:- Replace
paper.mdwith the repository'sREADME.md. - Replace
addendum.mdwith the main project code files.
- Replace
-
Output Definition:
In the next version, use a new agent to directly generate a full Conda environment or Dockerfile based on the environment-related outputs.
- Use an agent to automatically construct the rubric tree structure based on the repository contents.
- Determine a suitable generation time parameter for future demos and experiments.