Hi, I have noticed that some part of the experiments mentioned in the paper are not currently included in the repository. In particular:
-
code generation experiment done on the HumanEval dataset, also could the calculation of winrate with GPT4 as baseline for both chat generation and code generation experiments.
-
the openllm leaderboard experiments for mitigating forgetting and reducing the alignment task, I could only find the gsm8k task in the repo.
would it be possible to share the code used for these part of the work ? Would be really helpful for anyone trying to replicate the results and build upon your research. Thanks again for making your work and code available !