 Hi authors, the performance metrics of the code LLM are not explicitly mentioned in the paper, is the score?And how is it calculated?