Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View codetlingua's full-sized avatar

Block or report codetlingua

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
codetlingua/README.md

Code Lingua Leaderboard

🚨 WIP: Artifacts for the leaderboard is expected to finish soon 🚨

Code Lingua leaderboard evaluates LLMs in Programming Language Translation. While other leaderboards assess abilities of LLMs to understand Natural Language (NL) for code synthesis, the ultimate way of assessing whether LLMs understand code syntax and semantics is code translation. Code Lingua serves as such leaderboard, and compares the ability of LLMs to understand what the code implements in source language and translate the same semantics in target language.

Requirements

Execute the following to install all requirements:

pip3 install -r requirements.txt

Docker

To create a docker image, execute the following:

docker build -t codetlingua .

Dataset

The dataset used in this study is available on HuggingFace. The current version of the leaderboard consists of the following datasets:

  1. CodeNet:
  • PLs: C, C++, Go, Java, Python
  • # Samples / Language: 200
  • # Tests / Sample: 1
  1. AVATAR:
  • PLs: Java, Python
  • # Samples / Language: 250
  • # Tests / Sample: ~50

Closed models API calls

In order to use GPT, Claude and Gemini, the following environment variables must be set before running the code.

  • GPT: OPENAI_API_KEY
  • Claude: ANTHROPIC_KEY
  • Gemini: GEMINI_KEY

The current version has been tested with gpt3.5-turbo, gpt4, gpt-4-0125, gemini-pro (1.0), claude-3-opus-20240229 .

Evaluating a new model?

The artifacts of Code Lingua has multiple modules which can be used for evaluating new LLMs on our benchmarks. You can either use our artifacts to evaluate your model or file a request so we could evaluate your model and add it to our leaderboard.

Translation

The first step is to use the model for generating raw translations. Please see the translate.sh script on how to generate translations. A sample translation command is provided below:

bash scripts/translate.sh deepseek-coder-1.3b-instruct codenet Java Python deepseek 0.2 10 16 1024 3 0

Sanitization

The raw translations generated by LLMs contain extra template-related tokens and natural language. Please see the sanitize.sh script on how to sanitize the generated translations. A sample sanitization command is provided below:

bash scripts/sanitize.sh translations deepseek-coder-1.3b-instruct codenet Java Python 0.2 remove_prompt

Evaluation

The final step is to evaluate the correctness of sanitized translations. Please check the evaluate.sh script on how to run the test suites against the translations. A sample evaluation command is given below:

bash scripts/evaluate.sh translations deepseek-coder-1.3b-instruct codenet Java Python 0.2 8

Contact Us

The artifacts of Code Lingua leaderboard is consistently being improved. If you see any inconsistencies, please feel free to open a PR or contact Ali ([email protected]).

Popular repositories Loading

  1. codetlingua codetlingua Public

    Python 18 5

  2. codetlingua.github.io codetlingua.github.io Public

    HTML