Performance Review On LLM For Solving Leetcode Pro

This paper evaluates the performance of Large Language Models (LLMs) in solving programming problems from Leetcode, highlighting their strengths and limitations in code generation. The study involved generating solutions with models like GPT-4 and GPT-3.5-turbo, assessing correctness and efficiency through metrics such as pass@k and runtime performance. Results indicate that while LLMs can produce competitive solutions, there are notable differences in performance compared to human-written code, emphasizing the need for further advancements in LLM capabilities.

Uploaded by

tl22btai0222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Performance Review On LLM For Solving Leetcode Pro

Uploaded by

tl22btai0222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Performance Review on LLM for solving leetcode

problems
1st Lun Wang* 2nd Chuanqi Shi 3rd Shaoshuai Du 4th Yiyi Tao
Duke University University of California San Diego University of Amsterdam Johns Hopkins University
North Carolina, USA California, USA Amsterdam, Netherlands Maryland, USC
[email protected] [email protected] [email protected] [email protected]

5th Yixian shen 6th Hang Zheng 7th Xinyu Qiu

University of Amsterdam University of California San Diego Northeastern University
Amsterdam, Netherlands California, USA Boston, USA
arXiv:2502.15770v1 [cs.SE] 16 Feb 2025

[email protected] [email protected] [email protected]

Abstract—This paper presents a comprehensive performance knowledge embedded within these models [4]. Despite their
evaluation of Large Language Models (LLMs) in solving pro- widespread adoption, research is increasingly focused on un-
gramming challenges from Leetcode, a widely used platform derstanding the limitations and evaluating the performance of
for algorithm practice and technical interviews. We began by
crawling the Leetcode website to collect a diverse set of problems LLMs. Studies have highlighted security vulnerabilities in AI-
encompassing various difficulty levels and topics. Using this generated code(Pearce etal., 2022; Sandoval etal., 2023; Perry
dataset, we generated solutions with multiple LLMs, including ., 2023) and the prevalence of bugs(Jesse, 2023), emphasizing
GPT-4 and GPT-3.5-turbo (ChatGPT-turbo). The generated solu- the need for thorough code review and testing. Other research
tions were systematically evaluated for correctness and efficiency. explores how developers interact with LLMs [12] and integrate
We employed the pass@k metric to assess the success rates
within a given number of attempts and analyzed the runtime them into their workflows(Vaithilingam., 2022; Barke., 2023),
performance of the solutions. Our results highlight the strengths examining the dynamics between human creativity and AI
and limitations of current LLMs [10] in code generation and assistance [13]. However, evaluating the runtime performance
problem-solving tasks, providing insights into their potential ap- of LLM-generated code has received less attention. While
plications and areas for improvement in automated programming correctness is crucial, the efficiency of code—how fast it
assistance.
Index Terms—LLM, LLM performance evaluation, ChatGPT runs and how optimally it uses resources—is a significant
review. concern in software engineering [6]. Performance optimization
is essential when resources are limited, scalability is needed,
I. I NTRODUCTION or energy consumption is a concern(Verdecchia., 2017; Acar
Large Language Models (LLMs) like ChatGPT(OpenAI, etal., 2016). In areas like high-frequency trading, real-time
2023) have revolutionized artificial intelligence, demonstrat- data processing, or large-scale web applications, even minor
ing remarkable capabilities in text and image generation. In execution time improvements can have substantial impacts. [8]
software development, specialized code-focused LLMs—such To address this gap, our research evaluates the performance of
as CodeGen(Nijkamp etal., 2022), StarCoder(Li etal., 2023), code generated by LLMs on algorithmic challenges typical of
WizardCoder(Luo etal., 2023), CodeT5(Wang etal., 2021), and programming contests and technical interviews. We conduct
Incoder(Fried etal., 2022)—assist developers by automating a comprehensive performance review using problems from
tasks like code generation, documentation, and unit testing. Leetcode,4 a widely used platform offering a vast repository
Additionally, LLMs have been integrated into Integrated De- of algorithmic problems across various difficulty levels and
velopment Environments (IDEs) as code assistants [3], includ- topics. [1] Our key contributions are: 1. Performance Analysis
ing GitHub Copilot,¹ Amazon CodeWhisperer,² and Tabnine.³ of LLM-Generated Code: We analyze the performance of
These tools aim to accelerate development by providing real- code generated by 18 LLMs on 204 Leetcode problems,
time code suggestions and automating routine coding tasks investigating performance differences across models using a
. The integration of LLMs into software development offers novel method for measuring and comparing runtime efficiency
significant benefits. Developers can save time, focus on higher- [11]. 2. Comparison with Human-Written Code: We compare
level design decisions, and potentially reduce time to mar- the performance of LLM-generated code with human-written
ket. LLMs help generate boilerplate code, suggest improve- code, providing insights into the current capabilities of LLMs
ments, and assist with complex problem-solving, enhancing in producing efficient code and highlighting areas where they
productivity and fostering innovation by leveraging the vast may lag behind human expertise. 3. Evaluation of Leetcode
as a Dataset: We assess the usability of Leetcode as a public
Identify applicable funding agency here. If none, delete this. repository of algorithmic problems for research purposes,
discussing its strengths and limitations to guide future research code generation process [16]. This framework automatically
utilizing similar resources [14]. Our methodology involves sent requests to the OpenAI API, providing the standardized
generating solutions using multiple LLMs, including GPT-4 problem input (problem statement, code comments, and code
and GPT-3.5-turbo, and systematically assessing their correct- framework) as prompts. The solutions returned by the LLM
ness and efficiency. We utilize metrics such as the pass@k were then parsed and formatted into the required Leetcode
metric, which evaluates the probability of a model providing submission format in Python code.
a correct solution within k attempts, and measure the runtime • Temperature Settings: We used five different tempera-
performance of the generated code. [15] By analyzing these ture values: 0.2, 0.4, 0.6, 0.8, and 1.0.
metrics, we aim to understand the strengths and limitations of • Solution Generation: At each temperature setting, we
current LLMs in algorithmic problem-solving contexts. Our generated 10 distinct solutions per model for each prob-
findings offer insights into how LLMs can assist developers lem.
in tackling complex programming challenges and identify
• OpenAI Models: We interfaced with the OpenAI API,
areas where further advancements are needed to enhance their
providing the standardized problem input (problem state-
capabilities in generating efficient, high-performance code
ment, code comments, and code framework) as prompts.
[17].
We set the temperature parameter accordingly and gener-
II. E XPERIMENT ated multiple solutions by invoking the model repeatedly.
• GitHub Copilot: We integrated Copilot into a compatible
This section outlines the experimental framework employed
code editor (e.g., Visual Studio Code) and input the
to evaluate the performance of Large Language Models
problem’s code framework. Copilot’s suggestions were
(LLMs) in solving algorithmic problems from Leetcode. The
captured for each temperature setting by configuring its
experiment is structured into three primary phases: data col-
randomness settings if available or by inducing variability
lection, code generation, and solution evaluation.
through prompt modifications.
A. Data Collection By generating multiple solutions across different temper-
To establish a comprehensive dataset for our evaluation, atures, we aimed to observe the impact of the temperature
we crawled the Leetcode website and collected a total of parameter on the correctness and efficiency of the generated
2,014 problems. These problems span various difficulty lev- code. This process also allowed us to assess the models’ ability
els—Easy, Medium, and Hard—and encompass a wide range to produce diverse solutions and their propensity to generate
of topics including data structures, algorithms, mathematics, optimal or suboptimal code under varying conditions [5].
and system design. During data collection, we focused on
extracting the essential components necessary for code gen- C. Solution Evaluation
eration: The evaluation phase involved assessing the correctness and
- Problem Statements: The detailed descriptions of each performance of the generated solutions by submitting them
problem, including the objective and any specific require- to Leetcode’s online judge system. The Leetcode platform
ments. provides an automated environment that compiles and executes
- Function Signatures: The provided code frameworks or submitted code against a predefined set of test cases.
templates, specifying input and output formats. For each submitted solution, we collected the following
- Code Comments: Any comments included in the code metrics
templates that provide additional guidance or constraints.
• Number of Unit Tests Passed: The total number of test
We standardized the problem data by removing any ex-
cases successfully passed by the solution.
traneous information such as solution discussions, hints, or
• Overall status: indicating whether the solution met all
previously submitted solutions. This preprocessing ensured
the problem requirements.
that the input to the LLMs was consistent and contained only
• Runtime: The execution time of the solution, measured
the information that a typical developer would have when
by Leetcode’s evaluation system.
attempting to solve the problem independently.
• Memory Usage: The amount of memory consumed
B. Code Generation during execution.
The code generation phase involved utilizing two categories The evaluation process was conducted systematically:
of LLMs to generate solutions for the collected Leetcode 1) Automated Submission: Solutions were programmati-
problems: OpenAI Models and GitHub Copilot Model. For cally submitted to Leetcode using their API or through
each problem, we generated solutions using these models automated scripting to ensure consistency and efficiency.
under varying levels of randomness and creativity, controlled [2]
by the temperature parameter in the models’ settings. The 2) Data Recording: All evaluation results were recorded
temperature parameter influences the diversity of the output, in a structured format for subsequent analysis. This
with higher values producing more varied and creative re- included capturing the raw output from Leetcode and
sponses. We utilized a Python framework to automate the parsing relevant information.
The collected data enabled us to analyze several aspects of parameter influences the randomness and diversity of the gen-
the models’ performance: erated solutions. By evaluating across different temperatures,
• Success Rate (Pass@ k Metric): The probability of a we aimed to identify the optimal setting for each model. The
model generating a correct solution within k attempts, best pass@ k value observed across all temperatures was then
considering the multiple solutions generated per problem. considered the final pass@ k metric for that LLM.
• Error Analysis: Identification of common errors or 2) D.2 Code Performance: To assess the performance of
misconceptions exhibited by the models, such as off-by- the code generated by the LLMs, we considered three key
one errors, incorrect loop conditions, or misuse of data metrics:
structures. 1) Memory Usage:
• Runtime Performance: Assessment of the efficiency of • We recorded the memory consumption reported by
the solutions, with a focus on execution time and resource Leetcode’s evaluation system for each submitted
utilization. solution. Memory usage is a critical factor in code
By evaluating both correctness and performance, we aimed performance, especially for problems with large
to understand not only whether the models could solve the input sizes or when operating under memory con-
problems but also how efficiently they could do so. This straints.
dual focus is critical in algorithmic problem-solving contexts, 2) Runtime Performance:
where optimal solutions are often required to meet time and • We measured the execution time of the generated
space constraints. solutions using pytest-benchmark, a Python
D. Data Analysis benchmarking tool. For each solution, we con-
ducted multiple runs to obtain a reliable estimate
In this section, we present the methods and metrics used of its runtime performance. The median runtime
to analyze the functional correctness and performance of was computed to mitigate the impact of outliers and
the code generated by the Large Language Models (LLMs). variability in execution times.
Our analysis aims to assess not only whether the models
3) Leetcode Runtime Percentile Rank:
can produce correct solutions but also how efficiently these
• Upon submission, Leetcode provides a percentile
solutions run compared to human-written code [9].
1) D.1 Functional Correctness: Functional correctness ranking that indicates how a solution’s runtime
measures the extent to which the code generated by an LLM compares to other users’ submissions for the same
adheres to the specified problem requirements, effectively problem. This rank is a value between 0 and 100,
conforming to the “program contract” defined by the input representing the percentage of submissions that the
prompt. To evaluate this aspect, we employed the pass@ k current solution outperforms. For example, a rank
metric, which calculates the probability that at least one of of 90 implies that the solution is faster than 90% of
the k generated samples passes all the test cases for a given all other submitted solutions. This metric allowed
problem [7]. us to benchmark the LLM-generated code against
We computed the pass@ k metrics for k = 1 and human-written code at a global scale.
k = 10, utilizing the unbiased estimator proposed by Chen III. R ESULTS
et al. (2021). This estimator accounts for the likelihood of
As show in TABLE 1, which presents the
obtaining a correct solution among multiple attempts and is
performance of various AI models in pass-k metrics,
defined as:
 likely representing different tasks or evaluation
n−c

benchmarks. Below is an analysis of the data:
 k  https : //github.com/DHU er/LLMe valuationr esults
pass@k = E 1 − n  ,

A. Dataset analysis
k
Our dataset analysis encompasses approximately 2,100
where:
LeetCode problems, meticulously selected to provide a com-
• n is the total number of generated samples, prehensive evaluation of Large Language Models (LLMs)
• c is the number of correct samples (i.e., samples that pass across a diverse range of algorithmic challenges. These prob-
all test cases), lems are systematically categorized into three difficulty levels:
• E denotes the expected value. Easy, Medium, and Hard, adhering to a distribution ratio of
This formula provides an unbiased estimate of the pass@ k approximately 11:50:10, respectively. Furthermore, all solu-
metric by considering all possible combinations of correct and tions generated by the LLMs were implemented in Python,
incorrect samples without replacement. a language renowned for its readability and widespread use
Following the methodology suggested by Chen et al. (2021), in coding competitions and technical interviews. Additionally,
we calculated the pass@ k for each temperature setting when each problem was approached using LLMs configured with
evaluating an LLM’s functional correctness. The temperature five different temperature settings—0.2, 0.4, 0.6, 0.8, and
1.0. The temperature parameter controls the creativity and
variability of the generated solutions, allowing us to examine
how different levels of randomness impact the correctness and
efficiency of the code produced. All the experiment code and
dataset is published at:
B. LLMs solution compared with Humans
To facilitate a robust comparison between LLM-generated
solutions and human-written code,we selected the o1-mini
model tested on LeetCode for this analysis. The results of
this comparison are depicted in Figure 3. Utilizing LeetCode’s
runtime percentile rankings—which assume that the majority
of historical submissions originate from human program-
mers—we assessed the execution speed of the LLM-generated
solutions relative to human-written counterparts. Our findings
reveal that the solutions produced by the selected LLM achieve
a mean runtime percentile rank of 63%, indicating that they
are faster than 63% of all previous submissions. Fig. 1. Leetcode memory analysis

C. Performance Overview
Top Performers:
• Canonical Solutions is the highest-performing model
with near-perfect scores (97.94 and 98.04). This suggests
it is tailored or highly optimized for the specific tasks.
• GTP-4-omni, GPT-4, and GPT-4-turbo follow but with
significantly lower scores, indicating a strong perfor-
mance but a noticeable gap compared to Canonical So-
lutions.
Mid-Tier Performers:
• Models such as Copilot, CodeLlama-13B-Instruct, and
WizardCoder-Python-7B show moderate performance
(scores in the range of ∼4–19). This reflects some
utility but highlights significant room for improvement
compared to the top-tier models.
Lower Performers:
Fig. 2. Leetcode runtime analysis
• Models like SantaCoder, InCoder-6B, and CodeT5-
Large-NTP-PY perform poorly with scores often below
5. These results suggest limited capability in handling the
evaluated tasks effectively.

R EFERENCES
[1] Ehsan Aghapour, Yixian Shen, Dolly Sapra, Andy Pimentel, and Anuj
Pathania. Piqi: Partially quantized dnn inference on hmpsocs. In
Proceedings of the 29th ACM/IEEE International Symposium on Low
Power Electronics and Design, pages 1–6, 2024.
[2] Tristan Coignion, Clément Quinton, and Romain Rouvoy. A perfor-
mance study of llm-generated code on leetcode. In Proceedings of
the 28th International Conference on Evaluation and Assessment in
Software Engineering, pages 79–89, 2024.
[3] Shaoshuai Du, Kuangrong Hao, Haichao Zhang, Xue-song Tang, and
Bing Wei. Patch elastic deformation: An effective data augmentation
method. In 2022 China Automation Congress (CAC), pages 2079–2084,
2022.
[4] Yiru Gong, Qimin Zhang, Huili Zheng, Zheyan Liu, and Shaohan Chen.
Graphical structural learning of rs-fmri data in heavy smokers. In
2024 4th International Conference on Computer Science and Blockchain
(CCSB), pages 434–438, 2024.
[5] Jiacheng Hu, Zhen Qi, Jianjun Wei, Jiajing Chen, Runyuan Bao,
and Xinyu Qiu. Few-shot learning with adaptive weight masking in Fig. 3. Distribution of the ranking for the runtime of the leetcode solution
conditional gans, 2024. of o1-mini
TABLE I
PASS @1 AND PASS @10 M ETRICS FOR VARIOUS M ODELS

Model Pass@1 (%) Pass@10 (%)

Canonical Solutions 97.94 98.04
GTP-4-omni 43.36 61.95
GPT-4 31.68 67.79
GPT-4-turbo 31.25 50.00
Copilot 8.09 19.12
CodeLlama-13B-Instruct 4.17 14.71
WizardCoder-Python-7B 4.17 12.25
CodeLlama-13B-Python 3.58 14.71
CodeLlama-7B-Instruct 3.24 12.75
StarCoder 2.70 10.29
CodeLlama-7B-Python 2.60 10.78
CodeLlama-7B 2.11 10.29
CodeGen2-7B-Instruct 2.16 10.78
CodeGen2-7B-Mono 1.28 7.35
CodeGen-6B-Mono 1.08 5.39
CodeGen-2B-Mono 1.13 6.37
Replit-Code-v1-3B 0.98 4.90
SantaCoder 0.69 4.90
InCoder-6B 0.59 3.92
InCoder-1B 0.10 0.98
CodeGen-350M-Mono 0.39 2.94
CodeT5-Large-NTP-PY 0.25 1.96

[6] Daoming Li, Qiang Chen, and Lun Wang. Phishing attacks: Detection
and prevention techniques. Journal of Industrial Engineering and
Applied Science, 2(4):48–53, 2024.
[7] Keqin Li, Jiajing Chen, Denzhi Yu, Tao Dajun, Xinyu Qiu, Lian Jieting,
Sun Baiwei, Zhang Shengyuan, Zhenyu Wan, Ran Ji, Bo Hong, and
Fanghao Ni. Deep reinforcement learning-based obstacle avoidance for
robot movement in warehouse environments, 2024.
[8] Zheyan Liu, Qimin Zhang, Huili Zheng, Shaohan Chen, and Yiru Gong.
A comparative study of machine learning approaches for diabetes risk
prediction: Insights from shap and feature importance. In 2024 5th In-
ternational Conference on Machine Learning and Computer Application
(ICMLCA), pages 35–38, 2024.
[9] Yixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, and Anuj Pathania.
Parameter-efficient fine-tuning via selective discrete cosine transform.
arXiv preprint arXiv:2410.09103, 2024.
[10] Yiyi Tao. Meta learning enabled adversarial defense. In 2023 IEEE
International Conference on Sensors, Electronics and Computer Engi-
neering (ICSECE), pages 1326–1330, 2023.
[11] Yiyi Tao. Meta learning enabled adversarial defense. In 2023 IEEE
International Conference on Sensors, Electronics and Computer Engi-
neering (ICSECE), pages 1326–1330. IEEE, 2023.
[12] Yiyi Tao, Yiling Jia, Nan Wang, and Hongning Wang. The fact:
Taming latent factor models for explainability with factorization trees.
In Proceedings of the 42nd international ACM SIGIR conference on
research and development in information retrieval, pages 295–304, 2019.
[13] Yiyi Tao, Zhuoyue Wang, Hang Zhang, and Lun Wang. Nevlp: Noise-
robust framework for efficient vision-language pre-training. arXiv
preprint arXiv:2409.09582, 2024.
[14] Chenxu Wang, Yixian Shen, Jia Jia, Yutong Lu, Zhiguang Chen, and
Bo Wang. Singlecaffe: an efficient framework for deep learning on a
single node. IEEE Access, 6:69660–69671, 2018.
[15] Lun Wang. Low-latency, high-throughput load balancing algorithms.
Journal of Computer Technology and Applied Mathematics, 1(2):1–9,
2024.
[16] Lun Wang, Wei Fang, and Yudi Du. Load balancing strategies in
heterogeneous environments. Journal of Computer Technology and
Applied Mathematics, 1(2):10–18, 2024.
[17] Lun Wang, Wentao Xiao, and Shan Ye. Dynamic multi-label learning
with multiple new labels. In Image and Graphics: 10th International
Conference, ICIG 2019, Beijing, China, August 23–25, 2019, Proceed-
ings, Part III 10, pages 421–431. Springer, 2019.

Vibe Coding Glossary For Beginners
100% (1)
Vibe Coding Glossary For Beginners
12 pages
Generative AI Interview Questions and Answers
100% (1)
Generative AI Interview Questions and Answers
7 pages
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
No ratings yet
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
42 pages
Microsoft 365 Copilot Architecture & Deployment
No ratings yet
Microsoft 365 Copilot Architecture & Deployment
7 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
From LLMs To LLM Based Agents For Software Engineering 1723301316
100% (1)
From LLMs To LLM Based Agents For Software Engineering 1723301316
42 pages
LLM Seminar PDF
No ratings yet
LLM Seminar PDF
10 pages
LLMs in Software Engineering
No ratings yet
LLMs in Software Engineering
75 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
Base 6
No ratings yet
Base 6
18 pages
Evaluating Large Language Models in Class Level Code Generation
No ratings yet
Evaluating Large Language Models in Class Level Code Generation
13 pages
Live Code Bench
No ratings yet
Live Code Bench
46 pages
Coding With ChatGPT and Other LLMs
100% (2)
Coding With ChatGPT and Other LLMs
2 pages
A Survey On Large Language Models For Soft Engineering
No ratings yet
A Survey On Large Language Models For Soft Engineering
57 pages
Efficient Large Language Models Survey
No ratings yet
Efficient Large Language Models Survey
67 pages
Large Language Models For Software Engineering
No ratings yet
Large Language Models For Software Engineering
79 pages
LLM-ProS Analyzing Large Language Models Performance in Competitive Problem Solving
No ratings yet
LLM-ProS Analyzing Large Language Models Performance in Competitive Problem Solving
8 pages
LMARL25 Final Projects
No ratings yet
LMARL25 Final Projects
8 pages
Analysis of Code and Test-Code Generated by Large Language Models
No ratings yet
Analysis of Code and Test-Code Generated by Large Language Models
47 pages
From Llms To Llm-Based Agents For Software Engineering: A Survey of Current, Challenges and Future
No ratings yet
From Llms To Llm-Based Agents For Software Engineering: A Survey of Current, Challenges and Future
50 pages
Large Language Models (LLMS) For Source Code Analysis: Applications, Models and Datasets
No ratings yet
Large Language Models (LLMS) For Source Code Analysis: Applications, Models and Datasets
24 pages
Codepori: Large Scale Model For Autonomous Software Development by Using Multi-Agents
No ratings yet
Codepori: Large Scale Model For Autonomous Software Development by Using Multi-Agents
10 pages
A Case Study On The Generative AI Project Life Cycle Using Large Language Models
No ratings yet
A Case Study On The Generative AI Project Life Cycle Using Large Language Models
12 pages
SWE-Perf: Can Language Models Optimize Code Performance On Real-World Repositories?
No ratings yet
SWE-Perf: Can Language Models Optimize Code Performance On Real-World Repositories?
22 pages
Large Language Models For Software Engineering - A Systematic Literature Review
No ratings yet
Large Language Models For Software Engineering - A Systematic Literature Review
79 pages
Assignment LLM AI Software Engineering Formatted
No ratings yet
Assignment LLM AI Software Engineering Formatted
4 pages
Tosem2hshzh024 5 PDF
No ratings yet
Tosem2hshzh024 5 PDF
79 pages
代码大模型
No ratings yet
代码大模型
18 pages
LLM's For Code Generation
No ratings yet
LLM's For Code Generation
31 pages
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
No ratings yet
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
10 pages
Evaluating Large Language Model (LLM) Systems: Metrics, Challenges, and Best Practices
No ratings yet
Evaluating Large Language Model (LLM) Systems: Metrics, Challenges, and Best Practices
27 pages
Evaluating LLMs in Code Generation
No ratings yet
Evaluating LLMs in Code Generation
26 pages
Bugs in LLms Genereated Code
No ratings yet
Bugs in LLms Genereated Code
47 pages
04 Resume-Omer-Arshad
No ratings yet
04 Resume-Omer-Arshad
4 pages
SciReplicate-Bench Benchmarking LLMs in Agent-Driv
No ratings yet
SciReplicate-Bench Benchmarking LLMs in Agent-Driv
23 pages
From - LLMs To - LLM - Based - Agents
No ratings yet
From - LLMs To - LLM - Based - Agents
42 pages
The Hidden Risks of LLM-Generated Web
No ratings yet
The Hidden Risks of LLM-Generated Web
9 pages
CodePori - Large-Scale System For Autonomous Software Development Using Multi-Agent Technology - 2402.01411v2
No ratings yet
CodePori - Large-Scale System For Autonomous Software Development Using Multi-Agent Technology - 2402.01411v2
23 pages
03-Towards An Understanding of Large Language
No ratings yet
03-Towards An Understanding of Large Language
41 pages
The Roleof LLMsin Automating Test Case Generationand Software Validation
No ratings yet
The Roleof LLMsin Automating Test Case Generationand Software Validation
12 pages
Test 2 29
No ratings yet
Test 2 29
14 pages
Leveraging Open Source LLMs For Software Engineering Education and Training
No ratings yet
Leveraging Open Source LLMs For Software Engineering Education and Training
10 pages
Nai - Research Paper
No ratings yet
Nai - Research Paper
14 pages
133 Large Language Model Evaluatio
No ratings yet
133 Large Language Model Evaluatio
12 pages
Large Language Models For Code Analysis - Do LLMs Really Do Their Job
No ratings yet
Large Language Models For Code Analysis - Do LLMs Really Do Their Job
18 pages
Full Text 01
No ratings yet
Full Text 01
51 pages
Assessing Large Language Models For Code Generation: A Comprehensive Framework
No ratings yet
Assessing Large Language Models For Code Generation: A Comprehensive Framework
6 pages
ASE2024 CodeGenSurvey-7
No ratings yet
ASE2024 CodeGenSurvey-7
17 pages
Case Study For Procurement
No ratings yet
Case Study For Procurement
62 pages
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
No ratings yet
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
6 pages
Consulting 4.0: An Exploratory Analysis of The Role and Potential of Artificial Intelligence in The Consultancy of Tomorrow
No ratings yet
Consulting 4.0: An Exploratory Analysis of The Role and Potential of Artificial Intelligence in The Consultancy of Tomorrow
12 pages
Studying The Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
No ratings yet
Studying The Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
19 pages
Taming Genai To Automate Business Processes:: The Aloma Example
No ratings yet
Taming Genai To Automate Business Processes:: The Aloma Example
15 pages
Inference Efficiency by Learning Task Complexity
No ratings yet
Inference Efficiency by Learning Task Complexity
9 pages
Performance-Aligned Llms For Generating Fast Code
No ratings yet
Performance-Aligned Llms For Generating Fast Code
12 pages
Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification
No ratings yet
Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification
6 pages
s10270 023 01105 5 - Newe
No ratings yet
s10270 023 01105 5 - Newe
13 pages
Large Language Models
No ratings yet
Large Language Models
6 pages
Brainstorming for Code Generation
No ratings yet
Brainstorming for Code Generation
13 pages
LLM Deployment on Local Network
No ratings yet
LLM Deployment on Local Network
3 pages
Investigating Chinese Learners Use and Perceptions of ChatGPT in EAP
No ratings yet
Investigating Chinese Learners Use and Perceptions of ChatGPT in EAP
5 pages
ChatGPT Coding CompSac 23
No ratings yet
ChatGPT Coding CompSac 23
9 pages
CIBench Evaluating Your LLMs With A Code Interpret
No ratings yet
CIBench Evaluating Your LLMs With A Code Interpret
22 pages
Data Seminar
No ratings yet
Data Seminar
10 pages
LLM Optimization and Acceleration Solutions
No ratings yet
LLM Optimization and Acceleration Solutions
12 pages
Ask Your PDF (Thesis)
No ratings yet
Ask Your PDF (Thesis)
42 pages
nlfynx7RfS0IZ9YGOtls - Some Core Concepts
No ratings yet
nlfynx7RfS0IZ9YGOtls - Some Core Concepts
6 pages
Benchmarking Large Language Models With A Unified Performance Ranking Metric
No ratings yet
Benchmarking Large Language Models With A Unified Performance Ranking Metric
13 pages
Ai in Review
No ratings yet
Ai in Review
27 pages
Resume Pavan Kumar Barman
No ratings yet
Resume Pavan Kumar Barman
2 pages
SSRN 5240924
No ratings yet
SSRN 5240924
101 pages
IT & ML Expert for Advanced AI Solutions
No ratings yet
IT & ML Expert for Advanced AI Solutions
7 pages
Evaluate Ai LLM
No ratings yet
Evaluate Ai LLM
17 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
14 pages
Aisb2023 1
No ratings yet
Aisb2023 1
138 pages
Finetuning Large Language Models - Short Course
No ratings yet
Finetuning Large Language Models - Short Course
16 pages
Comparing LLMs Using A Unified Performance Ranking System
No ratings yet
Comparing LLMs Using A Unified Performance Ranking System
13 pages
Auto Summarization
No ratings yet
Auto Summarization
36 pages
Ma LAware
No ratings yet
Ma LAware
5 pages
MCQ Generator
No ratings yet
MCQ Generator
9 pages
Omama A Local AI Chatbot
No ratings yet
Omama A Local AI Chatbot
10 pages
MOOVE-instructions-slide Deck
No ratings yet
MOOVE-instructions-slide Deck
51 pages
The Promises and Pitfalls of LLMs As Feedback Providers - Jacobsen - Weber
No ratings yet
The Promises and Pitfalls of LLMs As Feedback Providers - Jacobsen - Weber
34 pages
Build A Native AI Agent Pipeline With OpenVINO From 0 To 1 - by OpenVINO™ Toolkit - OpenVINO-toolkit - Nov, 2024 - Medium
No ratings yet
Build A Native AI Agent Pipeline With OpenVINO From 0 To 1 - by OpenVINO™ Toolkit - OpenVINO-toolkit - Nov, 2024 - Medium
16 pages
Faithful Reasoning Using Large Language Models: Antonia Creswell and Murray Shanahan
No ratings yet
Faithful Reasoning Using Large Language Models: Antonia Creswell and Murray Shanahan
48 pages
PPLL Va: V V S U - W P G: A Aried Ideo Equence Nderstand ING ITH Rompt Uidance
No ratings yet
PPLL Va: V V S U - W P G: A Aried Ideo Equence Nderstand ING ITH Rompt Uidance
16 pages
S 001: N Q A C LLM E: Afurai EW Ualitative Pproach For ODE Valuation
No ratings yet
S 001: N Q A C LLM E: Afurai EW Ualitative Pproach For ODE Valuation
22 pages
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
No ratings yet
CIE - Paper - AICS - 2023 - FineTuneIt - BHartmann - Example Paper
8 pages
FinTech Lab Summer 2024 Programming Task
No ratings yet
FinTech Lab Summer 2024 Programming Task
2 pages

Performance Review On LLM For Solving Leetcode Pro

Uploaded by

Performance Review On LLM For Solving Leetcode Pro

Uploaded by

Performance Review on LLM for solving leetcode

5th Yixian shen 6th Hang Zheng 7th Xinyu Qiu

[email protected] [email protected] [email protected]

Model Pass@1 (%) Pass@10 (%)

You might also like