"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

nai-kon · 2024-11-12T05:15:40Z

After Ver0.3.0, "eval time" and "prompt eval time" of llama_print_timings are displayed as 0.00ms.
Firstly I thought it was a problem with llama.cpp, but it was displayed correctly.

Here is a code and results.

from llama_cpp import Llama

model = Llama(
    model_path="Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf",
)
output = model(
    prompt="Q: Name the planets in the solar system? A: ",
    max_tokens=128, # Generate up to 32 tokens, set to None to generate up to the end of the context window
    stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
    echo=True # Echo the prompt back in the output
)
print(output)

Ubuntu 22.04
Python 3.10.12

llama-cpp-python Ver0.2.90

llama_print_timings:      sample time =       2.79 ms /    33 runs   (    0.08 ms per token, 11819.48 tokens per second)
llama_print_timings: prompt eval time =   14805.69 ms /    13 tokens ( 1138.90 ms per token,     0.88 tokens per second)
llama_print_timings:        eval time =    3430.58 ms /    32 runs   (  107.21 ms per token,     9.33 tokens per second)
llama_print_timings:       total time =   18278.73 ms /    45 tokens

llama-cpp-python Ver0.3.0

llama_perf_context_print:        load time =   14788.07 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20017.62 ms /    61 tokens

llama-cpp-python Ver0.3.1

llama_perf_context_print:        load time =   14937.34 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20313.54 ms /    61 tokens

llama.cpp of latest master

exec command: ./llama-cli -m Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf -p "Q: Name the planets in the solar system? A: " -n 400 -e

llama_perf_context_print:        load time =    1450.48 ms
llama_perf_context_print: prompt eval time =     600.72 ms /    13 tokens (   46.21 ms per token,    21.64 tokens per second)
llama_perf_context_print:        eval time =   42424.41 ms /   399 runs   (  106.33 ms per token,     9.40 tokens per second)
llama_perf_context_print:       total time =   43197.46 ms /   412 tokens

The text was updated successfully, but these errors were encountered:

ddh0 · 2024-11-28T00:10:51Z

Same issue here

nobelchowdary · 2025-01-14T19:22:15Z

Same issue. Any resolution or fixing on it?

nai-kon changed the title ~~"eval time" and "prompt eval time" shows 0.00ms after Ver0.3.0~~ "eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 Nov 12, 2024

shakalaca mentioned this issue Jan 18, 2025

Fix error showing time spent in llama perf context print #1898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

nai-kon commented Nov 12, 2024 •

edited

Loading

ddh0 commented Nov 28, 2024

nobelchowdary commented Jan 14, 2025

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

Comments

nai-kon commented Nov 12, 2024 • edited Loading

llama-cpp-python Ver0.2.90

llama-cpp-python Ver0.3.0

llama-cpp-python Ver0.3.1

llama.cpp of latest master

ddh0 commented Nov 28, 2024

nobelchowdary commented Jan 14, 2025

nai-kon commented Nov 12, 2024 •

edited

Loading