Thanks to visit codestin.com
Credit goes to github.com

Skip to content

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nai-kon opened this issue Nov 12, 2024 · 2 comments
Open

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

nai-kon opened this issue Nov 12, 2024 · 2 comments

Comments

@nai-kon
Copy link

nai-kon commented Nov 12, 2024

After Ver0.3.0, "eval time" and "prompt eval time" of llama_print_timings are displayed as 0.00ms.
Firstly I thought it was a problem with llama.cpp, but it was displayed correctly.

Here is a code and results.

from llama_cpp import Llama

model = Llama(
    model_path="Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf",
)
output = model(
    prompt="Q: Name the planets in the solar system? A: ",
    max_tokens=128, # Generate up to 32 tokens, set to None to generate up to the end of the context window
    stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
    echo=True # Echo the prompt back in the output
)
print(output)
  • Ubuntu 22.04
  • Python 3.10.12

llama-cpp-python Ver0.2.90

llama_print_timings:      sample time =       2.79 ms /    33 runs   (    0.08 ms per token, 11819.48 tokens per second)
llama_print_timings: prompt eval time =   14805.69 ms /    13 tokens ( 1138.90 ms per token,     0.88 tokens per second)
llama_print_timings:        eval time =    3430.58 ms /    32 runs   (  107.21 ms per token,     9.33 tokens per second)
llama_print_timings:       total time =   18278.73 ms /    45 tokens

llama-cpp-python Ver0.3.0

llama_perf_context_print:        load time =   14788.07 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20017.62 ms /    61 tokens

llama-cpp-python Ver0.3.1

llama_perf_context_print:        load time =   14937.34 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20313.54 ms /    61 tokens

llama.cpp of latest master

exec command: ./llama-cli -m Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf -p "Q: Name the planets in the solar system? A: " -n 400 -e

llama_perf_context_print:        load time =    1450.48 ms
llama_perf_context_print: prompt eval time =     600.72 ms /    13 tokens (   46.21 ms per token,    21.64 tokens per second)
llama_perf_context_print:        eval time =   42424.41 ms /   399 runs   (  106.33 ms per token,     9.40 tokens per second)
llama_perf_context_print:       total time =   43197.46 ms /   412 tokens
@nai-kon nai-kon changed the title "eval time" and "prompt eval time" shows 0.00ms after Ver0.3.0 "eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 Nov 12, 2024
@ddh0
Copy link
Contributor

ddh0 commented Nov 28, 2024

Same issue here

@nobelchowdary
Copy link

Same issue. Any resolution or fixing on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants