You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After Ver0.3.0, "eval time" and "prompt eval time" of llama_print_timings are displayed as 0.00ms.
Firstly I thought it was a problem with llama.cpp, but it was displayed correctly.
Here is a code and results.
fromllama_cppimportLlamamodel=Llama(
model_path="Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf",
)
output=model(
prompt="Q: Name the planets in the solar system? A: ",
max_tokens=128, # Generate up to 32 tokens, set to None to generate up to the end of the context windowstop=["Q:", "\n"], # Stop generating just before the model would generate a new questionecho=True# Echo the prompt back in the output
)
print(output)
Ubuntu 22.04
Python 3.10.12
llama-cpp-python Ver0.2.90
llama_print_timings: sample time = 2.79 ms / 33 runs ( 0.08 ms per token, 11819.48 tokens per second)
llama_print_timings: prompt eval time = 14805.69 ms / 13 tokens ( 1138.90 ms per token, 0.88 tokens per second)
llama_print_timings: eval time = 3430.58 ms / 32 runs ( 107.21 ms per token, 9.33 tokens per second)
llama_print_timings: total time = 18278.73 ms / 45 tokens
llama-cpp-python Ver0.3.0
llama_perf_context_print: load time = 14788.07 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 13 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 48 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 20017.62 ms / 61 tokens
llama-cpp-python Ver0.3.1
llama_perf_context_print: load time = 14937.34 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 13 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 48 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 20313.54 ms / 61 tokens
llama.cpp of latest master
exec command: ./llama-cli -m Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf -p "Q: Name the planets in the solar system? A: " -n 400 -e
llama_perf_context_print: load time = 1450.48 ms
llama_perf_context_print: prompt eval time = 600.72 ms / 13 tokens ( 46.21 ms per token, 21.64 tokens per second)
llama_perf_context_print: eval time = 42424.41 ms / 399 runs ( 106.33 ms per token, 9.40 tokens per second)
llama_perf_context_print: total time = 43197.46 ms / 412 tokens
The text was updated successfully, but these errors were encountered:
nai-kon
changed the title
"eval time" and "prompt eval time" shows 0.00ms after Ver0.3.0
"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0
Nov 12, 2024
After Ver0.3.0, "eval time" and "prompt eval time" of llama_print_timings are displayed as 0.00ms.
Firstly I thought it was a problem with llama.cpp, but it was displayed correctly.
Here is a code and results.
llama-cpp-python Ver0.2.90
llama-cpp-python Ver0.3.0
llama-cpp-python Ver0.3.1
llama.cpp of latest master
exec command:
./llama-cli -m Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf -p "Q: Name the planets in the solar system? A: " -n 400 -e
The text was updated successfully, but these errors were encountered: