Thanks to visit codestin.com
Credit goes to github.com

Skip to content

"eval time" and "prompt eval time" is 0.00ms after Ver0.3.0 #1830

Closed
@nai-kon

Description

@nai-kon

After Ver0.3.0, "eval time" and "prompt eval time" of llama_print_timings are displayed as 0.00ms.
Firstly I thought it was a problem with llama.cpp, but it was displayed correctly.

Here is a code and results.

from llama_cpp import Llama

model = Llama(
    model_path="Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf",
)
output = model(
    prompt="Q: Name the planets in the solar system? A: ",
    max_tokens=128, # Generate up to 32 tokens, set to None to generate up to the end of the context window
    stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
    echo=True # Echo the prompt back in the output
)
print(output)
  • Ubuntu 22.04
  • Python 3.10.12

llama-cpp-python Ver0.2.90

llama_print_timings:      sample time =       2.79 ms /    33 runs   (    0.08 ms per token, 11819.48 tokens per second)
llama_print_timings: prompt eval time =   14805.69 ms /    13 tokens ( 1138.90 ms per token,     0.88 tokens per second)
llama_print_timings:        eval time =    3430.58 ms /    32 runs   (  107.21 ms per token,     9.33 tokens per second)
llama_print_timings:       total time =   18278.73 ms /    45 tokens

llama-cpp-python Ver0.3.0

llama_perf_context_print:        load time =   14788.07 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20017.62 ms /    61 tokens

llama-cpp-python Ver0.3.1

llama_perf_context_print:        load time =   14937.34 ms
llama_perf_context_print: prompt eval time =       0.00 ms /    13 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /    48 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =   20313.54 ms /    61 tokens

llama.cpp of latest master

exec command: ./llama-cli -m Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf -p "Q: Name the planets in the solar system? A: " -n 400 -e

llama_perf_context_print:        load time =    1450.48 ms
llama_perf_context_print: prompt eval time =     600.72 ms /    13 tokens (   46.21 ms per token,    21.64 tokens per second)
llama_perf_context_print:        eval time =   42424.41 ms /   399 runs   (  106.33 ms per token,     9.40 tokens per second)
llama_perf_context_print:       total time =   43197.46 ms /   412 tokens

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions