Closed
Description
SYCL is producing garbled results on above 4 GB allocations of GPU memory.
main.exe -ngl 99 -n 512 -m "D:\models\mythomax-l2-13b.Q5_K_M.gguf" -f "G:\llama.cpp-vulkan\prompts\chat-with-bob.txt"
produced a buffer size of 8694.21 MiB, and this was the output:
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
llama_print_timings: load time = 12957.00 ms
llama_print_timings: sample time = 42.91 ms / 512 runs ( 0.08 ms per token, 11932.78 tokens per second)
llama_print_timings: prompt eval time = 1798.23 ms / 99 tokens ( 18.16 ms per token, 55.05 tokens per second)
llama_print_timings: eval time = 33713.87 ms / 511 runs ( 65.98 ms per token, 15.16 tokens per second)
llama_print_timings: total time = 35658.27 ms / 610 tokens
Log end
with main.exe -ngl 8 -n 512 -m "D:\models\mythomax-l2-13b.Q5_K_M.gguf" -f "G:\llama.cpp-vulkan\prompts\chat-with-bob.txt"
this is what i got on output
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: Thank you very much, Bob.
Bob: You're welcome! Is there anything else I can assist you with? [end of text]
llama_print_timings: load time = 10251.58 ms
llama_print_timings: sample time = 3.63 ms / 26 runs ( 0.14 ms per token, 7154.65 tokens per second)
llama_print_timings: prompt eval time = 40286.66 ms / 99 tokens ( 406.94 ms per token, 2.46 tokens per second)
llama_print_timings: eval time = 6300.03 ms / 25 runs ( 252.00 ms per token, 3.97 tokens per second)
llama_print_timings: total time = 46600.27 ms / 124 tokens
Log end