Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SYCL incoherent output on >4GB allocations of GPU memory #5250

Closed
@Jacoby1218

Description

@Jacoby1218

SYCL is producing garbled results on above 4 GB allocations of GPU memory.

main.exe -ngl 99 -n 512 -m "D:\models\mythomax-l2-13b.Q5_K_M.gguf" -f "G:\llama.cpp-vulkan\prompts\chat-with-bob.txt" produced a buffer size of 8694.21 MiB, and this was the output:

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
llama_print_timings:        load time =   12957.00 ms
llama_print_timings:      sample time =      42.91 ms /   512 runs   (    0.08 ms per token, 11932.78 tokens per second)
llama_print_timings: prompt eval time =    1798.23 ms /    99 tokens (   18.16 ms per token,    55.05 tokens per second)
llama_print_timings:        eval time =   33713.87 ms /   511 runs   (   65.98 ms per token,    15.16 tokens per second)
llama_print_timings:       total time =   35658.27 ms /   610 tokens
Log end

with main.exe -ngl 8 -n 512 -m "D:\models\mythomax-l2-13b.Q5_K_M.gguf" -f "G:\llama.cpp-vulkan\prompts\chat-with-bob.txt" this is what i got on output

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: Thank you very much, Bob.
Bob: You're welcome! Is there anything else I can assist you with? [end of text]

llama_print_timings:        load time =   10251.58 ms
llama_print_timings:      sample time =       3.63 ms /    26 runs   (    0.14 ms per token,  7154.65 tokens per second)
llama_print_timings: prompt eval time =   40286.66 ms /    99 tokens (  406.94 ms per token,     2.46 tokens per second)
llama_print_timings:        eval time =    6300.03 ms /    25 runs   (  252.00 ms per token,     3.97 tokens per second)
llama_print_timings:       total time =   46600.27 ms /   124 tokens
Log end

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions