Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Java tests failed when CUDA enabled on version 3.0.0 #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RFYoung opened this issue Apr 8, 2024 · 5 comments Β· Fixed by #58 or #60
Closed

Java tests failed when CUDA enabled on version 3.0.0 #54

RFYoung opened this issue Apr 8, 2024 · 5 comments Β· Fixed by #58 or #60

Comments

@RFYoung
Copy link

RFYoung commented Apr 8, 2024

Hello!

I really appreciate that you have upgraded this project!

However, there are still 2 tests that cannot pass. testGenerateInfill and testCompleteInfillCustom. The outputs would be something like this:

{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"update_slots","line":1772,"msg":"all slots are idle"}
{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"launch_slot_with_task","line":1066,"msg":"slot is processing task","id_slot":0,"id_task":21}
{"tid":"130286006306496","timestamp":1712589265,"level":"INFO","function":"update_slots","line":2082,"msg":"kv cache rm [p0, end)","id_slot":0,"id_task":21,"p0":0}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000767f016b3d0f, pid=20857, tid=20912
#
# JRE version: OpenJDK Runtime Environment (22.0+36) (build 22+36)
# Java VM: OpenJDK 64-Bit Server VM (22+36, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libllama.so+0x125d0f]  dequantize_row_q4_K+0x4f
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/yys/java-llama.cpp/core.20857)
#
# An error report file with more information is saved as:
# /home/yys/java-llama.cpp/hs_err_pid20857.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

I have built with the command cmake .. -DBUILD_SHARED_LIBS=ON -DLLAMA_CUDA=ON -DLLAMA_CURL=ON.

Also, I have tested vanilla llama.cpp of tag b2619, with the same build args above and the same inference args (shown below), and it worked without crash:

./server -m PATH_TO_LLAMA_CHAT -ngl 43 --embeddings

and

curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{                                                                                                                                                                              
        "n_predict": 10,
        "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
        "logit_bias": [[2, 2.0]],
        "stop": ["\"\"\""],
        "seed": 42,
        "input_suffix": "\n    return result\n",
        "temperature": 0.95,
        "prompt": ""
}'

Anyway, other java tests have been passed.

Thanks!

@RFYoung
Copy link
Author

RFYoung commented Apr 8, 2024

Here is the log file:
hs_err_pid20857.log

@kherud
Copy link
Owner

kherud commented Apr 9, 2024

Damn, I didn't test thoroughly enough with CUDA, but I can reproduce the problems, thanks for reporting. It seems to be related to input_prefix and input_suffix being set, but I didn't find the reason yet. The strings are correctly transferred to C++ and tokenized equivalently. I think I tracked the segmentation fault down to happening at this line

const int ret = llama_decode(ctx, batch_view);

@kherud
Copy link
Owner

kherud commented Apr 15, 2024

It turns out this is a bug in llama.cpp after all and I've created an issue there (see ggml-org/llama.cpp#6672)

It didn't produce a crash for you because the /infill endpoint has to be used instead of /completion.

The problem only seems to occur with models that don't support infilling, which unfortunately is the case for the model used in the unit tests. However, everything works correctly with models that support infilling (e.g. codellama).

@kherud
Copy link
Owner

kherud commented Apr 21, 2024

I changed the model that is used for testing to codellama, so there shouldn't be a segmentation fault anymore. However, I'm still leaving this issue open until the underlying issue is fixed within llama.cpp.

@josh-ramer
Copy link

I think I've diagnosed the issue & pointed to the tag that fixed it in the related thread. ggml-org/llama.cpp#6672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants