Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Interrupting inference? #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jrtp opened this issue May 15, 2024 · 7 comments · Fixed by #63
Closed

Interrupting inference? #62

jrtp opened this issue May 15, 2024 · 7 comments · Fixed by #63

Comments

@jrtp
Copy link

jrtp commented May 15, 2024

Nice work :)
I did not find a way to cleanly interrupt inference, could only supress output from the iterable loop. Is this somehow possible?

@kherud
Copy link
Owner

kherud commented May 15, 2024

Hey @jrtp thanks for the issue, this was indeed not possible. However, it can now be done like this:

LlamaIterator iterator = model.generate(params).iterator();
while (iterator.hasNext()) {
    LLamaOutput output = iterator.next();
    System.out.println(output);

    if (Math.random() < 0.5) {
        iterator.cancel();
    }
}

Note, that there was a slight API change from LlamaModel.Output to LlamaOutput.

Maven version 3.1.0 should soon be available.

@kherud kherud reopened this May 15, 2024
@jrtp
Copy link
Author

jrtp commented May 16, 2024

Awesome, thx for the quick inplementation. I just tried it, after the update the model loading interrupted with the previous llama.cpp I had compiled with GPU on windows, then after recompiling it seemed to work but for some reason inference suddenly stops in the middle. Any ideas what it could be?

        LlamaIterator iterator = model.generate(inferParams).iterator();
        while (iterator.hasNext()) {
            token = String.valueOf(iterator.next());
            Main.logLine(token);
            if (canceled) {
                iterator.cancel();
                break;
            }
        }

Same behaviour that inference just stops with the shipped llama.cpp (so no --Dde.kherud.llama.lib.path set) - just a lot slower ;)

@kherud
Copy link
Owner

kherud commented May 16, 2024

It's expected that the previous shared library doesn't work anymore, since I upgraded the binding to the latest available llama.cpp version in 3.1.0.

From the code you gave, it's hard to tell why it suddenly stops. If it's not done on purpose via canceled, maybe your inferParams are the reason.

If you can give more details, I can later try to reproduce the problem:

  • Which model
  • Which inferParams
  • Ideally also the prompt

@jrtp
Copy link
Author

jrtp commented May 16, 2024

Weired, just switched back and forth between dependencies and now it just flies without obvious change - thx again!

@jrtp
Copy link
Author

jrtp commented May 16, 2024

Ok, hopefully last question, now everything seems to work except exactly the same run.bat as previously throws this error now: Could not find or load main class .kherud.llama.lib.path=out
The cmd option used is this -Dde.kherud.llama.lib.path=out/
Without the option it works but without GPU, weiredly enough it also works with GPU started from intellij with that option - any ideas how that could be affected? I triplechecked, with 3.0.2 this option was just working.

@jrtp
Copy link
Author

jrtp commented May 16, 2024

FYI - user error - for some reason Intellij had 3.0.2 and 3.1.0 in the artifacts, didnt know this isnt updating automatically when changing maven deps

@kherud
Copy link
Owner

kherud commented May 18, 2024

Great, glad to hear everything works now!

@kherud kherud closed this as completed May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants