-
Notifications
You must be signed in to change notification settings - Fork 43
Interrupting inference? #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @jrtp thanks for the issue, this was indeed not possible. However, it can now be done like this: LlamaIterator iterator = model.generate(params).iterator();
while (iterator.hasNext()) {
LLamaOutput output = iterator.next();
System.out.println(output);
if (Math.random() < 0.5) {
iterator.cancel();
}
} Note, that there was a slight API change from Maven version |
Awesome, thx for the quick inplementation. I just tried it, after the update the model loading interrupted with the previous llama.cpp I had compiled with GPU on windows, then after recompiling it seemed to work but for some reason inference suddenly stops in the middle. Any ideas what it could be?
Same behaviour that inference just stops with the shipped llama.cpp (so no --Dde.kherud.llama.lib.path set) - just a lot slower ;) |
It's expected that the previous shared library doesn't work anymore, since I upgraded the binding to the latest available llama.cpp version in From the code you gave, it's hard to tell why it suddenly stops. If it's not done on purpose via If you can give more details, I can later try to reproduce the problem:
|
Weired, just switched back and forth between dependencies and now it just flies without obvious change - thx again! |
Ok, hopefully last question, now everything seems to work except exactly the same run.bat as previously throws this error now: Could not find or load main class .kherud.llama.lib.path=out |
FYI - user error - for some reason Intellij had 3.0.2 and 3.1.0 in the artifacts, didnt know this isnt updating automatically when changing maven deps |
Great, glad to hear everything works now! |
Nice work :)
I did not find a way to cleanly interrupt inference, could only supress output from the iterable loop. Is this somehow possible?
The text was updated successfully, but these errors were encountered: