guide : using the new WebUI of llama.cpp #16938
Replies: 17 comments 11 replies
-
|
Does anyone have a neat example to share for constrained output using the custom JSON option of the WebUI? Something that would be suitable for demonstration purposes. |
Beta Was this translation helpful? Give feedback.
-
|
I tried this one : Inside Developer / Custom JSON : Prompt "you feel good ?" Model answer (on SvelteUI) : |
Beta Was this translation helpful? Give feedback.
-
|
Not sure if this is a neat example, but something easy you can do with vision LLMs is extract data from images in a structured way. Add this to Developer/Custom JSON {
"json_schema": {
"$defs": {
"Address": {
"properties": {
"street": {
"title": "Street",
"type": "string"
},
"city": {
"title": "City",
"type": "string"
},
"state": {
"title": "State",
"type": "string"
},
"zip_code": {
"title": "Zip Code",
"type": "string"
}
},
"required": [
"street",
"city",
"state",
"zip_code"
],
"title": "Address",
"type": "object"
},
"BillTo": {
"properties": {
"company_name": {
"title": "Company Name",
"type": "string"
},
"address": {
"$ref": "#/$defs/Address"
},
"attention": {
"title": "Attention",
"type": "string"
}
},
"required": [
"company_name",
"address",
"attention"
],
"title": "BillTo",
"type": "object"
},
"Company": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"address": {
"$ref": "#/$defs/Address"
},
"phone": {
"title": "Phone",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
}
},
"required": [
"name",
"address",
"phone",
"email"
],
"title": "Company",
"type": "object"
},
"InvoiceLine": {
"properties": {
"description": {
"title": "Description",
"type": "string"
},
"quantity": {
"title": "Quantity",
"type": "integer"
},
"rate": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Rate"
},
"amount": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Amount"
}
},
"required": [
"description",
"quantity",
"rate",
"amount"
],
"title": "InvoiceLine",
"type": "object"
},
"PaymentMethods": {
"properties": {
"bank_account": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Bank Account"
},
"routing_number": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Routing Number"
},
"check_payable_to": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Check Payable To"
}
},
"title": "PaymentMethods",
"type": "object"
}
},
"properties": {
"invoice_number": {
"title": "Invoice Number",
"type": "string"
},
"invoice_date": {
"format": "date",
"title": "Invoice Date",
"type": "string"
},
"due_date": {
"format": "date",
"title": "Due Date",
"type": "string"
},
"company": {
"$ref": "#/$defs/Company"
},
"bill_to": {
"$ref": "#/$defs/BillTo"
},
"lines": {
"items": {
"$ref": "#/$defs/InvoiceLine"
},
"title": "Lines",
"type": "array"
},
"subtotal": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Subtotal"
},
"tax_rate": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Tax Rate"
},
"tax_amount": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Tax Amount"
},
"total": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
],
"title": "Total"
},
"payment_terms": {
"title": "Payment Terms",
"type": "string"
},
"payment_methods": {
"$ref": "#/$defs/PaymentMethods"
},
"notes": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Notes"
}
},
"required": [
"invoice_number",
"invoice_date",
"due_date",
"company",
"bill_to",
"lines",
"subtotal",
"tax_rate",
"tax_amount",
"total",
"payment_terms",
"payment_methods"
],
"title": "Invoice",
"type": "object"
}
}and with a model that supports vision (Qwen3-VL-8B should work), paste this image: And it will just output the invoice data without requiring any instructions: {
"invoice_number": "INV-2024-0847",
"invoice_date": "2025-07-29",
"due_date": "2025-08-28",
"company": {
"name": "Acme Corporation",
"address": {
"street": "123 Business Street",
"city": "New York",
"state": "NY",
"zip_code": "10001"
},
"phone": "(555) 123-4567",
"email": "[email protected]"
},
"bill_to": {
"company_name": "Tech Solutions Inc.",
"address": {
"street": "456 Innovation Drive",
"city": "San Francisco",
"state": "CA",
"zip_code": "94105"
},
"attention": "John Smith"
},
"lines": [
{
"description": "Web Development Services",
"quantity": 40,
"rate": 150.00,
"amount": 6000.00
},
{
"description": "UI/UX Design",
"quantity": 20,
"rate": 125.00,
"amount": 2500.00
},
{
"description": "Database Setup",
"quantity": 8,
"rate": 100.00,
"amount": 800.00
},
{
"description": "Monthly Hosting",
"quantity": 1,
"rate": 250.00,
"amount": 250.00
}
],
"subtotal": 9550.00,
"tax_rate": 8.5,
"tax_amount": 811.75,
"total": 10361.75,
"payment_terms": "Net 30 days. 1.5% late fee per month on overdue balances.",
"payment_methods": {
"bank_account": "Account #123456789, Routing #987654321",
"check_payable_to": "Acme Corporation"
},
"notes": "Thank you for your business!"
}One problem with this is that the output is not wrapped in json fenced markdown blocks so you get no syntax highlighting. This could be improved if the web UI had native support for passing a JSON schema and when enabled displayed the output in a specialized JSON viewer, such as this one |
Beta Was this translation helpful? Give feedback.
-
|
I love the look of this. Could you add a "Continue Assistant Response" kind of button? Helps to steer the AI toward a specific formatting you want at the beginning of a conversation if you could edit its response then have it continue output. |
Beta Was this translation helpful? Give feedback.
-
|
How to enable Parallel conversations? Do I need to use a specific param when launching the server? |
Beta Was this translation helpful? Give feedback.
-
|
Congratulations guys this looks absolutely amazing!! :D Can't wait to use it |
Beta Was this translation helpful? Give feedback.
-
|
πππ |
Beta Was this translation helpful? Give feedback.
-
|
Excellent work! It strikes the right balance between functionality, a simple user experience, and performance. Admittedly, this is outside the scope of the project, but I would appreciate the option of deploying this interface in standalone mode, separate from llama.cpp, with third-party OpenAI API support. |
Beta Was this translation helpful? Give feedback.
-
|
implement more Agents for the GUI, like mini-swe-agent and/or make a GUI for trae https://github.com/bytedance/trae-agent |
Beta Was this translation helpful? Give feedback.
-
|
Is there an option to add a search URL or something to search the web? |
Beta Was this translation helpful? Give feedback.
-
|
Kudos guys this rocks! |
Beta Was this translation helpful? Give feedback.
-
|
I created a step-by-step installation and testing video for this Llama.cpp WebUI: https://youtu.be/1H1gx2A9cww?si=bJwf8-QcVSCutelf Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
Error: "the request exceeds the available context size, try increasing it" So I can only use a chat as long as it's in context size? context window shifting would be really nice. So e.g. on 16k context one can write on and on and the ai always knows the newest context (all earliest messages that are apart from context_size - max_output (e.g. 2048) are deleted in kv-cache). I tried using Btw. I like that llama.cpp has now its ui for chats (switching from koboldcpp). I really like llama.cpp for its vram efficiency (using cuda on nvidia customer card). |
Beta Was this translation helpful? Give feedback.
-
|
where I can get whole list of commands? |
Beta Was this translation helpful? Give feedback.
-
|
Off the scale - thank you for all you do! |
Beta Was this translation helpful? Give feedback.
-
|
still no video input support I think, qwen3 vl supports video understanding |
Beta Was this translation helpful? Give feedback.
-
|
Does it have wen search as a tool? |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This guide highlights the key features of the new SvelteKit-based WebUI of
llama.cpp.The new WebUI in combination with the advanced backend capabilities of the llama-server delivers the ultimate local AI chat experience. A few characteristics that set this project ahead of the alternatives:
Getting started
Get llama.cpp: Install | Download | Build
Start the
llama-servertool:# sample server running gpt-oss-20b at http://127.0.0.1:8033 llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 0 --host 127.0.0.1 --port 8033Open and start using the WebUI in your browser:
Tip
For a simple, GUI-based setup of
llama.cppon Mac, try the new LlamaBarn applicationFeatures
The new WebUI is packed with many useful features to enhance your local AI experience. Following are a few examples.
Text document processing
Add multiple text files from disk or from the clipboard to the context of your conversation:
PDF document processing
Attach one or multiple PDFs to your conversation. By default, the contents of the PDFs will be converted to RAW text, excluding any visuals.
Optionally, the WebUI can process the PDFs as images when the AI model supports it.
Image inputs
When the selected AI model has vision input capabilities, the WebUI allows you to insert images into your conversation:
Images can be inserted in addition to a textual context:
Conversation branching
Branch from previous points of the conversation by editing or regenerating a message:
webui-edits-0-thumb-small.mp4
Parallel conversations
Run multiple chat conversations at the same time:
webui-parallel-0-thumb-small.mp4
Parallel image processing is also supported:
webui-parallel-1-thumb-small.mp4
Override default sampling parameters
Start the
llama-serverusing a set of default sampling parameters:# set the default Top-K to be 5 and the default Temperature to be 0.80 llama-server -hf ggml-org/gpt-oss-120b-GGUF --jinja -c 0 --port 8033 --alias gpt-oss-120b --top-k 5 --temp 0.80These parameters will now become the default values in the WebUI settings:
webui-parameters-0-thumb-small.mp4
More info: #16515
Render math expressions
The WebUI can render mathematical expressions:
Input via URL parameters
The WebUI supports passing input through the URL parameters:
webui-url-input-0-thumb-small.mp4
HTML/JS preview
The WebUI supports inline rendering of generated HTML/JS code:
webui-js-0-thumb-small.mp4
More info: #16757
Constrained generation
Specify a custom JSON schema to constrain the generated output to a specific format. As an example, here is generic invoice data extraction from multiple documents:
webui-constrained-0-thumb-small.mp4
Import/Export
Use the Import/Export options to manage your private conversations directly through the WebUI:
Efficient SSM context management
The context management and prefix caching of State Space Models (SSMs, e.g. Mamba) can be tricky.
llama-serversolves this problem efficiently for one or multiple users with minimum reprocessing.Here is an example of context branching using a hybrid LLM:
webui-ssm-0-thumb-small.mp4
Mobile compatibility
The new WebUI is mobile friendly:
Sample commands
A few
llama-servercommands used for the examples above:Acknowledgements
Beta Was this translation helpful? Give feedback.
All reactions