Manual de LMStudio
Manual de LMStudio
main
Welcome lmstudio.js content
Local LLM Server
Code Examples
📄️Quick Start Guide 📄️Add to an Existing Pr…
Minimal setup to get started with the L… Adding LM Studio SDK to an existing pr…
📄️Code Examples
Examples of how to use the LM Studio J…
Previous Next
« Text Embeddings Quick Start Guide »
Community More
Code!
Download LM Studio
If you haven't already, download and install the latest version of LM Studio from the LM
Studio website.
Windows:
~/.cache/lm-studio/bin/lms bootstrap
Open up a new terminal and run the following command to verify that the CLI is
installed:
lms
Typescript (Recommended)
JavaScript
Code!
Add the following code to your project's entrypoint file ( src/index.ts or
src/index.js ):
TypeScript JavaScript
// index.js
const { LMStudioClient } = require("@lmstudio/sdk");
// Predict!
const prediction = model.respond([
{ role: "system", content: "You are a helpful AI assistant." },
{ role: "user", content: "What is the meaning of life?" },
]);
for await (const text of prediction) {
process.stdout.write(text);
}
}
main();
🎉 That's it! You have successfully loaded a model and made a prediction from your own
JavaScript/TypeScript program.
Previous Next
« lmstudio.js Add to an Existing Project »
Community More
Page 4 of 4
LM Studio Documentation Blog GitHub Downloads Discord Skip to
main
Welcome lmstudio.js Add to an Existing Project content
Download LM Studio
Local LLM Server
Set up LM Studio CLI ( lms )
Text Embeddings Add the SDK to Your Project
Project Code!
Here you'll find the minimal steps to add LM Studio SDK to an existing
TypeScript/JavaScript project.
Download LM Studio
If you haven't already, download and install the latest version of LM Studio from the LM
Studio website.
Windows:
Page 1 of 4
Linux/macOS:
~/.cache/lm-studio/bin/lms bootstrap
Open up a new terminal and run the following command to verify that the CLI is
installed:
lms
npm yarn
If you are developing a web application, you should start the server with the --cors flag
instead:
Code!
Add the following code where you want to use the LLM:
TypeScript JavaScript
// index.js
const { LMStudioClient } = require("@lmstudio/sdk");
// Predict!
const prediction = model.respond([
{ role: "system", content: "You are a helpful AI assistant." },
{ role: "user", content: "What is the meaning of life?" },
]);
for await (const text of prediction) {
process.stdout.write(text);
}
}
main();
Page 3 of 4
Now, run your program and watch an LLM explaining the meaning of life.
🎉 That's it! You have successfully loaded a model and made a prediction from your own
JavaScript/TypeScript program.
Previous Next
« Quick Start Guide Code Examples »
Community More
Page 4 of 4
LM Studio Documentation Blog GitHub Downloads Discord Skip to
main
Welcome lmstudio.js Code Examples content
Loading an LLM and
Local LLM Server Predicting with It
Unloading a Model
Text Completion
// Load a model
const llama3 = await client.llm.load("lmstudio-community/Meta-Llama-3-8B
config: { gpuOffload: "max" },
Page 1 of 13
});
main();
ABOUT process.stdout.write
// client.llm.load(...);
}
main();
await client.llm.load("lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
config: { gpuOffload: "max" },
noHup: true,
});
await client.llm.load("lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF", {
config: { gpuOffload: "max" },
identifier: "my-model",
});
// llama3.complete(...);
Page 4 of 13
const llama3 = await client.llm.load("lmstudio-community/Meta-Llama-3-8B-In
config: { gpuOffload: "max" },
verbose: false, // Disables the default progress logging
onProgress: (progress) => {
console.log(`Progress: ${(progress * 100).toFixed(1)}%`);
},
});
Canceling a Load
You can cancel a load by using an AbortController.
try {
const llama3 = await client.llm.load("lmstudio-community/Meta-Llama-3-8B
signal: controller.signal,
});
// llama3.complete(...);
} catch (error) {
console.error(error);
}
Page 5 of 13
// Somewhere else in your code:
controller.abort();
ABOUT AbortController
Unloading a Model
You can unload a model by calling the unload method.
// ...Do stuff...
await client.llm.unload("my-model");
Note, by default, all models loaded by a client are unloaded when the client disconnects.
Therefore, unless you want to precisely control the lifetime of a model, you do not need
to unload them manually.
If you wish to keep a model loaded after disconnection, you can set the noHup
option to true when loading the model.
// myModel.complete(...);
// llama3.complete(...);
Text Completion
To perform text completion, use the complete method:
By default, the inference parameters in the preset is used for the prediction. You can
override them like this:
Conversation
Page 8 of 13
To perform a conversation, use the respond method:
Similarly, you can override the inference parameters for the conversation (Note the
available options are different from text completion):
LLMs are stateless. They do not remember or retain information from previous
inputs. Therefore, when predicting with an LLM, you should always provide the full
history/context.
Page 9 of 13
Getting Prediction Stats
If you wish to get the prediction statistics, you can await on the prediction object to get a
PredictionResult , through which you can access the stats via the stats property.
NO EXTRA WAITING
When you have already consumed the prediction stream, awaiting on the prediction
object will not cause any extra waiting, as the result is cached within the prediction
object.
On the other hand, if you only care about the final result, you don't need to iterate
through the stream. Instead, you can await on the prediction object directly to get
the final result.
// Or just:
Page 10 of 13
Producing JSON (Structured Output)
LM Studio supports structured prediction, which will force the model to produce content
that conforms to a specific structure. To enable structured prediction, you should set the
structured field. It is available for both complete and respond methods.
Sometimes, any JSON is not enough. You might want to enforce a specific JSON schema.
You can do this by providing a JSON schema to the structured field. Read more about
JSON schema at json-schema.org.
const schema = {
type: "object",
properties: {
setup: { type: "string" },
punchline: { type: "string" },
},
required: ["setup", "punchline"],
};
Page 11 of 13
const prediction = llama3.complete("Here is a joke in JSON:", {
maxPredictedTokens: 100,
structured: { type: "json", jsonSchema: schema },
});
Canceling a Prediction
A prediction may be canceled by calling the cancel method on the prediction object.
When a prediction is canceled, the prediction will stop normally but with stopReason set
to "userStopped" . You can detect cancellation like so:
Previous
« Add to an Existing Project
Community More
Page 13 of 13
LM Studio Documentation Blog GitHub Downloads Discord Skip to
main
Welcome Welcome content
Learn about
Local LLM Server
LM Studio Documentation
Get started by downloading
Text Embeddings the latest release
Learn about
1. LM Studio OpenAI-like Server - /v1/chat/completions , /v1/completions ,
/v1/embeddings with Llama 3, Phi-3 or any other local LLM with a server running
on localhost.
2. Text Embeddings - Generate text embeddings locally using LM Studio's
embeddings server (useful for RAG applications)
3. Use LLMs programatically from JS/TS/Node - Load and use LLMs programatically
in your own code
Supported Platforms
Windows (x86, x64, AVX2)
macOS (Apple Silicon - M1/M2/M3)
Page 1 of 2
Linux (x86, Ubuntu 22.04, AVX2)
Next
Local LLM Server »
Community More
Page 2 of 2
LM Studio Documentation Blog GitHub Downloads Discord Skip to
main
Welcome Local LLM Server content
Supported endpoints
Local LLM Server
LM Studio Server
Using the local server
Text Embeddings Check which models
are currently loaded
lmstudio.js
Make an inferencing
You can use LLMs you load within LM Studio via an API server running on localhost.
request (using
OpenAI's 'Chat
Requests and responses follow OpenAI's API format.
Completions' format)
Supported payload
Point any code that currently uses OpenAI to localhost:PORT to use a local LLM
parameters
instead.
Supported endpoints
GET /v1/models
POST /v1/chat/completions
POST /v1/embeddings
POST /v1/completions
curl http://localhost:1234/v1/models
{
"data": [
{
"id": "TheBloke/phi-2-GGUF/phi-2.Q4_K_S.gguf",
"object": "model",
"owned_by": "organization-owner",
"permission": [
{}
]
},
{
"id": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q4_k_m.gguf",
"object": "model",
"owned_by": "organization-owner",
"permission": [
{}
]
}
],
"object": "list"
}%
Page 2 of 4
Make an inferencing request (using OpenAI's 'Chat
Completions' format)
In this example the local server is running on port 1234 . You can change it in the
server control bar in the app.
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a helpful coding assistant."
{ "role": "user", "content": "How do I init and update a git submodu
],
"temperature": 0.7,
"max_tokens": -1,
"stream": true
}'
model
top_p
top_k
messages
temperature
max_tokens
stream
stop
presence_penalty
Page 3 of 4
frequency_penalty
logit_bias
repeat_penalty
seed
Previous Next
« Welcome Text Embeddings »
Community More
Page 4 of 4
LM Studio Documentation Blog GitHub Downloads Discord Skip to
main
Welcome Text Embeddings content
Getting Text Embeddings
Local LLM Server from LM Studio's Local
How-To
lmstudio.js
Example request:
Text embeddings are a way to represent text as a vector of numbers.
Example response:
Embeddings are frequently used in Retrieval Augmented Generation (RAG) applications. Which embedding models
are available?
Read on to learn how to generate Text Embeddings fully locally using LM Studio's Featured models:
The request and response format follow OpenAI's API format. Read about it here.
Example use-cases include RAG applications, code-search applications, and any other
application that requires text embeddings.
Page 1 of 4
How-To
LM Studio 0.2.19 or newer is required. Download the beta version from
lmstudio.ai/beta-releases.html
1. Head to the Local Server tab ( <-> on the left) and start the server.
2. Load a text embedding model by choosing it from Embedding Model Settings
dropdown.
3. Utilize the POST /v1/embeddings endpoint to get embeddings for your text.
Example request:
Assuming the server is listening on port 1234
Supported input types are string and string[] (array of strings)
curl http://localhost:1234/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Your text string goes here",
"model": "model-identifier-here"
}'
Example response:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
-0.005118194036185741,
-0.05910402536392212,
Page 2 of 4
... truncated ...
-0.02389773353934288
],
"index": 0
}
],
"model": "nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q5_K_
"usage": {
"prompt_tokens": 0,
"total_tokens": 0
}
}
Featured models:
nomic-embed-text-v1.5
bge-large-en-v1.5
https://platform.openai.com/docs/guides/embeddings
https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-
embeddings/
Page 3 of 4
Previous Next
« Local LLM Server lmstudio.js »
Community More
Page 4 of 4