Thanks to visit codestin.com
Credit goes to docs.ollama.com

Skip to main content
Ollama provides compatibility with parts of the OpenAI API to help connect existing applications to Ollama.

Usage

Simple v1/chat/completions example

basic.py
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='gpt-oss:20b',
)
print(chat_completion.choices[0].message.content)

Simple v1/responses example

responses.py
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

responses_result = client.responses.create(
  model='qwen3:8b',
  input='Write a short poem about the color blue',
)
print(responses_result.output_text)

v1/chat/completions with vision example

vision.py
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

response = client.chat.completions.create(
    model='qwen3-vl:8b',
    messages=[
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': "What's in this image?"},
                {
                    'type': 'image_url',
                    'image_url': '',
                },
            ],
        }
    ],
    max_tokens=300,
)
print(response.choices[0].message.content)

Endpoints

/v1/chat/completions

Supported features

  • Chat completions
  • Streaming
  • JSON mode
  • Reproducible outputs
  • Vision
  • Tools
  • Logprobs

Supported request fields

  • model
  • messages
    • Text content
    • Image content
      • Base64 encoded image
      • Image URL
    • Array of content parts
  • frequency_penalty
  • presence_penalty
  • response_format
  • seed
  • stop
  • stream
  • stream_options
    • include_usage
  • temperature
  • top_p
  • max_tokens
  • tools
  • tool_choice
  • logit_bias
  • user
  • n

/v1/completions

Supported features

  • Completions
  • Streaming
  • JSON mode
  • Reproducible outputs
  • Logprobs

Supported request fields

  • model
  • prompt
  • frequency_penalty
  • presence_penalty
  • seed
  • stop
  • stream
  • stream_options
    • include_usage
  • temperature
  • top_p
  • max_tokens
  • suffix
  • best_of
  • echo
  • logit_bias
  • user
  • n

Notes

  • prompt currently only accepts a string

/v1/models

Notes

  • created corresponds to when the model was last modified
  • owned_by corresponds to the ollama username, defaulting to "library"

/v1/models/{model}

Notes

  • created corresponds to when the model was last modified
  • owned_by corresponds to the ollama username, defaulting to "library"

/v1/embeddings

Supported request fields

  • model
  • input
    • string
    • array of strings
    • array of tokens
    • array of token arrays
  • encoding format
  • dimensions
  • user

/v1/images/generations (experimental)

Note: This endpoint is experimental and may change or be removed in future versions.
Generate images using image generation models.
images.py
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

response = client.images.generate(
    model='x/z-image-turbo',
    prompt='A cute robot learning to paint',
    size='1024x1024',
    response_format='b64_json',
)
print(response.data[0].b64_json[:50] + '...')

Supported request fields

  • model
  • prompt
  • size (e.g. “1024x1024”)
  • response_format (only b64_json supported)
  • n
  • quality
  • style
  • user

/v1/responses

Note: Added in Ollama v0.13.3
Ollama supports the OpenAI Responses API. Only the non-stateful flavor is supported (i.e., there is no previous_response_id or conversation support).

Supported features

  • Streaming
  • Tools (function calling)
  • Reasoning summaries (for thinking models)
  • Stateful requests

Supported request fields

  • model
  • input
  • instructions
  • tools
  • stream
  • temperature
  • top_p
  • max_output_tokens
  • previous_response_id (stateful v1/responses not supported)
  • conversation (stateful v1/responses not supported)
  • truncation

Models

Before using a model, pull it locally ollama pull:
ollama pull llama3.2

Default model names

For tooling that relies on default OpenAI model names such as gpt-3.5-turbo, use ollama cp to copy an existing model name to a temporary name:
ollama cp llama3.2 gpt-3.5-turbo
Afterwards, this new model name can be specified the model field:
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gpt-3.5-turbo",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Setting the context size

The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a Modelfile which looks like:
FROM <some model>
PARAMETER num_ctx <context size>
Use the ollama create mymodel command to create a new model with the updated context size. Call the API with the updated model name:
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mymodel",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'