feat: add structured output to openai #603

bfdykstra · 2025-01-06T18:47:54Z

Description

Adds a StructuredOutputChatOpenAI class to enable downstream applications to consume json

Simple example usage

import json
from kotaemon.llms import StructuredOutputChatOpenAI

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

answer = await structured_llm.ainvoke('Hello how are you?')

print(json.loads(answer.content))
# -> {'answer': "I'm just a computer program, but I'm here and ready to help you! How can I assist you today?"}

Example usage in a retrieval pipeline

from kotaemon.storages.docstores import LanceDBDocumentStore
from kotaemon.storages.vectorstores import ChromaVectorStore
from kotaemon.embeddings.openai import OpenAIEmbeddings
from ktem.ktem.index.file.pipelines import DocumentRetrievalPipeline
from kotaemon.indices.qa.format_context import PrepareEvidencePipeline
from kotaemon.indices.qa.citation_qa import AnswerWithContextPipeline
from kotaemon.llms.chats.openai import StructuredOutputChatOpenAI, ChatOpenAI

from ktem.ktem.reasoning.simple import FullQAPipeline

from kotaemon.indices.rankings import LLMTrulensScoring

app_dir = "<path to your app data>/kotaemon/ktem_app_data/"
user_data_dir = app_dir + "user_data/"
doc_store_dir = user_data_dir + "docstore/"
doc_store = LanceDBDocumentStore(path = doc_store_dir, collection_name="index_1")

# vector store stuff
vector_store_dir = user_data_dir + "vectorstore"

vector_store = ChromaVectorStore(path = vector_store_dir, collection_name="index_1")

llm = ChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 0,
    api_key = os.environ.get('OPENAI_API_KEY'),
)
llm_scorer = LLMTrulensScoring( llm = llm )

#embeddings
embedding = OpenAIEmbeddings(
    base_url='https://api.openai.com/v1',
    model = 'text-embedding-ada-002',
    api_key=os.environ.get('OPENAI_API_KEY'),
    context_length=8191)


# document retrieval pipeline
document_retrieval = DocumentRetrievalPipeline(
    embedding = embedding,
    retrieval_mode = 'vector', # can be vector or text
    vector_store = vector_store,
    doc_store = doc_store,
    top_k=5,
    rerankers=[], #can provide rerankers
    llm_scorer = llm_scorer
    # rerankers = [cohere_reranking]
)

# pipeline that formats retrieved content
evidence_pipeline = PrepareEvidencePipeline()

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

# answer questions with provided evidence
answer_pipeline = AnswerWithContextPipeline(
    llm=structured_llm,
    qa_template= (
            "Context: \n{context}\n\n"
            "{question}\n"
        )
)

qa_pipeline = FullQAPipeline(
    retrievers=[document_retrieval],
    evidence_pipeline=evidence_pipeline,
    answering_pipeline=answer_pipeline
)

prompt = 'This is a prompt'

# fetch relevant document ids and implement invoke method
answer, scored_docs = qa_pipeline.invoke(prompt, document_ids=[])
        
parsed_answer = json.loads(answer.content)

Type of change

New features (non-breaking change).
Bug fix (non-breaking change).
Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

I have performed a self-review of my code.
I have added thorough tests if it is a core feature.
There is a reference to the original bug report and related work.
I have commented on my code, particularly in hard-to-understand areas.
The feature is well documented.

taprosoft · 2025-04-15T07:54:16Z

Sorry for the long @bfdykstra. Thanks for the great contribution and documentation.

bfdykstra added 2 commits January 6, 2025 10:26

add structured output to openai

3ebc021

remove notebook, modify prepare output method

a016a28

bfdykstra changed the title ~~[Feature] add structured output to openai~~ feat: add structured output to openai Jan 9, 2025

fix: comfort precommit

6adaac7

taprosoft merged commit 9b05693 into Cinnamon:main Apr 15, 2025
5 checks passed

bfdykstra mentioned this pull request Jun 27, 2025

[REQUEST] Without WebUI, calling the kotaemon API interface locally for RAG #741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add structured output to openai #603

feat: add structured output to openai #603

Uh oh!

bfdykstra commented Jan 6, 2025

Uh oh!

taprosoft commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add structured output to openai #603

feat: add structured output to openai #603

Uh oh!

Conversation

bfdykstra commented Jan 6, 2025

Description

Type of change

Checklist

Uh oh!

taprosoft commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants