Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bfdykstra
Copy link
Contributor

Description

  • Adds a StructuredOutputChatOpenAI class to enable downstream applications to consume json

Simple example usage

import json
from kotaemon.llms import StructuredOutputChatOpenAI

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

answer = await structured_llm.ainvoke('Hello how are you?')

print(json.loads(answer.content))
# -> {'answer': "I'm just a computer program, but I'm here and ready to help you! How can I assist you today?"}

Example usage in a retrieval pipeline

from kotaemon.storages.docstores import LanceDBDocumentStore
from kotaemon.storages.vectorstores import ChromaVectorStore
from kotaemon.embeddings.openai import OpenAIEmbeddings
from ktem.ktem.index.file.pipelines import DocumentRetrievalPipeline
from kotaemon.indices.qa.format_context import PrepareEvidencePipeline
from kotaemon.indices.qa.citation_qa import AnswerWithContextPipeline
from kotaemon.llms.chats.openai import StructuredOutputChatOpenAI, ChatOpenAI

from ktem.ktem.reasoning.simple import FullQAPipeline

from kotaemon.indices.rankings import LLMTrulensScoring

app_dir = "<path to your app data>/kotaemon/ktem_app_data/"
user_data_dir = app_dir + "user_data/"
doc_store_dir = user_data_dir + "docstore/"
doc_store = LanceDBDocumentStore(path = doc_store_dir, collection_name="index_1")

# vector store stuff
vector_store_dir = user_data_dir + "vectorstore"

vector_store = ChromaVectorStore(path = vector_store_dir, collection_name="index_1")

llm = ChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 0,
    api_key = os.environ.get('OPENAI_API_KEY'),
)
llm_scorer = LLMTrulensScoring( llm = llm )

#embeddings
embedding = OpenAIEmbeddings(
    base_url='https://api.openai.com/v1',
    model = 'text-embedding-ada-002',
    api_key=os.environ.get('OPENAI_API_KEY'),
    context_length=8191)


# document retrieval pipeline
document_retrieval = DocumentRetrievalPipeline(
    embedding = embedding,
    retrieval_mode = 'vector', # can be vector or text
    vector_store = vector_store,
    doc_store = doc_store,
    top_k=5,
    rerankers=[], #can provide rerankers
    llm_scorer = llm_scorer
    # rerankers = [cohere_reranking]
)

# pipeline that formats retrieved content
evidence_pipeline = PrepareEvidencePipeline()

class StructuredAnswer(BaseModel):
    answer: str

structured_llm = StructuredOutputChatOpenAI(
    base_url='https://api.openai.com/v1',
    model = 'gpt-4o-mini',
    temperature= 1,
    api_key = os.environ.get('OPENAI_API_KEY'),
    response_schema=StructuredAnswer
)

# answer questions with provided evidence
answer_pipeline = AnswerWithContextPipeline(
    llm=structured_llm,
    qa_template= (
            "Context: \n{context}\n\n"
            "{question}\n"
        )
)

qa_pipeline = FullQAPipeline(
    retrievers=[document_retrieval],
    evidence_pipeline=evidence_pipeline,
    answering_pipeline=answer_pipeline
)

prompt = 'This is a prompt'

# fetch relevant document ids and implement invoke method
answer, scored_docs = qa_pipeline.invoke(prompt, document_ids=[])
        
parsed_answer = json.loads(answer.content)

Type of change

  • New features (non-breaking change).
  • Bug fix (non-breaking change).
  • Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • I have performed a self-review of my code.
  • I have added thorough tests if it is a core feature.
  • There is a reference to the original bug report and related work.
  • I have commented on my code, particularly in hard-to-understand areas.
  • The feature is well documented.

@bfdykstra bfdykstra changed the title [Feature] add structured output to openai feat: add structured output to openai Jan 9, 2025
@taprosoft
Copy link
Contributor

Sorry for the long @bfdykstra. Thanks for the great contribution and documentation.

@taprosoft taprosoft merged commit 9b05693 into Cinnamon:main Apr 15, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants