feat: support structured outputs in `OpenAIChatGenerator` #9754

Amnah199 · 2025-09-02T08:59:34Z

Related Issues

fixes Option to enable structured outputs with OpenAI Generators #8276

Proposed Changes:

Add support for structured outputs using response_format in OpenAIChatGenerator and AzureOpenAIChatGenerator.

How did you test it?

Add tests for response_format using pydantic model and json schema.

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2025-09-02T09:03:31Z

Pull Request Test Coverage Report for Build 17760482097

Details

0 of 0 changed or added relevant lines in 0 files are covered.
10 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.01%) to 92.059%

Files with Coverage Reduction	New Missed Lines	%
components/generators/chat/azure.py	4	93.55%
components/generators/chat/openai.py	6	96.7%

Totals
Change from base Build 17736941972:	0.01%
Covered Lines:	12996
Relevant Lines:	14117

💛 - Coveralls

haystack/components/generators/chat/openai.py

test/components/generators/chat/test_openai.py

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml

haystack/components/generators/chat/openai.py

anakin87

I left some comments.

I also suggest merging main to see the actual coverage. My impression is that several new code paths are not covered by unit tests. I would like to have them covered, since this component is crucial.

anakin87 · 2025-09-04T14:03:22Z

haystack/components/generators/chat/openai.py

+        if response_format:
+            if is_streaming:
+                raise ValueError(
+                    "OpenAI does not support `streaming_callback` with `response_format`, please choose one."


I would like to understand better. It seems that OpenAI supports streaming + structured outputs.
If we are making this choice for simplicity reasons, I would be more specific: "The OpenAIChatGenerator does not ..."

Hmm interesting. I think because of the beta code example in documentation I misunderstood that its an unstable feature (not available in completions API). But you are right its supported. I’ll update the function to enable this.

I did not notice that it was beta

It might also be reasonable to skip it if it requires rewriting significantly the component. (Unfortunately, we have many other components depending on this implementation)

If there are stable ways to do this, let's do it. Otherwise, let's create an issue to track this once that API is no longer in beta.

I looked into their stream function and here looks like the stable version also supports response_format.

I will test this locally and will update the code.

@anakin87 Looked into this and here are some points:

The response_format with streaming will be passed to chat.completions.create. This endpoint allows response_format to be either a json schema or { "type": "json_object" }. But it cannot be a pydantic.BaseModel.

From the documentation, it seems like the beta version supports pydantic models with stream endpoint. Which I dont want to introduce for the reasons you mentioned above.

So for now, I believe we can support the first point and mention the limitation in docstrings. Anyways, the error is handled by OpenAI itself if the user passes a pydantic model with streaming.

Agree... Let's do what you propose and mention in docstrings that Pydantic response_format won't work with streaming.

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml

test/components/generators/chat/test_openai.py

test/components/generators/chat/test_azure.py

test/components/generators/chat/test_openai.py

…t-ai/haystack into openai-structured-outputs

anakin87 · 2025-09-05T15:11:24Z

haystack/components/generators/chat/openai.py

+        if "stream" in api_args.keys():
+            chat_completion = self.client.chat.completions.create(**api_args)


I find this hard to understand. Something similar to this would work?

We could always return a dictionary with the same fields from _prepare_api_call.

if api_args.get("response_format"): # We cannot pass stream param to chat.completions.parse endpoint api_args.pop("stream", None) chat_completion = self.client.chat.completions.parse(**api_args) else: ...

(I might be wrong, in any case I'd appreciate it if we can make the code more intuitive to follow.)

Hmm we allow passing response_format with stream param to create endpoint, for streaming structured outputs so this wont work.

Ok I now understand, but it's hard to follow.

What I would recommend is to

include in api_args an item called endpoint/method containing "parse" or "create" (in _prepare_api_call)

(add comments where this value is set to explain why we are doing that.)

reuse the value in run

haystack/components/generators/chat/azure.py

haystack/components/generators/chat/openai.py

test/components/generators/chat/test_openai.py

sjrl · 2025-09-09T06:58:53Z

haystack/components/generators/chat/azure.py

@@ -5,7 +5,9 @@
 import os
 from typing import Any, Optional, Union

+from openai.lib._pydantic import to_strict_json_schema


Is there a different way to import this function that doesn't go through a private file? I'm a little worried the import path is subject to break/change

Hmm, for now seems like this is the only way to import this.

hmm I think we can do this a different way. We should be able to directly use the one from pydantic which looks like this parameters_schema = model.model_json_schema(). This is from the _create_tool_parameters_schema function in ComponentTool

OpenAI expect a stricter JSON Schema than Pydantic’s default. For example, objects must set additionalProperties and Optional keys are handled differently. As a result, model_json_schema() often isn’t accepted as-is.
Its also discussed here where another solution is offered but I prefer using the openai method over some unpopular library.

Nevertheless, I spotted a bug in to_dict where schema wasn't stored properly. Fixing this.

ahh okay thanks for the info

anakin87

I left some comments/suggestions, taking also #9776 into account

haystack/components/generators/chat/azure.py

haystack/components/generators/chat/openai.py

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml

test/components/generators/chat/test_azure.py

test/components/generators/chat/test_openai.py

anakin87

Looks good!

Once you've incorporated my final suggestions, feel free to merge.

haystack/components/generators/chat/openai.py

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml

Add parse for response format

bf363bb

github-actions bot added the type:documentation Improvements on the docs label Sep 2, 2025

Amnah199 added 2 commits September 2, 2025 11:10

Update response_format

df844b4

Add tests

cb5f2dc

github-actions bot added the topic:tests label Sep 2, 2025

Amnah199 added 2 commits September 2, 2025 14:53

Add release notes

f54826d

Update checks

4bc8b65

Amnah199 marked this pull request as ready for review September 2, 2025 13:45

Amnah199 requested a review from a team as a code owner September 2, 2025 13:45

Amnah199 requested review from davidsbatista and removed request for a team September 2, 2025 13:45

anakin87 reviewed Sep 2, 2025

View reviewed changes

haystack/components/generators/chat/openai.py Outdated Show resolved Hide resolved

davidsbatista reviewed Sep 2, 2025

View reviewed changes

test/components/generators/chat/test_openai.py Outdated Show resolved Hide resolved

davidsbatista reviewed Sep 2, 2025

View reviewed changes

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml Outdated Show resolved Hide resolved

davidsbatista reviewed Sep 2, 2025

View reviewed changes

releasenotes/notes/add-openai-structured-outputs-906be78a984a1079.yaml Outdated Show resolved Hide resolved

davidsbatista reviewed Sep 2, 2025

View reviewed changes

haystack/components/generators/chat/openai.py Outdated Show resolved Hide resolved

anakin87 reviewed Sep 2, 2025

View reviewed changes

haystack/components/generators/chat/openai.py Outdated Show resolved Hide resolved

Amnah199 added 4 commits September 3, 2025 16:24

remove instance var

1b5ed82

Add tests for azure

02d3d59

Add schema test

060c4f0

Add comments

c2d41fe

Amnah199 requested review from anakin87 and davidsbatista September 4, 2025 09:47

Amnah199 mentioned this pull request Sep 4, 2025

Investigate structured outputs support in OpenAI-based integrations #9761

Open

6 tasks

anakin87 requested changes Sep 4, 2025

View reviewed changes

Amnah199 added 4 commits September 4, 2025 16:26

Merge branch 'main' into openai-structured-outputs

26533a2

Add streaming support

4c0069c

Merge branch 'openai-structured-outputs' of https://github.com/deepse…

0600176

…t-ai/haystack into openai-structured-outputs

PR comments

b123b9a

anakin87 reviewed Sep 5, 2025

View reviewed changes

Amnah199 added 4 commits September 8, 2025 09:27

PR comments

cde0714

Add tests

2b4a0ff

Fix tests

0dbe1fd

Add unit tests

52401d6