Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Amnah199
Copy link
Contributor

@Amnah199 Amnah199 commented Sep 2, 2025

Related Issues

Proposed Changes:

  • Add support for structured outputs using response_format in OpenAIChatGenerator and AzureOpenAIChatGenerator.

How did you test it?

  • Add tests for response_format using pydantic model and json schema.

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@github-actions github-actions bot added the type:documentation Improvements on the docs label Sep 2, 2025
@coveralls
Copy link
Collaborator

coveralls commented Sep 2, 2025

Pull Request Test Coverage Report for Build 17578922709

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 39 unchanged lines in 6 files lost coverage.
  • Overall coverage decreased (-0.003%) to 92.063%

Files with Coverage Reduction New Missed Lines %
core/pipeline/utils.py 1 98.53%
tools/component_tool.py 2 93.83%
components/generators/chat/azure.py 4 93.44%
components/generators/chat/openai.py 7 96.2%
core/super_component/super_component.py 8 95.92%
core/pipeline/breakpoint.py 17 86.67%
Totals Coverage Status
Change from base Build 17465550683: -0.003%
Covered Lines: 12992
Relevant Lines: 14112

💛 - Coveralls

@Amnah199 Amnah199 marked this pull request as ready for review September 2, 2025 13:45
@Amnah199 Amnah199 requested a review from a team as a code owner September 2, 2025 13:45
@Amnah199 Amnah199 requested review from davidsbatista and removed request for a team September 2, 2025 13:45
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

I also suggest merging main to see the actual coverage. My impression is that several new code paths are not covered by unit tests. I would like to have them covered, since this component is crucial.

if response_format:
if is_streaming:
raise ValueError(
"OpenAI does not support `streaming_callback` with `response_format`, please choose one."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand better. It seems that OpenAI supports streaming + structured outputs.
If we are making this choice for simplicity reasons, I would be more specific: "The OpenAIChatGenerator does not ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. I think because of the beta code example in documentation I misunderstood that its an unstable feature (not available in completions API). But you are right its supported. I’ll update the function to enable this.

Copy link
Member

@anakin87 anakin87 Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I did not notice that it was beta
  • It might also be reasonable to skip it if it requires rewriting significantly the component. (Unfortunately, we have many other components depending on this implementation)
  • If there are stable ways to do this, let's do it. Otherwise, let's create an issue to track this once that API is no longer in beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I looked into their stream function and here looks like the stable version also supports response_format.
  • I will test this locally and will update the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anakin87 Looked into this and here are some points:

  • The response_format with streaming will be passed to chat.completions.create. This endpoint allows response_format to be either a json schema or { "type": "json_object" }. But it cannot be a pydantic.BaseModel.
  • From the documentation, it seems like the beta version supports pydantic models with stream endpoint. Which I dont want to introduce for the reasons you mentioned above.

So for now, I believe we can support the first point and mention the limitation in docstrings. Anyways, the error is handled by OpenAI itself if the user passes a pydantic model with streaming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree... Let's do what you propose and mention in docstrings that Pydantic response_format won't work with streaming.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the progress (unit tests still missing)


(Since I'll be off, once the PR is ready, feel free to dismiss my review as stale and let David approve)

Comment on lines 304 to 305
if "stream" in api_args.keys():
chat_completion = self.client.chat.completions.create(**api_args)
Copy link
Member

@anakin87 anakin87 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this hard to understand. Something similar to this would work?

We could always return a dictionary with the same fields from _prepare_api_call.

        if api_args.get("response_format"):
            # We cannot pass stream param to chat.completions.parse endpoint
            api_args.pop("stream", None)
            chat_completion = self.client.chat.completions.parse(**api_args)
      else:
            ...

(I might be wrong, in any case I'd appreciate it if we can make the code more intuitive to follow.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we allow passing response_format with stream param to create endpoint, for streaming structured outputs so this wont work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I now understand, but it's hard to follow.

What I would recommend is to

  1. include in api_args an item called endpoint/method containing "parse" or "create" (in _prepare_api_call)
  2. (add comments where this value is set to explain why we are doing that.)
  3. reuse the value in run

if openai_endpoint == "create":
chat_completion = await self.async_client.chat.completions.create(**api_args)
elif openai_endpoint == "parse":
chat_completion = await self.async_client.chat.completions.parse(**api_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an extra else here that raises an exception in case an unexpected value is popped? It would avoid all the typing issues below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the condition to use parse if its passed as the endpoint, else always use create which was the case before.

@@ -5,7 +5,9 @@
import os
from typing import Any, Optional, Union

from openai.lib._pydantic import to_strict_json_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different way to import this function that doesn't go through a private file? I'm a little worried the import path is subject to break/change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, for now seems like this is the only way to import this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think we can do this a different way. We should be able to directly use the one from pydantic which looks like this parameters_schema = model.model_json_schema(). This is from the _create_tool_parameters_schema function in ComponentTool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI expect a stricter JSON Schema than Pydantic’s default. For example, objects must set additionalProperties and Optional keys are handled differently. As a result, model_json_schema() often isn’t accepted as-is.
Its also discussed here where another solution is offered but I prefer using the openai method over some unpopular library.

Copy link
Contributor Author

@Amnah199 Amnah199 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, I spotted a bug in to_dict where schema wasn't stored properly. Fixing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh okay thanks for the info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to enable structured outputs with OpenAI Generators
5 participants