-
Notifications
You must be signed in to change notification settings - Fork 8.6k
fix(llm/google): fix image handling in first user message with system text #3513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Refactor GoogleMessageSerializer to improve image handling and clarify comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="browser_use/llm/google/serializer.py">
<violation number="1" location="browser_use/llm/google/serializer.py:89">
Aggregating all user text into one part here reorders mixed content: text that originally followed an image now appears before it when include_system_in_user=True. Please keep iterating the parts in order so images remain in their original position relative to text.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
browser_use/llm/google/serializer.py
Outdated
| # Add system text as the first part | ||
| message_parts.append(Part.from_text(text=system_text)) | ||
| system_parts = [] # Clear after using | ||
| message_parts.append(Part.from_text(text=f'{system_text}\n\n{getattr(message, "text", "")}')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregating all user text into one part here reorders mixed content: text that originally followed an image now appears before it when include_system_in_user=True. Please keep iterating the parts in order so images remain in their original position relative to text.
Prompt for AI agents
Address the following comment on browser_use/llm/google/serializer.py at line 89:
<comment>Aggregating all user text into one part here reorders mixed content: text that originally followed an image now appears before it when include_system_in_user=True. Please keep iterating the parts in order so images remain in their original position relative to text.</comment>
<file context>
@@ -82,39 +86,31 @@ def serialize_messages(
- # Add system text as the first part
- message_parts.append(Part.from_text(text=system_text))
- system_parts = [] # Clear after using
+ message_parts.append(Part.from_text(text=f'{system_text}\n\n{getattr(message, "text", "")}'))
+ for part in message.content or []:
+ if part.type == 'image_url':
</file context>
✅ Addressed in a39cd7b
Refactor message serialization to handle text parts more effectively by prepending system text only before the first text part and managing other part types accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed changes from recent commits (found 1 issue).
1 issue found across 1 file
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="browser_use/llm/google/serializer.py">
<violation number="1" location="browser_use/llm/google/serializer.py:91">
If the first user message contains no text parts (e.g., images only), this condition prevents the system instructions from being included at all. Please ensure the system text is still inserted even when there is no text part in the message.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
browser_use/llm/google/serializer.py
Outdated
| elif message.content is not None: | ||
| first_text_inserted = False | ||
| for part in message.content or []: | ||
| if part.type == 'text' and not first_text_inserted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first user message contains no text parts (e.g., images only), this condition prevents the system instructions from being included at all. Please ensure the system text is still inserted even when there is no text part in the message.
Prompt for AI agents
Address the following comment on browser_use/llm/google/serializer.py at line 91:
<comment>If the first user message contains no text parts (e.g., images only), this condition prevents the system instructions from being included at all. Please ensure the system text is still inserted even when there is no text part in the message.</comment>
<file context>
@@ -85,15 +85,22 @@ def serialize_messages(
+ first_text_inserted = False
for part in message.content or []:
- if part.type == 'image_url':
+ if part.type == 'text' and not first_text_inserted:
+ # Prepend system text only before the first text part
+ message_parts.append(Part.from_text(text=f'{system_text}\n\n{part.text}'))
</file context>
✅ Addressed in b61454d
fix bug If the first user message contains no text parts (e.g., images only), this condition prevents the system instructions from being included at all. fix it by ensuring the system text is still inserted even when there is no text part in the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 1 file
Ensure proper formatting by adding a newline at the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 1 file
Ensure proper file termination by adding a newline.
When include_system_in_user=True, non-text content (images) in the first
user message was not being serialized. Now properly handles mixed content
by extracting image serialization logic into a helper method.
Summary by cubic
Fixes image handling in the Google Gemini serializer so images in the first user message are preserved when system text is included. Also ensures system text is inserted before the first part, including image-only messages.
Bug Fixes
Refactors
Written for commit 8982da7. Summary will update automatically on new commits.