-
Notifications
You must be signed in to change notification settings - Fork 667
Add pybindings for multimodal LLM runner #14285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14285
Note: Links to docs will display an error until the docs builds have been completed. ❌ 14 New Failures, 1 Unrelated FailureAs of commit 7f111bc with merge base d43cde5 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
from executorch.extension.llm.runner._llm_runner import GenerationConfig # noqa: F401 | ||
|
||
|
||
def load_image_from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these methods in utils.py be prefixed with _? Otherwise, it looks like an API we would support in the long term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to keep it, I'd suggest this location:
extension/vision/preprocessing.py that can be used in general CV tasks. We already have extension/audio for audio preprocessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
return image | ||
|
||
|
||
def create_generation_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this method (as well as estimate_tokens, format_stats) Looks like the current extension/llm/runner/utils.py
location is a good choice.
ValueError: If the image format is not supported | ||
FileNotFoundError: If the image file doesn't exist | ||
""" | ||
if isinstance(image, (str, Path)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you use the CV preprocessing utils function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let me fix. Recent updates made sure it works with Gemma3, exported using optimum-et.
d352449
to
0ad3c71
Compare
0ad3c71
to
02ac3bf
Compare
This pull request introduces Python bindings for the ExecuTorch MultimodalRunner, enabling Python users to run multimodal LLM inference (supporting text, image, and audio inputs) and generate text outputs. The changes include new build system integration, a detailed implementation plan and documentation, and a high-level Python API with robust input handling and error management.
Python Bindings Implementation:
__init__.py
for the MultimodalRunner, providing user-friendly methods for text and image input creation, text generation (with or without streaming callbacks), and resource management. The API includes comprehensive input validation, support for multiple image formats (file path, NumPy array, PIL), and fallback mechanisms if dependencies are missing.Build System Integration:
CMakeLists.txt
to add apybind11
-based Python extension module (_llm_runner
) whenEXECUTORCH_BUILD_PYBIND
is set, linking all necessary dependencies and setting up include paths.Documentation and Planning:
README.md
.Utility and Extensibility:
load_image_from_file
,preprocess_image
,create_generation_config
) for easier input preprocessing and configuration from Python.Testing and Examples (Planned):
test_runner_pybindings.py
.Code Snippet of How to Use:
Output from console:
cc @mergennachin @cccclai @helunwencser @jackzhxng