multimodal audio in vs. transcribing an audio attachment?

Currently, it appears that audio is handled as a binary attachment, and is then transcribed. ref https://github.com/flutter/ai/blob/b1fbc7c8dfc6b1b0dc7c2485b4ea2f3d99befa32/lib/src/views/llm_chat_view/llm_chat_view.dart#L230

For multimodal models such as Gemini, audio as an input is natively supported.

The expectation is that instead of an audio attachment that is transcribed, the audio should be used as the input to the model directly rather than the transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multimodal audio in vs. transcribing an audio attachment? #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multimodal audio in vs. transcribing an audio attachment? #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions