Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Is there a way to get response to image input? #1350

@oguzhankayaa

Description

@oguzhankayaa

Hi there,

I am currently working on Bidi Streaming using websockets. I am trying to have a image streaming feature too.
When I send images and audio(or text) together I got response. But I couldn't find a way to get response to only image data. Is there a way to do that?

In main.py, I handle image data like this:

elif mime_type.startswith("image"):
            # Send image data (video frames)
            decoded_data = base64.b64decode(data)
            live_request_queue.send_realtime(Blob(data=decoded_data, mime_type=mime_type))
            print(f"[CLIENT TO AGENT]: {mime_type}: {len(decoded_data)} bytes")

on app.js I added:


const startCamButton = document.getElementById('startCamButton')
startCamButton.addEventListener('click', async () => {
  try {
    startCamButton.disabled = true
    startAudioButton.disabled = true
    startAudio()
    is_audio = true

    connectWebsocket()
    // Start video capture at fps FPS
    await startVideoCapture(videoFrameHandler, fps)
  } catch (error) {
    console.error('Failed to start camera:', error)
    startCamButton.disabled = false
    alert(
      'Failed to access camera. Please make sure you have granted camera permissions.'
    )
  }
})

// Video frame handler
function videoFrameHandler(frameData) {
  // Send the frame data as base64
  sendMessage({
    mime_type: 'image/jpeg',
    data: arrayBufferToBase64(frameData),
  })
  console.log('[CLIENT TO AGENT] sent video frame: bytes')
}

To app.js.

With these additions, I don't get response to only image data. For example when I say tell me the number of fingers you see continuously, I only got response when I talk. I'd like to send webcam frames and get responses like "You are showing three fingers" without having to speak or type anything.

Metadata

Metadata

Assignees

Labels

bot triaged[Bot] This issue is triaged by ADK botlive[Component] This issue is related to live, voice and video chatquestion[Component] This issue is asking a question or clarification

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions