GPT-4V Gemini

This is a crude demo project made to mimic the supposed live video ingestion capabilities of Google's multimodal Gemini LLM, but made with the GPT-4 Vision API.

Demo: https://youtu.be/UxQb88gENeg

Setup

$ pip install -r requirements.txt
$ export OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Voice version (terminal)

To run the voice commanded terminal version, run the voice.py script.

$ python3 voice.py VIDEO_STREAM_URL

The assistant only reacts to voice commands.

Motion version (terminal)

To run the motion detecting version, run the motion.py script.

$ python3 motion.py VIDEO_STREAM_URL

The assistants reacts every time motion is detected in the video. A tripod is recommended.

Automatic version (terminal)

To run the automatic version that detects both voice commands and motion in the video, run the auto.py script.

$ python3 auto.py VIDEO_STREAM_URL

The assistants reacts every time motion is detected in the video or a voice command is given. A tripod is recommended.

Automatic version with UI

There is also a version with a "UI" made with CV2 (it sucks but kinda works). It both listens to voice commands and detects motion in the video and automatically sends both to the GPT4V API.

$ python3 auto_with_ui.py VIDEO_STREAM_URL

How to get a video stream URL

In my testing, I have used my phone camera as the video stream. For this, I used the IP Webcam app on Play Store. I set the camera to 10 fps at 640x480 resolution.

The VIDEO_STREAM_URL is passed directly into cv2.VideoCapture(), so I guess you should be able to pass in a video file too, or any kind of video stream.

Configuration

There is a config.py file where you can tweak some settings if you are having trouble with the motion detection or speech detection.

Known issues

GPT-4V API is often slow
Sometimes the assistant response is detected as a user message
The CV2 UI sucks and should be made with another way
The CV2 UI can only be closed by hittin Ctrl+C in the terminal

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
modules		modules
.gitignore		.gitignore
README.md		README.md
auto.py		auto.py
auto_with_ui.py		auto_with_ui.py
config.py		config.py
motion.py		motion.py
voice.py		voice.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-4V Gemini

Setup

Voice version (terminal)

Motion version (terminal)

Automatic version (terminal)

Automatic version with UI

How to get a video stream URL

Configuration

Known issues

About

Uh oh!

Releases

Packages

Languages

unconv/gpt4v-gemini

Folders and files

Latest commit

History

Repository files navigation

GPT-4V Gemini

Setup

Voice version (terminal)

Motion version (terminal)

Automatic version (terminal)

Automatic version with UI

How to get a video stream URL

Configuration

Known issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages