Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sshkeda/pi-read

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pi-read

A pi extension that adds native PDF, video, and audio support to pi's built-in read tool, with provider-aware allowlisting so binary content is only sent to models that can actually consume it.

Install

# From git
pi install git:github.com/sshkeda/pi-read

# From local path
pi install /path/to/pi-read

For native OpenRouter audio/video/PDF transport on Pi's OpenAI-compatible path, also apply the companion patches via pi-patches, which reads this repo's patches/pi-patches.json through its sources.json.

Usage

pi -e /path/to/pi-read/src/index.ts

Then ask pi to read a file as usual:

read presentation.pdf
read demo.mp4
read podcast.mp3

Pi's built-in read tool supports text and images (jpg, png, gif, webp); this extension extends read to also handle:

Format MIME Types
Video video/mp4, video/webm, video/quicktime (.mov)
Audio audio/mpeg (.mp3), audio/wav, audio/ogg (.ogg/.oga/.opus/.spx), audio/mp4 (.m4a)
PDF application/pdf

Text and images are delegated to the built-in read tool unchanged, including image resizing and truncation.

Provider safety

Before each LLM call, a context event handler checks the current provider/model route against an explicit allowlist. Binary content only goes to providers that can ingest it:

Route Behaviour
google, google-vertex, google-gemini-cli ✅ Sent as inlineData
OpenRouter / google/gemini-3.1-pro-preview ✅ Sent natively via pi-patches (file for PDFs, video_url for video, input_audio for audio)
OpenRouter / google/gemini-3-flash-preview ✅ Sent natively via pi-patches
Anthropic / Claude, OpenAI / ChatGPT, other OpenRouter models ⚠️ Stripped to a text fallback like [PDF file omitted — current model does not support inline PDF content]

Stripping operates on a structuredClone of the message history, so the persisted session stays small (lightweight local file refs, not giant base64 blobs in the transcript). Switching to an allowed model later rehydrates the original file.

Testing

cd ../pi-mock && npm run build
cd ../pi-read && node --test --test-force-exit test/test-pi-read.mjs

Tests live in test/test-pi-read.mjs and use the sibling pi-mock harness to simulate provider requests. They verify:

  • Google: PDF/MP4/audio sent with correct MIME type
  • Anthropic / OpenAI: binary stripped to text fallback
  • OpenRouter allowlist (with pi-patches applied) passes PDF / MP4 / M4A through as file / video_url / input_audio
  • Session safety: large audio reads persist local refs instead of base64 blobs; later turns rehydrate correctly
  • Text files always delegated to built-in read

Requirements

  • pi coding agent
  • An allowed multimodal Gemini route for native PDF/video/audio
  • pi-patches applied if you want native OpenRouter audio/video/PDF transport on the OpenAI-compatible path

License

MIT

About

Pi extension that adds native PDF, video, and audio support to the read tool for multimodal models like Gemini

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors