Heiji is a Raycast extension that turns any macOS text field into an instant voice-driven AI assistant. The MVP ships three commands:
- Dictate – records speech, transcribes via OpenAI STT, and inserts text (or routes wake phrases to an LLM).
- Settings – manages OpenAI credentials, language, silence detection, and dictionary override.
- Retranscribe Last – retries the most recent recording without re-capturing audio.
- Raycast for macOS with the Developer CLI enabled.
- Node.js 20.15.1 or newer (verified up to Node 24) and npm 10+.
- SoX for audio capture (
brew install sox). - An OpenAI API key with access to
gpt-5-transcribeandgpt-4o-mini.
-
Clone the project
git clone https://github.com/project-heiji/heiji.git cd heiji -
Install dependencies
npm install
-
Verify the build (optional but recommended)
npm run lint npm run test npm run build -
Start the extension in Raycast
npx ray dev
The Raycast CLI will start a development session and open Raycast with the three Heiji commands available while the process is running. Keep this terminal window open for hot reloading. If the CLI is not found, ensure Raycast’s Developer CLI setting is enabled and the
raybinary is on yourPATH(which rayshould return a path). -
Assign a hotkey (optional)
- In Raycast Preferences → Extensions → Heiji, select the Dictate command.
- Click Add Shortcut and bind it to a convenient key (e.g.
⌥⇧D).
- Open Raycast and run
Heiji: Settings. - Enter your OpenAI API key. Raycast will store it securely.
- Adjust preferences if needed:
Dictation Language– ISO 639-1 code (e.g.en,fr).Silence Stop Seconds– auto-stop delay after you stop speaking.Dictionary Path Override– optional JSON file that biases transcription.
- Close the form; preferences save automatically and you’ll see a confirmation toast.
You’re ready to dictate. Launch the Dictate command and start speaking—Heiji handles transcription, wake commands, and clipboard restore automatically.
sox --versionshould report 14.4 or later; if missing, install viabrew install sox.ray --versionshould output the bundled Raycast CLI version.node -vshould be ≥ 20.15.1 (tested through Node 24.x).- If SoX is installed outside your default
PATH, exportHEIJI_SOX_PATH=/full/path/to/sox(e.g.,/opt/homebrew/bin/sox) before launching Raycast or the CLI. - If ambient noise requires tweaking silence detection, set
HEIJI_SILENCE_THRESHOLD(e.g.,0.3%) to adjust when recording stops. - The transcription prompt explicitly mentions wake phrases (“Heiji”, “Hey G”, “Heyji”) so the STT model keeps them verbatim; avoid removing this guidance when modifying the prompt builder.
-
Basic dictation
- Trigger the
Heiji: Dictatecommand. - Speak your text; Heiji stops recording after the configured silence window.
- The transcript is pasted into the active application and your original clipboard is restored.
- Trigger the
-
Wake phrase actions
- Start with “Heiji …” (e.g. “Heiji, summarize this into three bullet points”).
- The remainder is sent to
gpt-4o-mini; the LLM response replaces the current selection.
-
Retrying a recording
- Run
Heiji: Retranscribe Lastto re-transcribe the most recent audio without speaking again.
- Run
-
Troubleshooting
- Toasts surface recoverable issues such as missing transcription or wake response.
- If you hit an error toast, re-run the Dictate command and try again.
| Command | Purpose |
|---|---|
Heiji: Dictate |
Records audio and inserts the transcript or wake response into the front app |
Heiji: Settings |
Configures API key, language, silence timeout, and dictionary path |
Heiji: Retranscribe |
Reuses the cached audio file and re-runs transcription |
If you are contributing to the extension, the repository defines convenience scripts:
npm run lint– runs ESLint (skips until dependencies are installed).npm run test– executes the ts-node test suites (dictionary, audio, OpenAI client, insertion, wake, dictate, retranscribe, logger).npm run build– type-checks the extension withtsc --noEmit.
Note: Commands are safe to run repeatedly; when dependencies are missing the script reports the skip explicitly.
- Trigger via Raycast (e.g., keyboard shortcut).
- HUD flow:
Listening…→Processing…→InsertedorHeiji Action completefor wake requests. - Wake phrases (
Heiji,Hey G,Heyji) are stripped before being sent togpt-4o-mini. - Clipboard content is restored automatically after insertion; Cmd+Z reverts the paste.
Exposes the four preferences required by the MVP:
| Preference | Notes |
|---|---|
OpenAI API Key |
Stored securely by Raycast. |
Dictation Language |
ISO 639-1 code controlling STT language (default fr). |
Silence Stop Seconds |
Controls voice-activity timeout (default 2.0). |
Dictionary Path |
Leave blank to use the auto-generated path in ~/Library/Application Support. |
- Surface
Retranscribing…HUD while reusing the cached audio file from the last dictation. - On success inserts the new transcript and shows
Retranscribed. - On failure displays a toast and leaves the clipboard untouched.
A sample mapping is provided in dictionary.sample.json:
{
"Billie": "Billie",
"Raycast": "Raycast",
"GPT-4": "GPT-4"
}Drop a customised JSON file at the path set in Settings to bias the STT prompt.
Each dictation run records transient metrics (timestamps t0–t4, total latency, transcript length, mode). Metrics live only in memory and can be inspected via a short Node snippet while the process is running:
npx ts-node -e "import { getMetrics, metricsAsMarkdown } from "./src/logger"; console.log(metricsAsMarkdown(getMetrics()));"(Feel free to wrap this in a small Raycast debug command; metrics are never written to disk.)
- Run
npm run lint && npm run test && npm run buildafter installing dependencies. - Dictate command inserts text and routes wake phrases (<1.5 s for short utterances).
- Retranscribe command successfully retries the cached recording.
- Clipboard is restored and Cmd+Z undoes each paste.
- Metrics accessible via
metricsAsMarkdownduring the session.
Follow Raycast’s private publishing flow:
raycast loginandraycast publishfrom the project root.- Select Private visibility and invite testers by Raycast handle.
- Ensure the README, icon, and command descriptions meet store guidelines before submission.
When you are ready to distribute a new build (private beta or public release), follow this checklist:
-
Verify the release
npm run lint npm run test npm run build npx ray lint npx ray buildray buildoutputs an optimized bundle in./dist. -
Prepare release notes
- Summarise user-facing changes.
- Capture any migration steps (e.g. new permissions, required settings).
-
Publish through Raycast
npx ray login # once per machine npx ray publish # choose Private or Public visibility
Follow the interactive prompts; attach the
distbundle thatray buildproduced if requested. -
Tag the release (optional)
git tag -a vX.Y.Z -m "Heiji vX.Y.Z" git push origin vX.Y.Z -
Notify testers/users
- Share the Raycast store link.
- Mention the hotkey and new capabilities to highlight in onboarding.