Thanks to visit codestin.com
Credit goes to github.com

Skip to content

pezzos/heiji

Repository files navigation

Project Heiji

Heiji is a Raycast extension that turns any macOS text field into an instant voice-driven AI assistant. The MVP ships three commands:

  • Dictate – records speech, transcribes via OpenAI STT, and inserts text (or routes wake phrases to an LLM).
  • Settings – manages OpenAI credentials, language, silence detection, and dictionary override.
  • Retranscribe Last – retries the most recent recording without re-capturing audio.

Install & Run Locally (for testers)

Requirements

  • Raycast for macOS with the Developer CLI enabled.
  • Node.js 20.15.1 or newer (verified up to Node 24) and npm 10+.
  • SoX for audio capture (brew install sox).
  • An OpenAI API key with access to gpt-5-transcribe and gpt-4o-mini.

Step-by-step

  1. Clone the project

    git clone https://github.com/project-heiji/heiji.git
    cd heiji
  2. Install dependencies

    npm install
  3. Verify the build (optional but recommended)

    npm run lint
    npm run test
    npm run build
  4. Start the extension in Raycast

    npx ray dev

    The Raycast CLI will start a development session and open Raycast with the three Heiji commands available while the process is running. Keep this terminal window open for hot reloading. If the CLI is not found, ensure Raycast’s Developer CLI setting is enabled and the ray binary is on your PATH (which ray should return a path).

  5. Assign a hotkey (optional)

    • In Raycast Preferences → Extensions → Heiji, select the Dictate command.
    • Click Add Shortcut and bind it to a convenient key (e.g. ⌥⇧D).

First-time Configuration

  1. Open Raycast and run Heiji: Settings.
  2. Enter your OpenAI API key. Raycast will store it securely.
  3. Adjust preferences if needed:
    • Dictation Language – ISO 639-1 code (e.g. en, fr).
    • Silence Stop Seconds – auto-stop delay after you stop speaking.
    • Dictionary Path Override – optional JSON file that biases transcription.
  4. Close the form; preferences save automatically and you’ll see a confirmation toast.

You’re ready to dictate. Launch the Dictate command and start speaking—Heiji handles transcription, wake commands, and clipboard restore automatically.

Quick Technical Checks

  • sox --version should report 14.4 or later; if missing, install via brew install sox.
  • ray --version should output the bundled Raycast CLI version.
  • node -v should be ≥ 20.15.1 (tested through Node 24.x).
  • If SoX is installed outside your default PATH, export HEIJI_SOX_PATH=/full/path/to/sox (e.g., /opt/homebrew/bin/sox) before launching Raycast or the CLI.
  • If ambient noise requires tweaking silence detection, set HEIJI_SILENCE_THRESHOLD (e.g., 0.3%) to adjust when recording stops.
  • The transcription prompt explicitly mentions wake phrases (“Heiji”, “Hey G”, “Heyji”) so the STT model keeps them verbatim; avoid removing this guidance when modifying the prompt builder.

Everyday Usage Guide

  • Basic dictation

    1. Trigger the Heiji: Dictate command.
    2. Speak your text; Heiji stops recording after the configured silence window.
    3. The transcript is pasted into the active application and your original clipboard is restored.
  • Wake phrase actions

    • Start with “Heiji …” (e.g. “Heiji, summarize this into three bullet points”).
    • The remainder is sent to gpt-4o-mini; the LLM response replaces the current selection.
  • Retrying a recording

    • Run Heiji: Retranscribe Last to re-transcribe the most recent audio without speaking again.
  • Troubleshooting

    • Toasts surface recoverable issues such as missing transcription or wake response.
    • If you hit an error toast, re-run the Dictate command and try again.

Commands overview

Command Purpose
Heiji: Dictate Records audio and inserts the transcript or wake response into the front app
Heiji: Settings Configures API key, language, silence timeout, and dictionary path
Heiji: Retranscribe Reuses the cached audio file and re-runs transcription

Developer Workflow

If you are contributing to the extension, the repository defines convenience scripts:

  • npm run lint – runs ESLint (skips until dependencies are installed).
  • npm run test – executes the ts-node test suites (dictionary, audio, OpenAI client, insertion, wake, dictate, retranscribe, logger).
  • npm run build – type-checks the extension with tsc --noEmit.

Note: Commands are safe to run repeatedly; when dependencies are missing the script reports the skip explicitly.

Commands

Dictate

  1. Trigger via Raycast (e.g., keyboard shortcut).
  2. HUD flow: Listening…Processing…Inserted or Heiji Action complete for wake requests.
  3. Wake phrases (Heiji, Hey G, Heyji) are stripped before being sent to gpt-4o-mini.
  4. Clipboard content is restored automatically after insertion; Cmd+Z reverts the paste.

Settings

Exposes the four preferences required by the MVP:

Preference Notes
OpenAI API Key Stored securely by Raycast.
Dictation Language ISO 639-1 code controlling STT language (default fr).
Silence Stop Seconds Controls voice-activity timeout (default 2.0).
Dictionary Path Leave blank to use the auto-generated path in ~/Library/Application Support.

Retranscribe Last

  • Surface Retranscribing… HUD while reusing the cached audio file from the last dictation.
  • On success inserts the new transcript and shows Retranscribed.
  • On failure displays a toast and leaves the clipboard untouched.

Dictionary Sample

A sample mapping is provided in dictionary.sample.json:

{
  "Billie": "Billie",
  "Raycast": "Raycast",
  "GPT-4": "GPT-4"
}

Drop a customised JSON file at the path set in Settings to bias the STT prompt.

Logging & Metrics

Each dictation run records transient metrics (timestamps t0–t4, total latency, transcript length, mode). Metrics live only in memory and can be inspected via a short Node snippet while the process is running:

npx ts-node -e "import { getMetrics, metricsAsMarkdown } from "./src/logger"; console.log(metricsAsMarkdown(getMetrics()));"

(Feel free to wrap this in a small Raycast debug command; metrics are never written to disk.)

Manual Verification Checklist

  • Run npm run lint && npm run test && npm run build after installing dependencies.
  • Dictate command inserts text and routes wake phrases (<1.5 s for short utterances).
  • Retranscribe command successfully retries the cached recording.
  • Clipboard is restored and Cmd+Z undoes each paste.
  • Metrics accessible via metricsAsMarkdown during the session.

Private Beta Publishing

Follow Raycast’s private publishing flow:

  1. raycast login and raycast publish from the project root.
  2. Select Private visibility and invite testers by Raycast handle.
  3. Ensure the README, icon, and command descriptions meet store guidelines before submission.

Shipping the Extension

When you are ready to distribute a new build (private beta or public release), follow this checklist:

  1. Verify the release

    npm run lint
    npm run test
    npm run build
    npx ray lint
    npx ray build

    ray build outputs an optimized bundle in ./dist.

  2. Prepare release notes

    • Summarise user-facing changes.
    • Capture any migration steps (e.g. new permissions, required settings).
  3. Publish through Raycast

    npx ray login          # once per machine
    npx ray publish        # choose Private or Public visibility

    Follow the interactive prompts; attach the dist bundle that ray build produced if requested.

  4. Tag the release (optional)

    git tag -a vX.Y.Z -m "Heiji vX.Y.Z"
    git push origin vX.Y.Z
  5. Notify testers/users

    • Share the Raycast store link.
    • Mention the hotkey and new capabilities to highlight in onboarding.

For background, see BRIEF.md, SPEC.md, and PLAN.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published