ITO-177: Restructure grpc streaming #354

fulltimemike · 2025-10-20T22:13:00Z

No description provided.

github-actions · 2025-10-20T22:13:18Z

Resolves #177

fulltimemike · 2025-10-20T22:13:35Z

app/components/home/contents/NotesContent.tsx

+    const fmt = new Intl.DateTimeFormat('en-GB', {
+      hour: '2-digit',
+      minute: '2-digit',
+      second: '2-digit',
+      fractionalSecondDigits: 3,
+      hour12: false,
+    })
+
+    const timestamp = fmt.format(new Date())
+    console.log(`${timestamp}: Pasted content: ${content}`)


Utility log for getting rough timing data for the transcript being fully done

fulltimemike · 2025-10-20T22:14:06Z

app/components/pill/Pill.tsx

+          // Only clear volume history when recording stops
+          setVolumeHistory([])


Now users can switch modes for ito mid-stream, but the audio bars will not be cleared. They'll simply change color

fulltimemike · 2025-10-20T22:45:00Z

lib/main/grammar/GrammarRulesService.ts

+  public setCursorContext(context: string): void {
+    this.cursorContext = context
+  }


Main change here is to set the cursor context when the transcription stream is kicked off, so that the context is available whenever a transcript is returned.

Why aren't we tracking the cursor context in the lifecycle of the transcription instead? Not a fan of storing this in the local memory of the class since this is a singleton used across the app

I agree, this isn't as clean as it could be. I think the grammarRulesService does not have to be a singleton. I would rather construct the class with the necessary context, and then we don't have to worry about using it elsewhere.

I don't want to put grammar rules context in itoSession though, as I feel that's a lower level than itoSession

fulltimemike · 2025-10-20T22:46:11Z

lib/main/itoSession.ts

+import { getAdvancedSettings } from './store'
+import log from 'electron-log'
+
+export class ItoSession {


ItoSession is mainly a coordinator class that stays abstract. It orchestrates calls to set up audio streaming, running the grpc stream, pasting the resultant transcript, and handles sending updates to the pill window. It

fulltimemike · 2025-10-20T22:46:51Z

lib/main/itoStreamController.ts

+ * ItoStreamController manages the lifecycle of a transcription stream using TranscribeStreamV2.
+ * It allows sending metadata/config, streaming audio, and updating settings during the stream.
+ */
+export class ItoStreamController {


This class primarily handles sending the audio and context streams in parallel

fulltimemike · 2025-10-20T22:48:48Z

lib/main/voiceInputService.ts

+   * Starts audio recording and handles system audio muting.
+   * Does NOT start the ItoStreamController - that should be done separately.
+   */
+  public startAudioRecording = () => {


VoiceInputService has been refactored so that it is no longer a pass through call to kick off transcription streaming. It should primarily be responsible for audio at an abstract level. This class could probably be merged with another one (perhaps audio recorder?) if we want to

fulltimemike · 2025-10-20T22:50:12Z

lib/media/keyboard.ts

-// Debouncing state
-let shortcutDebounceTimeout: NodeJS.Timeout | null = null
-let pendingShortcut: KeyboardShortcutConfig | null = null
-export const DEBOUNCE_TIME = 10
-


No more debounce! Users can change ito mode mid stream, which means being slow to hit both keys necessary for intelligent mode is no longer a problem.

fulltimemike · 2025-10-20T22:51:42Z

lib/media/keyboard.ts

 import { BrowserWindow } from 'electron'
-import { audioRecorderService } from './audio'
-import { voiceInputService } from '../main/voiceInputService'
+import { itoSession } from '../main/itoSession'


keyboard now starts itoSession instead of interactionManager and voiceInputService. Both are managed by itoSession now

fulltimemike · 2025-10-20T22:52:31Z

lib/main/itoSessionManager.ts

+    // Check actual audio duration (keyboard duration can be misleading due to latency)
+    const audioDurationMs = itoStreamController.getAudioDurationMs()
+
+    if (audioDurationMs < this.MINIMUM_AUDIO_DURATION_MS) {


I think this is cleaner to have the itoSession itself cancel its own stream instead of buffering audio and only beginning streaming once we have met the minimum buffer.

fulltimemike · 2025-10-20T22:56:56Z

server/src/services/ito/transcribeStreamV2Handler.ts

+  timestamp: number
+}
+
+export class TranscribeStreamV2Handler {


Now that we are handling midstream updates, this new handler has a few important changes over TranscribeV1 (left as a deprecated function we should support until all customers are on the new version)

we merge context updates so that late arriving context overwrites pre existing context (this includes mode!)

There is a mode change grace period at the end, where a user releasing keys may trigger a mode change. If it happens in a slim window at the end of the stream, we assume the user accidentally changed modes. Without this, a user with fn + ctl as intelligent mode might release ctl a fraction of a second before releasing fn, triggering a mode change to dictation at the very end.

Audio and context are processed via the stream instead of headers

lib/main/itoSession.ts

Co-authored-by: Julian <[email protected]>

fulltimemike · 2025-10-21T17:35:27Z

lib/main/itoSession.ts

+      }
+
+      // Send transcription result to main window
+      this.windowMessenger.sendTranscriptionResult(response)


doesn't do anything anymore, delete this entire windowMessenger class

fulltimemike · 2025-10-21T17:36:36Z

lib/main/itoSession.ts

+import { getAdvancedSettings } from './store'
+import log from 'electron-log'
+
+export class ItoSession {


Consider renaming to ItoSessionManager

fulltimemike · 2025-10-21T17:37:29Z

lib/main/itoSession.ts

+  public setMainWindow(mainWindow: any) {
+    this.windowMessenger.setMainWindow(mainWindow)
+  }
+}


fulltimemike · 2025-10-21T17:37:42Z

lib/main/main.ts

-  // Set main window for transcription service so it can send messages
-  transcriptionService.setMainWindow(mainWindow)
+  // Set main window for ito session so it can send messages
+  itoSession.setMainWindow(mainWindow)


julgmz · 2025-10-21T17:55:41Z

lib/main/itoStreamController.ts

+    }
+
+    this.stopStreaming()
+    this.audioStreamManager.clearInteractionAudio()


pull this into stopStreaming

julgmz · 2025-10-21T17:56:38Z

lib/main/itoStreamController.ts

+  private audioStreamManager = new AudioStreamManager()
+
+  private hasStartedGrpc = false
+  private currentMode: ItoMode | null = null


default to dictation

julgmz · 2025-10-21T17:58:01Z

lib/main/itoStreamController.ts

+            transcriptionPrompt:
+              context.advancedSettings.llm.transcriptionPrompt,
+            editingPrompt: context.advancedSettings.llm.editingPrompt,
+            asrModel: '',


double check if duplication is necessary of some of these fields

julgmz · 2025-10-21T18:12:42Z

lib/media/keyboard.ts

-      }, DEBOUNCE_TIME) // debounce
+  // Handle shortcut activation and mode changes
+  if (currentlyHeldShortcut) {
+    if (!isShortcutActive) {


refactor to remove isShortcutActive and replace w ternary logic on activeShortcutId

CLAUDE.md

JohnDonavon · 2025-10-21T18:17:06Z

lib/main/voiceInputService.ts

-    const recordingStatePayload: RecordingStatePayload = {
-      isRecording: true,
+    // Start audio recorder
+    log.info(


Note: we should prefer using console.log over log.info as we have some logic to send console logs to our server.

julgmz · 2025-10-21T18:27:46Z

server/src/services/ito/transcribeStreamV2Handler.ts

+      llmSettings: undefined,
+      vocabulary: [],
+    })
+    const modeHistory: ModeChangeRecord[] = []


check edge case where we start intelligent mode then release the keys sending an unintentional dictation mode update

julgmz · 2025-10-21T18:32:09Z

server/src/services/ito/transcribeStreamV2Handler.ts

+        : base.llmSettings,
+      vocabulary:
+        update.vocabulary.length > 0
+          ? [...base.vocabulary, ...update.vocabulary]


just take the update here, dont spread the base

ITO-177: Restructure grpc streaming

be86823

fulltimemike commented Oct 20, 2025

View reviewed changes

update comments/order

9e4e412

fulltimemike commented Oct 20, 2025

View reviewed changes

deprecate and split out old handler

618d665

julgmz reviewed Oct 21, 2025

View reviewed changes

lib/main/itoSession.ts Outdated Show resolved Hide resolved

julgmz reviewed Oct 21, 2025

View reviewed changes

lib/main/itoSession.ts Outdated Show resolved Hide resolved

fulltimemike and others added 5 commits October 21, 2025 11:50

log error for canceled session

90fa5a1

Update lib/main/itoSession.ts

8e1bd97

Co-authored-by: Julian <[email protected]>

make grammarRules service a new instantiation each session

26a6e3e

fix test now that we dont pass in 4

9bb0a22

fix formatting

6d04d79

fulltimemike commented Oct 21, 2025

View reviewed changes

julgmz reviewed Oct 21, 2025

View reviewed changes

CLAUDE.md Show resolved Hide resolved

JohnDonavon reviewed Oct 21, 2025

View reviewed changes

julgmz reviewed Oct 21, 2025

View reviewed changes

julgmz approved these changes Oct 21, 2025

View reviewed changes

fulltimemike added 10 commits October 21, 2025 17:31

feedback

941e26f

console logs and feedback

eae63df

adjust tests

716d65a

keyboard suggestions

df28267

handle aborts

71ae672

fix mode change edge case

3b9eb69

refactor abort error

aa6bb9d

Split out audio processing

dc60991

remove unnecessary transcription settings

93418fa

handle ending stream as another one begins

73c55c0

fulltimemike merged commit bd6881a into dev Oct 23, 2025
4 checks passed

		// Only clear volume history when recording stops
		setVolumeHistory([])

Uh oh!

ITO-177: Restructure grpc streaming #354

ITO-177: Restructure grpc streaming #354

Uh oh!

Conversation

fulltimemike commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julgmz Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

julgmz Oct 21, 2025 •

edited

Loading