-
Couldn't load subscription status.
- Fork 60
ITO-177: Restructure grpc streaming #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Resolves #177 |
| const fmt = new Intl.DateTimeFormat('en-GB', { | ||
| hour: '2-digit', | ||
| minute: '2-digit', | ||
| second: '2-digit', | ||
| fractionalSecondDigits: 3, | ||
| hour12: false, | ||
| }) | ||
|
|
||
| const timestamp = fmt.format(new Date()) | ||
| console.log(`${timestamp}: Pasted content: ${content}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utility log for getting rough timing data for the transcript being fully done
| // Only clear volume history when recording stops | ||
| setVolumeHistory([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now users can switch modes for ito mid-stream, but the audio bars will not be cleared. They'll simply change color
| public setCursorContext(context: string): void { | ||
| this.cursorContext = context | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main change here is to set the cursor context when the transcription stream is kicked off, so that the context is available whenever a transcript is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we tracking the cursor context in the lifecycle of the transcription instead? Not a fan of storing this in the local memory of the class since this is a singleton used across the app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this isn't as clean as it could be. I think the grammarRulesService does not have to be a singleton. I would rather construct the class with the necessary context, and then we don't have to worry about using it elsewhere.
I don't want to put grammar rules context in itoSession though, as I feel that's a lower level than itoSession
lib/main/itoSession.ts
Outdated
| import { getAdvancedSettings } from './store' | ||
| import log from 'electron-log' | ||
|
|
||
| export class ItoSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ItoSession is mainly a coordinator class that stays abstract. It orchestrates calls to set up audio streaming, running the grpc stream, pasting the resultant transcript, and handles sending updates to the pill window. It
| * ItoStreamController manages the lifecycle of a transcription stream using TranscribeStreamV2. | ||
| * It allows sending metadata/config, streaming audio, and updating settings during the stream. | ||
| */ | ||
| export class ItoStreamController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class primarily handles sending the audio and context streams in parallel
| * Starts audio recording and handles system audio muting. | ||
| * Does NOT start the ItoStreamController - that should be done separately. | ||
| */ | ||
| public startAudioRecording = () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VoiceInputService has been refactored so that it is no longer a pass through call to kick off transcription streaming. It should primarily be responsible for audio at an abstract level. This class could probably be merged with another one (perhaps audio recorder?) if we want to
| // Debouncing state | ||
| let shortcutDebounceTimeout: NodeJS.Timeout | null = null | ||
| let pendingShortcut: KeyboardShortcutConfig | null = null | ||
| export const DEBOUNCE_TIME = 10 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more debounce! Users can change ito mode mid stream, which means being slow to hit both keys necessary for intelligent mode is no longer a problem.
lib/media/keyboard.ts
Outdated
| import { BrowserWindow } from 'electron' | ||
| import { audioRecorderService } from './audio' | ||
| import { voiceInputService } from '../main/voiceInputService' | ||
| import { itoSession } from '../main/itoSession' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyboard now starts itoSession instead of interactionManager and voiceInputService. Both are managed by itoSession now
| // Check actual audio duration (keyboard duration can be misleading due to latency) | ||
| const audioDurationMs = itoStreamController.getAudioDurationMs() | ||
|
|
||
| if (audioDurationMs < this.MINIMUM_AUDIO_DURATION_MS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is cleaner to have the itoSession itself cancel its own stream instead of buffering audio and only beginning streaming once we have met the minimum buffer.
| timestamp: number | ||
| } | ||
|
|
||
| export class TranscribeStreamV2Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are handling midstream updates, this new handler has a few important changes over TranscribeV1 (left as a deprecated function we should support until all customers are on the new version)
- we merge context updates so that late arriving context overwrites pre existing context (this includes mode!)
- There is a mode change grace period at the end, where a user releasing keys may trigger a mode change. If it happens in a slim window at the end of the stream, we assume the user accidentally changed modes. Without this, a user with fn + ctl as intelligent mode might release ctl a fraction of a second before releasing fn, triggering a mode change to dictation at the very end.
- Audio and context are processed via the stream instead of headers
lib/main/itoSession.ts
Outdated
| } | ||
|
|
||
| // Send transcription result to main window | ||
| this.windowMessenger.sendTranscriptionResult(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't do anything anymore, delete this entire windowMessenger class
lib/main/itoSession.ts
Outdated
| import { getAdvancedSettings } from './store' | ||
| import log from 'electron-log' | ||
|
|
||
| export class ItoSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming to ItoSessionManager
lib/main/itoSession.ts
Outdated
| public setMainWindow(mainWindow: any) { | ||
| this.windowMessenger.setMainWindow(mainWindow) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
lib/main/main.ts
Outdated
| // Set main window for transcription service so it can send messages | ||
| transcriptionService.setMainWindow(mainWindow) | ||
| // Set main window for ito session so it can send messages | ||
| itoSession.setMainWindow(mainWindow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
lib/main/itoStreamController.ts
Outdated
| } | ||
|
|
||
| this.stopStreaming() | ||
| this.audioStreamManager.clearInteractionAudio() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this into stopStreaming
lib/main/itoStreamController.ts
Outdated
| private audioStreamManager = new AudioStreamManager() | ||
|
|
||
| private hasStartedGrpc = false | ||
| private currentMode: ItoMode | null = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to dictation
lib/main/itoStreamController.ts
Outdated
| transcriptionPrompt: | ||
| context.advancedSettings.llm.transcriptionPrompt, | ||
| editingPrompt: context.advancedSettings.llm.editingPrompt, | ||
| asrModel: '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check if duplication is necessary of some of these fields
lib/media/keyboard.ts
Outdated
| }, DEBOUNCE_TIME) // debounce | ||
| // Handle shortcut activation and mode changes | ||
| if (currentlyHeldShortcut) { | ||
| if (!isShortcutActive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor to remove isShortcutActive and replace w ternary logic on activeShortcutId
lib/main/voiceInputService.ts
Outdated
| const recordingStatePayload: RecordingStatePayload = { | ||
| isRecording: true, | ||
| // Start audio recorder | ||
| log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we should prefer using console.log over log.info as we have some logic to send console logs to our server.
| llmSettings: undefined, | ||
| vocabulary: [], | ||
| }) | ||
| const modeHistory: ModeChangeRecord[] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check edge case where we start intelligent mode then release the keys sending an unintentional dictation mode update
| : base.llmSettings, | ||
| vocabulary: | ||
| update.vocabulary.length > 0 | ||
| ? [...base.vocabulary, ...update.vocabulary] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just take the update here, dont spread the base
No description provided.