Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@fulltimemike
Copy link
Collaborator

No description provided.

@github-actions
Copy link

Resolves #177

Comment on lines +86 to +95
const fmt = new Intl.DateTimeFormat('en-GB', {
hour: '2-digit',
minute: '2-digit',
second: '2-digit',
fractionalSecondDigits: 3,
hour12: false,
})

const timestamp = fmt.format(new Date())
console.log(`${timestamp}: Pasted content: ${content}`)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility log for getting rough timing data for the transcript being fully done

Comment on lines +114 to +115
// Only clear volume history when recording stops
setVolumeHistory([])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now users can switch modes for ito mid-stream, but the audio bars will not be cleared. They'll simply change color

Comment on lines 10 to 12
public setCursorContext(context: string): void {
this.cursorContext = context
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change here is to set the cursor context when the transcription stream is kicked off, so that the context is available whenever a transcript is returned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we tracking the cursor context in the lifecycle of the transcription instead? Not a fan of storing this in the local memory of the class since this is a singleton used across the app

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this isn't as clean as it could be. I think the grammarRulesService does not have to be a singleton. I would rather construct the class with the necessary context, and then we don't have to worry about using it elsewhere.

I don't want to put grammar rules context in itoSession though, as I feel that's a lower level than itoSession

import { getAdvancedSettings } from './store'
import log from 'electron-log'

export class ItoSession {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ItoSession is mainly a coordinator class that stays abstract. It orchestrates calls to set up audio streaming, running the grpc stream, pasting the resultant transcript, and handles sending updates to the pill window. It

* ItoStreamController manages the lifecycle of a transcription stream using TranscribeStreamV2.
* It allows sending metadata/config, streaming audio, and updating settings during the stream.
*/
export class ItoStreamController {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class primarily handles sending the audio and context streams in parallel

* Starts audio recording and handles system audio muting.
* Does NOT start the ItoStreamController - that should be done separately.
*/
public startAudioRecording = () => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VoiceInputService has been refactored so that it is no longer a pass through call to kick off transcription streaming. It should primarily be responsible for audio at an abstract level. This class could probably be merged with another one (perhaps audio recorder?) if we want to

Comment on lines -41 to -45
// Debouncing state
let shortcutDebounceTimeout: NodeJS.Timeout | null = null
let pendingShortcut: KeyboardShortcutConfig | null = null
export const DEBOUNCE_TIME = 10

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more debounce! Users can change ito mode mid stream, which means being slow to hit both keys necessary for intelligent mode is no longer a problem.

import { BrowserWindow } from 'electron'
import { audioRecorderService } from './audio'
import { voiceInputService } from '../main/voiceInputService'
import { itoSession } from '../main/itoSession'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keyboard now starts itoSession instead of interactionManager and voiceInputService. Both are managed by itoSession now

// Check actual audio duration (keyboard duration can be misleading due to latency)
const audioDurationMs = itoStreamController.getAudioDurationMs()

if (audioDurationMs < this.MINIMUM_AUDIO_DURATION_MS) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is cleaner to have the itoSession itself cancel its own stream instead of buffering audio and only beginning streaming once we have met the minimum buffer.

timestamp: number
}

export class TranscribeStreamV2Handler {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are handling midstream updates, this new handler has a few important changes over TranscribeV1 (left as a deprecated function we should support until all customers are on the new version)

  • we merge context updates so that late arriving context overwrites pre existing context (this includes mode!)
  • There is a mode change grace period at the end, where a user releasing keys may trigger a mode change. If it happens in a slim window at the end of the stream, we assume the user accidentally changed modes. Without this, a user with fn + ctl as intelligent mode might release ctl a fraction of a second before releasing fn, triggering a mode change to dictation at the very end.
  • Audio and context are processed via the stream instead of headers

}

// Send transcription result to main window
this.windowMessenger.sendTranscriptionResult(response)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't do anything anymore, delete this entire windowMessenger class

import { getAdvancedSettings } from './store'
import log from 'electron-log'

export class ItoSession {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming to ItoSessionManager

Comment on lines 223 to 226
public setMainWindow(mainWindow: any) {
this.windowMessenger.setMainWindow(mainWindow)
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

lib/main/main.ts Outdated
// Set main window for transcription service so it can send messages
transcriptionService.setMainWindow(mainWindow)
// Set main window for ito session so it can send messages
itoSession.setMainWindow(mainWindow)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

}

this.stopStreaming()
this.audioStreamManager.clearInteractionAudio()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this into stopStreaming

private audioStreamManager = new AudioStreamManager()

private hasStartedGrpc = false
private currentMode: ItoMode | null = null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to dictation

transcriptionPrompt:
context.advancedSettings.llm.transcriptionPrompt,
editingPrompt: context.advancedSettings.llm.editingPrompt,
asrModel: '',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check if duplication is necessary of some of these fields

}, DEBOUNCE_TIME) // debounce
// Handle shortcut activation and mode changes
if (currentlyHeldShortcut) {
if (!isShortcutActive) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor to remove isShortcutActive and replace w ternary logic on activeShortcutId

const recordingStatePayload: RecordingStatePayload = {
isRecording: true,
// Start audio recorder
log.info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we should prefer using console.log over log.info as we have some logic to send console logs to our server.

llmSettings: undefined,
vocabulary: [],
})
const modeHistory: ModeChangeRecord[] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check edge case where we start intelligent mode then release the keys sending an unintentional dictation mode update

: base.llmSettings,
vocabulary:
update.vocabulary.length > 0
? [...base.vocabulary, ...update.vocabulary]
Copy link
Collaborator

@julgmz julgmz Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just take the update here, dont spread the base

@fulltimemike fulltimemike merged commit bd6881a into dev Oct 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants