-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix(stage-ui,server-*): implement handling for input:text from anywhere, support for Discord
#928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(stage-ui,server-*): implement handling for input:text from anywhere, support for Discord
#928
Conversation
Summary of ChangesHello @skyline624, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Discord integration by enabling full text chat functionality and ensuring a seamless experience for both Discord users and web interface users. It introduces mechanisms to process Discord text inputs, maintain session isolation for these conversations, and intelligently route AI responses back to their origin, all while keeping the main web chat UI uncluttered. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request does a great job of enabling text-based interactions with the Discord bot and isolating Discord sessions from the main web UI. The changes are well-structured, particularly the session handling in chatStore.ts and the event-driven communication in airi-adapter.ts. My review includes a few suggestions to improve type safety by removing as any casts, and to enhance maintainability by refactoring duplicated code and addressing a potential performance concern with deep cloning in high-frequency operations.
| return | ||
|
|
||
| broadcastStreamEvent({ type: 'before-compose', message, sessionId: chatStore.activeSessionId, context }) | ||
| broadcastStreamEvent({ type: 'before-compose', message, sessionId: chatStore.activeSessionId, context: JSON.parse(JSON.stringify(context)) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using JSON.parse(JSON.stringify(context)) for deep cloning is a common pattern, but it can have a performance impact, especially in high-frequency hooks like onTokenLiteral. This method is relatively slow for large objects and has limitations (e.g., it drops undefined values and converts Date objects to strings). Since this pattern is repeated for all broadcastStreamEvent calls in this file, consider creating a more performant deep-clone utility or using a lightweight library like rfdc if you notice any UI lag during message streaming. This would also improve maintainability by centralizing the cloning logic.
| let discord | ||
| const inputContent = Array.isArray(context.input.content) | ||
| ? context.input.content.map(c => c.type === 'text' ? c.text : '').join('') | ||
| : context.input.content | ||
|
|
||
| if (typeof inputContent === 'string' && inputContent.includes('(From Discord user')) { | ||
| discord = Object.values(context.contexts).flat().find(c => c.metadata?.discord)?.metadata?.discord | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic to extract the discord metadata from the context is duplicated here and in the onChatTurnComplete hook below (lines 164-171). To improve maintainability and reduce redundancy, consider extracting this into a small helper function.
For example, you could define a helper function like this within the store:
function getDiscordMetadataFromContext(context: ChatStreamEventContext): any | undefined {
const inputContent = Array.isArray(context.input.content)
? context.input.content.map(c => c.type === 'text' ? c.text : '').join('')
: context.input.content;
if (typeof inputContent === 'string' && inputContent.includes('(From Discord user')) {
return Object.values(context.contexts).flat().find(c => c.metadata?.discord)?.metadata?.discord;
}
return undefined;
}Then you can simply call this function in both places.
| let discord | |
| const inputContent = Array.isArray(context.input.content) | |
| ? context.input.content.map(c => c.type === 'text' ? c.text : '').join('') | |
| : context.input.content | |
| if (typeof inputContent === 'string' && inputContent.includes('(From Discord user')) { | |
| discord = Object.values(context.contexts).flat().find(c => c.metadata?.discord)?.metadata?.discord | |
| } | |
| const discord = getDiscordMetadataFromContext(context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gemini review
Having getDiscord...(...) implementation here and handles specifically for Discord is not extendable for future integrations, what approach would you suggest to have both Discord context builder and other source/module integrated in? Perhaps, can we ask the input:text to build essential messages / context for consumer side from the module that initially emits? What is the best option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise a very good point regarding the extendability of the getDiscordContextFrom helper. Relying on a string pattern in the message content ((From Discord user) and a hardcoded metadata key (discord) makes the solution less flexible for integrating other platforms like Telegram or Slack in the future.
Your suggestion to have the input:text event build and carry essential context from the module that initially emits it is indeed the best approach for a more generic and maintainable solution. The current implementation already does this by including the discord object in the input:text event data, which is then ingested into a ContextMessage with source: 'discord' and metadata: { discord: { ... } }.
To make the context extraction more generic and leverage this existing structure, we can modify the helper function to primarily use the source property of the ContextMessage objects within the ChatStreamEventContext. This separates the identification of the message's origin from the specific content of the message itself, allowing the (From Discord user X) marker to serve its intended purpose for the LLM, as mentioned in the PR description.
Here's a more generic helper function that you could use:
function getPlatformSpecificMetadata(context: ChatStreamEventContext, platformName: string): any | undefined {
// Find a ContextMessage that explicitly states its source as the given platform
const platformContextMessage = Object.values(context.contexts).flat().find(c => c.source === platformName);
// Return the metadata associated with that platform, assuming it's structured as { platformName: { ... } }
return platformContextMessage?.metadata?.[platformName];
}Then, in onAssistantMessage and onChatTurnComplete, you would use it like this:
const discord = getPlatformSpecificMetadata(context, 'discord');
if (discord) {
serverChannelStore.send({
type: 'output:gen-ai:chat:message',
data: {
message,
discord,
// ... other data
},
});
}This approach is more extendable because:
- Source Identification: It relies on the
sourcefield of theContextMessage, which is a programmatic identifier for the platform, rather than a string pattern in the message content. - Generic Metadata Retrieval: It retrieves platform-specific metadata based on a consistent key (
platformName) within theContextMessage.metadataobject. - Clearer Intent: The
(From Discord user X)marker in the message content can then be understood purely as an instruction for the LLM, without being part of the routing logic within thecontext-bridge.
| // Handle output from AIRI system (IA response) | ||
| this.airiClient.onEvent('output:gen-ai:chat:message', async (event) => { | ||
| try { | ||
| const { message, discord } = event.data as any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using as any bypasses TypeScript's type safety. It would be more robust to define a specific type for the data payload of the output:gen-ai:chat:message event and use that for casting. This improves code clarity and prevents potential runtime errors if the data structure changes.
For example, you could define a type like this (preferably in a shared location):
interface ChatMessageOutput {
message: { content: string };
discord?: { channelId: string };
}| const { message, discord } = event.data as any | |
| const { message, discord } = event.data as ChatMessageOutput |
| this.airiClient.send({ | ||
| type: 'input:text', | ||
| data: { | ||
| text: content, | ||
| discord: { | ||
| channelId: message.channelId, | ||
| guildId: message.guildId, | ||
| guildMember: { | ||
| id: message.author.id, | ||
| displayName: message.member?.displayName ?? message.author.username, | ||
| nickname: message.member?.nickname ?? message.author.username, | ||
| }, | ||
| }, | ||
| } as any, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting the event data to any bypasses TypeScript's type safety. A better approach is to construct a fully typed event object before sending it. This ensures the payload matches the expected type for an input:text event and makes the code more robust.
Additionally, the nickname field is redundant because displayName already provides the server-specific nickname if one is set. The receiving end in context-bridge.ts only uses displayName.
You can define a type for the data payload and create a typed object, which resolves both issues.
Example type definition:
interface TextInputData {
text: string;
discord: {
channelId: string;
guildId: string | null;
guildMember: {
id: string;
displayName: string;
};
};
} const data = {
text: content,
discord: {
channelId: message.channelId,
guildId: message.guildId,
guildMember: {
id: message.author.id,
displayName: message.member?.displayName ?? message.author.username,
},
},
};
this.airiClient.send({
type: 'input:text',
data,
})| chatStore.ingestContextMessage({ | ||
| id, | ||
| contextId: id, | ||
| source: event.source, | ||
| strategy: ContextUpdateStrategy.AppendSelf, | ||
| createdAt: Date.now(), | ||
| content: `The input is coming from Discord channel ${discord.channelId} (Guild: ${discord.guildId}).`, | ||
| metadata: { | ||
| discord, | ||
| }, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so sorry that the server-shared/channel-server's type is not finalized and still experimenting. Is context:update ok for consuming the input:text event from Discord? Should we do spark:notify instead?
| if (discord?.guildMember?.displayName) { | ||
| messageText = `(From Discord user ${discord.guildMember.displayName}): ${text}` | ||
| targetSessionId = 'discord' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here. This reminds me that having a context formatter is essential to all integrations.
| await chatStore.send(messageText, { | ||
| model: activeModel.value, | ||
| chatProvider, | ||
| }, targetSessionId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, so here use as context:update is fine since you use chatStore.send(...) for reacting to stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, I used input:text to trigger a direct chat interaction which feels more natural for a direct reply pattern.
⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.
Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏 |
ac2f5c7 to
a79585c
Compare
|
Done! I've applied the latest code review suggestions and fixed the remaining points. It should be ready for another look/approval |
fc42d16 to
730b8ac
Compare
| if (discord?.channelId) { | ||
| const channel = await this.discordClient.channels.fetch(discord.channelId) | ||
| if (channel?.isTextBased() && 'send' in channel) { | ||
| await (channel as any).send(message.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not to use any here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I'll refactor this to use a property check instead of any to ensure the channel is sendable while keeping it type-safe.
| // @ts-expect-error - the .crossws property wasn't extended in types | ||
| plugins: [ws({ resolve: async req => (await app.fetch(req)).crossws })], | ||
| port: env.PORT ? Number(env.PORT) : 6121, | ||
| hostname: env.HOST || '0.0.0.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hostname: env.HOST || '0.0.0.0', |
… and token - update DiscordAdapter to use 'module:configure' event and improve logging - enhance DiscordAdapter with message handling and additional intents - enhance DiscordAdapter to include guild member details in message context - enhance context bridge to support target session ID for Discord messages
c532dac to
dfe0fdb
Compare
|
Addressed all review comments:
Tests are passing locally. Ready for another look! |
input:text from anywhere, support for Discord
Description
This PR fixes the Discord bot which was ignoring text mentions and improves how external messages are handled in the web interface.
Key Changes
input:texthandling incontext-bridge.ts. Previously, the bot sent the event but the stage-web client wasn't listening.displayName. Added a marker(From Discord user X):to messages so the LLM can identify the speaker.chatStore.tsto support background/targeted sessions (targetSessionId).'discord'session.Motivation
Makes the Discord integration fully functional for text chat without polluting the primary user's web interface experience.