Thanks to visit codestin.com
Credit goes to openai.github.io

跳转到内容

将实时智能体连接到 Twilio

Twilio 提供了一个 Media Streams API,可将电话通话中的 原始音频发送到 WebSocket 服务器。该设置可用于将您的 语音智能体概述 连接到 Twilio。您可以使用 websocket 模式下的默认 Realtime Session 传输, 将来自 Twilio 的事件连接到您的 Realtime Session。不过, 这需要您设置正确的音频格式,并调整中断时机,因为电话通话相比基于 Web 的对话自然会引入更高的延迟。

为提升设置体验,我们创建了一个专用传输层,为您处理与 Twilio 的连接, 包括中断处理和音频转发。

  1. 确保您拥有 Twilio 账户和一个 Twilio 电话号码。

  2. 设置一个可接收来自 Twilio 事件的 WebSocket 服务器。

    如果您在本地开发,这需要配置一个本地隧道,例如 这需要您配置一个本地隧道,例如 ngrokCloudflare Tunnel 以便让本地服务器对 Twilio 可访问。您可以使用 TwilioRealtimeTransportLayer 连接到 Twilio。

  3. 通过安装扩展包来安装 Twilio 适配器:

    Terminal window
    npm install @openai/agents-extensions
  4. 导入适配器和模型以连接到您的 RealtimeSession

    import { TwilioRealtimeTransportLayer } from '@openai/agents-extensions';
    import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
    const agent = new RealtimeAgent({
    name: 'My Agent',
    });
    // Create a new transport mechanism that will bridge the connection between Twilio and
    // the OpenAI Realtime API.
    const twilioTransport = new TwilioRealtimeTransportLayer({
    twilioWebSocket: websocketConnection,
    });
    const session = new RealtimeSession(agent, {
    // set your own transport
    transport: twilioTransport,
    });
  5. 将您的 RealtimeSession 连接到 Twilio:

    session.connect({ apiKey: 'your-openai-api-key' });

RealtimeSession 中您期望的任何事件和行为都将按预期工作, 包括工具调用、护栏等。阅读语音智能体概述 以了解更多如何在语音智能体中使用 RealtimeSession 的信息。

  1. 速度至关重要。

    为了接收来自 Twilio 的所有必要事件和音频,您应在拿到 WebSocket 连接的引用后尽快创建 TwilioRealtimeTransportLayer 实例,并立即调用 session.connect()

  2. 访问原始 Twilio 事件。

    如果您想访问 Twilio 发送的原始事件,可以监听 RealtimeSession 实例上的 transport_event 事件。来自 Twilio 的每个事件的类型均为 twilio_message,并带有包含原始事件数据的 message 属性。

  3. 查看调试日志。

    有时您可能会遇到需要更多信息的问题。使用 环境变量 DEBUG=openai-agents* 将显示来自 Agents SDK 的所有调试日志。 或者,您也可以仅为 Twilio 适配器启用调试日志: DEBUG=openai-agents:extensions:twilio*

下面是一个端到端的 WebSocket 服务器示例,用于接收来自 Twilio 的请求并将其转发到 RealtimeSession

使用 Fastify 的示例服务器
import Fastify from 'fastify';
import type { FastifyInstance, FastifyReply, FastifyRequest } from 'fastify';
import dotenv from 'dotenv';
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';
import {
RealtimeAgent,
RealtimeSession,
backgroundResult,
tool,
} from '@openai/agents/realtime';
import { TwilioRealtimeTransportLayer } from '@openai/agents-extensions';
import { hostedMcpTool } from '@openai/agents';
import { z } from 'zod';
import process from 'node:process';
// Load environment variables from .env file
dotenv.config();
// Retrieve the OpenAI API key from environment variables. You must have OpenAI Realtime API access.
const { OPENAI_API_KEY } = process.env;
if (!OPENAI_API_KEY) {
console.error('Missing OpenAI API key. Please set it in the .env file.');
process.exit(1);
}
const PORT = +(process.env.PORT || 5050);
// Initialize Fastify
const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);
const weatherTool = tool({
name: 'weather',
description: 'Get the weather in a given location.',
parameters: z.object({
location: z.string(),
}),
execute: async ({ location }: { location: string }) => {
return backgroundResult(`The weather in ${location} is sunny.`);
},
});
const secretTool = tool({
name: 'secret',
description: 'A secret tool to tell the special number.',
parameters: z.object({
question: z
.string()
.describe(
'The question to ask the secret tool; mainly about the special number.',
),
}),
execute: async ({ question }: { question: string }) => {
return `The answer to ${question} is 42.`;
},
needsApproval: true,
});
const agent = new RealtimeAgent({
name: 'Greeter',
instructions:
'You are a friendly assistant. When you use a tool always first say what you are about to do.',
tools: [
hostedMcpTool({
serverLabel: 'deepwiki',
serverUrl: 'https://mcp.deepwiki.com/sse',
}),
secretTool,
weatherTool,
],
});
// Root Route
fastify.get('/', async (_request: FastifyRequest, reply: FastifyReply) => {
reply.send({ message: 'Twilio Media Stream Server is running!' });
});
// Route for Twilio to handle incoming and outgoing calls
// <Say> punctuation to improve text-to-speech translation
fastify.all(
'/incoming-call',
async (request: FastifyRequest, reply: FastifyReply) => {
const twimlResponse = `
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>O.K. you can start talking!</Say>
<Connect>
<Stream url="wss://${request.headers.host}/media-stream" />
</Connect>
</Response>`.trim();
reply.type('text/xml').send(twimlResponse);
},
);
// WebSocket route for media-stream
fastify.register(async (scopedFastify: FastifyInstance) => {
scopedFastify.get(
'/media-stream',
{ websocket: true },
async (connection: any) => {
const twilioTransportLayer = new TwilioRealtimeTransportLayer({
twilioWebSocket: connection,
});
const session = new RealtimeSession(agent, {
transport: twilioTransportLayer,
model: 'gpt-realtime',
config: {
audio: {
output: {
voice: 'verse',
},
},
},
});
session.on('mcp_tools_changed', (tools: { name: string }[]) => {
const toolNames = tools.map((tool) => tool.name).join(', ');
console.log(`Available MCP tools: ${toolNames || 'None'}`);
});
session.on(
'tool_approval_requested',
(_context: unknown, _agent: unknown, approvalRequest: any) => {
console.log(
`Approving tool call for ${approvalRequest.approvalItem.rawItem.name}.`,
);
session
.approve(approvalRequest.approvalItem)
.catch((error: unknown) =>
console.error('Failed to approve tool call.', error),
);
},
);
session.on(
'mcp_tool_call_completed',
(_context: unknown, _agent: unknown, toolCall: unknown) => {
console.log('MCP tool call completed.', toolCall);
},
);
await session.connect({
apiKey: OPENAI_API_KEY,
});
console.log('Connected to the OpenAI Realtime API');
},
);
});
fastify.listen({ port: PORT }, (err: Error | null) => {
if (err) {
console.error(err);
process.exit(1);
}
console.log(`Server is listening on port ${PORT}`);
});
process.on('SIGINT', () => {
fastify.close();
process.exit(0);
});