Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Feature]: Phoenix Observability Integration plugin #727

@crivetimihai

Description

@crivetimihai

Feature Request: Phoenix Observability Integration

Summary

Integrate Arize Phoenix as an observability plugin for MCP Gateway to provide comprehensive LLM tracing, evaluation, and monitoring capabilities.

Note: due to the licensing of Arize Phoenix, we should NOT include it as part of MCP Gateway, but rather, look compatibility with their SaaS service.

Motivation

As identified in the roadmap (Release 0.7.0), MCP Gateway requires core observability features. Phoenix provides a mature, open-source AI observability platform that aligns perfectly with the gateway's needs for:

  • OpenTelemetry-based tracing for MCP tool invocations
  • LLM performance evaluation and benchmarking
  • Request/response monitoring across virtual servers
  • Prompt management and optimization tracking

Proposed Implementation

1. Phoenix Observability Plugin

Create a new plugin at plugins/phoenix_observability/ that implements:

Hook Integration Points

  • tool_pre_invoke / tool_post_invoke: Capture tool execution traces with input/output data
  • prompt_pre_fetch / prompt_post_fetch: Track prompt template usage and performance
  • resource_pre_fetch / resource_post_fetch: Monitor resource access patterns
  • Request/Response hooks: Trace complete MCP request lifecycle

Plugin Configuration (plugins/config.yaml)

- name: "PhoenixObservabilityPlugin"
  kind: "plugins.phoenix_observability.phoenix_plugin.PhoenixObservabilityPlugin"
  description: "AI observability with tracing, evaluation, and monitoring via Arize Phoenix"
  version: "1.0.0"
  author: "MCP Gateway Team"
  hooks: 
    - "tool_pre_invoke"
    - "tool_post_invoke"
    - "prompt_pre_fetch"
    - "prompt_post_fetch"
    - "resource_pre_fetch"
    - "resource_post_fetch"
  tags: ["observability", "tracing", "monitoring", "opentelemetry", "phoenix"]
  mode: "permissive"
  priority: 200  # Lower priority to not interfere with security plugins
  config:
    phoenix_endpoint: "${PHOENIX_ENDPOINT:-http://localhost:6006}"
    enable_tracing: true
    enable_evaluation: false  # Can be enabled for automatic quality checks
    sample_rate: 1.0  # Trace sampling rate
    export_batch_size: 100
    export_interval_ms: 5000
    # OpenTelemetry configuration
    otel_service_name: "mcp-gateway"
    otel_resource_attributes:
      deployment.environment: "${DEPLOYMENT_ENV:-development}"
      service.version: "${MCPGATEWAY_VERSION}"
    # Trace enrichment
    capture_request_headers: ["X-Request-ID", "X-Tenant-ID", "User-Agent"]
    capture_response_headers: ["X-Response-Time"]
    redact_sensitive_fields: ["password", "api_key", "secret", "token"]

2. Core Features

A. Distributed Tracing

  • Implement OpenTelemetry instrumentation using Phoenix's OpenInference spec
  • Create spans for:
    • Virtual server requests
    • Individual tool invocations
    • Prompt rendering operations
    • Resource fetching
    • Federation calls between gateways
  • Include context propagation for distributed traces across federated gateways

B. Performance Metrics

  • Tool execution latency by server/tool
  • Request throughput per virtual server
  • Error rates and success rates
  • Resource usage patterns
  • Cache hit/miss ratios

C. LLM-Specific Observability

  • Token usage tracking (if available from MCP servers)
  • Prompt template effectiveness metrics
  • Tool selection patterns
  • Chain-of-thought execution traces

D. Evaluation Integration

  • Support for Phoenix's evaluation framework to:
    • Assess tool output quality
    • Monitor prompt effectiveness
    • Detect anomalies in responses
    • Track drift in tool behavior over time

3. Integration with Existing Infrastructure

Alignment with Roadmap

Plugin Framework Utilization

  • Leverages existing PluginContext for request tracking
  • Uses GlobalContext for tenant/user attribution
  • Implements async operations for non-blocking trace export
  • Respects plugin priority system (runs after security plugins)

4. Deployment Options

Docker Compose Configuration

Option A: PostgreSQL Backend (Recommended for Production)
# docker-compose.phoenix.yaml - Separate compose file for Phoenix stack
version: '3.8'

services:
  phoenix:
    image: arizephoenix/phoenix:v4.0.0  # Pin to specific version
    container_name: mcp-phoenix
    restart: unless-stopped
    depends_on:
      - phoenix-db
    ports:
      - "6006:6006"  # Phoenix UI and OTLP HTTP collector
      - "4317:4317"  # OTLP gRPC collector
    environment:
      - PHOENIX_SQL_DATABASE_URL=postgresql://phoenix:phoenix_secret@phoenix-db:5432/phoenix
      - PHOENIX_ENABLE_AUTH=false  # Set to true for production
      - PHOENIX_SECRET_KEY=${PHOENIX_SECRET_KEY:-your-secret-key-here}
      - PHOENIX_LOG_LEVEL=info
      - PHOENIX_GRPC_PORT=4317
      - PHOENIX_HTTP_PORT=6006
    networks:
      - phoenix-net
      - mcpgateway-net  # Connect to MCP Gateway network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:6006/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  phoenix-db:
    image: postgres:15-alpine
    container_name: mcp-phoenix-db
    restart: unless-stopped
    environment:
      - POSTGRES_USER=phoenix
      - POSTGRES_PASSWORD=phoenix_secret
      - POSTGRES_DB=phoenix
      - POSTGRES_INITDB_ARGS=--encoding=UTF8
    volumes:
      - phoenix-postgres-data:/var/lib/postgresql/data
    networks:
      - phoenix-net
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U phoenix"]
      interval: 10s
      timeout: 5s
      retries: 5

networks:
  phoenix-net:
    driver: bridge
  mcpgateway-net:
    external: true  # Assuming MCP Gateway network exists

volumes:
  phoenix-postgres-data:
    driver: local
Option B: SQLite Backend (Development/Testing)
# docker-compose.phoenix-dev.yaml - Lightweight option for development
version: '3.8'

services:
  phoenix:
    image: arizephoenix/phoenix:latest
    container_name: mcp-phoenix-dev
    restart: unless-stopped
    ports:
      - "6006:6006"  # Phoenix UI and OTLP HTTP
      - "4317:4317"  # OTLP gRPC
    environment:
      - PHOENIX_WORKING_DIR=/mnt/data
      - PHOENIX_ENABLE_AUTH=false
      - PHOENIX_LOG_LEVEL=debug
    volumes:
      - phoenix-sqlite-data:/mnt/data
    networks:
      - mcpgateway-net

volumes:
  phoenix-sqlite-data:
    driver: local

networks:
  mcpgateway-net:
    external: true

MCP Gateway Integration Configuration

Update the main docker-compose.yaml to connect with Phoenix:

# Addition to mcpgateway service in main docker-compose.yaml
services:
  mcpgateway:
    # ... existing configuration ...
    environment:
      # Phoenix Observability
      - PHOENIX_ENDPOINT=http://mcp-phoenix:6006
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://mcp-phoenix:4317
      - OTEL_SERVICE_NAME=mcp-gateway
      - OTEL_TRACES_EXPORTER=otlp
      - OTEL_METRICS_EXPORTER=otlp
    networks:
      - mcpgateway-net
    depends_on:
      - phoenix  # If running in same compose file

networks:
  mcpgateway-net:
    driver: bridge

Deployment Commands

# Development: Start Phoenix with SQLite
docker-compose -f docker-compose.phoenix-dev.yaml up -d

# Production: Start Phoenix with PostgreSQL
docker-compose -f docker-compose.phoenix.yaml up -d

# Start both MCP Gateway and Phoenix together
docker-compose -f docker-compose.yaml -f docker-compose.phoenix.yaml up -d

# View logs
docker-compose -f docker-compose.phoenix.yaml logs -f phoenix

# Stop services
docker-compose -f docker-compose.phoenix.yaml down

Kubernetes Deployment

# phoenix-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: phoenix
  namespace: mcp-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: phoenix
  template:
    metadata:
      labels:
        app: phoenix
    spec:
      containers:
      - name: phoenix
        image: arizephoenix/phoenix:v4.0.0
        ports:
        - containerPort: 6006
          name: http
        - containerPort: 4317
          name: grpc
        env:
        - name: PHOENIX_SQL_DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: phoenix-db-secret
              key: connection-string
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
  name: phoenix
  namespace: mcp-gateway
spec:
  selector:
    app: phoenix
  ports:
  - port: 6006
    targetPort: 6006
    name: http
  - port: 4317
    targetPort: 4317
    name: grpc

Cloud Deployment Options

  • Phoenix Cloud: Managed SaaS offering with API key authentication
  • AWS ECS: Container deployment with RDS PostgreSQL backend
  • GCP Cloud Run: Serverless deployment with Cloud SQL
  • Azure Container Instances: With Azure Database for PostgreSQL

5. Benefits

  1. Complete Visibility: End-to-end tracing of MCP requests across all components
  2. Performance Optimization: Identify bottlenecks in tool execution and federation
  3. Quality Assurance: Automatic evaluation of tool outputs and prompt effectiveness
  4. Debugging: Detailed traces for troubleshooting complex multi-tool workflows
  5. Compliance: Audit trail of all MCP operations with user/tenant attribution
  6. Scalability Insights: Understand system behavior under load with distributed tracing

6. Implementation Phases

Phase 1: Basic Tracing (2 weeks)

  • OpenTelemetry setup and configuration
  • Basic span creation for tool invocations
  • Phoenix endpoint integration

Phase 2: Enhanced Observability (2 weeks)

  • Prompt and resource tracking
  • Federation tracing
  • Custom attributes and metadata

Phase 3: Evaluation & Analytics (1 week)

  • LLM evaluation integration
  • Performance baselines
  • Anomaly detection setup

Phase 4: Production Hardening (1 week)

  • Error handling and retry logic
  • Performance optimization
  • Documentation and examples

Technical Requirements

Dependencies

# pyproject.toml additions
opentelemetry-api = "^1.20.0"
opentelemetry-sdk = "^1.20.0"
opentelemetry-exporter-otlp = "^1.20.0"
openinference-instrumentation = "^0.1.0"
arize-phoenix = "^4.0.0"  # Optional for embedded mode

Environment Variables

PHOENIX_ENDPOINT=http://localhost:6006
PHOENIX_API_KEY=<optional-for-cloud>
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_SERVICE_NAME=mcp-gateway
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production

Testing Strategy

  1. Unit Tests: Mock Phoenix client, verify span creation
  2. Integration Tests: Test with local Phoenix instance
  3. Load Tests: Validate performance impact < 5% overhead
  4. E2E Tests: Complete trace verification across federated gateways

Documentation Requirements

  1. Setup Guide: Phoenix deployment and configuration
  2. User Guide: Interpreting traces and metrics
  3. Troubleshooting: Common issues and solutions
  4. Best Practices: Sampling strategies, sensitive data handling

Success Criteria

  • Complete request tracing with < 5% performance impact
  • Tool invocation visibility across all virtual servers
  • Federation trace correlation working
  • Phoenix UI showing meaningful insights
  • Documentation and examples complete
  • All tests passing with 90%+ coverage

Related Issues

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestobservabilityObservability, logging, monitoringpluginspythonPython / backend development (FastAPI)triageIssues / Features awaiting triage

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions