A FastAPI service that provides REST API endpoints for RAGAS evaluation capabilities, allowing other applications to request evaluations programmatically.
This service exposes RAGAS evaluation functionality through HTTP endpoints, enabling:
- Dataset creation and management
- Data insertion and updates
- Batch evaluation of datasets
- Single sample evaluation
- Metric management and configuration
- Result storage and retrieval
- Create, read, update, delete datasets
- Bulk data insertion
- Dataset validation and schema enforcement
- Dataset versioning and history
- Run evaluations with configurable metrics
- Support for both single-turn and multi-turn samples
- Async evaluation for large datasets
- Progress tracking and status updates
- List available metrics
- Configure metric parameters
- Custom metric registration
- Metric performance tracking
- Store and retrieve evaluation results
- Result aggregation and analysis
- Export results in various formats
- Historical result comparison
- Manage LLM configurations
- Embedding model management
- API key management
- Model performance tracking
POST /api/v1/datasetsRequest Body:
{
"name": "string",
"description": "string",
"sample_type": "single_turn" | "multi_turn",
"metadata": {
"source": "string",
"version": "string",
"tags": ["string"]
}
}Response:
{
"dataset_id": "uuid",
"name": "string",
"description": "string",
"sample_type": "string",
"created_at": "datetime",
"updated_at": "datetime",
"sample_count": 0,
"metadata": {}
}GET /api/v1/datasets/{dataset_id}Response:
{
"dataset_id": "uuid",
"name": "string",
"description": "string",
"sample_type": "string",
"created_at": "datetime",
"updated_at": "datetime",
"sample_count": 0,
"metadata": {},
"samples": [
{
"user_input": "string",
"retrieved_contexts": ["string"],
"reference_contexts": ["string"],
"response": "string",
"multi_responses": ["string"],
"reference": "string",
"rubrics": {}
}
]
}GET /api/v1/datasets?page=1&size=10&sample_type=single_turnResponse:
{
"datasets": [
{
"dataset_id": "uuid",
"name": "string",
"description": "string",
"sample_type": "string",
"created_at": "datetime",
"updated_at": "datetime",
"sample_count": 0
}
],
"total": 100,
"page": 1,
"size": 10
}DELETE /api/v1/datasets/{dataset_id}Response:
{
"message": "Dataset deleted successfully",
"dataset_id": "uuid"
}POST /api/v1/datasets/{dataset_id}/samplesRequest Body (Single Turn):
{
"user_input": "What is the capital of France?",
"retrieved_contexts": ["Paris is the capital of France."],
"reference_contexts": ["Paris is the capital and largest city of France."],
"response": "The capital of France is Paris.",
"reference": "Paris",
"rubrics": {
"accuracy": "high",
"completeness": "medium"
}
}Request Body (Multi Turn):
{
"user_input": [
{
"role": "user",
"content": "What is the weather like?"
},
{
"role": "assistant",
"content": "I don't have access to real-time weather data."
},
{
"role": "user",
"content": "Can you check the weather for New York?"
}
],
"response": "I cannot provide real-time weather information.",
"reference": "The assistant should explain it cannot access weather data."
}Response:
{
"sample_id": "uuid",
"dataset_id": "uuid",
"created_at": "datetime"
}POST /api/v1/datasets/{dataset_id}/samples/bulkRequest Body:
{
"samples": [
{
"user_input": "string",
"retrieved_contexts": ["string"],
"response": "string",
"reference": "string"
}
]
}Response:
{
"inserted_count": 10,
"failed_count": 0,
"errors": []
}PUT /api/v1/datasets/{dataset_id}/samples/{sample_id}Request Body: Same as insert single sample
Response:
{
"sample_id": "uuid",
"updated_at": "datetime"
}DELETE /api/v1/datasets/{dataset_id}/samples/{sample_id}Response:
{
"message": "Sample deleted successfully",
"sample_id": "uuid"
}POST /api/v1/evaluate/datasetRequest Body:
{
"dataset_id": "uuid",
"metrics": [
{
"name": "answer_relevancy",
"parameters": {}
},
{
"name": "context_precision",
"parameters": {}
},
{
"name": "faithfulness",
"parameters": {}
},
{
"name": "context_recall",
"parameters": {}
}
],
"llm_config": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "string"
},
"embeddings_config": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "string"
},
"experiment_name": "string",
"batch_size": 10,
"raise_exceptions": false
}Response:
{
"evaluation_id": "uuid",
"status": "running",
"progress": 0.0,
"estimated_completion": "datetime",
"results_url": "/api/v1/evaluations/{evaluation_id}/results"
}GET /api/v1/evaluations/{evaluation_id}Response:
{
"evaluation_id": "uuid",
"status": "completed" | "running" | "failed",
"progress": 0.85,
"started_at": "datetime",
"completed_at": "datetime",
"error_message": "string"
}GET /api/v1/evaluations/{evaluation_id}/resultsResponse:
{
"evaluation_id": "uuid",
"dataset_id": "uuid",
"experiment_name": "string",
"metrics": {
"answer_relevancy": 0.874,
"context_precision": 0.817,
"faithfulness": 0.892,
"context_recall": 0.756
},
"sample_scores": [
{
"sample_id": "uuid",
"answer_relevancy": 0.9,
"context_precision": 0.8,
"faithfulness": 0.85,
"context_recall": 0.7
}
],
"cost_analysis": {
"total_tokens": 15000,
"total_cost": 0.045,
"currency": "USD"
},
"traces": [
{
"sample_id": "uuid",
"trace_url": "string"
}
],
"created_at": "datetime"
}POST /api/v1/evaluate/singleRequest Body:
{
"sample": {
"user_input": "What is the capital of France?",
"retrieved_contexts": ["Paris is the capital of France."],
"response": "The capital of France is Paris.",
"reference": "Paris"
},
"metrics": [
{
"name": "answer_relevancy",
"parameters": {}
},
{
"name": "faithfulness",
"parameters": {}
}
],
"llm_config": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "string"
}
}Response:
{
"sample_id": "uuid",
"scores": {
"answer_relevancy": 0.9,
"faithfulness": 0.85
},
"reasoning": {
"answer_relevancy": "The response directly answers the question about France's capital.",
"faithfulness": "The response is consistent with the provided context."
},
"cost": {
"tokens": 150,
"cost": 0.00045,
"currency": "USD"
}
}GET /api/v1/metricsResponse:
{
"metrics": [
{
"name": "answer_relevancy",
"description": "Measures how relevant the answer is to the question",
"type": "llm_based",
"supported_sample_types": ["single_turn"],
"parameters": {
"llm_required": true,
"embeddings_required": false
}
},
{
"name": "context_precision",
"description": "Measures the precision of retrieved contexts",
"type": "embedding_based",
"supported_sample_types": ["single_turn"],
"parameters": {
"llm_required": false,
"embeddings_required": true
}
}
]
}GET /api/v1/metrics/{metric_name}Response:
{
"name": "answer_relevancy",
"description": "Measures how relevant the answer is to the question",
"type": "llm_based",
"supported_sample_types": ["single_turn"],
"parameters": {
"llm_required": true,
"embeddings_required": false
},
"default_config": {
"llm": "gpt-4o",
"embeddings": null
},
"example_usage": {
"sample": {
"user_input": "What is the capital of France?",
"response": "The capital of France is Paris."
},
"expected_score": 0.9
}
}GET /api/v1/config/llm-providersResponse:
{
"providers": [
{
"name": "openai",
"models": [
{
"name": "gpt-4o",
"max_tokens": 4096,
"cost_per_1k_tokens": 0.03
},
{
"name": "gpt-3.5-turbo",
"max_tokens": 4096,
"cost_per_1k_tokens": 0.002
}
]
},
{
"name": "anthropic",
"models": [
{
"name": "claude-3-opus",
"max_tokens": 4096,
"cost_per_1k_tokens": 0.015
}
]
}
]
}GET /api/v1/config/embedding-providersResponse:
{
"providers": [
{
"name": "openai",
"models": [
{
"name": "text-embedding-3-small",
"dimensions": 1536,
"cost_per_1k_tokens": 0.00002
}
]
}
]
}GET /api/v1/evaluations?dataset_id=uuid&page=1&size=10Response:
{
"evaluations": [
{
"evaluation_id": "uuid",
"dataset_id": "uuid",
"dataset_name": "string",
"experiment_name": "string",
"status": "completed",
"metrics_count": 4,
"created_at": "datetime",
"completed_at": "datetime"
}
],
"total": 50,
"page": 1,
"size": 10
}GET /api/v1/evaluations/{evaluation_id}/export?format=csvResponse: File download (CSV, JSON, or Excel)
POST /api/v1/evaluations/compareRequest Body:
{
"evaluation_ids": ["uuid1", "uuid2", "uuid3"],
"metrics": ["answer_relevancy", "faithfulness"]
}Response:
{
"comparison": {
"evaluation_ids": ["uuid1", "uuid2", "uuid3"],
"metrics": {
"answer_relevancy": {
"uuid1": 0.874,
"uuid2": 0.892,
"uuid3": 0.856
},
"faithfulness": {
"uuid1": 0.892,
"uuid2": 0.901,
"uuid3": 0.878
}
},
"improvements": {
"uuid1_to_uuid2": {
"answer_relevancy": "+0.018",
"faithfulness": "+0.009"
}
}
}
}- API key management
- User/tenant management
- Rate limiting
- Access control for datasets and evaluations
- Dataset persistence (database + file storage)
- Evaluation result storage
- Backup and archival
- Data versioning
- Async evaluation job processing
- Progress tracking
- Job scheduling and retry logic
- Resource management
- Request/response logging
- Performance metrics
- Error tracking and alerting
- Usage analytics
- LLM response caching
- Embedding caching
- Evaluation result caching
- Configuration caching
- Evaluation completion notifications
- Error alerts
- Progress updates
- Webhook support
All endpoints return consistent error responses:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid request parameters",
"details": {
"field": "user_input",
"issue": "Field is required"
}
},
"timestamp": "datetime",
"request_id": "uuid"
}Common error codes:
VALIDATION_ERROR: Invalid request parametersDATASET_NOT_FOUND: Dataset doesn't existEVALUATION_NOT_FOUND: Evaluation doesn't existMETRIC_NOT_SUPPORTED: Metric not availableLLM_ERROR: LLM API errorRATE_LIMIT_EXCEEDED: Too many requestsINTERNAL_ERROR: Server error
- Dataset operations: 100 requests/minute
- Evaluation requests: 10 requests/minute
- Single sample evaluation: 50 requests/minute
- Configuration operations: 200 requests/minute
All endpoints require API key authentication:
Authorization: Bearer your-api-key-hereConfigure webhooks for evaluation completion:
POST /api/v1/webhooksRequest Body:
{
"url": "https://your-app.com/webhook",
"events": ["evaluation.completed", "evaluation.failed"],
"secret": "webhook-secret"
}docker build -t ragas-service .
docker run -p 8000:8000 ragas-serviceDATABASE_URL=postgresql://user:pass@localhost/ragas
REDIS_URL=redis://localhost:6379
OPENAI_API_KEY=your-key
LANGCHAIN_API_KEY=your-key
ANTHROPIC_API_KEY=your-key
JWT_SECRET=your-secretGET /healthResponse:
{
"status": "healthy",
"services": {
"database": "connected",
"redis": "connected",
"llm_providers": "available"
},
"timestamp": "datetime"
}