ai-sdk-bench

AI SDK benchmarking tool that tests AI agents with MCP (Model Context Protocol) integration using the Vercel AI Gateway. Automatically discovers and runs all tests in the tests/ directory, verifying LLM-generated Svelte components against test suites.

Installation

To install dependencies:

./scripts/install.sh # installs the correct bun version
bun install

Setup

Configure your Vercel OIDC token using bun.secrets:

Install Vercel CLI if you haven't already
Run bun run vercel:link and link the benchmark to a project that has AI Gateway enabled

Store your VERCEL_OIDC_TOKEN securely:

# Get your token from Vercel project settings
bun run secrets set VERCEL_OIDC_TOKEN your_token_here

Required API Keys

VERCEL_OIDC_TOKEN: Required for Vercel AI Gateway (stored in bun.secrets)
Other API keys (Anthropic, OpenAI, OpenRouter) are configured in the Vercel dashboard when using AI Gateway

Secrets Management

API keys are stored securely using your OS credential manager:

# Check if token is set
bun run secrets

# Set token
bun run secrets set VERCEL_OIDC_TOKEN your_token_here

# Get token
bun run secrets get VERCEL_OIDC_TOKEN

Security Benefits:

Encrypted storage using OS credential manager (Keychain, libsecret, Windows Credential Manager)
No plaintext API keys in files
User-level access control

Usage

To run the benchmark:

bun run index.ts

Interactive CLI

The benchmark features an interactive CLI that will prompt you for configuration:

Model Selection: Choose one or more models from the Vercel AI Gateway
- Select from available models in your configured providers
- Optionally add custom model IDs
- Can test multiple models in a single run
MCP Integration: Choose your MCP configuration
- No MCP Integration: Run without external tools
- MCP over HTTP: Use HTTP-based MCP server (default: https://mcp.svelte.dev/mcp)
- MCP over StdIO: Use local MCP server via command (default: npx -y @sveltejs/mcp)
- Option to provide custom MCP server URL or command
TestComponent Tool: Enable/disable the testing tool for models
- Allows models to run tests during component development
- Enabled by default

Benchmark Workflow

After configuration, the benchmark will:

Discover all tests in tests/ directory
For each selected model and test:
- Run the AI agent with the test's prompt
- Extract the generated Svelte component
- Verify the component against the test suite
Generate a combined report with all results

Results and Reports

Results are saved to the results/ directory with timestamped filenames:

results/result-2024-12-07-14-30-45.json - Full execution trace with all test results
results/result-2024-12-07-14-30-45.html - Interactive HTML report with expandable test sections

The HTML report includes:

Summary bar showing passed/failed/skipped counts
Expandable sections for each test
Step-by-step execution trace
Generated component code
Test verification results with pass/fail details
Token usage statistics
MCP status badge
Dark/light theme toggle

To regenerate an HTML report from a JSON file:

# Regenerate most recent result
bun run generate-report.ts

# Regenerate specific result
bun run generate-report.ts results/result-2024-12-07-14-30-45.json

Test Structure

Each test in the tests/ directory should have:

tests/
  {test-name}/
    Reference.svelte  - Reference implementation (known-good solution)
    test.ts          - Vitest test file (imports "./Component.svelte")
    prompt.md        - Prompt for the AI agent

The benchmark:

Reads the prompt from prompt.md
Asks the agent to generate a component
Writes the generated component to a temporary location
Runs the tests against the generated component
Reports pass/fail status

Verifying Reference Implementations

To verify that all reference implementations pass their tests:

bun run verify-tests

This copies each Reference.svelte to Component.svelte temporarily and runs the tests.

MCP Integration

The tool supports optional integration with MCP (Model Context Protocol) servers through the interactive CLI. When running the benchmark, you'll be prompted to choose:

No MCP Integration: Run without external tools
MCP over HTTP: Connect to an HTTP-based MCP server
- Default: https://mcp.svelte.dev/mcp
- Option to provide a custom URL
MCP over StdIO: Connect to a local MCP server via command
- Default: npx -y @sveltejs/mcp
- Option to provide a custom command

MCP status, transport type, and server configuration are documented in both the JSON metadata and displayed as a badge in the HTML report.

Exit Codes

0: All tests passed
1: One or more tests failed

Documentation

See AGENTS.md for detailed documentation on:

Architecture and components
Environment variables and model configuration
MCP integration details
Development commands
Multi-test result format

This project was created using bun init in bun v1.3.3. Bun is a fast all-in-one JavaScript runtime.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
lib		lib
patches		patches
results		results
scripts		scripts
tests		tests
.bunversion		.bunversion
.cocoignore		.cocoignore
.cocominify		.cocominify
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
bun.lock		bun.lock
generate-report.ts		generate-report.ts
index.ts		index.ts
package.json		package.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
verify-references.ts		verify-references.ts
vitest-setup.js		vitest-setup.js
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ai-sdk-bench

Installation

Setup

Required API Keys

Secrets Management

Usage

Interactive CLI

Benchmark Workflow

Results and Reports

Test Structure

Verifying Reference Implementations

MCP Integration

Exit Codes

Documentation

About

Uh oh!

Releases

Packages

Languages

maxffarrell/ai

Folders and files

Latest commit

History

Repository files navigation

ai-sdk-bench

Installation

Setup

Required API Keys

Secrets Management

Usage

Interactive CLI

Benchmark Workflow

Results and Reports

Test Structure

Verifying Reference Implementations

MCP Integration

Exit Codes

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages