Deploy with Cloud Run

You can deploy Genkit flows as HTTPS endpoints using Cloud Run. Cloud Run has several deployment options, including container based deployment; this page explains how to deploy your flows directly from code.

Before you begin

Install the Google Cloud CLI.
You should be familiar with Genkit’s concept of flows, and how to write them. This page assumes that you already have flows that you want to deploy.
It would be helpful, but not required, if you’ve already used Google Cloud and Cloud Run before.

1. Set up a Google Cloud project

If you don’t already have a Google Cloud project set up, follow these steps:

Create a new Google Cloud project using the Cloud console or choose an existing one.
Link the project to a billing account, which is required for Cloud Run.
Configure the Google Cloud CLI to use your project:
Terminal window
```
gcloud init
```

2. Prepare your Node project for deployment

For your flows to be deployable, you will need to make some small changes to your project code:

Add start and build scripts to package.json

When deploying a Node.js project to Cloud Run, the deployment tools expect your project to have a start script and, optionally, a build script. For a typical TypeScript project, the following scripts are usually adequate:

"scripts": {
  "start": "node lib/index.js",
  "build": "tsc"
},

Add code to configure and start the flow server

In the file that’s run by your start script, add a call to startFlowServer. This method will start an Express server set up to serve your flows as web endpoints.

When you make the call, specify the flows you want to serve:

There is also:

import { startFlowServer } from '@genkit-ai/express';

startFlowServer({
  flows: [menuSuggestionFlow],
});

There are also some optional parameters you can specify:

port: the network port to listen on. If unspecified, the server listens on the port defined in the PORT environment variable, and if PORT is not set, defaults to 3400.
cors: the flow server’s CORS policy. If you will be accessing these endpoints from a web application, you likely need to specify this.
pathPrefix: an optional path prefix to add before your flow endpoints.
jsonParserOptions: options to pass to Express’s JSON body parser

Optional: Define an authorization policy

All deployed flows should require some form of authorization; otherwise, your potentially-expensive generative AI flows would be invocable by anyone.

When you deploy your flows with Cloud Run, you have two options for authorization:

Cloud IAM-based authorization: Use Google Cloud’s native access management facilities to gate access to your endpoints. For information on providing these credentials, see Authentication in the Cloud Run docs.
Authorization policy defined in code: Use the authorization policy feature of the Genkit express plugin to verify authorization info using custom code. This is often, but not necessarily, token-based authorization.

If you want to define an authorization policy in code, use the authPolicy parameter in the flow definition:

// middleware for handling auth tokens in headers.
const authMiddleware = async (req, resp, next) => {
  // parse auth headers and convert to auth object.
  (req as RequestWithAuth).auth = {
    user: await verifyAuthToken(req.header('authorization')),
  };
  next();
};

app.post(
  '/simpleFlow',
  authMiddleware,
  expressHandler(simpleFlow, {
    authPolicy: ({ auth }) => {
      if (!auth.user) {
        throw new Error('not authorized');
      }
    },
  }),
);

The auth parameter of the authorization policy comes from the auth property of the request object. You typically set this property using Express middleware. See Authorization and integrity.

Refer to express plugin documentation for more details.

Make API credentials available to deployed flows

Once deployed, your flows need some way to authenticate with any remote services they rely on. Most flows will at a minimum need credentials for accessing the model API service they use.

For this example, do one of the following, depending on the model provider you chose:

Generate an API key for the Gemini API using Google AI Studio.
Make the API key available in the Cloud Run environment:
1. In the Cloud console, enable the Secret Manager API.
2. On the Secret Manager page, create a new secret containing your API key.
3. After you create the secret, on the same page, grant your default compute service account access to the secret with the Secret Manager Secret Accessor role. (You can look up the name of the default compute service account on the IAM page.)
In a later step, when you deploy your service, you will need to reference the name of this secret.

The only secret you need to set up for this tutorial is for the model provider, but in general, you must do something similar for each service your flow uses.

3. Deploy flows to Cloud Run

After you’ve prepared your project for deployment, you can deploy it using the gcloud tool.

gcloud run deploy --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest

gcloud run deploy

The deployment tool will prompt you for any information it requires.

When asked if you want to allow unauthenticated invocations:

Answer Y if you’re not using IAM and have instead defined an authorization policy in code.
Answer N to configure your service to require IAM credentials.

Optional: Try the deployed flow

After deployment finishes, the tool will print the service URL. You can test it with curl:

curl -X POST https://<service-url>/menuSuggestionFlow \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" -d '{"data": "banana"}'

You can deploy Genkit flows as web services using Cloud Run. This page, as an example, walks you through the process of deploying the default sample flow.

Install the Google Cloud CLI if you haven’t already.
Create a new Google Cloud project using the Cloud console or choose an existing one. The project must be linked to a billing account.

After you create or choose a project, configure the Google Cloud CLI to use it:
Terminal window
```
gcloud auth login

gcloud init
```
Create a directory for the Genkit sample project:
Terminal window
```
mkdir -p ~/tmp/genkit-cloud-project

cd ~/tmp/genkit-cloud-project
```
If you’re going to use an IDE, open it to this directory.

Initialize a Go module in your project directory:

go mod init example/cloudrun

go mod get github.com/firebase/genkit/go

Create a sample app using Genkit:

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"

    "github.com/firebase/genkit/go/ai"
    "github.com/firebase/genkit/go/genkit"
    "github.com/firebase/genkit/go/plugins/googlegenai"
    "github.com/firebase/genkit/go/plugins/server"
)

func main() {
    ctx := context.Background()

    // Initialize Genkit with the Google AI plugin and Gemini 2.5 Flash.
    // Alternatively, use &googlegenai.VertexAI{} and "vertexai/gemini-2.5-flash"
    // to use Vertex AI as the provider instead.
    g := genkit.Init(ctx,
        genkit.WithPlugins(&googlegenai.GoogleAI{}),
        genkit.WithDefaultModel("googleai/gemini-2.5-flash"),
    )

    flow := genkit.DefineFlow(g, "jokesFlow", func(ctx context.Context, topic string) (string, error) {
        resp, err := genkit.Generate(ctx, g,
            ai.WithPrompt(`Tell a short joke about %s. Be creative!`, topic),
        )
        if err != nil {
            return "", fmt.Errorf("failed to generate joke: %w", err)
        }

        return resp.Text(), nil
    })

    mux := http.NewServeMux()
    mux.HandleFunc("POST /jokesFlow", genkit.Handler(flow))
    log.Fatal(server.Start(ctx, "0.0.0.0:"+os.Getenv("PORT"), mux))
}

Make API credentials available to your deployed function. Choose which credentials you need based on your choice in the sample above:

Gemini (Google AI)
1. Make sure Google AI is available in your region.
2. Generate an API key for the Gemini API using Google AI Studio.
3. Make the API key available in the Cloud Run environment:
  1. In the Cloud console, enable the Secret Manager API.
  2. On the Secret Manager page, create a new secret containing your API key.
  3. After you create the secret, on the same page, grant your default compute service account access to the secret with the Secret Manager Secret Accessor role. (You can look up the name of the default compute service account on the IAM page.)
  In a later step, when you deploy your service, you will need to reference the name of this secret.
Gemini (Vertex AI)
1. In the Cloud console, Enable the Vertex AI API for your project.
2. On the IAM page, ensure that the Default compute service account is granted the Vertex AI User role.
The only secret you need to set up for this tutorial is for the model provider, but in general, you must do something similar for each service your flow uses.
Optional: Try your flow in the developer UI:
1. Set up your local environment for the model provider you chose:
  
  Gemini (Google AI)
  Terminal window
```
export GEMINI_API_KEY=<your API key>
```
  Gemini (Vertex AI)
  Terminal window
```
export GOOGLE_CLOUD_PROJECT=<your project ID>

export GOOGLE_CLOUD_LOCATION=us-central1

gcloud auth application-default login
```
2. Start the UI:
  Terminal window
```
genkit start -- go run .
```
3. In the developer UI (http://localhost:4000/), run the flow:
  1. Click jokesFlow.
  2. On the Input JSON tab, provide a subject for the model:
    "bananas"
  3. Click Run.
If everything’s working as expected so far, you can build and deploy the flow:

Gemini (Google AI)
Terminal window
```
gcloud run deploy --port 3400 \
  --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest
```
Gemini (Vertex AI)
Terminal window
```
gcloud run deploy --port 3400 \
  --set-env-vars GOOGLE_CLOUD_PROJECT=<your-gcloud-project> \
  --set-env-vars GOOGLE_CLOUD_LOCATION=us-central1
```
(GOOGLE_CLOUD_LOCATION configures the Vertex API region you want to use.)

Choose N when asked if you want to allow unauthenticated invocations. Answering N will configure your service to require IAM credentials. See Authentication in the Cloud Run docs for information on providing these credentials.

After deployment finishes, the tool will print the service URL. You can test it with curl:

curl -X POST https://<service-url>/jokesFlow \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" -d '{"data": "bananas"}'

You can deploy Genkit flows as HTTPS endpoints using Cloud Run. This page walks you through deploying a FastAPI-based Genkit application to Cloud Run with automatic scaling and containerization.

Before you begin

Install the Google Cloud CLI.
You should be familiar with Genkit’s concept of flows and how to write them.
It would be helpful, but not required, if you’ve already used Google Cloud and Cloud Run before.

1. Set up a Google Cloud project

If you don’t already have a Google Cloud project set up, follow these steps:

Create a new Google Cloud project using the Cloud console or choose an existing one.
Link the project to a billing account, which is required for Cloud Run.
Configure the Google Cloud CLI to use your project:
Terminal window
```
gcloud init
```

2. Prepare your Python project for deployment

Initialize your project with uv

Create a new project or navigate to your existing project:

# Create project directory
mkdir genkit-cloudrun
cd genkit-cloudrun

# Initialize with uv
uv init

# Add dependencies
uv add genkit genkit-plugin-google-genai fastapi uvicorn slowapi

Create your FastAPI application with Genkit

Genkit flows work seamlessly with FastAPI as they’re both built on ASGI standards. Create a main.py file:

import os
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

# Initialize Genkit
ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.5-flash',
)

# Define input/output schemas
class JokeRequest(BaseModel):
    """Request schema for joke generation."""
    topic: str = Field(description="Topic for the joke", min_length=1)

class JokeResponse(BaseModel):
    """Response schema for joke generation."""
    joke: str
    topic: str

class SummaryRequest(BaseModel):
    """Request schema for text summarization."""
    text: str = Field(description="Text to summarize", min_length=10)

# Application lifespan for startup/shutdown
@asynccontextmanager
async def lifespan(app: FastAPI):
    """Manage application lifespan."""
    print("🚀 Starting Genkit Cloud Run service")
    yield
    print("👋 Shutting down Genkit Cloud Run service")

# Create FastAPI app
app = FastAPI(
    title="Genkit Cloud Run Service",
    description="AI-powered API with Genkit and FastAPI",
    version="1.0.0",
    lifespan=lifespan,
)

# Health check endpoint
@app.get("/")
async def root():
    """Root endpoint with service info."""
    return {
        "service": "Genkit Cloud Run",
        "status": "running",
        "docs": "/docs"
    }

@app.get("/health")
async def health_check():
    """Health check endpoint for Cloud Run."""
    return {"status": "healthy"}

# Define Genkit flow
@ai.flow()
async def joke_flow(topic: str) -> str:
    """Generate a joke about the given topic.

    Args:
        topic: The topic for the joke.

    Returns:
        A funny joke about the topic.
    """
    response = await ai.generate(
        prompt=f'Tell a short, funny joke about {topic}. Be creative!',
    )
    return response.text

# FastAPI endpoint that uses the flow
@app.post("/joke", response_model=JokeResponse)
async def generate_joke(request: JokeRequest):
    """Generate a joke via REST API.

    Args:
        request: The joke request with topic.

    Returns:
        The generated joke.
    """
    try:
        joke = await joke_flow(request.topic)
        return JokeResponse(joke=joke, topic=request.topic)
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Failed to generate joke: {str(e)}")

@ai.flow()
async def summarize_flow(text: str) -> str:
    """Summarize the provided text.

    Args:
        text: The text to summarize.

    Returns:
        A concise summary.
    """
    response = await ai.generate(
        prompt=f'Summarize the following text in 2-3 sentences:\n\n{text}',
    )
    return response.text

@app.post("/summarize")
async def summarize_text(request: SummaryRequest):
    """Summarize text via REST API.

    Args:
        request: The text to summarize.

    Returns:
        The summary.
    """
    try:
        summary = await summarize_flow(request.text)
        return {"summary": summary}
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Failed to summarize: {str(e)}")

if __name__ == "__main__":
    import uvicorn
    port = int(os.environ.get("PORT", 8080))
    uvicorn.run(app, host="0.0.0.0", port=port)

Optional: Add authorization

All deployed flows should require some form of authorization. You have two options:

Cloud IAM-based authorization: Use Google Cloud’s native access management to gate access to your endpoints. See Authentication in the Cloud Run docs.

Custom authorization with FastAPI: Use FastAPI’s dependency injection for JWT auth:

from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt

security = HTTPBearer()

async def verify_token(
    credentials: HTTPAuthorizationCredentials = Depends(security)
) -> dict:
    """Verify JWT token and return user info.

    Args:
        credentials: HTTP authorization credentials.

    Returns:
        User information from token.

    Raises:
        HTTPException: If token is invalid.
    """
    try:
        token = credentials.credentials
        # Replace with your actual token verification
        payload = jwt.decode(token, "your-secret-key", algorithms=["HS256"])
        return {
            "user_id": payload.get("user_id"),
            "email": payload.get("email"),
        }
    except jwt.InvalidTokenError:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid authentication credentials",
            headers={"WWW-Authenticate": "Bearer"},
        )

@app.post("/protected-joke", response_model=JokeResponse)
async def protected_generate_joke(
    request: JokeRequest,
    user: dict = Depends(verify_token)
):
    """Generate a joke with authentication required.

    Args:
        request: The joke request.
        user: Authenticated user information.

    Returns:
        The generated joke.
    """
    joke = await joke_flow(request.topic)
    return JokeResponse(joke=joke, topic=request.topic)

Create a Dockerfile for Cloud Run

Create a Dockerfile for containerized deployment:

FROM python:3.11-slim

WORKDIR /app

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# Copy dependency files
COPY pyproject.toml uv.lock* ./

# Install dependencies
RUN uv sync --frozen --no-dev

# Copy application code
COPY . .

# Expose port
EXPOSE 8080

# Run with uvicorn
CMD ["uv", "run", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Create .dockerignore

Create a .dockerignore file to exclude unnecessary files:

__pycache__
*.pyc
*.pyo
*.pyd
.Python
.venv
.uv
.git
.gitignore
*.md
.DS_Store

Make API credentials available to deployed flows

Gemini (Google AI)

Generate an API key for the Gemini API using Google AI Studio.
Store the API key in Secret Manager:
1. Enable the Secret Manager API.
2. Create a new secret containing your API key on the Secret Manager page.
3. Grant your default compute service account the Secret Manager Secret Accessor role.

Gemini (Vertex AI)

Enable the Vertex AI API for your project.
On the IAM page, ensure the Default compute service account has the Vertex AI User role.

3. Deploy to Cloud Run

Deploy your application using the gcloud tool. Cloud Run will automatically build your container using the Dockerfile.

Gemini (Google AI)

gcloud run deploy genkit-service \
  --source . \
  --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest \
  --allow-unauthenticated

Gemini (Vertex AI)

gcloud run deploy genkit-service \
  --source . \
  --set-env-vars GOOGLE_CLOUD_PROJECT=<your-project-id> \
  --set-env-vars GOOGLE_CLOUD_LOCATION=us-central1 \
  --allow-unauthenticated

When asked if you want to allow unauthenticated invocations:

Answer Y if you’re using custom authorization in code.
Answer N to require IAM credentials (omit --allow-unauthenticated flag).

Alternative: Deploy with existing container

If you prefer to build and push the container separately:

# Build and push to Artifact Registry
gcloud builds submit --tag gcr.io/<your-project-id>/genkit-service

# Deploy the container
gcloud run deploy genkit-service \
  --image gcr.io/<your-project-id>/genkit-service \
  --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest

4. Test the deployed flow

After deployment, the tool will print the service URL. Test your endpoints:

# Save the service URL
SERVICE_URL="https://<service-url>"

# Test health endpoint
curl $SERVICE_URL/health

# Test joke generation
curl -X POST $SERVICE_URL/joke \
  -H "Content-Type: application/json" \
  -d '{"topic": "programming"}'

# With IAM authentication (if required)
curl -X POST $SERVICE_URL/joke \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -d '{"topic": "artificial intelligence"}'

# Test summarization
curl -X POST $SERVICE_URL/summarize \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -d '{"text": "Cloud Run is a fully managed compute platform that automatically scales your stateless containers. It abstracts away infrastructure management so you can focus on building applications."}'

5. View automatic API documentation

FastAPI automatically generates interactive API documentation. After deployment, visit:

Swagger UI: https://<service-url>/docs
ReDoc: https://<service-url>/redoc

These provide interactive documentation where you can test your endpoints directly in the browser.

Advanced Configuration

Environment Variables

Set additional environment variables for your deployment:

gcloud run deploy genkit-service \
  --source . \
  --set-env-vars LOG_LEVEL=info \
  --set-env-vars MAX_WORKERS=4 \
  --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest

Resource Limits

Configure CPU and memory allocation:

gcloud run deploy genkit-service \
  --source . \
  --cpu 2 \
  --memory 2Gi \
  --max-instances 10 \
  --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest

Custom Domains

Add a custom domain to your Cloud Run service:

# Map your domain
gcloud run domain-mappings create \
  --service genkit-service \
  --domain api.yourdomain.com

Monitoring and Logging

View logs in Cloud Console or using gcloud:

# Stream logs
gcloud run logs tail genkit-service --follow

# View recent logs
gcloud run logs read genkit-service --limit 50

Production Best Practices

1. Use Structured Logging

import logging
import json

logging.basicConfig(
    level=logging.INFO,
    format='%(message)s'
)
logger = logging.getLogger(__name__)

@app.post("/joke")
async def generate_joke(request: JokeRequest):
    logger.info(json.dumps({
        "event": "joke_request",
        "topic": request.topic
    }))
    joke = await joke_flow(request.topic)
    logger.info(json.dumps({
        "event": "joke_generated",
        "topic": request.topic,
        "length": len(joke)
    }))
    return JokeResponse(joke=joke, topic=request.topic)

2. Add Request Validation

from fastapi import Request
import time

@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
    """Add processing time to response headers."""
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time
    response.headers["X-Process-Time"] = str(process_time)
    return response

3. Implement Rate Limiting

Use Cloud Armor or implement rate limiting in your application:

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.post("/joke")
@limiter.limit("10/minute")
async def generate_joke(request: Request, joke_request: JokeRequest):
    joke = await joke_flow(joke_request.topic)
    return JokeResponse(joke=joke, topic=joke_request.topic)

4. Enable CORS for Web Applications

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_credentials=True,
    allow_methods=["POST", "GET"],
    allow_headers=["*"],
)

Next Steps

Learn about FastAPI integration for more advanced patterns
Explore authorization options for securing your endpoints
Set up observability to monitor your deployed flows