Execute a Cloud Run job using Workflows

Workflows enables you to execute Cloud Run jobs as part of a workflow to perform more complex data processing or orchestrate a system of existing jobs.

This tutorial demonstrates how to use Workflows to execute a Cloud Run job that processes data passed as environment variables to the job, in response to an event from Cloud Storage.

Note that you can also store the event data in a Cloud Storage bucket which allows you to encrypt the data using customer-managed encryption keys. For more information, see Execute a Cloud Run job that processes event data saved in Cloud Storage.

Create a Cloud Run job

This tutorial uses a sample Cloud Run job from GitHub. The job reads data from an input file in Cloud Storage, and performs some arbitrary processing for each line in the file.

  1. Get the sample code by cloning the sample app repository to your local machine:

    git clone https://github.com/GoogleCloudPlatform/jobs-demos.git

    Alternatively, you can download the sample as a ZIP file and extract it.

  2. Change to the directory that contains the sample code:

    cd jobs-demos/parallel-processing
  3. Create a Cloud Storage bucket to store an input file that can be written to and trigger an event:

    Console

    1. In the Google Cloud console, go to the Cloud Storage Buckets page.

      Go to Buckets

    2. Click add Create.
    3. On the Create a bucket page, enter a name for your bucket:
      input-PROJECT_ID
      Replace PROJECT_ID with the ID of your Google Cloud project.
    4. Retain the other defaults.
    5. Click Create.

    gcloud

    Run the gcloud storage buckets create command:

    gcloud storage buckets create gs://input-PROJECT_ID

    If the request is successful, the command returns the following message:

    Creating gs://input-PROJECT_ID/...

    Terraform

    To create a Cloud Storage bucket, use the google_storage_bucket resource and modify your main.tf file as shown in the following sample.

    To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

    Note that in a typical Terraform workflow, you apply the entire plan at once. However, for the purposes of this tutorial, you can target a specific resource. For example:

    terraform apply -target="random_id.bucket_name_suffix"
    and
    terraform apply -target="google_storage_bucket.default"

    # Cloud Storage bucket names must be globally unique
    resource "random_id" "bucket_name_suffix" {
      byte_length = 4
    }
    
    # Create a Cloud Storage bucket
    resource "google_storage_bucket" "default" {
      name                        = "input-${data.google_project.project.name}-${random_id.bucket_name_suffix.hex}"
      location                    = "us-central1"
      storage_class               = "STANDARD"
      force_destroy               = false
      uniform_bucket_level_access = true
    }
  4. Create an Artifact Registry standard repository where you can store your container image:

    Console

    1. In the Google Cloud console, go to the Artifact Registry Repositories page:

      Go to Repositories

    2. Click Create repository.

    3. Enter a name for the repository—for example, my-repo. For each repository location in a project, repository names must be unique.

    4. Retain the default format which should be Docker.

    5. Retain the default mode which should be Standard.

    6. For the region, select us-central1 (Iowa).

    7. Retain all the other defaults.

    8. Click Create.

    gcloud

    Run the command:

    gcloud artifacts repositories create REPOSITORY \
        --repository-format=docker \
        --location=us-central1

    Replace REPOSITORY with a unique name for the repository—for example, my-repo. For each repository location in a project, repository names must be unique.

    Terraform

    To create an Artifact Registry repository, use the google_artifact_registry_repository resource and modify your main.tf file as shown in the following sample.

    Note that in a typical Terraform workflow, you apply the entire plan at once. However, for the purposes of this tutorial, you can target a specific resource. For example:

    terraform apply -target="google_artifact_registry_repository.default"

    # Create an Artifact Registry repository
    resource "google_artifact_registry_repository" "default" {
      location      = "us-central1"
      repository_id = "my-repo"
      format        = "docker"
    }
  5. Build the container image using a default Google Cloud buildpack:

    export SERVICE_NAME=parallel-job
    gcloud builds submit \
        --pack image=us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/${SERVICE_NAME}

    Replace REPOSITORY with the name of your Artifact Registry repository.

    It can take a couple of minutes for the build to complete.

  6. Create a Cloud Run job that deploys the container image:

    Console

    1. In the Google Cloud console, go to the Cloud Run page:

      Go to Cloud Run

    2. Click Create job to display the Create job form.

      1. In the form, select us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/parallel-job:latest as the Artifact Registry container image URL.
      2. Optional: For the job name, enter parallel-job.
      3. Optional: For the region, select us-central1 (Iowa).
      4. For the number of tasks that you want to run in the job, enter 10. All of the tasks must succeed for the job to succeed. By default, the tasks execute in parallel.
    3. Expand the Container, Variables & Secrets, Connections, Security section and retain all the defaults with the exception of the following settings:

      1. Click the General tab.

        1. For the container command, enter python.
        2. For the container argument, enter process.py.
      2. Click the Variables & Secrets tab.

        1. Click Add variable, and enter INPUT_BUCKET for the name and input-PROJECT_ID for the value.
        2. Click Add variable, and enter INPUT_FILE for the name and input_file.txt for the value.
    4. To create the job, click Create.

    gcloud

    1. Set the default Cloud Run region:

      gcloud config set run/region us-central1
    2. Create the Cloud Run job:

      gcloud run jobs create parallel-job \
          --image us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/parallel-job:latest \
          --command python \
          --args process.py \
          --tasks 10 \
          --set-env-vars=INPUT_BUCKET=input-PROJECT_ID,INPUT_FILE=input_file.txt

      Note that if you don't specify an image tag, Artifact Registry looks for the image with the default latest tag.

      For a full list of available options when creating a job, refer to the gcloud run jobs create command line documentation.

      Once the job is created, you should see a message that indicates success.

    Terraform

    To create a Cloud Run job, use the google_cloud_run_v2_job resource and modify your main.tf file as shown in the following sample.

    Note that in a typical Terraform workflow, you apply the entire plan at once. However, for the purposes of this tutorial, you can target a specific resource. For example:

    terraform apply -target="google_cloud_run_v2_job.default"

    # Create a Cloud Run job
    resource "google_cloud_run_v2_job" "default" {
      name     = "parallel-job"
      location = "us-central1"
    
      template {
        task_count = 10
        template {
          containers {
            image   = "us-central1-docker.pkg.dev/${data.google_project.project.name}/${google_artifact_registry_repository.default.repository_id}/parallel-job:latest"
            command = ["python"]
            args    = ["process.py"]
            env {
              name  = "INPUT_BUCKET"
              value = google_storage_bucket.default.name
            }
            env {
              name  = "INPUT_FILE"
              value = "input_file.txt"
            }
          }
        }
      }
    }

Deploy a workflow that executes the Cloud Run job

Define and deploy a workflow that executes the Cloud Run job you just created. A workflow definition is made up of a series of steps described using the Workflows syntax.

Console

  1. In the Google Cloud console, go to the Workflows page:

    Go to Workflows

  2. Click Create.

  3. Enter a name for the new workflow, such as cloud-run-job-workflow.

  4. For the region, select us-central1 (Iowa).

  5. In the Service account field, select the service account you created earlier.

    The service account serves as the workflow's identity. You should have already granted the Cloud Run Admin role to the service account so that the workflow can execute the Cloud Run job.

  6. Click Next.

  7. In the workflow editor, enter the following definition for your workflow:

    main:
        params: [event]
        steps:
            - init:
                assign:
                    - project_id: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                    - event_bucket: ${event.data.bucket}
                    - event_file: ${event.data.name}
                    - target_bucket: ${"input-" + project_id}
                    - job_name: parallel-job
                    - job_location: us-central1
            - check_input_file:
                switch:
                    - condition: ${event_bucket == target_bucket}
                      next: run_job
                    - condition: true
                      next: end
            - run_job:
                call: googleapis.run.v1.namespaces.jobs.run
                args:
                    name: ${"namespaces/" + project_id + "/jobs/" + job_name}
                    location: ${job_location}
                    body:
                        overrides:
                            containerOverrides:
                                env:
                                    - name: INPUT_BUCKET
                                      value: ${event_bucket}
                                    - name: INPUT_FILE
                                      value: ${event_file}
                result: job_execution
            - finish:
                return: ${job_execution}
  8. Click Deploy.

gcloud

  1. Create a source code file for your workflow:

    touch cloud-run-job-workflow.yaml
  2. Copy the following workflow definition to your source code file:

    main:
        params: [event]
        steps:
            - init:
                assign:
                    - project_id: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                    - event_bucket: ${event.data.bucket}
                    - event_file: ${event.data.name}
                    - target_bucket: ${"input-" + project_id}
                    - job_name: parallel-job
                    - job_location: us-central1
            - check_input_file:
                switch:
                    - condition: ${event_bucket == target_bucket}
                      next: run_job
                    - condition: true
                      next: end
            - run_job:
                call: googleapis.run.v1.namespaces.jobs.run
                args:
                    name: ${"namespaces/" + project_id + "/jobs/" + job_name}
                    location: ${job_location}
                    body:
                        overrides:
                            containerOverrides:
                                env:
                                    - name: INPUT_BUCKET
                                      value: ${event_bucket}
                                    - name: INPUT_FILE
                                      value: ${event_file}
                result: job_execution
            - finish:
                return: ${job_execution}
  3. Deploy the workflow by entering the following command:

    gcloud workflows deploy cloud-run-job-workflow \
        --location=us-central1 \
        --source=cloud-run-job-workflow.yaml \
        --service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.

    Replace the following:

    • SERVICE_ACCOUNT_NAME: the name of the service account you created earlier
    • PROJECT_ID: the ID of your Google Cloud project

    The service account serves as the workflow's identity. You should have already granted the roles/run.admin role to the service account so that the workflow can execute the Cloud Run job.

Terraform

To create a workflow, use the google_workflows_workflow resource and modify your main.tf file as shown in the following sample.

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Note that in a typical Terraform workflow, you apply the entire plan at once. However, for the purposes of this tutorial, you can target a specific resource. For example:

terraform apply -target="google_workflows_workflow.default"

# Create a workflow
resource "google_workflows_workflow" "default" {
  name        = "cloud-run-job-workflow"
  region      = "us-central1"
  description = "Workflow that routes a Cloud Storage event and executes a Cloud Run job"

  deletion_protection = false # set to "true" in production

  # Note that $$ is needed for Terraform
  source_contents = <<EOF
  main:
      params: [event]
      steps:
          - init:
              assign:
                  - project_id: $${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                  - event_bucket: $${event.data.bucket}
                  - event_file: $${event.data.name}
                  - target_bucket: "${google_storage_bucket.default.name}"
                  - job_name: parallel-job
                  - job_location: us-central1
          - check_input_file:
              switch:
                  - condition: $${event_bucket == target_bucket}
                    next: run_job
                  - condition: true
                    next: end
          - run_job:
              call: googleapis.run.v1.namespaces.jobs.run
              args:
                  name: $${"namespaces/" + project_id + "/jobs/" + job_name}
                  location: $${job_location}
                  body:
                      overrides:
                          containerOverrides:
                              env:
                                  - name: INPUT_BUCKET
                                    value: $${event_bucket}
                                  - name: INPUT_FILE
                                    value: $${event_file}
              result: job_execution
          - finish:
              return: $${job_execution}
  EOF
}

The workflow does the following:

  1. init step—Accepts a Cloud Storage event as an argument and then sets necessary variables.

  2. check_input_file step—Checks if the Cloud Storage bucket specified in the event is the bucket used by the Cloud Run job.

    • If yes, the workflow proceeds to the run_job step.
    • If no, the workflow terminates, halting any further processing.
  3. run_job step—Uses the Cloud Run Admin API connector's googleapis.run.v1.namespaces.jobs.run method to execute the job. The Cloud Storage bucket and data file names are passed as override variables from the workflow to the job.

  4. finish step—Returns information about the job execution as the result of the workflow.

Create an Eventarc trigger for the workflow

To automatically execute the workflow and in turn the Cloud Run job whenever the input data file is updated, create an Eventarc trigger that responds to Cloud Storage events in the bucket containing the input data file.

Console

  1. In the Google Cloud console, go to the Workflows page:

    Go to Workflows

  2. Click the name of your workflow, such as cloud-run-job-workflow.

  3. On the Workflow details page, click Edit.

  4. On the Edit workflow page, in the Triggers section, click Add new trigger > Eventarc.

    The Eventarc trigger pane opens.

  5. In the Trigger name field, enter a name for the trigger, such as cloud-run-job-workflow-trigger.

  6. From the Event provider list, select Cloud Storage.

  7. From the Event list, select google.cloud.storage.object.v1.finalized.

  8. In the Bucket field, select the bucket containing the input data file. The bucket name has the form input-PROJECT_ID.

  9. In the Service account field, select the service account you created earlier.

    The service account serves as the trigger's identity. You should have already granted the following roles to the service account:

    • Eventarc Event Receiver: to receive events
    • Workflows Invoker: to execute workflows
  10. Click Save trigger.

    The Eventarc trigger now appears in the Triggers section on the Edit workflow page.

  11. Click Next.

  12. Click Deploy.

gcloud

Create an Eventarc trigger by running the following command:

gcloud eventarc triggers create cloud-run-job-workflow-trigger \
    --location=us \
    --destination-workflow=cloud-run-job-workflow  \
    --destination-workflow-location=us-central1 \
    --event-filters="type=google.cloud.storage.object.v1.finalized" \
    --event-filters="bucket=input-PROJECT_ID" \
    --service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • SERVICE_ACCOUNT_NAME: the name of the service account you created earlier.

The service account serves as the trigger's identity. You should have already granted the following roles to the service account:

  • roles/eventarc.eventReceiver: to receive events
  • roles/workflows.invoker: to execute workflows

Terraform

To create a trigger, use the google_eventarc_trigger resource and modify your main.tf file as shown in the following sample.

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Note that in a typical Terraform workflow, you apply the entire plan at once. However, for the purposes of this tutorial, you can target a specific resource. For example:

terraform apply -target="google_eventarc_trigger.default"

# Create an Eventarc trigger that routes Cloud Storage events to Workflows
resource "google_eventarc_trigger" "default" {
  name     = "cloud-run-job-trigger"
  location = google_workflows_workflow.default.region

  # Capture objects changed in the bucket
  matching_criteria {
    attribute = "type"
    value     = "google.cloud.storage.object.v1.finalized"
  }
  matching_criteria {
    attribute = "bucket"
    value     = google_storage_bucket.default.name
  }

  # Send events to Workflows
  destination {
    workflow = google_workflows_workflow.default.id
  }

  service_account = google_service_account.workflows.email

}

Whenever a file is uploaded or overwritten in the Cloud Storage bucket containing the input data file, the workflow is executed with the corresponding Cloud Storage event as an argument.

Trigger the workflow

Test the end-to-end system by updating the input data file in Cloud Storage.

  1. Generate new data for the input file and upload it to Cloud Storage in the location expected by the Cloud Run job:

    base64 /dev/urandom | head -c 100000 >input_file.txt
    gcloud storage cp input_file.txt gs://input-PROJECT_ID/input_file.txt

    If you created a Cloud Storage bucket using Terraform, you can retrieve the name of the bucket by running the following command:

    gcloud storage buckets list gs://input*

    The Cloud Run job can take a few minutes to run.

  2. Confirm that the Cloud Run job ran as expected by viewing the job executions:

    gcloud config set run/region us-central1
    gcloud run jobs executions list --job=parallel-job

    You should see a successful job execution in the output indicating that 10/10 tasks have completed.

Learn more about triggering a workflow with events or Pub/Sub messages.