amygdala · amygdala · May 12, 2020 · May 12, 2020 · May 12, 2020
diff --git a/ml/notebook_examples/functions/hosted_kfp_gcf.ipynb b/ml/notebook_examples/functions/hosted_kfp_gcf.ipynb
@@ -0,0 +1,364 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using Google Cloud Functions to support event-based triggering of Cloud AI Platform Pipelines\n",
+    "\n",
+    "This example shows how you can run a [Cloud AI Platform Pipeline](https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines) from a [Google Cloud Function](https://cloud.google.com/functions/docs/), thus providing a way for Pipeline runs to be triggered by events (in the interim before this is supported by Pipelines itself).  \n",
+    "\n",
+    "In this example, the function is triggered by the addition of or update to a file in a [Google Cloud Storage](https://cloud.google.com/storage/) (GCS) bucket, but Cloud Functions can have other triggers too (including [Pub/Sub](https://cloud.google.com/pubsub/docs/)-based triggers).\n",
+    "\n",
+    "The example is Google Cloud Platform (GCP)-specific, and requires a [Cloud AI Platform Pipelines](https://cloud.google.com/ai-platform/pipelines/docs) installation using Pipelines version >= 0.4.\n",
+    "\n",
+    "(If you are instead interested in how to do this with a Kubeflow-based pipelines installation, see [this notebook](https://github.com/amygdala/kubeflow-examples/blob/cookbook/cookbook/pipelines/notebooks/gcf_kfp_trigger.ipynb)).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Create a Cloud AI Platform Pipelines installation\n",
+    "\n",
+    "Follow the instructions in the [documentation](https://cloud.google.com/ai-platform/pipelines/docs) to create a Cloud AI Platform Pipelines installation. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Identify (or create) a Cloud Storage bucket to use for the example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Before executing the next cell**, edit it to **set the `TRIGGER_BUCKET` environment variable** to a Google Cloud Storage bucket ([create a bucket first](https://console.cloud.google.com/storage/browser) if necessary). Do *not* include the `gs://` prefix in the bucket name.\n",
+    "\n",
+    "We'll deploy the GCF function so that it will trigger on new and updated files (blobs) in this bucket."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%env TRIGGER_BUCKET=REPLACE_WITH_YOUR_GCS_BUCKET_NAME"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Give Cloud Function's service account the necessary access\n",
+    "\n",
+    "First, make sure the Cloud Function API [is enabled](https://console.cloud.google.com/apis/library/cloudfunctions.googleapis.com?q=functions).\n",
+    "\n",
+    "Functions uses the project's 'appspot'acccount for its service account.  It will have the form: \n",
+    "` [email protected]`. (This is also the App Engine service account).\n",
+    "\n",
+    "- Go to your project's [IAM - Service Account page](https://console.cloud.google.com/iam-admin/serviceaccounts).\n",
+    "- Find the ` [email protected]` account and copy its email address.\n",
+    "- Find the project's Compute Engine (GCE) default service account (this is the default account used for the Pipelines installation).  It will have a form like this: `[email protected]`.\n",
+    "  Click the checkbox next to the GCE service account, and in the 'INFO PANEL' to the right, click **ADD MEMBER**. Add the Functions service account (`[email protected]`) as a **Project Viewer** of the GCE service account. \n",
+    "  \n",
+    "![Add the Functions service account as a project viewer of the GCE service account](https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup1.png)  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, configure your `TRIGGER_BUCKET` to allow the Functions service account access to that bucket. \n",
+    "\n",
+    "- Navigate in the console to your list of buckets in the [Storage Browser](https://console.cloud.google.com/storage/browser).\n",
+    "- Click the checkbox next to the `TRIGGER_BUCKET`.  In the 'INFO PANEL' to the right, click **ADD MEMBER**.  Add the service account (`[email protected]`) with `Storage Object Admin` permissions. (While not tested, giving both Object view and create permissions should also suffice).\n",
+    "\n",
+    "![add the app engine service account to the trigger bucket with view and edit permissions](https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup2.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create a simple GCF function to test your configuration\n",
+    "\n",
+    "First we'll generate and deploy a simple GCF function, to test that the basics are properly configured.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "mkdir -p functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll first create a `requirements.txt` file, to indicate what packages the GCF code requires to be installed. (We won't actually need `kfp` for this first 'sanity check' version of a GCF function, but we'll need it below for the second  function we'll create, that deploys a pipeline)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile functions/requirements.txt\n",
+    "kfp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we'll create a simple GCF function in the `functions/main.py` file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile functions/main.py\n",
+    "import logging\n",
+    "\n",
+    "def gcs_test(data, context):\n",
+    "  \"\"\"Background Cloud Function to be triggered by Cloud Storage.\n",
+    "     This generic function logs relevant data when a file is changed.\n",
+    "\n",
+    "  Args:\n",
+    "      data (dict): The Cloud Functions event payload.\n",
+    "      context (google.cloud.functions.Context): Metadata of triggering event.\n",
+    "  Returns:\n",
+    "      None; the output is written to Stackdriver Logging\n",
+    "  \"\"\"\n",
+    "\n",
+    "  logging.info('Event ID: {}'.format(context.event_id))\n",
+    "  logging.info('Event type: {}'.format(context.event_type))\n",
+    "  logging.info('Data: {}'.format(data))\n",
+    "  logging.info('Bucket: {}'.format(data['bucket']))\n",
+    "  logging.info('File: {}'.format(data['name']))\n",
+    "  file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])\n",
+    "  logging.info('Using file uri: %s', file_uri)\n",
+    "\n",
+    "  logging.info('Metageneration: {}'.format(data['metageneration']))\n",
+    "  logging.info('Created: {}'.format(data['timeCreated']))\n",
+    "  logging.info('Updated: {}'.format(data['updated']))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Deploy the GCF function as follows. (You'll need to **wait a moment or two for output of the deployment to display in the notebook**).  You can also run this command from a notebook terminal window in the `functions` subdirectory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "cd functions\n",
+    "gcloud functions deploy gcs_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After you've deployed, test your deployment by adding a file to the specified `TRIGGER_BUCKET`. You can do this easily by visiting the **Storage** panel in the Cloud Console, clicking on the bucket in the list, and then clicking on **Upload files** in the bucket details view.\n",
+    "\n",
+    "Then, check in the logs viewer panel (https://console.cloud.google.com/logs/viewer) to confirm that the GCF function was triggered and ran correctly.  You can select 'Cloud Function' in the first pulldown menu to filter on just those log entries."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Deploy a Pipeline from a GCF function\n",
+    "\n",
+    "Next, we'll create a GCF function that deploys an AI Platform Pipeline when triggered. First, preserve your existing main.py in a backup file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "cd functions\n",
+    "mv main.py main.py.bak"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then, **before executing the next cell**, **edit the `HOST` variable** in the code below.  You'll replace `<your_endpoint>` with the correct value for your installation.\n",
+    "\n",
+    "To find this URL, visit the [Pipelines panel](https://console.cloud.google.com/ai-platform/pipelines/) in the Cloud Console.  \n",
+    "From here, you can find the URL by clicking on the **SETTINGS** link for the Pipelines installation you want to use, and copying the 'host' string displayed in the client example code (prepend `https://` to that string in the code below).   \n",
+    "You can alternately click on **OPEN PIPELINES DASHBOARD** for the Pipelines installation, and copy that URL, removing the `/#/pipelines` suffix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile functions/main.py\n",
+    "import logging\n",
+    "import datetime\n",
+    "import logging\n",
+    "import time\n",
+    " \n",
+    "import kfp\n",
+    "import kfp.compiler as compiler\n",
+    "import kfp.dsl as dsl\n",
+    " \n",
+    "import requests\n",
+    " \n",
+    "# TODO: replace with your Pipelines endpoint URL\n",
+    "HOST = 'https://<your_endpoint>.pipelines.googleusercontent.com'\n",
+    "\n",
+    "@dsl.pipeline(\n",
+    "    name='Sequential',\n",
+    "    description='A pipeline with two sequential steps.'\n",
+    ")\n",
+    "def sequential_pipeline(filename='gs://ml-pipeline-playground/shakespeare1.txt'):\n",
+    "  \"\"\"A pipeline with two sequential steps.\"\"\"\n",
+    "  op1 = dsl.ContainerOp(\n",
+    "      name='filechange',\n",
+    "      image='library/bash:4.4.23',\n",
+    "      command=['sh', '-c'],\n",
+    "      arguments=['echo \"%s\" > /tmp/results.txt' % filename],\n",
+    "      file_outputs={'newfile': '/tmp/results.txt'})\n",
+    "  op2 = dsl.ContainerOp(\n",
+    "      name='echo',\n",
+    "      image='library/bash:4.4.23',\n",
+    "      command=['sh', '-c'],\n",
+    "      arguments=['echo \"%s\"' % op1.outputs['newfile']]\n",
+    "      )\n",
+    " \n",
+    "def get_access_token():\n",
+    "  url = 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token'\n",
+    "  r = requests.get(url, headers={'Metadata-Flavor': 'Google'})\n",
+    "  r.raise_for_status()\n",
+    "  access_token = r.json()['access_token']\n",
+    "  return access_token\n",
+    " \n",
+    "def hosted_kfp_test(data, context):\n",
+    "  logging.info('Event ID: {}'.format(context.event_id))\n",
+    "  logging.info('Event type: {}'.format(context.event_type))\n",
+    "  logging.info('Data: {}'.format(data))\n",
+    "  logging.info('Bucket: {}'.format(data['bucket']))\n",
+    "  logging.info('File: {}'.format(data['name']))\n",
+    "  file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])\n",
+    "  logging.info('Using file uri: %s', file_uri)\n",
+    "  \n",
+    "  logging.info('Metageneration: {}'.format(data['metageneration']))\n",
+    "  logging.info('Created: {}'.format(data['timeCreated']))\n",
+    "  logging.info('Updated: {}'.format(data['updated']))\n",
+    "  \n",
+    "  token = get_access_token() \n",
+    "  logging.info('attempting to launch pipeline run.')\n",
+    "  ts = int(datetime.datetime.utcnow().timestamp() * 100000)\n",
+    "  client = kfp.Client(host=HOST, existing_token=token)\n",
+    "  compiler.Compiler().compile(sequential_pipeline, '/tmp/sequential.tar.gz')\n",
+    "  exp = client.create_experiment(name='gcstriggered')  # this is a 'get or create' op\n",
+    "  res = client.run_pipeline(exp.id, 'sequential_' + str(ts), '/tmp/sequential.tar.gz',\n",
+    "                              params={'filename': file_uri})\n",
+    "  logging.info(res)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, deploy the new GCF function. As before, **it will take a moment or two for the results of the deployment to display in the notebook**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "cd functions\n",
+    "gcloud functions deploy hosted_kfp_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Add another file to your `TRIGGER_BUCKET`. This time you should see both GCF functions triggered. The `hosted_kfp_test` function will deploy the pipeline. You'll be able to see it running at your Pipeline installation's endpoint, `https://<your_endpoint>.pipelines.googleusercontent.com/#/pipelines`, under the given Pipelines Experiment (`gcstriggered` as default)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "------------------------------------------\n",
+    "Copyright 2020, Google, LLC.\n",
+    "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "you may not use this file except in compliance with the License.\n",
+    "You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "\n",
+    "Unless required by applicable law or agreed to in writing, software\n",
+    "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "See the License for the specific language governing permissions and\n",
+    "limitations under the License."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}