Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
364 changes: 364 additions & 0 deletions ml/notebook_examples/functions/hosted_kfp_gcf.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using Google Cloud Functions to support event-based triggering of Cloud AI Platform Pipelines\n",
"\n",
"This example shows how you can run a [Cloud AI Platform Pipeline](https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines) from a [Google Cloud Function](https://cloud.google.com/functions/docs/), thus providing a way for Pipeline runs to be triggered by events (in the interim before this is supported by Pipelines itself). \n",
"\n",
"In this example, the function is triggered by the addition of or update to a file in a [Google Cloud Storage](https://cloud.google.com/storage/) (GCS) bucket, but Cloud Functions can have other triggers too (including [Pub/Sub](https://cloud.google.com/pubsub/docs/)-based triggers).\n",
"\n",
"The example is Google Cloud Platform (GCP)-specific, and requires a [Cloud AI Platform Pipelines](https://cloud.google.com/ai-platform/pipelines/docs) installation using Pipelines version >= 0.4.\n",
"\n",
"(If you are instead interested in how to do this with a Kubeflow-based pipelines installation, see [this notebook](https://github.com/amygdala/kubeflow-examples/blob/cookbook/cookbook/pipelines/notebooks/gcf_kfp_trigger.ipynb)).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Create a Cloud AI Platform Pipelines installation\n",
"\n",
"Follow the instructions in the [documentation](https://cloud.google.com/ai-platform/pipelines/docs) to create a Cloud AI Platform Pipelines installation. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Identify (or create) a Cloud Storage bucket to use for the example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Before executing the next cell**, edit it to **set the `TRIGGER_BUCKET` environment variable** to a Google Cloud Storage bucket ([create a bucket first](https://console.cloud.google.com/storage/browser) if necessary). Do *not* include the `gs://` prefix in the bucket name.\n",
"\n",
"We'll deploy the GCF function so that it will trigger on new and updated files (blobs) in this bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%env TRIGGER_BUCKET=REPLACE_WITH_YOUR_GCS_BUCKET_NAME"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Give Cloud Function's service account the necessary access\n",
"\n",
"First, make sure the Cloud Function API [is enabled](https://console.cloud.google.com/apis/library/cloudfunctions.googleapis.com?q=functions).\n",
"\n",
"Functions uses the project's 'appspot'acccount for its service account. It will have the form: \n",
"` [email protected]`. (This is also the App Engine service account).\n",
"\n",
"- Go to your project's [IAM - Service Account page](https://console.cloud.google.com/iam-admin/serviceaccounts).\n",
"- Find the ` [email protected]` account and copy its email address.\n",
"- Find the project's Compute Engine (GCE) default service account (this is the default account used for the Pipelines installation). It will have a form like this: `[email protected]`.\n",
" Click the checkbox next to the GCE service account, and in the 'INFO PANEL' to the right, click **ADD MEMBER**. Add the Functions service account (`[email protected]`) as a **Project Viewer** of the GCE service account. \n",
" \n",
"![Add the Functions service account as a project viewer of the GCE service account](https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup1.png) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, configure your `TRIGGER_BUCKET` to allow the Functions service account access to that bucket. \n",
"\n",
"- Navigate in the console to your list of buckets in the [Storage Browser](https://console.cloud.google.com/storage/browser).\n",
"- Click the checkbox next to the `TRIGGER_BUCKET`. In the 'INFO PANEL' to the right, click **ADD MEMBER**. Add the service account (`[email protected]`) with `Storage Object Admin` permissions. (While not tested, giving both Object view and create permissions should also suffice).\n",
"\n",
"![add the app engine service account to the trigger bucket with view and edit permissions](https://storage.googleapis.com/amy-jo/images/kfp-deploy/hosted_kfp_setup2.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a simple GCF function to test your configuration\n",
"\n",
"First we'll generate and deploy a simple GCF function, to test that the basics are properly configured. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"mkdir -p functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll first create a `requirements.txt` file, to indicate what packages the GCF code requires to be installed. (We won't actually need `kfp` for this first 'sanity check' version of a GCF function, but we'll need it below for the second function we'll create, that deploys a pipeline)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile functions/requirements.txt\n",
"kfp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we'll create a simple GCF function in the `functions/main.py` file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile functions/main.py\n",
"import logging\n",
"\n",
"def gcs_test(data, context):\n",
" \"\"\"Background Cloud Function to be triggered by Cloud Storage.\n",
" This generic function logs relevant data when a file is changed.\n",
"\n",
" Args:\n",
" data (dict): The Cloud Functions event payload.\n",
" context (google.cloud.functions.Context): Metadata of triggering event.\n",
" Returns:\n",
" None; the output is written to Stackdriver Logging\n",
" \"\"\"\n",
"\n",
" logging.info('Event ID: {}'.format(context.event_id))\n",
" logging.info('Event type: {}'.format(context.event_type))\n",
" logging.info('Data: {}'.format(data))\n",
" logging.info('Bucket: {}'.format(data['bucket']))\n",
" logging.info('File: {}'.format(data['name']))\n",
" file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])\n",
" logging.info('Using file uri: %s', file_uri)\n",
"\n",
" logging.info('Metageneration: {}'.format(data['metageneration']))\n",
" logging.info('Created: {}'.format(data['timeCreated']))\n",
" logging.info('Updated: {}'.format(data['updated']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deploy the GCF function as follows. (You'll need to **wait a moment or two for output of the deployment to display in the notebook**). You can also run this command from a notebook terminal window in the `functions` subdirectory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"cd functions\n",
"gcloud functions deploy gcs_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After you've deployed, test your deployment by adding a file to the specified `TRIGGER_BUCKET`. You can do this easily by visiting the **Storage** panel in the Cloud Console, clicking on the bucket in the list, and then clicking on **Upload files** in the bucket details view.\n",
"\n",
"Then, check in the logs viewer panel (https://console.cloud.google.com/logs/viewer) to confirm that the GCF function was triggered and ran correctly. You can select 'Cloud Function' in the first pulldown menu to filter on just those log entries."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy a Pipeline from a GCF function\n",
"\n",
"Next, we'll create a GCF function that deploys an AI Platform Pipeline when triggered. First, preserve your existing main.py in a backup file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"cd functions\n",
"mv main.py main.py.bak"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, **before executing the next cell**, **edit the `HOST` variable** in the code below. You'll replace `<your_endpoint>` with the correct value for your installation.\n",
"\n",
"To find this URL, visit the [Pipelines panel](https://console.cloud.google.com/ai-platform/pipelines/) in the Cloud Console. \n",
"From here, you can find the URL by clicking on the **SETTINGS** link for the Pipelines installation you want to use, and copying the 'host' string displayed in the client example code (prepend `https://` to that string in the code below). \n",
"You can alternately click on **OPEN PIPELINES DASHBOARD** for the Pipelines installation, and copy that URL, removing the `/#/pipelines` suffix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile functions/main.py\n",
"import logging\n",
"import datetime\n",
"import logging\n",
"import time\n",
" \n",
"import kfp\n",
"import kfp.compiler as compiler\n",
"import kfp.dsl as dsl\n",
" \n",
"import requests\n",
" \n",
"# TODO: replace with your Pipelines endpoint URL\n",
"HOST = 'https://<your_endpoint>.pipelines.googleusercontent.com'\n",
"\n",
"@dsl.pipeline(\n",
" name='Sequential',\n",
" description='A pipeline with two sequential steps.'\n",
")\n",
"def sequential_pipeline(filename='gs://ml-pipeline-playground/shakespeare1.txt'):\n",
" \"\"\"A pipeline with two sequential steps.\"\"\"\n",
" op1 = dsl.ContainerOp(\n",
" name='filechange',\n",
" image='library/bash:4.4.23',\n",
" command=['sh', '-c'],\n",
" arguments=['echo \"%s\" > /tmp/results.txt' % filename],\n",
" file_outputs={'newfile': '/tmp/results.txt'})\n",
" op2 = dsl.ContainerOp(\n",
" name='echo',\n",
" image='library/bash:4.4.23',\n",
" command=['sh', '-c'],\n",
" arguments=['echo \"%s\"' % op1.outputs['newfile']]\n",
" )\n",
" \n",
"def get_access_token():\n",
" url = 'http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token'\n",
" r = requests.get(url, headers={'Metadata-Flavor': 'Google'})\n",
" r.raise_for_status()\n",
" access_token = r.json()['access_token']\n",
" return access_token\n",
" \n",
"def hosted_kfp_test(data, context):\n",
" logging.info('Event ID: {}'.format(context.event_id))\n",
" logging.info('Event type: {}'.format(context.event_type))\n",
" logging.info('Data: {}'.format(data))\n",
" logging.info('Bucket: {}'.format(data['bucket']))\n",
" logging.info('File: {}'.format(data['name']))\n",
" file_uri = 'gs://%s/%s' % (data['bucket'], data['name'])\n",
" logging.info('Using file uri: %s', file_uri)\n",
" \n",
" logging.info('Metageneration: {}'.format(data['metageneration']))\n",
" logging.info('Created: {}'.format(data['timeCreated']))\n",
" logging.info('Updated: {}'.format(data['updated']))\n",
" \n",
" token = get_access_token() \n",
" logging.info('attempting to launch pipeline run.')\n",
" ts = int(datetime.datetime.utcnow().timestamp() * 100000)\n",
" client = kfp.Client(host=HOST, existing_token=token)\n",
" compiler.Compiler().compile(sequential_pipeline, '/tmp/sequential.tar.gz')\n",
" exp = client.create_experiment(name='gcstriggered') # this is a 'get or create' op\n",
" res = client.run_pipeline(exp.id, 'sequential_' + str(ts), '/tmp/sequential.tar.gz',\n",
" params={'filename': file_uri})\n",
" logging.info(res)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, deploy the new GCF function. As before, **it will take a moment or two for the results of the deployment to display in the notebook**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"cd functions\n",
"gcloud functions deploy hosted_kfp_test --runtime python37 --trigger-resource ${TRIGGER_BUCKET} --trigger-event google.storage.object.finalize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add another file to your `TRIGGER_BUCKET`. This time you should see both GCF functions triggered. The `hosted_kfp_test` function will deploy the pipeline. You'll be able to see it running at your Pipeline installation's endpoint, `https://<your_endpoint>.pipelines.googleusercontent.com/#/pipelines`, under the given Pipelines Experiment (`gcstriggered` as default)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------------------------------------\n",
"Copyright 2020, Google, LLC.\n",
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"you may not use this file except in compliance with the License.\n",
"You may obtain a copy of the License at\n",
"\n",
" http://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
"Unless required by applicable law or agreed to in writing, software\n",
"distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"See the License for the specific language governing permissions and\n",
"limitations under the License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading