Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit d186a6f

Browse files
authored
Tables e2e (#60)
kfp bug?
1 parent 7b326d6 commit d186a6f

File tree

7 files changed

+50
-111
lines changed

7 files changed

+50
-111
lines changed

ml/automl/tables/kfp_e2e/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,9 @@ Once a Pipelines installation is running, we can upload the example AutoML Table
7676
Click on **Pipelines** in the left nav bar of the Pipelines Dashboard. Click on **Upload Pipeline**.
7777

7878
- For Cloud AI Platform Pipelines, upload [`tables_pipeline_caip.py.tar.gz`][36], from this directory. This archive points to the compiled version of [this pipeline][37], specified and compiled using the [Kubeflow Pipelines SDK][38].
79-
- For Kubeflow Pipelines on a Kubeflow installation, upload [`tables_pipeline_kf.py.tar.gz`][39]. This archive points to the compiled version of [this pipeline][40].
79+
- For Kubeflow Pipelines on a Kubeflow installation, upload [`tables_pipeline_kf.py.tar.gz`][39]. This archive points to the compiled version of [this pipeline][40]. **To run this example on a KF installation, you will need to give the `<deployment-name>-user@<project-id>.iam.gserviceaccount.com` service account `AutoML Admin` privileges**.
8080

81-
> Note: The difference between the two pipelines relates to how GCP authentication is handled. For the Kubeflow pipeline, we’ve added `.apply(gcp.use_gcp_secret('user-gcp-sa'))` annotations to the pipeline steps. This tells the pipeline to use the mounted _secret_—set up during the installation process— that provides GCP account credentials. With the Cloud AI Platform Pipelines installation, the GKE cluster nodes have been set up to use the `cloud-platform` scope. With an upcoming Kubeflow release, specification of the mounted secret will no longer be necessary.
81+
> Note: The difference between the two pipelines relates to how GCP authentication is handled. For the Kubeflow pipeline, we’ve added `.apply(gcp.use_gcp_secret('user-gcp-sa'))` annotations to the pipeline steps. This tells the pipeline to use the mounted _secret_—set up during the installation process— that provides GCP account credentials. With the Cloud AI Platform Pipelines installation, the GKE cluster nodes have been set up to use the `cloud-platform` scope. With recent Kubeflow releases, specification of the mounted secret is no longer necessary, but we include both versions for compatibility.
8282
8383
The uploaded pipeline graph will look similar to this:
8484

@@ -88,7 +88,7 @@ The uploaded pipeline graph will look similar to this:
8888
</figure>
8989

9090
Click the **+Create Run** button to run the pipeline. You will need to fill in some pipeline parameters.
91-
Specifically, replace `YOUR_PROJECT_HERE` with the name of your project; replace `YOUR_DATASET_NAME` with the name you want to give your new dataset (make it unique, and use letters, numbers and underscores up to 32 characters); and replace `YOUR_BUCKET_NAME` with the name of a [GCS bucket][41]. This bucket should be in the [same _region_][42] as that specified by the `gcp_region` parameter. E.g., if you keep the default `us-central1` region, your bucket should also be a _regional_ (not multi-regional) bucket in the `us-central1` region. ++double check that this is necessary.++
91+
Specifically, replace `YOUR_PROJECT_HERE` with the name of your project; replace `YOUR_DATASET_NAME` with the name you want to give your new dataset (make it unique, and use letters, numbers and underscores up to 32 characters); and replace `YOUR_BUCKET_NAME` with the name of a [GCS bucket][41]. Do not include the `gs://` prefix— just enter the name. This bucket should be in the [same _region_][42] as that specified by the `gcp_region` parameter. E.g., if you keep the default `us-central1` region, your bucket should also be a _regional_ (not multi-regional) bucket in the `us-central1` region. ++double check that this is necessary.++
9292

9393
If you want to schedule a recurrent set of runs, you can do that instead. If your data is in [BigQuery][43]— as is the case for this example pipeline— and has a temporal aspect, you could define a _view_ to reflect that, e.g. to return data from a window over the last `N` days or hours. Then, the AutoML pipeline could specify ingestion of data from that view, grabbing an updated data window each time the pipeline is run, and building a new model based on that updated window.
9494

ml/automl/tables/kfp_e2e/create_model_for_tables/tables_eval_metrics_component.py

Lines changed: 17 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -19,46 +19,29 @@
1919
# An example of how the model eval info could be used to make decisions aboiut whether or not
2020
# to deploy the model.
2121
def automl_eval_metrics(
22-
# gcp_project_id: str,
23-
# gcp_region: str,
24-
# model_display_name: str,
2522
eval_data_path: InputPath('evals'),
2623
mlpipeline_ui_metadata_path: OutputPath('UI_metadata'),
2724
mlpipeline_metrics_path: OutputPath('UI_metrics'),
28-
# api_endpoint: str = None,
2925
# thresholds: str = '{"au_prc": 0.9}',
30-
thresholds: str = '{"mean_absolute_error": 450}',
26+
thresholds: str = '{"mean_absolute_error": 460}',
3127
confidence_threshold: float = 0.5 # for classification
3228

33-
) -> NamedTuple('Outputs', [('deploy', bool)]):
29+
# ) -> NamedTuple('Outputs', [('deploy', str)]): # this gives the same result
30+
) -> NamedTuple('Outputs', [('deploy', 'String')]):
3431
import subprocess
3532
import sys
36-
# we could build a base image that includes these libraries if we don't want to do
37-
# the dynamic installation when the step runs.
38-
# subprocess.run([sys.executable, '-m', 'pip', 'install', 'googleapis-common-protos==1.6.0',
39-
# '--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
40-
# subprocess.run([sys.executable, '-m', 'pip', 'install', 'google-cloud-automl==0.9.0',
41-
# 'google-cloud-storage',
42-
# '--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
33+
subprocess.run([sys.executable, '-m', 'pip', 'install', 'googleapis-common-protos==1.6.0',
34+
'--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
35+
subprocess.run([sys.executable, '-m', 'pip', 'install', 'google-cloud-automl==0.9.0',
36+
'google-cloud-storage',
37+
'--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
4338

44-
# import google
39+
import google
4540
import json
4641
import logging
4742
import pickle
48-
# from google.api_core.client_options import ClientOptions
49-
# from google.api_core import exceptions
50-
# from google.cloud import automl_v1beta1 as automl
51-
# from google.cloud import storage
5243

5344
logging.getLogger().setLevel(logging.INFO) # TODO: make level configurable
54-
# TODO: we could instead check for region 'eu' and use 'eu-automl.googleapis.com:443'endpoint
55-
# in that case, instead of requiring endpoint to be specified.
56-
# if api_endpoint:
57-
# client_options = ClientOptions(api_endpoint=api_endpoint)
58-
# client = automl.TablesClient(project=gcp_project_id, region=gcp_region,
59-
# client_options=client_options)
60-
# else:
61-
# client = automl.TablesClient(project=gcp_project_id, region=gcp_region)
6245

6346
thresholds_dict = json.loads(thresholds)
6447
logging.info('thresholds dict: {}'.format(thresholds_dict))
@@ -78,12 +61,12 @@ def regression_threshold_check(eval_info):
7861
if eresults[k] > v:
7962
logging.info('{} > {}; returning False'.format(
8063
eresults[k], v))
81-
return (False, eresults)
64+
return ('False', eresults)
8265
elif eresults[k] < v:
8366
logging.info('{} < {}; returning False'.format(
8467
eresults[k], v))
85-
return (False, eresults)
86-
return (True, eresults)
68+
return ('False', eresults)
69+
return ('deploy', eresults)
8770

8871
def classif_threshold_check(eval_info):
8972
eresults = {}
@@ -108,13 +91,13 @@ def classif_threshold_check(eval_info):
10891
if eresults[k] > v:
10992
logging.info('{} > {}; returning False'.format(
11093
eresults[k], v))
111-
return (False, eresults)
94+
return ('False', eresults)
11295
else:
11396
if eresults[k] < v:
11497
logging.info('{} < {}; returning False'.format(
11598
eresults[k], v))
116-
return (False, eresults)
117-
return (True, eresults)
99+
return ('False', eresults)
100+
return ('deploy', eresults)
118101

119102
with open(eval_data_path, 'rb') as f:
120103
logging.info('successfully opened eval_data_path {}'.format(eval_data_path))
@@ -177,13 +160,13 @@ def classif_threshold_check(eval_info):
177160
mlpipeline_ui_metadata_file.write(json.dumps(metadata))
178161
logging.info('deploy flag: {}'.format(res))
179162
return res
180-
return True
163+
return 'deploy'
181164
except Exception as e:
182165
logging.warning(e)
183166
# If can't reconstruct the eval, or don't have thresholds defined,
184167
# return True as a signal to deploy.
185168
# TODO: is this the right default?
186-
return True
169+
return 'deploy'
187170

188171

189172
if __name__ == '__main__':

ml/automl/tables/kfp_e2e/create_model_for_tables/tables_eval_metrics_component.yaml

Lines changed: 28 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ inputs:
44
type: evals
55
- name: thresholds
66
type: String
7-
default: '{"mean_absolute_error": 450}'
7+
default: '{"mean_absolute_error": 460}'
88
optional: true
99
- name: confidence_threshold
1010
type: Float
@@ -16,7 +16,7 @@ outputs:
1616
- name: mlpipeline_metrics
1717
type: UI_metrics
1818
- name: deploy
19-
type: Boolean
19+
type: String
2020
implementation:
2121
container:
2222
image: python:3.7
@@ -25,64 +25,35 @@ implementation:
2525
- -u
2626
- -c
2727
- |
28-
class OutputPath:
29-
'''When creating component from function, OutputPath should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a file with the given path instead of returning the data from the function.'''
30-
def __init__(self, type=None):
31-
self.type = type
32-
3328
def _make_parent_dirs_and_return_path(file_path: str):
3429
import os
3530
os.makedirs(os.path.dirname(file_path), exist_ok=True)
3631
return file_path
3732
38-
class InputPath:
39-
'''When creating component from function, InputPath should be used as function parameter annotation to tell the system to pass the *data file path* to the function instead of passing the actual data.'''
40-
def __init__(self, type=None):
41-
self.type = type
42-
43-
from typing import NamedTuple
44-
4533
def automl_eval_metrics(
46-
# gcp_project_id: str,
47-
# gcp_region: str,
48-
# model_display_name: str,
49-
eval_data_path: InputPath('evals'),
50-
mlpipeline_ui_metadata_path: OutputPath('UI_metadata'),
51-
mlpipeline_metrics_path: OutputPath('UI_metrics'),
52-
# api_endpoint: str = None,
34+
eval_data_path ,
35+
mlpipeline_ui_metadata_path ,
36+
mlpipeline_metrics_path ,
5337
# thresholds: str = '{"au_prc": 0.9}',
54-
thresholds: str = '{"mean_absolute_error": 450}',
55-
confidence_threshold: float = 0.5 # for classification
38+
thresholds = '{"mean_absolute_error": 460}',
39+
confidence_threshold = 0.5 # for classification
5640
57-
) -> NamedTuple('Outputs', [('deploy', bool)]):
41+
# ) -> NamedTuple('Outputs', [('deploy', str)]):
42+
) :
5843
import subprocess
5944
import sys
60-
# we could build a base image that includes these libraries if we don't want to do
61-
# the dynamic installation when the step runs.
62-
# subprocess.run([sys.executable, '-m', 'pip', 'install', 'googleapis-common-protos==1.6.0',
63-
# '--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
64-
# subprocess.run([sys.executable, '-m', 'pip', 'install', 'google-cloud-automl==0.9.0',
65-
# 'google-cloud-storage',
66-
# '--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
45+
subprocess.run([sys.executable, '-m', 'pip', 'install', 'googleapis-common-protos==1.6.0',
46+
'--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
47+
subprocess.run([sys.executable, '-m', 'pip', 'install', 'google-cloud-automl==0.9.0',
48+
'google-cloud-storage',
49+
'--no-warn-script-location'], env={'PIP_DISABLE_PIP_VERSION_CHECK': '1'}, check=True)
6750
68-
# import google
51+
import google
6952
import json
7053
import logging
7154
import pickle
72-
# from google.api_core.client_options import ClientOptions
73-
# from google.api_core import exceptions
74-
# from google.cloud import automl_v1beta1 as automl
75-
# from google.cloud import storage
7655
7756
logging.getLogger().setLevel(logging.INFO) # TODO: make level configurable
78-
# TODO: we could instead check for region 'eu' and use 'eu-automl.googleapis.com:443'endpoint
79-
# in that case, instead of requiring endpoint to be specified.
80-
# if api_endpoint:
81-
# client_options = ClientOptions(api_endpoint=api_endpoint)
82-
# client = automl.TablesClient(project=gcp_project_id, region=gcp_region,
83-
# client_options=client_options)
84-
# else:
85-
# client = automl.TablesClient(project=gcp_project_id, region=gcp_region)
8657
8758
thresholds_dict = json.loads(thresholds)
8859
logging.info('thresholds dict: {}'.format(thresholds_dict))
@@ -102,12 +73,12 @@ implementation:
10273
if eresults[k] > v:
10374
logging.info('{} > {}; returning False'.format(
10475
eresults[k], v))
105-
return (False, eresults)
76+
return ('False', eresults)
10677
elif eresults[k] < v:
10778
logging.info('{} < {}; returning False'.format(
10879
eresults[k], v))
109-
return (False, eresults)
110-
return (True, eresults)
80+
return ('False', eresults)
81+
return ('deploy', eresults)
11182
11283
def classif_threshold_check(eval_info):
11384
eresults = {}
@@ -132,13 +103,13 @@ implementation:
132103
if eresults[k] > v:
133104
logging.info('{} > {}; returning False'.format(
134105
eresults[k], v))
135-
return (False, eresults)
106+
return ('False', eresults)
136107
else:
137108
if eresults[k] < v:
138109
logging.info('{} < {}; returning False'.format(
139110
eresults[k], v))
140-
return (False, eresults)
141-
return (True, eresults)
111+
return ('False', eresults)
112+
return ('deploy', eresults)
142113
143114
with open(eval_data_path, 'rb') as f:
144115
logging.info('successfully opened eval_data_path {}'.format(eval_data_path))
@@ -201,20 +172,18 @@ implementation:
201172
mlpipeline_ui_metadata_file.write(json.dumps(metadata))
202173
logging.info('deploy flag: {}'.format(res))
203174
return res
204-
return True
175+
return 'deploy'
205176
except Exception as e:
206177
logging.warning(e)
207178
# If can't reconstruct the eval, or don't have thresholds defined,
208179
# return True as a signal to deploy.
209180
# TODO: is this the right default?
210-
return True
181+
return 'deploy'
211182
212-
def _serialize_bool(bool_value: bool) -> str:
213-
if isinstance(bool_value, str):
214-
return bool_value
215-
if not isinstance(bool_value, bool):
216-
raise TypeError('Value "{}" has type "{}" instead of bool.'.format(str(bool_value), str(type(bool_value))))
217-
return str(bool_value)
183+
def _serialize_str(str_value: str) -> str:
184+
if not isinstance(str_value, str):
185+
raise TypeError('Value "{}" has type "{}" instead of str.'.format(str(str_value), str(type(str_value))))
186+
return str_value
218187
219188
import argparse
220189
_parser = argparse.ArgumentParser(prog='Automl eval metrics', description='')
@@ -229,11 +198,8 @@ implementation:
229198
230199
_outputs = automl_eval_metrics(**_parsed_args)
231200
232-
if not hasattr(_outputs, '__getitem__') or isinstance(_outputs, str):
233-
_outputs = [_outputs]
234-
235201
_output_serializers = [
236-
_serialize_bool,
202+
_serialize_str,
237203
238204
]
239205

ml/automl/tables/kfp_e2e/tables_pipeline_caip.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,16 +126,11 @@ def automl_tables( #pylint: disable=unused-argument
126126
)
127127

128128
eval_metrics = eval_metrics_op(
129-
# gcp_project_id=gcp_project_id,
130-
# gcp_region=gcp_region,
131-
# bucket_name=bucket_name,
132-
# api_endpoint=api_endpoint,
133-
# model_display_name=train_model.outputs['model_display_name'],
134129
thresholds=thresholds,
135130
eval_data=eval_model.outputs['eval_data'],
136131
)
137132

138-
with dsl.Condition(eval_metrics.outputs['deploy'] == True):
133+
with dsl.Condition(eval_metrics.outputs['deploy'] == 'True'):
139134
deploy_model = deploy_model_op(
140135
gcp_project_id=gcp_project_id,
141136
gcp_region=gcp_region,
-188 Bytes
Binary file not shown.

ml/automl/tables/kfp_e2e/tables_pipeline_kf.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,16 +126,11 @@ def automl_tables( #pylint: disable=unused-argument
126126
).apply(gcp.use_gcp_secret('user-gcp-sa'))
127127

128128
eval_metrics = eval_metrics_op(
129-
# gcp_project_id=gcp_project_id,
130-
# gcp_region=gcp_region,
131-
# bucket_name=bucket_name,
132-
# api_endpoint=api_endpoint,
133-
# model_display_name=train_model.outputs['model_display_name'],
134129
thresholds=thresholds,
135130
eval_data=eval_model.outputs['eval_data'],
136131
).apply(gcp.use_gcp_secret('user-gcp-sa'))
137132

138-
with dsl.Condition(eval_metrics.outputs['deploy'] == True):
133+
with dsl.Condition(eval_metrics.outputs['deploy'] == 'd'):
139134
deploy_model = deploy_model_op(
140135
gcp_project_id=gcp_project_id,
141136
gcp_region=gcp_region,
-215 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)