Google Vision
USER GUIDE
Blue Prism Version: 6.4
Document Revision: 1.1
For more information please contact:
[email protected] | UK: +44 (0) 870 879 3000 | US: +1 888 757 7476
www.blueprism.com
Contents
1. Introduction ........................................................................................................................................................... 3
2. Solution Overview and Configuration ................................................................................................................... 3
2.1. Limitations..................................................................................................................................................... 3
3. Pre-Requisites and Environment Configuration..................................................................................................... 4
3.1. Google Cloud Services Prerequisites ............................................................................................................. 4
3.2. Blue Prism Configuration .............................................................................................................................. 4
4. Using the Skill ..............................................................................................................................................................6
4.1. Common Parameters...................................................................................................................................... 6
4.2. Detect Face .................................................................................................................................................... 7
4.3. Get Image Properties ..................................................................................................................................... 8
4.4. Safe Search Classification....................................................................................................................................9
4.5. Label Entities ............................................................................................................................................... 10
4.6. Text Detection ............................................................................................................................................. 11
4.7. Logo Detection ............................................................................................................................................ 12
4.8. Landmark Detection .................................................................................................................................... 13
4.9. Document Text Extraction (OCR).....................................................................................................................14
The information contained in this document is the proprietary and confidential information of Blue Prism Limited and should not be
disclosed to a third party without the written consent of an authorised Blue Prism representative. No part of this document may be
reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying without the written
permission of Blue Prism Limited.
© Blue Prism Limited, 2001 – 2018
®Blue Prism is a registered trademark of Blue Prism Limited
All trademarks are hereby acknowledged and are used to the benefit of their respective owners.
Blue Prism is not responsible for the content of external websites referenced by this document.
Blue Prism Limited, Centrix House, Crow Lane East, Newton-le-Willows, WA12 9UY, United Kingdom
Registered in England: Reg. No. 4260035. Tel: +44 870 879 3000. Web: www.blueprism.com
Commercial in Confidence Page 2 of 14
1. Introduction
As the market for RPA grows, also grows the interest of what RPA can do and how easy it can integrate with every
ecosystem available. With the advent of Artificial Intelligence to the marketplace, interest has grown in capabilities
that provide integrations with different pre-trained AI services in the cloud.
This document focuses on the design of the integration between Blue Prism and Google’s Vision Cognitive Service.
Google provides these in the form of web services, which are consumed via RESTful APIs.
2. Solution Overview and Configuration
The basic design of the Google Vision Skill is to encapsulate the different AI Cognitive services offered by Google.
These integrations can be used as an easy bridge to connect the client’s processes to the different AI services
developed by Google.
The Blue Prism’s Google Vision Skill interacts with the Google Cognitive Services by using Blue Prism to construct a
REST call. Then, the response given back is handled by Blue Prism and then converted into easy-to-use outputs,
such as Text, Numbers, or Collections.
All of Google’s services require a service account, which is given to each party as part of their contract with Googles
authentication server. When registering with Google’s Cloud Platform, you can create service accounts which are
restricted based upon API services. These service accounts are part of the OAuth 2.0 authentication layer which
allow you to call the API’s seamlessly – don’t worry, Blue Prism handles all the flow for you. Once a service account
has been saved inside of Blue Prism, the basic data flow of an API call would be as such:
Textual Inputs Json Request
Textual Outputs Json Response
Google Cloud
Platform
2.1. Limitations
The following limitations should be understood before attempting to use these integrations:
• The customer or partner is responsible for the configuration and maintenance of the relevant cloud
subscriptions and services. Blue Prism cannot provide any support on the configuration of the cloud
environment itself.
• Use of the APIs may incur additional costs, depending on usage.
• There is always a possibility with external services that the APIs will change. This Skill is provided as-is
without warranties, and support is provided by Blue Prism on a best endeavors basis and is not subject to
formal SLAs.
Commercial in Confidence Page 3 of 14
3. Pre-Requisites and Environment Configuration
This section outlines the pre-requisites that are required to use the integrations. Note that Blue Prism is not able to
provide any support in configuring the Google Cloud Services themselves.
3.1. Google Cloud Services Prerequisites
To implement the Google Cognitive Services integration, the following components are required:
• Subscription to Google Cloud Platform
• Enable the Vision API
• Obtain a service account with access to the Vision API
3.2. Blue Prism Configuration
Before importing the Skill, which has been downloaded from the Digital Exchange, it is necessary that the following
information is obtained:
1. Service Account with access to the Vision API
The outlined credential requirements are explained in the next subsection.
If any conflict or overwrite messages appear during import, then please refer to the Release Manager section in the
Product Help.
3.2.1. Credentials
An individual credential, defined and stored in Credential Manager, will hold the Service Account information
needed to form an OAuth 2.0 request which responds with a bearer token. Each action has a common parameter
named “OAuth 2 (JWT Bearer Token) Authentication Credential Name” required to authenticate against the Google
Vision API.
The credential will be imported along with the skill and can then be configured for your environment. The credential
will be of type “OAuth 2.0 (JWT Bearer Token)” and will be named “GoogleJWT”. Once imported, you could even
restrict the credential by robot, but that side of the configuration is down to you.
Figure 6.2.1.A
Commercial in Confidence Page 4 of 14
3.2.2. Configure Credential Details
The issuer is the email listed IAM Section of Google Cloud Platform -> Service Accounts. When you originally
created a Service Account it is also listed in the .json file which was downloaded on your local machine. Finally, the
private key is the private key listed in the .json file which was downloaded to your workstation as you created your
service account. When copy and pasting in the private key, you must include the following information:
• -----BEGIN PRIVATE KEY-----\n
• \n-----END PRIVATE KEY-----\n
Save the credential edits. The Google Vision Skill has now been correctly configured.
Commercial in Confidence Page 5 of 14
4. Using the Skill
The following section outlines the individual configuration and usage of each action in the Google Vision Skill. In
total, this Skill contains 8 actions:
• Detect Face
• Get Image Properties
• Safe Search Classification
• Label Entities
• Text Detection
• Logo Detection
• Landmark Detection
• Document Text Extraction (OCR)
4.1.Common Parameters
Parameter Direction Data Type Description
Image to Analyse In Binary The image that is sent to Googles Vision API to be analysed,
in Binary format.
OAuth 2 (JWT In Text The name of the credential which has the OAuth 2.0
Bearer Token) information used for authentication with Google
Authentication
Credential Name
Commercial in Confidence Page 6 of 14
4.2. Detect Face
This action extracts a rich set of visual features based on the image content related to the categories found, words
related to the image content, and finally a full image description of the content.
4.2.1. Request
Request image is sent via the common parameter listed in section 4.1
4.2.2.Response
Parameter Direction Data Type Description
Facial Properties Out Collection A collection which has all the analysed information
related to the image’s facial properties (if present).
Commercial in Confidence Page 7 of 14
4.3. Get Image Properties
This action extracts general attributed of the supplied image, such as dominant colours.
4.3.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.3.2.Response
Parameter Direction Data Type Description
Dominant Out Collection A collection which has all the analysed information
Images related to the dominant colours of the specified image
Commercial in Confidence Page 8 of 14
4.4. Safe Search Classification
This action detects explicit content such as adult content or violent content within an image.
4.4.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.4.2.Response
Parameter Direction Data Type Description
Safe Search Out Collection A collection which has all the analysed information
Annotation related to the potential non-safe imagery (if present).
Commercial in Confidence Page 9 of 14
4.5. Label Entities
This action detects broad sets of categories within an image, which range from modes of transportation to animals.
4.5.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.5.2.Response
Parameter Direction Data Type Description
Labelled Out Collection A collection which has all the analysed information
Annotations related to the image’s entity annotations (if present).
Commercial in Confidence Page 10 of 14
4.6. Text Detection
This action performs Optical Character Recognition. It detects and extracts text within an image with support for a
broad range of languages. It also features automatic language identification.
4.6.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.6.2.Response
Parameter Direction Data Type Description
Detected Text Out Text A collated text data item which has all found text in one
sentence.
Text Annotations Out Collection A collection which has all the analysed information
related to the line-by-line detected text (if present).
Commercial in Confidence Page 11 of 14
4.7. Logo Detection
This action detects popular product logos within an image.
4.7.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.7.2.Response
Parameter Direction Data Type Description
Logo Out Collection A collection which has all the analysed information related
Annotations to any found logos (if present).
Commercial in Confidence Page 12 of 14
4.8. Landmark Detection
This action detects popular natural and man-made structures within an image.
4.8.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.8.2.Response
Parameter Direction Data Type Description
Landmark Out Collection A collection which has all the analysed information
Annotations related to any detected landmarks (if present).
Commercial in Confidence Page 13 of 14
4.9. Document Text Extraction (OCR)
This action performs Optical Character Recognition. This feature detects dense document text in an image.
4.9.1. Request
Request image is sent via the common parameter listed in section 4.1.
4.9.2.Response
Parameter Direction Data Type Description
Detected Text Out Text A collated text data item which has all found text in one
sentence.
Document Text Out Collection A collection which has all the analysed information
Annotations related to the line-by-line detected text (if present).
Commercial in Confidence Page 14 of 14