0% found this document useful (0 votes)

12 views9 pages

Exercise 2

The document outlines a demonstration for finding and identifying clues necessary for creating extractors in IBM's BigInsights Text Analytics. It details the steps for navigating the Web UI, creating a project, loading data files, and identifying positive and negative clues related to IBM Watson technology. The goal is to equip users with the skills to extract relevant information from documents effectively.

Uploaded by

amine Mokhtar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views9 pages

Exercise 2

Uploaded by

amine Mokhtar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

U n i t 2 T a s k a n a l ys i s

Demonstration 1
Finding and identifying clues

Positive clues: Watson, IBM, Technology, Solutions, Computer, System

False positive clues: Todd Watson, Research

Task Analysis © Copyright IBM Corporation 2015

Demonstration 1: Finding and identifying clues

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

© Copyright IBM Corp. 2012, 2015 2-9

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

Demonstration 1:
Finding and identifying clues

Purpose:
This demonstration will show you how to find and identify clues that are
needed for the extractor. In real life, this process would typically be done with
assistance from a subject matter expert, or someone who is familiar with the
documents that you are examining. Prior to starting this demonstration,
ensure that all the necessary Ambari services are up. If you had just
completed Demonstration 1, you are in good shape. Otherwise, refer to
demonstration 1 to get that set up.
User ids / Passwords
OS: biadmin/biadmin
Root: root/dalvm3
Ambari: admin/admin
BigInsights Home: guest/guest-password

Ambari Services Required:

- HDFS
- MapReduce2
- YARN
- Knox (also start the Demo LDAP service)
- BigInsights - Text Analytics
- BigInsights - Home

Task 1. Finding your way around the Web UI.

1. With the required services started, open up a new browser (or a new tab).
2. Go to the BigInsights - Home page. Use the bookmark saved in the Firefox
browser, or this URL:
https://ibmclass.localdomain:8443/gateway/default/BigInsightsWeb/index.html#/
welcome
3. Click on Text Analytics to load up the Web UI.
4. You have used this in the first demo, but let's spend a little more time on the
Web UI to make sure you know your way around. If you feel comfortable
enough, you may skip this task. The left side of the UI has your Projects and
Extractors. Click on the Ed Demo project to load it (if it wasn't already loaded).
This loads all your extractors onto the canvas. It also loads the documents that
were used in that project.
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

© Copyright IBM Corp. 2012, 2015 2-10

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

5. Click on the Extractor tab to see the list of the pre-built and custom-built
extractors. You can drag and drop these directly onto the canvas to start using
them.
6. On the canvas, select the Degree extractor.
7. Expand the Extractor Properties pane to see its settings. You may need to
resize by click and dragging the pane. Play around with this to get comfortable
in resizing the panes.
Note: You can only resize if the panel is expanded.
8. Under the Extractor Properties, there are three sub-tabs: General, Settings,
and Output. Click the General tab (if it isn't already on it).
9. On the General tab, you can edit the name, provide a description, or define
some tags to assist in being more easily searchable among the Extractor
catalogs. We will not do anything here, this is just for your information.
10. Click on the Settings tab. On here, you can modify the terms in the dictionary
(in this case) or if it was a different extractor, modify the settings of that one.
11. Click on the Output tab. Here is where you can specify the columns from the
extractor.
12. On the canvas, click on the Education History extractor and run it.
13. Go ahead and collapse the Extractor Properties and expand and resize the
Results pane so that it is more visible.
14. Each tab on the results pane comes from a single extractor. In our case, we
have a single union of multiple extractors, so we have single tab. Within that
one tab, however, we have multiple results, one for each of the extractors that
made up that union. Examine the results to see the various columns.
15. Click on any row and you will see that the results are highlighted within the
document on the Documents pane (on the right).
16. Remember, you have the option to export your results as a CSV file for further
analysis with a different tool.
17. On the Documents pane, you can toggle between single document view and
multiple document views. Go ahead and click on it to see it in action.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

© Copyright IBM Corp. 2012, 2015 2-11

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

18. Next to that is another button, Show Extractor Name. This is a nice little
feature that tells you which extractor found the results. For example, select one
of the rows from the Results pane.

19. Now click the Show Extractor Name button to see which extractor it was:

Obviously, in this case, we only had one extractor, but if you ran with multiple
extractors, you can use this to find out which one captured that result. This can
help with debugging if you end up finding terms that should or shouldn't be part
of the result set.
20. Finally, the third button is the Remove tag / Remap tag. This is used for
documents where you may have tags, such as XML documents.
21. If you need additional help, at the upper right corner, there is a dropdown icon.
Click on that and you can visit the help section for Text Analytics.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

© Copyright IBM Corp. 2012, 2015 2-12

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

Task 2. Creating the Watson project.

1. On the Project pane, click the green plus.
2. Specify the name Watson for the project.

Task 3. Loading the data files.

1. On the Documents pane, click the green plus.

2. Specify the file type as Text files and the file location as Local files. Click
Browse to select the files.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

© Copyright IBM Corp. 2012, 2015 2-13

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

3. Navigate to /home/biadmin/labfiles/WatsonData/Data/.

4. Select all the files. Use CTRL + A to select all the files and click Open.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

5. Click Add to add the files.

6. The documents are loaded.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

Task 4. Identifying and creating a list of the clues.

In this task, you will be creating a list of clues that you will use to create your
extractor. Your test data consists of a number of files that are actually a collection
of blogs and news posts retrieved from various social media sites using the
BigInsights sample Boardreader application. Each post is stored in an XML
encoded format. You use this test data to find examples of the type of information
that you want to extract and build your extractors based on those examples.
1. Locate the file SM001.txt. Select that file and choose the Single View to show
only one document at a time.

2. Next, make it easier to read by removing the tags by clicking on the Remove
tag icon.

3. This is a copy of the text:

The University of Rochester (UR) Simon School of Business and IBM
today announced winners of the first Watson academic case competition.
Part of a series for students studying a variety of academic
concentrations, the competition develops new ideas for harnessing IBM
Watson technology to solve daunting societal and business challenges
while helping students advance technology and business skills for jobs of
the future.
4. Since the goal of the task is to find social data that references IBM Watson, the
first snippet of interest would naturally be the word Watson. Make a note of this
word in a Notepad or a text editor of your choice. We'll keep a running note
here:
Positive clues: Watson
5. It is easy for you, as a human being, to scan through these files and find those
that are referencing the Watson technology as opposed to someone’s name or
a place. But that same innate capability does not exist for a computer. You are
going to have to give the computer both positive and negative clues for it to be
able to recognize the appropriate Watson reference.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 T a s k a n a l ys i s

The first reference to Watson in the text was related to a competition. The
second reference was IBM Watson technology. This is a reference in which we
have an interest. And there are two clues that are of value, IBM and technology.
It is the word Watson in context with these clue words that allow us to make the
assumption as to the meaning of the word, Watson, used here.
Positive clues: Watson, IBM, Technology
6. Locate the SM010.txt file.
7. Examine the file and take note of the words Solutions and computer. These
clues also relates to the Watson technology and will help the computer figure
out if the Watson within the document is the Watson we want.
Positive clues: Watson, IBM, Technology, Solutions, Computer
8. Locate the SM005.txt file.
9. Examine this file and take note of the word System.
Positive clues: Watson, IBM, Technology, Solutions, Computer, System
10. Locate the SM011.txt file.
11. Examine the document and take note of the word Jeopardy
Positive clues: Watson, IBM, Technology, Solutions, Computer, System,
Jeopardy
12. Locate the SM063.txt.
13. Here we will look for some negative clues, or clues that may give false positives
(e.g. returning Watson where it does not have anything to do with technology,
but rather, a person's name or something of that nature).
False positive clues: Todd Watson
14. Locate the SM121.txt. It's on page 25 if you are searching by page number.
15. In this file you have Watson Research Center in Yorktown Heights. Research
would be another good false positive:
False positive clues: Todd Watson, Research
16. At this point, we have enough information to work with to demonstrate the
capability of BigInsights Text Analytics.
Results:
At the end of this demo, you should be able to identify clues that are needed
for the extractors. You understand that typically, this process would involve
someone who is familiar with the documents, such as a subject matter expert.

This material is meant for IBM Academic Initiative use only. NOT FOR RESALE

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Exercise 1
No ratings yet
Exercise 1
20 pages
Exercise 5
No ratings yet
Exercise 5
8 pages
Exercise 4
No ratings yet
Exercise 4
5 pages
Exercise 3
No ratings yet
Exercise 3
4 pages
Auto Insurance Fraud Data Cleaning
No ratings yet
Auto Insurance Fraud Data Cleaning
4 pages
Course Guide
No ratings yet
Course Guide
148 pages
Course 2 - Intro To Data Science
50% (2)
Course 2 - Intro To Data Science
113 pages
Reference Data Collections and Custom Rules PDF
No ratings yet
Reference Data Collections and Custom Rules PDF
21 pages
Lab Guide Create A Guardium Query and Report
100% (1)
Lab Guide Create A Guardium Query and Report
23 pages
0A057 Course Guide DES
No ratings yet
0A057 Course Guide DES
152 pages
Artificial Intelligence Analyst (27nov20)
No ratings yet
Artificial Intelligence Analyst (27nov20)
2 pages
Front Cover: What's New in IBM I 7.3 and IBM POWER8 Systems
No ratings yet
Front Cover: What's New in IBM I 7.3 and IBM POWER8 Systems
29 pages
Ibm Biginsights For Data Scientists
No ratings yet
Ibm Biginsights For Data Scientists
26 pages
Analysts Notebook PDF
No ratings yet
Analysts Notebook PDF
298 pages
Practical Text Analytics
No ratings yet
Practical Text Analytics
32 pages
Using Generative AI To Derive Insights From Data - Orientation Deck
No ratings yet
Using Generative AI To Derive Insights From Data - Orientation Deck
23 pages
DA Unit 3,4,5 Notes
No ratings yet
DA Unit 3,4,5 Notes
54 pages
Exercise 3 Ibm Biginsights
No ratings yet
Exercise 3 Ibm Biginsights
12 pages
Exercise 2
No ratings yet
Exercise 2
8 pages
WB 8451 Exercises
No ratings yet
WB 8451 Exercises
55 pages
AD Module 4 Where Used and Text Search
No ratings yet
AD Module 4 Where Used and Text Search
45 pages
02 - Lab - Creating A Simple Mapping
No ratings yet
02 - Lab - Creating A Simple Mapping
11 pages
1.0 - Welcome
No ratings yet
1.0 - Welcome
18 pages
Introduction To Using SAS Enterprise Guide For Statistical Analysis
No ratings yet
Introduction To Using SAS Enterprise Guide For Statistical Analysis
19 pages
AI Analyst Lab 2
No ratings yet
AI Analyst Lab 2
32 pages
BDA Assignment QP-3 IT B With Key Solutions
No ratings yet
BDA Assignment QP-3 IT B With Key Solutions
7 pages
Informatica
100% (1)
Informatica
70 pages
NLP Unit4 Mat
No ratings yet
NLP Unit4 Mat
13 pages
IBM Cognos Framework Manager
0% (1)
IBM Cognos Framework Manager
456 pages
NetMiner Script Function Reference
No ratings yet
NetMiner Script Function Reference
145 pages
Informatica Power Center 9.0.1: Building Financial Data Mode - Lab#29
No ratings yet
Informatica Power Center 9.0.1: Building Financial Data Mode - Lab#29
23 pages
Unit 2
No ratings yet
Unit 2
5 pages
Enhancing Information Extraction Process in Job Recommendation Using Semantic Technology
No ratings yet
Enhancing Information Extraction Process in Job Recommendation Using Semantic Technology
11 pages
Microsoft Azure AI-900
No ratings yet
Microsoft Azure AI-900
37 pages
18 Bigsheets
No ratings yet
18 Bigsheets
29 pages
5-W26 - Integration Core Supplemental Guide.2016.17
No ratings yet
5-W26 - Integration Core Supplemental Guide.2016.17
78 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
QRC - Workstation 9.0 All About Processing August 25, 2021
No ratings yet
QRC - Workstation 9.0 All About Processing August 25, 2021
4 pages
副本： FinBeacon
No ratings yet
副本： FinBeacon
116 pages
AD Module 2 Eclipse Navigation and Introduction To Call Graphs
No ratings yet
AD Module 2 Eclipse Navigation and Introduction To Call Graphs
55 pages
Getting Started With IBM Watson Analytics
No ratings yet
Getting Started With IBM Watson Analytics
45 pages
Informatica Data Quality Check
100% (3)
Informatica Data Quality Check
45 pages
SE Sec Ejercicio
0% (1)
SE Sec Ejercicio
57 pages
Uk 2022 Transformer Guide 1spatial Version
No ratings yet
Uk 2022 Transformer Guide 1spatial Version
64 pages
Book Summary and Questions
No ratings yet
Book Summary and Questions
8 pages
Course 1 Guide
No ratings yet
Course 1 Guide
432 pages
Re: How Can We Run Workflow With PMCMD?: - You - Work - Directly - by - Using - Remote - Connections - HTM
No ratings yet
Re: How Can We Run Workflow With PMCMD?: - You - Work - Directly - by - Using - Remote - Connections - HTM
3 pages
9713 Y08 SP 2
No ratings yet
9713 Y08 SP 2
6 pages
9713 Y08 SP 2 PDF
No ratings yet
9713 Y08 SP 2 PDF
6 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Science Document Processing & Structuring Project
No ratings yet
Data Science Document Processing & Structuring Project
6 pages
Developing and Managing A BI Semantic Model
No ratings yet
Developing and Managing A BI Semantic Model
12 pages
Unit 4
No ratings yet
Unit 4
16 pages
SK6805MICRO LED Specification
No ratings yet
SK6805MICRO LED Specification
18 pages
Kioxia SSD XG6-P Product Brief
No ratings yet
Kioxia SSD XG6-P Product Brief
2 pages
LPM 211 Poultry
No ratings yet
LPM 211 Poultry
214 pages
Hygromatik Electrode Steam Humidifiers EU 2011
No ratings yet
Hygromatik Electrode Steam Humidifiers EU 2011
6 pages
Data Download HGM9510 V1.2 en
100% (1)
Data Download HGM9510 V1.2 en
64 pages
Export Import and Countertrade
No ratings yet
Export Import and Countertrade
32 pages
STD Blanket MSDS FOR TURBINE INSULATION
No ratings yet
STD Blanket MSDS FOR TURBINE INSULATION
7 pages
Acquiring Skills in Basketball Through Observational Learning
No ratings yet
Acquiring Skills in Basketball Through Observational Learning
19 pages
Conceptual Framework
No ratings yet
Conceptual Framework
12 pages
Kirin PDF
No ratings yet
Kirin PDF
28 pages
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
No ratings yet
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
7 pages
NTU Academic Calendar (Semester) - AY2018-19 PDF
No ratings yet
NTU Academic Calendar (Semester) - AY2018-19 PDF
1 page
Binomail Distribution
No ratings yet
Binomail Distribution
37 pages
73 1st Long Problem Set
No ratings yet
73 1st Long Problem Set
11 pages
WILP Brochure
No ratings yet
WILP Brochure
20 pages
Form Mechanics Lien Claim
No ratings yet
Form Mechanics Lien Claim
3 pages
Chapter 1 Governments and Individuals PDF
No ratings yet
Chapter 1 Governments and Individuals PDF
24 pages
Factors and Norms Influencing Unpaid Care Work
No ratings yet
Factors and Norms Influencing Unpaid Care Work
64 pages
PP Riseofchina
No ratings yet
PP Riseofchina
16 pages
MATH9944-Chapter Summary-5144
No ratings yet
MATH9944-Chapter Summary-5144
16 pages
Airport Terminal Standard Dimensions
No ratings yet
Airport Terminal Standard Dimensions
2 pages
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
No ratings yet
Escp European Standard Clinical Practice Recommendations For Non Hodgkin Lymphoma of Childhood and
45 pages
Sulfuro Hach Dr3900
No ratings yet
Sulfuro Hach Dr3900
6 pages
Abl90 Manual Operação
No ratings yet
Abl90 Manual Operação
59 pages
Counter Rust 7010 TDS
No ratings yet
Counter Rust 7010 TDS
2 pages
Bank Deposit Secrecy Law Overview
No ratings yet
Bank Deposit Secrecy Law Overview
7 pages
Wizz Account Terms and Conditions
No ratings yet
Wizz Account Terms and Conditions
7 pages
4th Sem Exam Fees Paid Yogi
No ratings yet
4th Sem Exam Fees Paid Yogi
1 page
Avr Libc User Manual 1.4.6
No ratings yet
Avr Libc User Manual 1.4.6
372 pages
Ups Lyonn Rackeable
No ratings yet
Ups Lyonn Rackeable
2 pages

Exercise 2

Uploaded by

Exercise 2

Uploaded by

U n i t 2 T a s k a n a l ys i s

Positive clues: Watson, IBM, Technology, Solutions, Computer, System

Task Analysis © Copyright IBM Corporation 2015

Demonstration 1: Finding and identifying clues

© Copyright IBM Corp. 2012, 2015 2-9

Ambari Services Required:

Task 1. Finding your way around the Web UI.

© Copyright IBM Corp. 2012, 2015 2-10

© Copyright IBM Corp. 2012, 2015 2-11

© Copyright IBM Corp. 2012, 2015 2-12

Task 2. Creating the Watson project.

Task 3. Loading the data files.

© Copyright IBM Corp. 2012, 2015 2-13

© Copyright IBM Corp. 2012, 2015 2-14

5. Click Add to add the files.

6. The documents are loaded.

© Copyright IBM Corp. 2012, 2015 2-15

Task 4. Identifying and creating a list of the clues.

3. This is a copy of the text:

© Copyright IBM Corp. 2012, 2015 2-16

© Copyright IBM Corp. 2012, 2015 2-17

You might also like