Introduction to
Machine Learning
1
Objectives
• Understand machine learning
• Identify types of machine learning
• Explain terminology that are related to machine learning
• Understand SAS Viya for machine learning
2
What is Machine Learning?
Machine learning allows computers to learn
and infer from data.
Machine Learning gives computers the ability
to learn without being explicitly programmed.
3
Types of Machine Learning Algorithms
Semi-supervised Reinforcement
Supervised learning Unsupervised Learning
Learning Learning
• data points have known • data points have • falls in between • aims at using
outcome unknown outcome supervised and observations gathered
• model relationships and • trained with unlabelled unsupervised from the interaction with
dependencies between data • cost to label is quite high, the environment to take
the target prediction • used in pattern detection since it requires skilled actions that would
output and the input and descriptive modeling human experts to do that maximize the reward or
features • use techniques on the • for the model building minimize the risk
• main types: regression input data to mine for • agent in the algorithm
and classification rules, detect patterns, continuously learns from
and summarize and the environment in an
group the data points iterative fashion
• main types: clustering • agent learns from its
algorithms and experiences of the
association rule learning environment until it
algorithms explores the full range of
possible states
• Use in computer played
board games (Chess, Go),
robotic hands, and self-
driving cars
4
Types of Supervised Learning
Regression outcome is continuous
(numerical)
Classification outcome is a category
5
Machine Learning Vocabulary
Terms Description Synonyms
Target predicted category or value of the data (column to Response, Output, Dependent
predict) Variable, Labels
Features properties of the data used for prediction (non- Predictors, Input, Independent
target columns) Variables, Attributes
Examples Example: a single data point within the data (one Observation, Record, Instance,
row) Datapoint, Row
Label the target value for a single data point Answer, y-value, Category
6
Machine Learning Vocabulary
Target
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
7
Machine Learning Vocabulary
Features Target
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
8
Machine Learning Vocabulary
Features Target
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
6.9 3.1 4.9 1.5 versicolor
Example
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
9
Machine Learning Vocabulary
Features Target
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
Label
6.9 3.1 4.9 1.5 versicolor
Example
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
10
Supervised Learning Overview
Training
data with fit
+ mode model
answers l
Test/Actual
Application predict
data predicte
+ model
without d
answers answers
11
Regression: Numeric Answers
Training
movie data fit
+ mode model
with
l
revenue
Test/Actual
movie data Application predict predicte
(unknown
+ model
d
revenue) revenue
12
Classification: Categorical Answers
Training
labeled fit
+ mode model
data l
Test/Actual
Application predict
unlabeled + model labels
data
13
Classification: Categorical Answers
Training
emails labeled + mode fit
model
as spam/not l
spam
Test/Actual
Application predict
unlabeled + model spam or
emails not
spam
14
Model Fitting in Machine Learning
Machine Learning Pipeline Engine
Machine Learning in Our Daily Lives
Postal Mail
Spam Filtering Web Search
Routing
Movie Vehicle Driver
Fraud Detection
Recommendations Assistance
Web Speech
Social Networks
Advertisements Recognition
16