Artificial Intelligence
Lecture 8
Bicol University College of Science
1st Semester 2021-2022
Machine Learning Overview
Paradigm
Traditional approach
5
Machine learning approach
6
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense
out of data
• Extracting patterns, fitting data to functions,
classifying data, etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real
noise data.
8
ML in real-life
9
ML in a Nutshell
• Tens of thousands of machine learning algorithms
• Hundreds new every year
• Every machine learning algorithm has three
components:
– Representation
– Evaluation
– Optimization
ML Components
• Representation
– Numerical functions
●
Linear regression
●
Neural networks
●
Support vector machines
– Symbolic functions
●
Decision trees
●
Sets of rules / Logic programs
– Instance-based functions
●
Nearest-neighbor
●
Case-based
– Probabilistic Graphical Models
●
Naïve Bayes
●
Bayesian networks
●
Hidden-Markov Models (HMMs)
●
Probabilistic Context Free Grammars (PCFGs)
●
Markov networks
ML Components
• Various Search/Optimization Algorithms
– Gradient descent
●
Perceptron
●
Backpropagation
– Dynamic Programming
●
HMM Learning
●
PCFG Learning
– Divide and Conquer
●
Decision tree induction
●
Rule learning
– Evolutionary Computation
●
Genetic Algorithms (GAs)
●
Genetic Programming (GP)
●
Neuro-evolution
ML Components
• Evaluation
– Accuracy
– Precision and recall
– Squared error
– Likelihood
– Posterior probability
– Cost / Utility
– Margin
– Entropy
– K-L divergence
– Etc.
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
– regression: predict numerical values
– classification: predict categorical values, i.e., labels
• Unsupervised learning
– Training data does not include desired outputs
– clustering: group data according to "distance"
– association: find frequent co-occurrences
– link prediction: discover relationships in data
– data reduction: project features to fewer features
• Reinforcement learning
– Rewards from sequence of actions
Classification
Object recognition
https://ai.googleblog.com/
2014/09/building-deeper-u
nderstanding-of-images.ht
ml
15
Reinforcement
learning
Learning to play Break Out
https://www.youtube.com/
watch?v=V1eYniJ0Rnk
16
Clustering
Crime prediction using k-
means clustering
http://www.grdjournals.co
m/uploads/article/GRDJE/V
02/I05/0176/GRDJEV02I05
0176.pdf
17
Machine learning algorithms
• Regression:
Ridge regression, Support Vector Machines, Random Forest,
Multilayer Neural Networks, Deep Neural Networks, ...
• Classification:
Naive Base, , Support Vector Machines,
Random Forest, Multilayer Neural Networks,
Deep Neural Networks, ...
• Clustering:
k-Means, Hierarchical Clustering, ...
18
Issues
• Many machine learning/AI projects fail
(Gartner claims 85 %)
• Ethics, e.g., Amazon has/had
sub-par employees fired by an AI
automatically
19
Reasons for failure
• Asking the wrong question
• Trying to solve the wrong problem
• Not having enough data
• Not having the right data
• Having too much data
• Hiring the wrong people
• Using the wrong tools
• Not having the right model
• Not having the right yardstick
20
Frameworks
• Programming languages
– Python
– R Fast-evolving ecosystem!
– C++
– ...
• Many libraries classic machine
– scikit-learn learning
– PyTorch
deep learning
– TensorFlow
frameworks
– Keras
– …
21
scikit-learn
• Nice end-to-end framework
– data exploration (+ pandas + holoviews)
– data preprocessing (+ pandas)
●
cleaning/missing values
●
normalization
– training
– testing
– application
• "Classic" machine learning only
• https://scikit-learn.org/stable/
22
Keras
• High-level framework for deep learning
• TensorFlow backend
• Layer types
– dense
– convolutional
– pooling
– embedding
– recurrent
– activation
– …
• https://keras.io/
23
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering
2.5
2.0
1.5
irisCluster$cluster
Petal.Width
1
1.0
0.5
0.0
2 4 6
Petal.Length
24
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression
25
Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of
descriptive features) is part of (discrete value).
virginica
0.0 4.0 96.0
Percent
100
75
Predicted
versicolor
0.0 96.0 4.0 50
• Regression 25
• Predicts continuous values.
setosa
100.0 0.0 0.0
setosa versicolor virginica
Actual
26
Machine Learning as a Process
- Define measurable and quantifiable goals
Define
- Use this stage to learn about the problem
Objectives
- Normalization
- Transformation
Model - Missing Values
Deployment Data - Outliers
Preparation
- Study models accuracy
- Work better than the naïve - Data Splitting
approach or previous system - Features Engineering
- Do the results make sense in - Estimating Performance
the context of the problem - Evaluation and Model
Model Model Selection
Evaluation Building
27
ML as a Process: Data Preparation
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the
model performance
• Time on data preparation should not be underestimated
• Missing Values • Scaling
• Error Values • Centering
Raw • Different Scales Data
Transform
• Skewness Data Modeling
Data • Dimensionality
• Types Problems ation
• Outliers
• Missing Values
Ready phase
• Many others • Errors
28
ML as a Process: Feature engineering
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative
predictors
Algorithms
Multiple
that use
models
Wrappers adding and
removing
models as
input and
Genetics
Algorithms
performance
parameter
as output
• Binning predictors
Evaluate the Based
Filters relevance of
the predictor
normally on
correlations
29
View of Std ML Datasets
- a Single Table (2D array)
Output
Feature 1 Feature 2 Feature N
... Category
Example 1 0.0 small red true
Example 2 9.3 medium red false
Example 3 8.2 small blue false
...
Example M 5.7 medium green true
ML as a Process: Model Building
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem
space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
31
Nearest Neighbors: Basic Algorithm
for Classification
• Find the K nearest neighbors to
test-set example
• Or find all ex’s within radius R
• Combine their ‘votes’
– Most common category
– Average value (real-valued prediction) +
- -
-
– Can also weight votes by distance ?
+ -
– Lots of variations on basic theme
Simple Example: 1-NN
(1-NN ≡ one nearest neighbor)
Training Set
1. a=0, b=0, c=1 +
2. a=0, b=0, c=0 -
3. a=1, b=1, c=1 -
Test Example
a=0, b=1, c=0 ?
“Hamming Distance” (# of different bits)
Ex 1 = 2
Ex 2 = 1 So output -
Ex 3 = 2
From neurons to ANNs
𝑥1 inspiration
𝑤1
𝑥2 𝑤2
𝑦 𝑁 𝜎 (𝑥 )
𝑥3
𝑤3 𝑦 =𝜎 ( ∑ 𝑤 𝑖 𝑥𝑖 + 𝑏
𝑖=1
) activation function
𝑏
+1
𝑥
...
𝑤𝑁
𝑥𝑁
34
Multilayer network
How to determine
weights?
35
Training: backpropagation
• Initialize weights "randomly"
• For all training epochs
• for all input-output in training set
• using input, compute output
(forward)
• compare computed output with
training output
• adapt weights (backward) to
improve output
• if accuracy is good enough, stop
36
Task: handwritten digit
recognition
• Input data
• grayscale image
• Output data
• digit 0, 1, ..., 9
• Training examples
• Test examples
37
Deep neural networks
• Many layers
• Features are learned, not given
• Low-level features combined into
high-level features
• Special types of layers
• convolutional
• drop-out
• recurrent
• ...
39
Convolutional neural
networks
1 ⋯ 0
[ ]
⋮ ⋱ ⋮
0 ⋯ 1
40
Convolution examples
1 ⋯ 0 1 ⋯ 0
[ ]
⋮ ⋱ ⋮
0 ⋯ 1 [ ]
⋮ ⋱ ⋮
0 ⋯ 1
0 ⋯ 1 0 ⋯ 1
[ ]
⋮ ⋱ ⋮
1 ⋯ 0 [ ]
⋮ ⋱ ⋮
1 ⋯ 0
41
Task: sentiment <start> this film was just
brilliant casting location
classification scenery story direction
everyone's really suited the
part they played and you
could just imagine being
• Input data there Robert redford's is an
amazing actor and now the
• movie review (English) same being director
norman's father came from
• Output data the same scottish island as
/
myself so i loved the fact
there was a real connection
with this
• Training examples film the witty remarks
throughout the film were
• Test examples great it was
just brilliant so much that
i bought the film as soon as
it
43
Word
embedding
• Represent words as
one-hot vectors
length = vocabulary
size
Issues:
• unwieldy
• no semantics
• Word embeddings
• dense vector
• vector distance
semantic distance
• Training
• use context
• discover relations with
surrounding words
44
End