ECE 498: ST:
Introduction to Applied Machine Learning
• Tao Han, Ph.D.
• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology
• https://tao-han-njit.netlify.app
Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
http://weebly110810.weebly.com/3
96403913129399.html
http://www.sucaitianxia.com/png/c
Transfer Learning artoon/200811/4261.html
Dog/Cat
Classifier
cat dog
Data not directly related to the task considered
elephant tiger dog cat
Similar domain, different tasks Different domains, same task
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled
Model Fine-tuning
labelled
Target Data
unlabeled
Warning: different terminology in different literature
Model Fine-tuning
One-shot learning: only a few
examples in target domain
• Task description
• Source data: 𝑥 𝑠 , 𝑦 𝑠 A large amount
• Target data: 𝑥 𝑡 , 𝑦 𝑡 Very little
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Conservative Training
Output layer output close Output layer
parameter close
initialization
Input layer Input layer
Target data (e.g.
Source data
A little data from
(e.g. Audio data of
target speaker)
Many speakers)
Layer Transfer
Output layer Copy some parameters
Target data
Input layer 1. Only train the rest layers (prevent
Source
data overfitting)
2. fine-tune the whole network (if
there is sufficient data)
Layer Transfer
• Which layer can be transferred (copied)?
• Image: usually copy the first few layers
Pixels Layer 1 Layer 2 Layer L
x1 …… ……
x2 …… elephant
……
……
……
……
xN …… ……
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
unlabeled
Warning: different terminology in different literature
Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Task A Task B
Task A Task B
Input
Input feature Input feature
feature
for task A for task B
Multitask Learning
- Multilingual Speech Recognition
states of states of states of states of states of
French German Spanish Italian Mandarin
Human languages
share some common
characteristics.
acoustic features
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
50
Character Error Rate
45
40 Mandarin
only
35
With
30 European
Language
25
1 10 100 1000
Hours of training data for Mandarin
Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
Domain-adaptation
unlabeled
Warning: different terminology in different literature
You have learned a lot about ML. Training a classifier is
not a big deal for you. ☺
Training
Data
Testing
Data
99.5% 57.5%
The results are from: http://proceedings.mlr.press/v37/ganin15.pdf
Domain shift: Training and testing data have different
distributions. Domain adaptation
Domain Shift
Training Data Testing Data
Source Target
Domain Domain
1 2 3 4 5 1 2 3 4 5
This is “0”. This is “1”.
Source Domain
Domain Adaptation (with labeled data)
“4” “0” “1”
Knowledge of target domain
• Idea: training a model by source data,
“8”
then fine-tune the model by target data
• Challenge: only limited target data, so be Little but
careful about overfitting labeled
Source Domain
Domain Adaptation (with labeled data)
“4” “0” “1”
Knowledge of target domain
“8”
Large amount of Little but
unlabeled data labeled
Basic Idea Learn to ignore colors
Feature
Extractor feature
(network)
Source
The same
Different
distribution
Target
Feature
Extractor feature
(network)
Domain Adversarial Training
image class distribution
Feature Label
“4”
Extractor Predictor
Source
(labeled)
blue points
Target
(unlabeled)
red points
Domain Adversarial Training
𝜃𝑓∗ = min 𝐿 − 𝐿𝑑 always zero?
𝜃𝑓
𝜃𝑓 𝜃𝑝
Feature Label
“4”
Extractor Predictor
𝐿
Generator 𝜃𝑝∗ = min 𝐿
𝜃𝑝
• Feature extractor: Learn 𝜃𝑑∗ = min 𝐿𝑑
𝜃𝑑
𝐿𝑑
to “fool” domain classifier 𝜃𝑑
Domain Source?
• Also need to support Classifier Target?
label predictor
Discriminator
Domain Adversarial Training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
class 1 (source) Target data
class 2 (source) (class unknown)
Limitation
Decision boundaries learned
from source domain
Source and target data Target data (unlabeled
are aligned, but …… far from boundary)
Considering Decision Boundary
unlabeled Small entropy
Feature Label
Extractor Predictor
1 2 3 4 5
unlabeled
Large entropy
Feature Label
Extractor Predictor
1 2 3 4 5
Used in Decision-boundary Iterative Refinement Training with
a Teacher (DIRT-T) https://arxiv.org/abs/1802.08735
Maximum Classifier Discrepancy https://arxiv.org/abs/1712.02560
Transfer Learning - Overview
Source Data (not directly related to the task)
labelled
Model Fine-tuning
labelled
Multitask Learning
Target Data
Domain-adaptation
unlabeled
Zero-shot learning
Warning: different terminology in different literature
http://evchk.wikia.com/wiki/%E8%8
Zero-shot Learning D%89%E6%B3%A5%E9%A6%AC
• Source data: 𝑥 𝑠 , 𝑦 𝑠 Training data Different
• Target data: 𝑥 𝑡 Testing data tasks
𝑥 𝑠: …… 𝑥𝑡 :
𝑦𝑠: cat dog …… Alpaca
How we solve this problem?
Zero-shot Learning
• Representing each class by its attributes
Training
1 0 0 1 1 1 Database
furry 4 legs tail furry 4 legs tail
attributes
furry 4 legs tail …
Dog O O O
NN NN
class Fish X X O
Chimp O X X
…
sufficient attributes for one
to one mapping
Zero-shot Learning
• Representing each class by its attributes
Testing Find the class with the most
similar attributes
0 0 1
furry 4 legs tail attributes
furry 4 legs tail …
Dog O O O
NN
class Fish X X O
Chimp O X X
…
sufficient attributes for one
to one mapping
𝑓 ∗ and g ∗ can be NN.
Zero-shot Learning Training target:
𝑓 𝑥 𝑛 and 𝑔 𝑦 𝑛 as
close as possible
• Attribute embedding
x2 y1 (attribute
of chimp) y2 (attribute
x1 of dog)
2
𝑓 𝑥2 𝑔 𝑦
𝑓 𝑥1 𝑔 𝑦1
y3 (attribute of 𝑔 𝑦3 𝑓 𝑥3
x3
Alpaca)
Embedding Space
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M.
Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS
2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia
Schmid, “Label-Embedding for Attribute-Based Classification”,
CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff
Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep
Visual-Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey
Dean, “Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016