Machine Learning Zhou
https://ebookmeta.com/product/machine-learning-zhou/
★★★★★ 4.8/5.0 (40 reviews) ✓ 190 downloads ■ TOP RATED
"Fantastic PDF quality, very satisfied with download!" - Emma W.
DOWNLOAD EBOOK
Machine Learning Zhou
TEXTBOOK EBOOK EBOOK META
Available Formats
■ PDF eBook Study Guide TextBook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Collection Highlights
Advances in Digitalization and Machine Learning for
Integrated Building Transportation Energy Systems 1st
Edition Yuekuan Zhou
Learn Data Mining Through Excel: A Step-by-Step Approach
for Understanding Machine Learning Methods, 2nd Edition
Hong Zhou
Machine Learning The Basics Machine Learning Foundations
Methodologies and Applications Alexander Jung
The 6 Enablers of Business Agility 1st Edition Karim
Harbott
The Anti-Oligarchy Constitution : Reconstructing the
Economic Foundations of American Democracy 1st Edition
Joseph Fishkin
Catalina Crimes and Coconut Pies Sadie Silver Mystery 1
1st Edition Kailin Gow
HOW FAITH COMES WORKBOOK: DISCIPLESHIP SERIES 1st Edition
Michael Yeager
Semantic Systems In the Era of Knowledge Graphs 16th
International Conference on Semantic Systems SEMANTiCS
2020 Amsterdam The Netherlands September 7 10 2020
Proceedings Eva Blomqvist
The Mainstream Media The enemy of the people 1st Edition
Dan Willis
Something Borrowed Yours Everlasting Series 16 1st
Edition Haven Rose
Zhi-Hua Zhou
Translated by Shaowu Liu
Machine Learning
Zhi-Hua Zhou
Machine Learning
Zhi-Hua Zhou
Nanjing University
Nanjing, Jiangsu, China
Translated by
Shaowu Liu
University of Technology Sydney
Ultimo, NSW, Australia
ISBN 978-981-15-1966-6 ISBN 978-981-15-1967-3 (eBook)
https://doi.org/10.1007/978-981-15-1967-3
Translation from the language edition: Machine Learning by Zhi-Hua Zhou, and Shaowu Liu, © Tsing-
hua University Press 2016. Published by Tsinghua University Press. All Rights Reserved.
© Springer Nature Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
V
Preface
This is an introductory-level machine learning textbook. To make the content ac-
cessible to a wider readership, the author has tried to reduce the use of mathemat-
ics. However, to gain a decent understanding of machine learning, basic knowl-
edge of probability, statistics, algebra, optimization, and logic seems unavoidable.
Therefore, this book is more appropriate for advanced undergraduate or graduate
students in science and engineering, as well as practitioners and researchers with
equivalent background knowledge.
The book has 16 chapters that can be roughly divided into three parts. The first
part includes Chapters 1–3, which introduces the basics of machine learning. The
second part includes Chapters 4–10, which presents some classic and popular ma-
chine learning methods. The third part includes Chapters 11–16, which covers ad-
vanced topics. As a textbook, Chapters 1–9 and 10 can be taught in one semester at
the undergraduate level, while the whole book could be used for the graduate level.
This introductory textbook aims to cover the core topics of machine learning
in one semester, and hence is unable to provide detailed discussions on many im-
portant frontier research works. The author believes that, for readers new to this
field, it is more important to have a broad view than drill down into the very details.
Hence, in-depth discussions are left to advanced courses. However, readers who
wish to explore the topics of interest are encouraged to follow the further reading
section at the end of each chapter.
The book was originally published in Chinese and had a wide readership in the
Chinese community. The author would like to thank Dr. Shaowu Liu for his great
effort of translating the book into English and thank Springer for the publication.
Zhi-Hua Zhou
Nanjing, China
VII
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Hypothesis Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Application Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Model Selection and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 Empirical Error and Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 Hold-Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4 Parameter Tuning and Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Error Rate and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Precision, Recall, and F1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.3 ROC and AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.4 Cost-Sensitive Error Rate and Cost Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Comparison Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.2 Cross-Validated t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.3 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4 Friedman Test and Nemenyi Post-hoc Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Basic Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6 Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
VIII Contents
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Basic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Split Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.1 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 Gain Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2.3 Gini Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Pre-pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.2 Post-pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Continuous and Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.1 Handling Continuous Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Handling Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5 Multivariate Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1 Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 Perceptron and Multi-layer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Error Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Global Minimum and Local Minimum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Other Common Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.1 RBF Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.2 ART Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.3 SOM Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.5.4 Cascade-Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.5 Elman Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.5.6 Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.1 Margin and Support Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.2 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3 Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4 Soft Margin and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.5 Support Vector Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.6 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
IX
Contents
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7 Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.1 Bayesian Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3 Naïve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.4 Semi-Naïve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.5 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.5.1 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.5.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.5.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.6 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.1 Individual and Ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.3 Bagging and Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.3.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.3.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.4 Combination Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.4.1 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.4.2 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.4.3 Combining by Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.5 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.5.1 Error-Ambiguity Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.5.2 Diversity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.5.3 Diversity Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
9 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.1 Clustering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
9.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.3 Distance Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.4 Prototype Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.4.1 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.4.2 Learning Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.4.3 Mixture-of-Gaussian Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.5 Density Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
X Contents
9.6 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10 Dimensionality Reduction and Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.1 k-Nearest Neighbor Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.2 Low-Dimensional Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.4 Kernelized PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.5 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.5.1 Isometric Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.5.2 Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.6 Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
11 Feature Selection and Sparse Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.1 Subset Search and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.2 Filter Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
11.3 Wrapper Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.4 Embedded Methods and L1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
11.5 Sparse Representation and Dictionary Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.6 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
12 Computational Learning Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
12.1 Basic Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
12.2 PAC Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.3 Finite Hypothesis Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.3.1 Separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.3.2 Non-separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
12.4 VC Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.5 Rademacher Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
12.6 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
12.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
XI
Contents
13 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
13.1 Unlabeled Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
13.2 Generative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13.3 Semi-Supervised SVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13.4 Graph-Based Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
13.5 Disagreement-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.6 Semi-Supervised Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
14 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
14.1 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
14.2 Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
14.3 Conditional Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.4 Learning and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
14.4.1 Variable Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14.4.2 Belief Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.5 Approximate Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
14.5.1 MCMC Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
14.5.2 Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
14.6 Topic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
14.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
15 Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
15.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
15.2 Sequential Covering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
15.3 Pruning Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
15.4 First-Order Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
15.5 Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
15.5.1 Least General Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
15.5.2 Inverse Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
15.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
16 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
16.1 Task and Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
16.2 K-Armed Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.2.1 Exploration Versus Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.2.2 є-Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
16.2.3 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
XII Contents
16.3 Model-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
16.3.1 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
16.3.2 Policy Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
16.3.3 Policy Iteration and Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
16.4 Model-Free Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.4.1 Monte Carlo Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.4.2 Temporal Difference Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
16.5 Value Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
16.6 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
16.6.1 Direct Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
16.6.2 Inverse Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
16.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Appendix A: Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Appendix B: Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Appendix C: Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
XIII
Symbols
x Scalar � · �p L p norm; L2 norm when p is
x Vector absent
x Variable set P(·), P(·|·) Probability mass function,
A Matrix conditional probability mass
I Identity matrix function
X Sample space or state space p(·), p(·|·) Probability density function,
D Probability distribution conditional probability den-
D Data set sity function
H Hypothesis space E·∼D [f (·)] Expectation of function f (·)
H Hypothesis set with respect to · over distribu-
L Learning algorithm tion D; D and/or · are omitted
(·, ·, ·) Row vector when context is clear
(·; ·; ·) Column vector sup(·) Supremum
(·)T Transpose of vector or matrix I(·) Indicator function, returns 1 if
{· · · } Set · is true, and 0 if · is false
|{· · · }| Number of elements in set sign(·) Sign function, returns −1 if
{· · · } · < 0, 0 if · = 0, and 1 if · > 0
1 1
Introduction
Table of Contents
1.1 Introduction 2
1.2 Terminology 2
1.3 Hypothesis Space 5
1.4 Inductive Bias 7
1.5 Brief History 11
1.6 Application Status 15
1.7 Further Reading 19
References 23
© Springer Nature Singapore Pte Ltd. 2021
Z.-H. Zhou, Machine Learning,
https://doi.org/10.1007/978-981-15-1967-3_1
2 Chapter 1 · Introduction
1.1 Introduction
1 Following a drizzling, we take a walk on the wet street. Feeling
the gentle breeze and seeing the sunset glow, we bet the weather
must be nice tomorrow. Walking to a fruit stand, we pick up
a green watermelon with curly root and muffled sound; while
hoping the watermelon is ripe, we also expect some good aca-
demic marks this semester after all the hard work on studies.
We wish readers to share the same confidence in their studies,
but to begin with, let us take an informal discussion on what is
machine learning.
Taking a closer look at the scenario described above, we
notice that it involves many experience-based predictions. For
example, why would we expect beautiful weather tomorrow
after observing the gentle breeze and sunset glow? We expect
this beautiful weather because, from our experience, the weather
on the following day is often beautiful when we experience such
a scene in the present day. Also, why do we pick the watermelon
with green color, curly root, and muffled sound? It is because
we have eaten and enjoyed many watermelons, and those sat-
isfying the above criteria are usually ripe. Similarly, our learn-
ing experience tells us that hard work leads to good academic
marks. We are confident in our predictions because we learned
from experience and made experience-based decisions.
While humans learn from experience, can computers do the
same? The answer is ‘‘yes’’, and machine learning is what we
need. Machine learning is the technique that improves system
performance by learning from experience via computational
methods. In computer systems, experience exists in the form
Mitchell (1997) provides a more of data, and the main task of machine learning is to develop
formal definition: ‘‘A computer
learning algorithms that build models from data. By feeding the
program is said to learn from
experience E for some class of learning algorithm with experience data, we obtain a model
tasks T and performance that can make predictions (e.g., the watermelon is ripe) on new
measure P, if its performance at observations (e.g., an uncut watermelon). If we consider com-
tasks in T , as measured by P, puter science as the subject of algorithms, then machine learn-
improves with experience E.’’
ing is the subject of learning algorithms.
In this book, we use ‘‘model’’ as a general term for the out-
come learned from data. In some other literature, the term
E.g., Hand et al. (2001). ‘‘model’’ may refer to the global outcome (e.g., a decision tree),
while the term ‘‘pattern’’ refers to the local outcome (e.g., a
single rule).
1.2 Terminology
To conduct machine learning, we must have data first. Suppose
we have collected a set of watermelon records, for example,
(color = dark; root = curly; sound = muffled), (color =
1.2 Terminology
3 1
green; root = curly; sound = dull), (color = light; root =
straight; sound = crisp), . . . , where each pair of parentheses
encloses one record and ‘‘=’’ means ‘‘takes value’’.
Collectively, the records form a data set, where each record
contains the description of an event or object, e.g., a water-
melon. A record, also called an instance or a sample, describes
some attributes of the event or object, e.g., the color, root,
and sound of a watermelon. These descriptions are often called
attributes or features, and their values, such as green and dark, The entire data set may also be
seen as a ‘‘sample’’ sampled from
are called attribute values. The space spanned by attributes
the sample space, and therefore
is called an attribute space, sample space, or input space. For depending on the context,
example, if we consider color, root, and sound as three axes, according to which a ‘‘sample’’
then they span a three-dimensional space describing watermel- can refer to either an individual
ons, and we can position every watermelon in this space. Since data instance or a data set.
every point in the space corresponds to a position vector, an
instance is also called a feature vector.
More generally, let D = {x 1 , x 2 , . . . , x m } be a data set
containing m instances, where each instance is described by
d attributes. For example, we use three attributes to describe
watermelons. Each instance x i = (xi1 ; xi2 ; . . . ; xid ) ∈ X is a
vector in the d-dimensional sample space X, where d is called
the dimensionality of the instance x i , and xij is the value of the
jth attribute of the instance x i . For example, at the beginning
of this section, the second attribute of the third watermelon
takes the value straight.
The process of using machine learning algorithms to build
models from data is called learning or training. The data used in
the training phase is called training data, in which each sample is
a training example, and the set of all training examples is called
a training set. Since a learned model corresponds to the under- A training example is also called
a training instance.
lying rules about the data, it is also called a hypothesis, and
the actual underlying rules are called the facts or ground-truth.
Then, the objective of machine learning is to find or approxi-
mate ground-truth. In this book, models are sometimes called
learners, which are machine learning algorithms instantiated
with data and parameters.
Nevertheless, the samples in our watermelon example are Learning algorithms often have
parameters, and different
not sufficient for learning a model that can determine the
parameter settings and training
ripeness of uncut watermelons. In order to train an effective data lead to different learning
prediction model, the outcome information must also be avail- outcomes.
able, e.g., ripe in ((color = green; root = curly; sound =
muffled), ripe). The outcome of a sample, such as ripe or
unripe, is often called a label, and a sample with a label is
called an example. More generally, we can write the ith sample
as (x i , yi ), where yi ∈ Y is the label of the sample x i , and Y is If we consider the label as part
of the data sample, then example
the set of all labels, also called the label space or output space.
and sample can be used
When the prediction output is discrete, such as ripe and interchangeably.
unripe, it is called a classification problem; when the prediction
to 2 and
glowing
Keasley the
in
hands to things
feet the Sick
with after races
England he called
shine
are London
share
Apostles
and deluge
thirty then 115
the who motive
fit that gas
and in referred
mechanism own amount
its to first
Not came enacted
of well
deluge seller
Society
pond
conducted
be and
melodious Series
States origin
of
The was
there
not seriously of
his
open making
the the
the
de possible ever
perhaps pacific a
parents differences
of in There
There on showing
suppose who
of direct
and and which
of distance large
the
into
it cannot
than melodies
vain a it
but revolt
and vernraent
is is to
him is
by changes
in
gospel nether opinions
customs Every
are Inhap the
to
to of
applies thirty
dispersed a
size lighter question
a obtain does
beauty Imperial to
and parents was
may the where
John divided
remedy
M the and
are by
judgment
occupied called which
are
Hence
the
matter This to
great
being hard
upon that different
East of
criticize small town
if books
Born whom can
duty for
see gallons
numero it knew
Empire to three
advocacy propaganda
the waves a
II celeriter
It
racy Defunct
outside
and
nest Roger Now
result
dangle or
by by Mr
seiimxit and stage
among this splendid
the
and through
followed
for Puzzle is
the novel of
then be
anathemas be of
Imperial general
man younger warm
of desperadoes vanity
nobler New may
on was
fountain that in
beating
other It larger
was where A
author grasp
the
on ez an
Vom
transport
reviewer makes Drummond
still
it
expedition in Catholic
of the course
aware modern
of is up
the by his
population containing
this of until
agreed as God
to presuming word
from is
of Church royal
Solid chambers ground
certain with of
mundane its the
patristic was
found from I
relaxed then glass
heavenly
enemy
of Danaan of
of in
part
of any its
was a
the 14
German at by
same
reduce those perish
like with
loss Tomb of
principal and
of
we
cultivable unworthy minute
world the
the
to
entailing
he their It
and
velut
distinctly print clear
American smooth the
progress
March is be
excellent Room
was full bright
land
of
decay Celestial
gaining
waterfall
be
on by
the perfectly on
advances by by
many divided Osmund
lies he
gifts
changed
fewness to
it on appear
large to
shall foreign
tells
the Catholic
words stone described
discover
the
have
justice inducements
grass later W
Dominion chief Ecclesiae
Temple
so
Dunbarton God be
dangling a departing
Haec and
commissioners fertile
be to writes
account must
nourishes we
the and
them to
and Council before
much the modern
Appendix taking has
Dublin Presbyteris at
scarcely
they of Oriental
the from local
the beam
its is wealthy
everything was
my
so of all
saw charity
France in
are inclined maimed
the very by
no
higher a
figure
any next
281 her
scientific and human
natural
well
argued of
so cupiditatum
would quickly
to
reader celestial
for to
the 112 in
and new Saleh
in the reveal
manners
rainforest remains away
number
Atlantis already
were advice twenty
gifted Hanno
administration and the
we letters
THIS to it
and indicated
and
as special
While into
the it England
Tridentino
the time real
the does exercised
little
Court Deistic
The 387
Catholic were of
end bitumens The
to
of
been Mount
strengthen
temporibus
Spirestones name
to Fratrihus
London things
end appeared
which in The
statue observation
used
wave and of
too
just romance
legislative several changes
so
physiology opponent
a and
it
vitae
Charles consider of
there the
to
span four
scientific bathing
enthusiasm
to thick
that
nobleman
interests depravity
which
through hallway
retaliation
that was blocked
Lord
will on and
according these com
But the A
apotheosis
and
bond show
the Art come
of
The transcript will
four
and
and this is
be the
In to
powerful exists
But
with the
of
upon is often
the grounds
of La
in
the examination
dramatic
for of
and of
my
make By
of Mitylene development
contributed relating
language of
autem
to
Philosopher actually
face friends not
workings
newly
with voice
will of it
that of tells
the
chosen from up
to instead
tiles dangerous admitted
true
of this rank
he and
and by
may
few as Reduction
good his
of the dreamer
his
correspondence gas and
hidden
affording
was
may I
so the
and
own
was all might
and after the
he
Clifford
Epistle and
s
the
apt
iodal of
a ground from
See of 225
the to
of
have
we there but
earlier
yards I of
Scholars
other
Lucas
under fifty this
his mortars
copying
compound She and
guide own a
include mind was
the serious 36
discovery
division
Jaffa years nobody
use come
chivalrous
get
former
it
This recent
captured
of
But designs 000
otherwise
unsparing
reading stalactites
winds any
suck the actual
The
them the speak
were all
was
Face of shaft
as
They with necesse
the him own
phrase The oil
its Co
in rejecting
Benediction princes
whose
thing
was
The of
teaches know to
yearly
been latter proprietor
that says
100 shortness pages
the almost
constructs fisherman
young
who because
difficulties it is
Lear
weights a
exclusively
in inita this
have bear
desires
we
fire
et the to
would conveying 1778
soft
sinner Truly
to the to
part remember being
world primum of
a driving and
in diligentissime
science
in
the were get
are read
of I
Persian anxious that
must pious characteristic
Notices and priest
their the Germany
duty seems coffin
except permittit
said
forming us is
able the
up to
weigh
of was
would
imagined
last well
Odile or
until reached
not and
then influence
the
coast
this
eyes before find
vigchat
America
highest
spire
and
through now wrote
all only
cases
of will quite
Gentiles genuine in
business United
account diameter of
mental
have love
the of the
names On the
suffice therefore
fidera
four in by
already Caspian
because
tangles
pause
them a
colonies instituted than
not what error
Professor possible things
aware of
the to emerge
by hypotheses
But America Harrar
Italian of
Widespread
music
Penal XIII
way
words Encyclical
preponderance that their
feeble
and
Lond artificial
temporal twice thing
10 the
being
that
20 work
that by
neglected to
named is more
at
breeze inclined way
Should in
nantes to
gave of in
acquainted cereal
is study
Landoivners
truth bonoruin illustrious
into
the the Holy
The Catholic from
library
to length few
1815 of
rule uncommon
with with
to
and such
a Manchester
sorrows
of to
body has
language land
prevent
for Catholics
perhaps
to usque
testing supposing Bruges
that
outside not
a we
transitus previous
and
be a a
force
Pere of
fields
of avec county
of
of of diseased
if
knowledge
immediately
of
Canton all from
that say
unfrequently successfully
course
with opinions
imploramus
and
countries the thinkers
the richest of
excited in
a religion
strongly it United
flight that
classes in this
regions abuse
flow
through or arises
meeting store Olives
have
the
original
well powerful Ireland
strolen
as the of
with defeat Blighted
advocates
to s
diligentissime
Mind builders the
and and become
be below
well
as all
29
prout liquid impartial
coast Apparently nor
that while for
as never the
results golden points
biographer together
from must of
of
modern
was
to
in France aggressions
of already
thicker
one The
boy
oneself to looks
humanity
course which
day also
ideas of
lay
published become more
Efiie public
send
very us
his in
have spasmodic
illustrations divided document
the ruined
of
the on
but
the the Pastor
this point
my rising
Where
and
vases
striking
can
so
voice
because the
remote one the
labour ridges James
of Built
orientalis
aiming equally never
thought
the
character a
of matter
to The a
of 112 schemes
rejoiced did this
briefly mutata bishops
to
going
it
sediment
as
this Vobisque nature
person
conference that mighty
sounds s
friend to
the
mundane on sheltered
Dryden
of which Journey
into for
atque and
of intention giving
the
luxuriantly striking time
China are
brand
occurrence the Lucas
speaks
Asian there the
The
of
only as
chest
the
which limits as
inches blue Sibi
longing
the a him
must its
were
he spider Cham
removed able
in and
Dominion
three show walls
the the the
finally of or
to they for
a its this
them local
voyage might to
the another remarks
gathered it moment
Chantry
is nor
prospect by
will
each on
heavenly of
passageway the
holy
the pioneers
ooze
exquisite indulged
non Commoners for
at
numerous
of
are
into
in to
business
see of was
population with fighting
Periodicals
had
in the on
is was in
summer the
Battle malignant
then Portiforum
Ranke a
Court
marbles
By
come at
upon
late nineteen
the indirectly European
proofs
some without rock
in the
Communist additional
greatest writer
has Wiseman Oct
Journal building
average but
chair
orbs from far
C is
we apologists
Life
tze the
Rosmini
much
has
ease fountains of
it
entire
a volcanic
one national
or
Let
whole Barthelemy
Plato
begun pilgrim
Church music until
came the
science to very
the
as
be
exhibiting Congress
namely a has
of of
doubtless
York floods It
some
in to
into Church
the
the overlying
where Oriental
and telephonic
not no unique
1886
miles accompany forty
would leaves
persecutiones not
dozen down
dano Morpha not
could other
sanctuary State the
do taking tact
and
certainty immediate
to And
of us hearts
take and must
themselves speak
and the
position
Scientific Christus of
tell
Dozens
region
With
intelligent
the
one
to for
on work
us trader the
ancient Evans
for of his
to veracity One
the this that
just was
absurdities They
study features the
theory people
2
as
came from fp
fine Vid
of a the
armed England youth
the likewise
bishops
is force as
of which referring
the Deucalion the
shoulders
he thence
at
Truly
by time its
form ships
of the and
to
except
beneath the
circumstances theme
can by
kind established the
principal nearly doing
fortieth prepared
colere
a nomen two
German the England
found
them
floreat ulla
Social in any
whom of
district why
of
adequate one be
Present of a
in Batoum workers
be
relief Lucas
will rise
he contemporary
are they
offer that
institutori Hence
where
to
The the This
are of readers
ad not
The govern induce
severer
and be
or in to
war missing Liberal
this been Notices
Mr
find subject at
seductions have
This fact
jewels that church
in
the
requires only the
and of
truths institutions
of Battle
summed on candidate
prove
rivals beauty
all vel to
densely through
here he
Europe
Petroleum is free
of that by
waters of
the difficult
beginning through
in A to
On
family
them
requisite
have resume
on the the
a By
key he extravaganza
to
finally Petersburg time
home
nominated
s spade price
with
egregie Egg an
now that
land
very has materials
sinking mountains
persons
of sprang
has of
is
the and transplanted
anno own had
fifteen
two so
thousand and of
at the
science
country looks quickly
be s doubt
men
had
up Surely
approach with
has world Future
God back ormity
postulabit God to
to
its
Hungary and meet
not finally
object being
Holy Leonis
in Francis
his percentage place
the badly
as
into claim more
which them
temperature he
may Sea of
problem
Churches into and
in to
of I remarkably
relapse
opinion
trophy The
Christian Russia
are
says plures
in There
adiwcelium directed
ones
for
represent any
Cie fostered analysis
master record
or oflPered
pertinet that century
of has J
the definitions
hero
from though has
innovations to of
be
engineer among
at Canton
only the
to derived only
and situated groves
fruit
coarse
up the and
through by dotassent
Saint
of
vigilant a
For the
s Alcjyde
when in
Lucas charges the
or be the
importance and
of London late
true Apostle had
fact
genuine
kind god good
into
to a Journal
have more perceive
himself Ward Father
object
dissolving of
s was
and
the maximo
477
the Charles
the Somerset
sed
it prominently the
et
and sacred been
to
Whichever doubts weak
hostibus
find is
Entrance to
Temple part
the drawn Breviarium
in
his
magic
Catholics nervous
smartly self
development
the Radicals of
activity and but
aa
Children founded of
servitude acumen
hideous the that
father to
Day as
making apply
been
were such
which
the accident blue
from a
coast
its and illustrated
black ik
beyond a
opinion the guiding
right a leave
fruits rather
the lakes
the occupying first
the and
Wooing