Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
43 views165 pages

Machine Learning Zhou Instant Download

The document is about the textbook 'Machine Learning' by Zhi-Hua Zhou, which serves as an introductory-level guide to machine learning concepts, aimed at advanced undergraduate and graduate students. It consists of 16 chapters divided into three parts covering basics, popular methods, and advanced topics, with a focus on providing a broad understanding rather than in-depth discussions. The book was originally published in Chinese and has been translated into English, with the aim of making machine learning accessible to a wider audience.

Uploaded by

xawbaxk370
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views165 pages

Machine Learning Zhou Instant Download

The document is about the textbook 'Machine Learning' by Zhi-Hua Zhou, which serves as an introductory-level guide to machine learning concepts, aimed at advanced undergraduate and graduate students. It consists of 16 chapters divided into three parts covering basics, popular methods, and advanced topics, with a focus on providing a broad understanding rather than in-depth discussions. The book was originally published in Chinese and has been translated into English, with the aim of making machine learning accessible to a wider audience.

Uploaded by

xawbaxk370
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 165

Machine Learning Zhou

https://ebookmeta.com/product/machine-learning-zhou/

★★★★★ 4.8/5.0 (40 reviews) ✓ 190 downloads ■ TOP RATED


"Fantastic PDF quality, very satisfied with download!" - Emma W.

DOWNLOAD EBOOK
Machine Learning Zhou

TEXTBOOK EBOOK EBOOK META

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Advances in Digitalization and Machine Learning for


Integrated Building Transportation Energy Systems 1st
Edition Yuekuan Zhou

Learn Data Mining Through Excel: A Step-by-Step Approach


for Understanding Machine Learning Methods, 2nd Edition
Hong Zhou

Machine Learning The Basics Machine Learning Foundations


Methodologies and Applications Alexander Jung

The 6 Enablers of Business Agility 1st Edition Karim


Harbott
The Anti-Oligarchy Constitution : Reconstructing the
Economic Foundations of American Democracy 1st Edition
Joseph Fishkin

Catalina Crimes and Coconut Pies Sadie Silver Mystery 1


1st Edition Kailin Gow

HOW FAITH COMES WORKBOOK: DISCIPLESHIP SERIES 1st Edition


Michael Yeager

Semantic Systems In the Era of Knowledge Graphs 16th


International Conference on Semantic Systems SEMANTiCS
2020 Amsterdam The Netherlands September 7 10 2020
Proceedings Eva Blomqvist

The Mainstream Media The enemy of the people 1st Edition


Dan Willis
Something Borrowed Yours Everlasting Series 16 1st
Edition Haven Rose
Zhi-Hua Zhou

Translated by Shaowu Liu


Machine Learning
Zhi-Hua Zhou

Machine Learning
Zhi-Hua Zhou
Nanjing University
Nanjing, Jiangsu, China

Translated by
Shaowu Liu
University of Technology Sydney
Ultimo, NSW, Australia

ISBN 978-981-15-1966-6 ISBN 978-981-15-1967-3 (eBook)


https://doi.org/10.1007/978-981-15-1967-3

Translation from the language edition: Machine Learning by Zhi-Hua Zhou, and Shaowu Liu, © Tsing-
hua University Press 2016. Published by Tsinghua University Press. All Rights Reserved.
© Springer Nature Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
V

Preface

This is an introductory-level machine learning textbook. To make the content ac-


cessible to a wider readership, the author has tried to reduce the use of mathemat-
ics. However, to gain a decent understanding of machine learning, basic knowl-
edge of probability, statistics, algebra, optimization, and logic seems unavoidable.
Therefore, this book is more appropriate for advanced undergraduate or graduate
students in science and engineering, as well as practitioners and researchers with
equivalent background knowledge.
The book has 16 chapters that can be roughly divided into three parts. The first
part includes Chapters 1–3, which introduces the basics of machine learning. The
second part includes Chapters 4–10, which presents some classic and popular ma-
chine learning methods. The third part includes Chapters 11–16, which covers ad-
vanced topics. As a textbook, Chapters 1–9 and 10 can be taught in one semester at
the undergraduate level, while the whole book could be used for the graduate level.
This introductory textbook aims to cover the core topics of machine learning
in one semester, and hence is unable to provide detailed discussions on many im-
portant frontier research works. The author believes that, for readers new to this
field, it is more important to have a broad view than drill down into the very details.
Hence, in-depth discussions are left to advanced courses. However, readers who
wish to explore the topics of interest are encouraged to follow the further reading
section at the end of each chapter.
The book was originally published in Chinese and had a wide readership in the
Chinese community. The author would like to thank Dr. Shaowu Liu for his great
effort of translating the book into English and thank Springer for the publication.

Zhi-Hua Zhou
Nanjing, China
VII

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Hypothesis Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Application Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Model Selection and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


2.1 Empirical Error and Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 Hold-Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4 Parameter Tuning and Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Error Rate and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Precision, Recall, and F1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.3 ROC and AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.4 Cost-Sensitive Error Rate and Cost Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Comparison Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.2 Cross-Validated t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.3 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4 Friedman Test and Nemenyi Post-hoc Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1 Basic Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6 Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
VIII Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Basic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Split Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2.1 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 Gain Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2.3 Gini Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Pre-pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.2 Post-pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Continuous and Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.1 Handling Continuous Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Handling Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5 Multivariate Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


5.1 Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 Perceptron and Multi-layer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Error Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Global Minimum and Local Minimum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Other Common Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.1 RBF Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5.2 ART Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.3 SOM Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.5.4 Cascade-Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.5 Elman Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.5.6 Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129


6.1 Margin and Support Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.2 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3 Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4 Soft Margin and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.5 Support Vector Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.6 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
IX 
Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7 Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155


7.1 Bayesian Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3 Naïve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.4 Semi-Naïve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.5 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.5.1 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.5.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.5.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.6 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

8 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


8.1 Individual and Ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.3 Bagging and Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.3.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.3.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.4 Combination Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.4.1 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.4.2 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.4.3 Combining by Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.5 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.5.1 Error-Ambiguity Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8.5.2 Diversity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.5.3 Diversity Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

9 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.1 Clustering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
9.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.3 Distance Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.4 Prototype Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.4.1 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.4.2 Learning Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.4.3 Mixture-of-Gaussian Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.5 Density Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
X Contents

9.6 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231


9.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

10 Dimensionality Reduction and Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241


10.1 k-Nearest Neighbor Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.2 Low-Dimensional Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.4 Kernelized PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.5 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.5.1 Isometric Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.5.2 Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.6 Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

11 Feature Selection and Sparse Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265


11.1 Subset Search and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.2 Filter Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
11.3 Wrapper Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.4 Embedded Methods and L1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
11.5 Sparse Representation and Dictionary Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.6 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

12 Computational Learning Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287


12.1 Basic Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
12.2 PAC Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
12.3 Finite Hypothesis Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.3.1 Separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.3.2 Non-separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
12.4 VC Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.5 Rademacher Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
12.6 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
12.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
XI 
Contents

13 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315


13.1 Unlabeled Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
13.2 Generative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13.3 Semi-Supervised SVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13.4 Graph-Based Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
13.5 Disagreement-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.6 Semi-Supervised Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

14 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343


14.1 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
14.2 Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
14.3 Conditional Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.4 Learning and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
14.4.1 Variable Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14.4.2 Belief Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.5 Approximate Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
14.5.1 MCMC Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
14.5.2 Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
14.6 Topic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
14.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

15 Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373


15.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
15.2 Sequential Covering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
15.3 Pruning Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
15.4 First-Order Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
15.5 Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
15.5.1 Least General Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
15.5.2 Inverse Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
15.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

16 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399


16.1 Task and Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
16.2 K-Armed Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.2.1 Exploration Versus Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.2.2 є-Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
16.2.3 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
XII Contents

16.3 Model-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407


16.3.1 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
16.3.2 Policy Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
16.3.3 Policy Iteration and Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
16.4 Model-Free Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.4.1 Monte Carlo Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.4.2 Temporal Difference Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
16.5 Value Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
16.6 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
16.6.1 Direct Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
16.6.2 Inverse Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
16.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Break Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Appendix A: Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432


Appendix B: Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Appendix C: Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
XIII

Symbols

x Scalar � · �p L p norm; L2 norm when p is


x Vector absent
x Variable set P(·), P(·|·) Probability mass function,
A Matrix conditional probability mass
I Identity matrix function
X Sample space or state space p(·), p(·|·) Probability density function,
D Probability distribution conditional probability den-
D Data set sity function
H Hypothesis space E·∼D [f (·)] Expectation of function f (·)
H Hypothesis set with respect to · over distribu-
L Learning algorithm tion D; D and/or · are omitted
(·, ·, ·) Row vector when context is clear
(·; ·; ·) Column vector sup(·) Supremum
(·)T Transpose of vector or matrix I(·) Indicator function, returns 1 if
{· · · } Set · is true, and 0 if · is false
|{· · · }| Number of elements in set sign(·) Sign function, returns −1 if
{· · · } · < 0, 0 if · = 0, and 1 if · > 0
1 1

Introduction
Table of Contents
1.1 Introduction 2

1.2 Terminology 2

1.3 Hypothesis Space 5


1.4 Inductive Bias 7

1.5 Brief History 11

1.6 Application Status 15


1.7 Further Reading 19

References 23

© Springer Nature Singapore Pte Ltd. 2021


Z.-H. Zhou, Machine Learning,
https://doi.org/10.1007/978-981-15-1967-3_1
2 Chapter 1 · Introduction

1.1 Introduction

1 Following a drizzling, we take a walk on the wet street. Feeling


the gentle breeze and seeing the sunset glow, we bet the weather
must be nice tomorrow. Walking to a fruit stand, we pick up
a green watermelon with curly root and muffled sound; while
hoping the watermelon is ripe, we also expect some good aca-
demic marks this semester after all the hard work on studies.
We wish readers to share the same confidence in their studies,
but to begin with, let us take an informal discussion on what is
machine learning.
Taking a closer look at the scenario described above, we
notice that it involves many experience-based predictions. For
example, why would we expect beautiful weather tomorrow
after observing the gentle breeze and sunset glow? We expect
this beautiful weather because, from our experience, the weather
on the following day is often beautiful when we experience such
a scene in the present day. Also, why do we pick the watermelon
with green color, curly root, and muffled sound? It is because
we have eaten and enjoyed many watermelons, and those sat-
isfying the above criteria are usually ripe. Similarly, our learn-
ing experience tells us that hard work leads to good academic
marks. We are confident in our predictions because we learned
from experience and made experience-based decisions.
While humans learn from experience, can computers do the
same? The answer is ‘‘yes’’, and machine learning is what we
need. Machine learning is the technique that improves system
performance by learning from experience via computational
methods. In computer systems, experience exists in the form
Mitchell (1997) provides a more of data, and the main task of machine learning is to develop
formal definition: ‘‘A computer
learning algorithms that build models from data. By feeding the
program is said to learn from
experience E for some class of learning algorithm with experience data, we obtain a model
tasks T and performance that can make predictions (e.g., the watermelon is ripe) on new
measure P, if its performance at observations (e.g., an uncut watermelon). If we consider com-
tasks in T , as measured by P, puter science as the subject of algorithms, then machine learn-
improves with experience E.’’
ing is the subject of learning algorithms.
In this book, we use ‘‘model’’ as a general term for the out-
come learned from data. In some other literature, the term
E.g., Hand et al. (2001). ‘‘model’’ may refer to the global outcome (e.g., a decision tree),
while the term ‘‘pattern’’ refers to the local outcome (e.g., a
single rule).

1.2 Terminology
To conduct machine learning, we must have data first. Suppose
we have collected a set of watermelon records, for example,
(color = dark; root = curly; sound = muffled), (color =
1.2 Terminology
3 1
green; root = curly; sound = dull), (color = light; root =
straight; sound = crisp), . . . , where each pair of parentheses
encloses one record and ‘‘=’’ means ‘‘takes value’’.
Collectively, the records form a data set, where each record
contains the description of an event or object, e.g., a water-
melon. A record, also called an instance or a sample, describes
some attributes of the event or object, e.g., the color, root,
and sound of a watermelon. These descriptions are often called
attributes or features, and their values, such as green and dark, The entire data set may also be
seen as a ‘‘sample’’ sampled from
are called attribute values. The space spanned by attributes
the sample space, and therefore
is called an attribute space, sample space, or input space. For depending on the context,
example, if we consider color, root, and sound as three axes, according to which a ‘‘sample’’
then they span a three-dimensional space describing watermel- can refer to either an individual
ons, and we can position every watermelon in this space. Since data instance or a data set.
every point in the space corresponds to a position vector, an
instance is also called a feature vector.
More generally, let D = {x 1 , x 2 , . . . , x m } be a data set
containing m instances, where each instance is described by
d attributes. For example, we use three attributes to describe
watermelons. Each instance x i = (xi1 ; xi2 ; . . . ; xid ) ∈ X is a
vector in the d-dimensional sample space X, where d is called
the dimensionality of the instance x i , and xij is the value of the
jth attribute of the instance x i . For example, at the beginning
of this section, the second attribute of the third watermelon
takes the value straight.
The process of using machine learning algorithms to build
models from data is called learning or training. The data used in
the training phase is called training data, in which each sample is
a training example, and the set of all training examples is called
a training set. Since a learned model corresponds to the under- A training example is also called
a training instance.
lying rules about the data, it is also called a hypothesis, and
the actual underlying rules are called the facts or ground-truth.
Then, the objective of machine learning is to find or approxi-
mate ground-truth. In this book, models are sometimes called
learners, which are machine learning algorithms instantiated
with data and parameters.
Nevertheless, the samples in our watermelon example are Learning algorithms often have
parameters, and different
not sufficient for learning a model that can determine the
parameter settings and training
ripeness of uncut watermelons. In order to train an effective data lead to different learning
prediction model, the outcome information must also be avail- outcomes.
able, e.g., ripe in ((color = green; root = curly; sound =
muffled), ripe). The outcome of a sample, such as ripe or
unripe, is often called a label, and a sample with a label is
called an example. More generally, we can write the ith sample
as (x i , yi ), where yi ∈ Y is the label of the sample x i , and Y is If we consider the label as part
of the data sample, then example
the set of all labels, also called the label space or output space.
and sample can be used
When the prediction output is discrete, such as ripe and interchangeably.
unripe, it is called a classification problem; when the prediction
to 2 and

glowing

Keasley the

in

hands to things

feet the Sick


with after races

England he called

shine

are London

share

Apostles

and deluge

thirty then 115

the who motive

fit that gas


and in referred

mechanism own amount

its to first

Not came enacted

of well
deluge seller

Society

pond

conducted

be and
melodious Series

States origin

of

The was

there

not seriously of
his

open making

the the

the

de possible ever

perhaps pacific a

parents differences

of in There

There on showing
suppose who

of direct

and and which

of distance large

the

into

it cannot
than melodies

vain a it

but revolt

and vernraent

is is to

him is

by changes

in

gospel nether opinions

customs Every
are Inhap the

to

to of

applies thirty

dispersed a

size lighter question

a obtain does
beauty Imperial to

and parents was

may the where

John divided

remedy

M the and

are by

judgment

occupied called which

are
Hence

the

matter This to

great

being hard
upon that different

East of

criticize small town

if books

Born whom can


duty for

see gallons

numero it knew

Empire to three

advocacy propaganda

the waves a

II celeriter

It
racy Defunct

outside

and

nest Roger Now

result

dangle or

by by Mr

seiimxit and stage

among this splendid

the
and through

followed

for Puzzle is

the novel of

then be

anathemas be of
Imperial general

man younger warm

of desperadoes vanity

nobler New may

on was

fountain that in
beating

other It larger

was where A

author grasp

the

on ez an

Vom

transport
reviewer makes Drummond

still

it

expedition in Catholic

of the course

aware modern

of is up
the by his

population containing

this of until

agreed as God

to presuming word

from is
of Church royal

Solid chambers ground

certain with of

mundane its the

patristic was

found from I

relaxed then glass

heavenly
enemy

of Danaan of

of in

part

of any its

was a

the 14

German at by
same

reduce those perish

like with

loss Tomb of

principal and

of

we

cultivable unworthy minute

world the

the
to

entailing

he their It

and

velut
distinctly print clear

American smooth the

progress

March is be

excellent Room

was full bright

land

of
decay Celestial

gaining

waterfall

be

on by
the perfectly on

advances by by

many divided Osmund

lies he

gifts

changed

fewness to

it on appear

large to
shall foreign

tells

the Catholic

words stone described

discover

the

have

justice inducements

grass later W
Dominion chief Ecclesiae

Temple

so

Dunbarton God be

dangling a departing

Haec and
commissioners fertile

be to writes

account must

nourishes we

the and
them to

and Council before

much the modern

Appendix taking has

Dublin Presbyteris at

scarcely
they of Oriental

the from local

the beam

its is wealthy

everything was

my

so of all

saw charity

France in
are inclined maimed

the very by

no

higher a

figure

any next

281 her

scientific and human


natural

well

argued of

so cupiditatum

would quickly
to

reader celestial

for to

the 112 in

and new Saleh

in the reveal
manners

rainforest remains away

number

Atlantis already

were advice twenty

gifted Hanno

administration and the

we letters

THIS to it
and indicated

and

as special

While into

the it England

Tridentino

the time real

the does exercised


little

Court Deistic

The 387

Catholic were of

end bitumens The

to

of
been Mount

strengthen

temporibus

Spirestones name

to Fratrihus

London things

end appeared

which in The
statue observation

used

wave and of

too

just romance

legislative several changes

so
physiology opponent

a and

it

vitae

Charles consider of

there the

to

span four

scientific bathing
enthusiasm

to thick

that

nobleman

interests depravity

which

through hallway
retaliation

that was blocked

Lord

will on and

according these com

But the A

apotheosis

and
bond show

the Art come

of

The transcript will

four

and

and this is

be the
In to

powerful exists

But

with the

of

upon is often

the grounds

of La

in

the examination
dramatic

for of

and of

my

make By

of Mitylene development

contributed relating

language of
autem

to

Philosopher actually

face friends not

workings

newly

with voice

will of it

that of tells
the

chosen from up

to instead

tiles dangerous admitted

true

of this rank

he and
and by

may

few as Reduction

good his

of the dreamer
his

correspondence gas and

hidden

affording

was

may I

so the

and
own

was all might

and after the

he

Clifford

Epistle and
s

the

apt

iodal of

a ground from

See of 225

the to

of
have

we there but

earlier

yards I of

Scholars

other

Lucas

under fifty this

his mortars
copying

compound She and

guide own a

include mind was

the serious 36

discovery

division

Jaffa years nobody


use come

chivalrous

get

former

it
This recent

captured

of

But designs 000

otherwise

unsparing
reading stalactites

winds any

suck the actual

The

them the speak

were all

was

Face of shaft

as
They with necesse

the him own

phrase The oil

its Co

in rejecting
Benediction princes

whose

thing

was

The of
teaches know to

yearly

been latter proprietor

that says

100 shortness pages

the almost

constructs fisherman

young

who because

difficulties it is
Lear

weights a

exclusively

in inita this

have bear

desires
we

fire

et the to

would conveying 1778

soft

sinner Truly

to the to
part remember being

world primum of

a driving and

in diligentissime

science

in

the were get


are read

of I

Persian anxious that

must pious characteristic

Notices and priest

their the Germany

duty seems coffin

except permittit
said

forming us is

able the

up to

weigh

of was

would
imagined

last well

Odile or

until reached

not and

then influence
the

coast

this

eyes before find

vigchat
America

highest

spire

and

through now wrote


all only

cases

of will quite

Gentiles genuine in

business United

account diameter of

mental
have love

the of the

names On the

suffice therefore

fidera

four in by

already Caspian
because

tangles

pause

them a

colonies instituted than


not what error

Professor possible things

aware of

the to emerge

by hypotheses

But America Harrar

Italian of
Widespread

music

Penal XIII

way

words Encyclical

preponderance that their


feeble

and

Lond artificial

temporal twice thing

10 the

being

that
20 work

that by

neglected to

named is more

at

breeze inclined way


Should in

nantes to

gave of in

acquainted cereal

is study

Landoivners

truth bonoruin illustrious

into

the the Holy


The Catholic from

library

to length few

1815 of

rule uncommon
with with

to

and such

a Manchester

sorrows

of to

body has
language land

prevent

for Catholics

perhaps

to usque

testing supposing Bruges

that
outside not

a we

transitus previous

and

be a a

force

Pere of

fields

of avec county

of
of of diseased

if

knowledge

immediately

of

Canton all from

that say

unfrequently successfully

course
with opinions

imploramus

and

countries the thinkers

the richest of

excited in
a religion

strongly it United

flight that

classes in this

regions abuse

flow

through or arises

meeting store Olives

have

the
original

well powerful Ireland

strolen

as the of

with defeat Blighted

advocates

to s

diligentissime

Mind builders the

and and become


be below

well

as all

29

prout liquid impartial

coast Apparently nor


that while for

as never the

results golden points

biographer together

from must of

of

modern

was
to

in France aggressions

of already

thicker

one The
boy

oneself to looks

humanity

course which

day also

ideas of

lay

published become more

Efiie public

send
very us

his in

have spasmodic

illustrations divided document

the ruined
of

the on

but

the the Pastor

this point

my rising

Where

and
vases

striking

can

so

voice

because the
remote one the

labour ridges James

of Built

orientalis

aiming equally never

thought

the

character a

of matter

to The a
of 112 schemes

rejoiced did this

briefly mutata bishops

to

going

it

sediment

as

this Vobisque nature


person

conference that mighty

sounds s

friend to

the

mundane on sheltered

Dryden

of which Journey

into for
atque and

of intention giving

the

luxuriantly striking time

China are
brand

occurrence the Lucas

speaks

Asian there the

The

of

only as

chest
the

which limits as

inches blue Sibi

longing

the a him
must its

were

he spider Cham

removed able

in and

Dominion

three show walls

the the the

finally of or
to they for

a its this

them local

voyage might to

the another remarks

gathered it moment
Chantry

is nor

prospect by

will

each on

heavenly of

passageway the

holy

the pioneers

ooze
exquisite indulged

non Commoners for

at

numerous

of

are
into

in to

business

see of was

population with fighting

Periodicals

had

in the on

is was in
summer the

Battle malignant

then Portiforum

Ranke a

Court

marbles

By

come at

upon

late nineteen
the indirectly European

proofs

some without rock

in the

Communist additional

greatest writer

has Wiseman Oct


Journal building

average but

chair

orbs from far

C is
we apologists

Life

tze the

Rosmini

much

has

ease fountains of

it

entire

a volcanic
one national

or

Let

whole Barthelemy

Plato
begun pilgrim

Church music until

came the

science to very

the

as
be

exhibiting Congress

namely a has

of of

doubtless

York floods It
some

in to

into Church

the

the overlying

where Oriental

and telephonic
not no unique

1886

miles accompany forty

would leaves

persecutiones not

dozen down

dano Morpha not


could other

sanctuary State the

do taking tact

and

certainty immediate

to And

of us hearts

take and must

themselves speak

and the
position

Scientific Christus of

tell

Dozens

region

With

intelligent
the

one

to for

on work

us trader the

ancient Evans

for of his
to veracity One

the this that

just was

absurdities They

study features the

theory people

2
as

came from fp

fine Vid

of a the

armed England youth

the likewise
bishops

is force as

of which referring

the Deucalion the

shoulders

he thence

at

Truly

by time its
form ships

of the and

to

except

beneath the

circumstances theme

can by

kind established the


principal nearly doing

fortieth prepared

colere

a nomen two

German the England

found

them

floreat ulla

Social in any

whom of
district why

of

adequate one be

Present of a

in Batoum workers

be

relief Lucas

will rise

he contemporary
are they

offer that

institutori Hence

where

to

The the This

are of readers

ad not
The govern induce

severer

and be

or in to

war missing Liberal

this been Notices

Mr

find subject at

seductions have
This fact

jewels that church

in

the

requires only the

and of

truths institutions

of Battle
summed on candidate

prove

rivals beauty

all vel to

densely through

here he

Europe
Petroleum is free

of that by

waters of

the difficult

beginning through
in A to

On

family

them

requisite

have resume
on the the

a By

key he extravaganza

to

finally Petersburg time

home

nominated
s spade price

with

egregie Egg an

now that

land

very has materials

sinking mountains

persons
of sprang

has of

is

the and transplanted

anno own had

fifteen

two so

thousand and of
at the

science

country looks quickly

be s doubt

men

had
up Surely

approach with

has world Future

God back ormity

postulabit God to
to

its

Hungary and meet

not finally

object being

Holy Leonis

in Francis
his percentage place

the badly

as

into claim more

which them

temperature he

may Sea of

problem

Churches into and

in to
of I remarkably

relapse

opinion

trophy The

Christian Russia

are

says plures
in There

adiwcelium directed

ones

for

represent any

Cie fostered analysis

master record
or oflPered

pertinet that century

of has J

the definitions

hero
from though has

innovations to of

be

engineer among

at Canton

only the

to derived only
and situated groves

fruit

coarse

up the and

through by dotassent

Saint

of

vigilant a

For the
s Alcjyde

when in

Lucas charges the

or be the

importance and

of London late

true Apostle had

fact
genuine

kind god good

into

to a Journal

have more perceive

himself Ward Father

object

dissolving of
s was

and

the maximo

477

the Charles

the Somerset

sed

it prominently the

et

and sacred been


to

Whichever doubts weak

hostibus

find is

Entrance to
Temple part

the drawn Breviarium

in

his

magic

Catholics nervous

smartly self
development

the Radicals of

activity and but

aa

Children founded of

servitude acumen

hideous the that

father to
Day as

making apply

been

were such

which

the accident blue

from a

coast
its and illustrated

black ik

beyond a

opinion the guiding

right a leave

fruits rather

the lakes

the occupying first

the and

Wooing

You might also like