Machine Learning Course
Ing 4 SI, 2022
Pr. Khadija SLIMANI
1 03/01/2022
What is this course about
Learn about Data Science
Learn about machine learning and its applications
How to build machine learning systems
How the algorithms behind them work
How to use those algorithms
2
Course planning
A Case study approach:
Course
Practical work (case study)
Week 1 Week 1 Week 3 Week 4 Week 5
Assignment
3
Course overview …..
1. Week 1 : Introduction to Data Science and Machine Learning
2. Week 2: Univariate & Multivariate Linear Regression
3. Week 3: Logistic Regression (Classification)
4. Week 4: Decision Trees (Regression & Classification)
5. Week 5: Model evaluation (overfitting, bias-variance, crossfolding, ...)
4
Course overview
1. Week 1 : Introduction to Data Science and Machine Learning
1. Introduction to Data Science
2. Introduction to Machine Learning
3. Machine Learning Tools
2. ….
5
1.1
Introduction to Data Science
6
The Era of Big Data
90% of the information ever generated was generated in the last two
years?
This growing torrent of data + growing storage and computation
capacity (cloud) ⇒ Big Data Era
7
What is Data Science ?
It goes back a little further than 2004, which is where the Google search
term history begins
Data Science is not just limited to tech companies
Almost every company is turning to data science to better understand how
to build products, serve customers and leverage new opportunities
Data Science is used in multiple disciplines: computer science,
behavioural sciences, law & business, etc..
All of these actors need data-driven methodologies to aid in their
discovery:
From statistical analysis, machine learning, & text mining to information
visualization
8
What is Data Science ?
Data Science is an umbrella term and it's basically the marriage of
many different fields.
9
What is Data Science ?
Definition of Data Science according to “Drew Conway”
10
What is Data Science ?
11
1.2
Introduction to Machine
Learning
12
What is Machine Learning ?
Artificial Intelligence (AI) and Machine Learning (ML) are the part of
computer science that are correlated with each other.
These two technologies are the most trending technologies which are
used for creating intelligent systems.
13
What is Machine Learning ?
Researchers interested in artificial intelligence wanted to see if
computers could learn from data.
ML is not a new science: many machine learning algorithms have
been around for a long time
14
What is Machine Learning ?
BUT, it is a science that’s gaining fresh momentum: the ability to
automatically apply complex mathematical calculations to big data –over and
over, faster and faster – is a recent development
15
15
What is Machine Learning ?
Google trends for the term “Machine Learning”
Definition of Machine Learning
Machine learning is the subfield of computer science that "gives computers the
ability to learn without being explicitly programmed" (Arthur Samuel, 1959)
A more modern definition by Tom Mitchell: "A computer program is said to
learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.“
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game
Old View of Machine Learning
Machine Learning in Intelligent Applications
The pipeline of Machine Learning
Types of Machine Learning
Machine learning tasks are typically classified into three broad
categories.
Depending on the nature of the learning "signal" or "feedback"
available to a learning system
Supervised Learning
The program is given a data set and already know what our correct
output should look like
Having the idea that there is a relationship between the input and the output
The goal is to learn a general rule that maps inputs to outputs.
Unsupervised Learning
No labels are given to the learning algorithm, leaving it on its own to
find structure in its input.
Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning)
Reinforcement Learning
A computer program interacts with a dynamic environment in which it
must perform a certain goal, without a teacher explicitly telling it
whether it has come close to its goal.
Learning to drive a car (Google Car)
Learning to play a game by playing against an opponent (AlphaGo)
Machine Learning Algorithms
Machine Learning Applications
Machine Learning in this course
Regression
Machine Learning in this course
Classification (Logistic Regression)
Classification (Decision Trees)
Machine Learning in this course
Machine Learning in this course
Machine Learning
Reinforcement
Supervised learning Unsupervised learning
learning
Dimensionality
Regression Classification Clustering Q-learning
reduction
K-means Hierarchical Fuzzy means SVD PCA ICA
Logistic Random
SVM k-NN
regression forest
1.3
Machine Learning Tools
31
Machine Learning Tools
Python
Python is a high level language
It is optimized for reading by people instead of machines
Python is also an interpreted language which means it is not compiled
into machine code
It is commonly used in an interactive fashion
Java & C: write code, compile and run, and then watch the output
Python: write and run line by line with the interpreter
Python
This is very useful for tasks that require a lot of investigations (data
cleaning) versus those that require a lot of design !
Different from C++ and java, Python is dynamically typedlanguage
(like javascript) : you declare the variable and assign a value to it
directly !
This enables to quickly set the variable type and content
Why Python for Machine Learning ?
Python is easy to learn
Now the language of choice for 8 of 10 top US computer science programs
(Philip Guo, CACM)
Full featured
Not just a statistics language, but has full capabilities for data acquisition,
cleaning, databases, high performance computing, and more
Strong Data Science Libraries
The SciPy Ecosystem
Tools to be used in this Course
Programming language to be used in this course: Python
Libraries:
Pandas
Numpy
Scipy
Scikit-Learn
Interactive tools:
Spyder: IDE for python
Jupyter Notebook: A web application that allows to:
create and share documents that contain live code, equations, visualizations and
explanatory text. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, machine learning and much more.
Pandas
Created in 2008 by Wes McKinney
Open source New BSD license
100 different contributors
https://pandas.pydata.org/pandas-docs/stable/
Pandas Series
Pandas DataFrame
Pandas DataFrame
Thank you for your
attention
41
Practical work
LAB1: Back in 15min!
42