Environment Setup for Machine Learning Class
Windows
Follow all the steps as it is from the link below:
https://www.youtube.com/watch?v=uOwCiZKj2rg&ab_channel
=MichaelGalarnyk
MAC
https://www.youtube.com/watch?v=E4k38RIUKvo
Ubuntu
https://www.youtube.com/watch?v=R2PWuaR_rZg
Lecture
Lecture 1
Python Tutorial Links
1. Microsoft : https://www.youtube.com/playlist?
list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
(Links to an external site.)
2. Bangla Tutorial : https://www.youtube.com/playlist?
list=PLGPedopQSAJAoVkMxbENx99s2I4DKYdj7
Lecture 2
1. See attached file named INTRO.pdf from below
https://drive.google.com/file/d/1064U9EDhMjVdpw0WjNDxWJsipldIqfy8/view?usp=sharing
[reference: http://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/01_intro.pdf ]
Lecture 3
1. Some Terminology:
https://developers.google.com/machine-learning/crash-course/framing/ml-terminology
2. TB2: Book page 7- 14 [Skip details of Batch and Online Learning]
NOTE: Page number corresponds to printed page number in PDF
Lecture 4
TB2: Book page 15-23 [Skip details of Batch and Online Learning]
Try and Practice Chapter one code from here:
https://github.com/ageron/handson-ml2/blob/master/01_the_machine_learning_la
ndscape.ipynb
Lecture 5
TB2 : Book page 23-30
Except [For the moment, we will come back later] - Regularization and Hyper
Parameter Tuning
Lecture 6
Understanding PANDAS library :
https://www.youtube.com/watch?v=CmorAWRsCAw
https://www.youtube.com/watch?v=F6kmIpWWEdU
Dataframe Basics: http://jalammar.github.io/gentle-visual-intro-to-data-
analysis-python-pandas/
Lecture 7
First Machine Learning Project using Iris Data set:
https://medium.com/gft-engineering/start-to-learn-machine-learning-with-the-iris-flower-
classification-challenge-4859a920e5e3
Data Explanation : https://raqueeb.gitbook.io/scikit-learn/iris-dataset/scikit-learn-iris
Lecture 8
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 35-37, 46-51
Lecture 9
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 51-55
Lecture 10
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 56-62
Lecture 11
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 62-64
Lecture 12 and 13
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 65-73
Lecture 14
TB2: Chapter 2 (End to End Machine Learning Project)
Page: 73 - 80 (Without Grid Search)
Cross Validation:
https://drive.google.com/file/d/1eUdbQaxgzFMpUVuYYbpTBEAV3z8IsoRg/view
Lecture 15 , 16 , 17
1. ML terms for evaluating machine learning models (all the links should be
considered as part of the syllabus)
Key Terms and Definitions
Accuracy:
https://developers.google.com/machine-learning/crash-course/classification/accuracy
a. https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-
which-to-optimize-on-867d3f11124
b. https://developers.google.com/machine-learning/crash-course/classification/
precision-and-recall
Confusion Matrix
c. https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
#:~:text=A%20confusion%20matrix%20is%20a,related%20terminology%20can%20be
%20confusing.
d. https://manisha-sirsat.blogspot.com/2019/04/confusion-matrix.html
e. Exercise (very important) :
https://developers.google.com/machine-learning/crash-course/classification/check-your-
understanding-accuracy-precision-recall
2. Lecture of Sensitivity, Specificity and Area Under Curve
a. https://developers.google.com/machine-learning/crash-course/classification/
roc-and-auc
b. https://www.youtube.com/watch?v=un6KTYMSzd4
c. https://www.youtube.com/watch?v=HXkrLmxNzUA
Exercise
https://developers.google.com/machine-learning/crash-course/classification/
check-your-understanding-roc-and-auc
Lecture 18
Titanic Problem Description
Lecture 19
Grid Search --> TB2: Chapter 2 (End to End Machine Learning Project)
Page: 75-78
Lecture 20 [KNN]
1. https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn
2.
https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_knn_
algorithm_finding_nearest_neighbors.htm
More on KNN training and testing phase
https://stackoverflow.com/questions/54505375/what-does-the-knn-algorithm-do-in-the-training-phase
Lecture 21 [Decision Tree]
Decision Tree
1. https://www.datacamp.com/community/tutorials/decision-tree-classification-python
2. https://www.youtube.com/watch?v=PHxYNGo8NcI&t=124s&ab_channel=codebasics
3.
https://www.bogotobogo.com/python/scikit-learn/scikt_machine_learning_Decision_Tree_Learning_I
nformatioin_Gain_IG_Impurity_Entropy_Gini_Classification_Error.php
Lecture 22 [Random Forest and PCA]
RANDOM FOREST
1. https://www.javatpoint.com/machine-learning-random-
forest-algorithm
2. As Regressor: https://www.geeksforgeeks.org/random-forest-
regression-in-python/
PCA
What is PCA?
Principal Component Analysis, or PCA, is a statistical method used to reduce the
number of variables in a dataset. It does so by lumping highly correlated variables
together. Naturally, this comes at the expense of accuracy. However, if you have 50
variables and realize that 40 of them are highly correlated, you will gladly trade a little
accuracy for simplicity.
High dimensionality means that the dataset has a large number of features. The primary
problem associated with high-dimensionality in the machine learning field is model
overfitting, which reduces the ability to generalize beyond the examples in the training
set. Richard Bellman described this phenomenon in 1961 as the Curse of
Dimensionality where Many algorithms that work fine in low dimensions become
intractable when the input is high-dimensional.
1. Also read from here:
https://datascienceplus.com/principal-component-analysis-pca-with-python/
2. sample code to show effectiveness of PCA:
https://colab.research.google.com/drive/1-6a02Ir87BNLSkM8uZ_-QZM23zZ2_WU0?
usp=sharing
Lecture 23 and 24 [Support Vector Machine, Ensemble Learning]
1. https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-
scikit-learn/
2. https://www.youtube.com/watch?
v=N1vOgolbjSc&ab_channel=AliceZhao
Idea of C and Gamma Parameter in SVM
C is the cost of misclassification
A large C gives you low bias and high variance. Low bias because you penalize the cost of
misclassification a lot.
A small C gives you higher bias and lower variance.
Gamma is the parameter of a Gaussian Kernel (to handle non-linear classification). Check this
points:
They are not linearly separable in 2D so you want to transform them to a higher dimension
where they will be linearly separable. Imagine "raising" the green points, then you can separate
them from the red points with a plane (hyperplane)
To "raise" the points you use the RBF kernel, gamma controls the shape of the "peaks" where
you raise the points. A small gamma gives you a pointed bump in the higher dimensions, a large
gamma gives you a softer, broader bump.
So a small gamma will give you low bias and high variance while a large gamma will give you
higher bias and low variance.
You usually find the best C and Gamma hyper-parameters using Grid-Search.
3. ENSEMBLE LEARNING: https://towardsdatascience.com/basic-ensemble-
learning-random-forest-adaboost-gradient-boosting-step-by-step-
explained-95d49d1e2725
Textbook
[ TB1 ] শূন্য থেকে পাইথন মেশিন লার্নিং : হাতেকলমে সাইকিট-লার্ন (দ্বিতীয়
সংস্করণ)
Book website : https://raqueeb.gitbook.io/scikit-learn/
https://www.rokomari.com/book/187277/shunno-theke-python-machine-learning--hate-
kalame-scikit-learn--hatekolome-machine-learning-series--iris- dataset-project-
[ TB2 ] Hands-on Machine Learning with Scikit-Learn, Keras &
TensorFlow
https://drive.google.com/file/d/1sW8D9m30QYqmdou9ZwOp4Mo8YhY7exbh/view?
usp=sharing
[ TB3 ] Machine Learning for Absolute Beginners
https://drive.google.com/file/d/1D43PKTrAZG6z6V43k2SZDRoeJQxw8N70/view?
usp=sharing
Python Basics:
1. Microsoft Tutorial: https://www.youtube.com/watch?v=jFCNu1-
Xdsw&list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
2. Python in Bangla: https://www.youtube.com/watch?
v=4QmifmQ7rHY&list=PLGPedopQSAJAoVkMxbENx99s2I4DKYd
j7