Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
34 views18 pages

Chapter 3. Machine Learning - Full

Uploaded by

schlaggen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views18 pages

Chapter 3. Machine Learning - Full

Uploaded by

schlaggen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

MACHINE LEARNING

DR. PHẠM MINH HOÀN – [email protected]


OBJECTIVES OF CHAPTER 3
• Understanding different types of data sources and how to access and manipulate
them.
• Data analysis is all about extracting meaningful insights from your data.
• Data exploration is about getting familiar with your data and identifying patterns
and trends.
• Data
visualization is about creating visual representations of your data to
communicate insights effectively.
• Most data analysis tasks involve using specialized libraries that provide functions
and tools for working with data.
CONTENTS
3.1. Machine Learning models
3.2. Regression
3.3. Classification
3.4. Clustering
MACHINE LEARNING MODELS
• Machine Learning is making the computer learn from studying data
and statistics.
• Machine Learning is a step into the direction of artificial intelligence
(AI).
• Machine Learning is a program that analyses data and learns to
predict the outcome.
REGRESSION
• The term regression is used when you try to find the
relationship between variables.
• In Machine Learning, and in statistical modeling, that
relationship is used to predict the outcome of future events.
REGRESSION
• Ex:
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
CLUSTERING
• K-means is an unsupervised learning method for clustering data
points.
• Thealgorithm iteratively divides data points into K clusters by
minimizing the variance in each cluster.
• The best value for K using the elbow method, then use K-means
clustering to group the data points into clusters.
CLUSTERING
• Ex: Visualizing some data points.
# Import the modules
import matplotlib.pyplot as plt
# Create arrays
x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
plt.scatter(x, y)
plt.show()
CLUSTERING
• Ex:Utilize the elbow method to visualize the intertia for different
values of K.
# Import the modules
from sklearn.cluster import KMeans
# Turn the data into a set of points
data = list(zip(x, y))
inertias = []
for i in range(1,11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
inertias.append(kmeans.inertia_)
CLUSTERING
• Ex:Utilize the elbow method to visualize the intertia for different
values of K.
To find the best value for K, run K-means across data for a range of possible
values.
Have 10 data points, so the maximum number of clusters is 10 (The K value
does not exceed the size of the data). So for each value K in range(1,11), train a
K-means model and plot the intertia at that number of clusters:
CLUSTERING
• Ex:Utilize the elbow method to visualize the intertia for different
values of K.
plt.plot(range(1,11), inertias, marker='o')
plt.title('Elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()
CLUSTERING
• Ex: The elbow method shows that 2 is a good value for K (where the
interia becomes more linear), so we retrain and visualize the result.
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)

plt.scatter(x, y, c=kmeans.labels_)
plt.show()
EXAMPLE WITH REGRESSION AND CLUSTERING
• Ex: To read data from the Sales.csv file, do:
• Linear regression analysis shows the relationship between the
number of orders and the total sales amount.
• Clustering based on total sales amount and total number of orders
with k-means.
CLASSIFICATION
• The classification technique or model attempts to get some
conclusion from observed values.
• DecisionTreeClassifier is a class capable of performing
multi-class classification on a dataset.
• A Decision Tree is a Flow Chart, and can help you make
decisions based on previous experience.
CLASSIFICATION
• Example: decide should go to a comedy show or not.
Age Experience Rank Nationality Go
36 10 9 UK NO
42 12 4 USA NO
23 4 6 N NO
52 4 4 USA NO
43 21 8 USA YES
44 14 5 UK NO
66 3 7 N YES
35 14 9 UK YES
52 13 7 N YES
35 5 9 N YES
24 3 5 USA NO
18 3 7 UK YES
45 9 9 UK YES
CLASSIFICATION
• Explain:
• Rank <= 6.5 means that every comedian with a rank of 6.5 or lower will
follow the True arrow (to the left), and the rest will follow the False arrow (to
the right).
• gini = 0.497 refers to the quality of the split, and is always a number between
0.0 and 0.5, where 0.0 would mean all of the samples got the same result, and
0.5 would mean that the split is done exactly in the middle.
• samples = 13 means that there are 13 comedians left at this point in the
decision, which is all of them since this is the first step.
• value = [6, 7] means that of these 13 comedians, 6 will get a "NO", and 7 will
get a "GO".
CLASSIFICATION
• Explain:
• There are many ways to split the samples, we use the GINI method in this
tutorial.
• The Gini method uses this formula:
Gini = 1 - (x/n)2 - (y/n)2
• Where x is the number of positive answers("GO"), n is the number of
samples, and y is the number of negative answers ("NO"), which gives us this
calculation:
1 - (7 / 13)2 - (6 / 13)2 = 0.497
SUMMARY

You might also like