Introduction:
Python is a popular programming language. It was created by Guido van
Rossum, and released in 1991.
It is used for:
web development (server-side),
software development,
mathematics,
system scripting.
Application of Python:
Python can be used on a server to create web applications.
Python can be used alongside software to create workflows.
Python can connect to database systems. It can also read and modify
files.
Python can be used to handle big data and perform complex
mathematics.
Python can be used for rapid prototyping, or for production-ready
software development.
List of some different variable types:
x = 123 # integer
x = 3.14 # float
x = "hello" # string
x = [0,1,2] # list
x = (0,1,2) # tuple
1. Python program to add two numbers in “ int data type “
Program code:
# Python3 program to add two numbers in int data types
num1 = 15
num2 = 12
# Adding two nos
sum = num1 + num2
# printing values
print("Sum of {0} and {1} is {2}" .format(num1, num2, sum))
output:
Sum of 15 and 12 is 27
Assignment problem:
Finding an area of rectangle using “float data type”
Program code:
length = 1.10
width = 2.20
area = length * width
print ("The area is: " , area)
Output:
This will print out: The area is: 2.42
Strings, lists, tuples in Python
Accessing Values in Lists:
To access values in lists, use the square brackets for slicing along with the
index or indices to obtain value available at that index
Program code:
list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5, 6, 7 ]
print ("list1[0]: ", list1[0])
print ("list2[1:5]: ", list2[1:5])
Output:
list1[0]: physics
list2[1:5]: [2, 3, 4, 5]
Basic List Operations:
Lists respond to the + and * operators much like strings; they mean
concatenation and repetition here too, except that the result is a new list,
not a string.
In fact, lists respond to all of the general sequence operations we used on
strings in the prior chapter.
Tuples:
A tuple is a sequence of immutable Python objects. Tuples are sequences,
just like lists. The main difference between the tuples and the lists is that
the tuples cannot be changed unlike lists. Tuples use parentheses, whereas
lists use square brackets. Creating a tuple is as simple as putting different
comma-separated values. Optionally, you can put these comma-separated
values between parentheses also.
Program code:
tup1 = ('physics', 'chemistry', 1997, 2000)
tup2 = (1, 2, 3, 4, 5, 6, 7 )
print ("tup1[0]: ", tup1[0])
print ("tup2[1:5]: ", tup2[1:5])
Output:
tup1[0]: physics
tup2[1:5]: (2, 3, 4, 5)
Strings:
Program code:
var1 = 'Hello World!'
var2 = "Python Programming"
print ("var1[0]: ", var1[0:5])
print ("var2[1:5]: ", var2[1:5])
Output:
var1[0]: Hello
var2[1:5]: ytho
Importing libraries for Machine learning applications in Python
Best Python libraries for Machine Learning:
Machine Learning, as the name suggests, is the science of programming a
computer by which they are able to learn from different kinds of data. A more
general definition given by Arthur Samuel is – “Machine Learning is the field
of study that gives computers the ability to learn without being explicitly
programmed.” They are typically used to solve various types of life problems.
In the older days, people used to perform Machine Learning tasks by manually
coding all the algorithms and mathematical and statistical formula. This made
the process time consuming, tedious and inefficient. But in the modern days, it
is become very much easy and efficient compared to the olden days by various
python libraries, frameworks, and modules. Today, Python is one of the most
popular programming languages for this task and it has replaced many
languages in the industry, one of the reason is its vast collection of libraries.
Python libraries that used in Machine Learning are:
Math
Numpy
Pandas
Matplotlib
TensorFlow
Scipy
Keras
PyTorch
Scikit-learn
Theano
1. Python math function
sqrt() function is an inbuilt function in Python programming language that returns the square root of
any number.
Syntax:
math.sqrt(x)
Parameter:
x is any number such that x>=0
Returns:
It returns the square root of the number
passed in the parameter.
Program:
# Python3 program to demonstrate the
# sqrt() method
# import the math module
import math
# print the square root of 0
print(math.sqrt(0))
# print the square root of 4
print(math.sqrt(4))
# print the square root of 3.5
print(math.sqrt(3.5))
Output:
0.0
2.0
1.8708286933869707
2. Python numpy function
NumPy is a very popular python library for large multi-dimensional array and
matrix processing, with the help of a large collection of high-level mathematical
functions. It is very useful for fundamental scientific computations in Machine
Learning. It is particularly useful for linear algebra, Fourier transform, and
random number capabilities. High-end libraries like TensorFlow use NumPy
internally for manipulation of Tensors.
Program:
# Python program using NumPy
# for some basic mathematical
# operations
import numpy as np
# Creating two arrays of rank 2
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
# Creating two arrays of rank 1
v = np.array([9, 10])
w = np.array([11, 12])
# Inner product of vectors
print(np.dot(v, w), "\n")
# Matrix and Vector product
print(np.dot(x, v), "\n")
# Matrix and matrix product
print(np.dot(x, y))
Output:
219
[29 67]
[[19 22]
[43 50]]
3. Python pandas function
Pandas is a popular Python library for data analysis. It is not directly related to
Machine Learning. As we know that the dataset must be prepared before
training. In this case, Pandas comes handy as it was developed specifically for
data extraction and preparation. It provides high-level data structures and wide
variety tools for data analysis. It provides many inbuilt methods for groping,
combining and filtering data.
Program:
# Python program using Pandas for
# arranging a given set of data
# into a table
# importing pandas as pd
import pandas as pd
data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing",
"Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }
data_table = pd.DataFrame(data)
print(data_table)
Output:
4. Python Matplotlib function
Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is not directly
related to Machine Learning. It particularly comes in handy when a programmer wants to visualize
the patterns in the data. It is a 2D plotting library used for creating 2D graphs and plots. A module
named pyplot makes it easy for programmers for plotting as it provides features to control line styles,
font properties, formatting axes, etc. It provides various kinds of graphs and plots for data
visualization, viz., histogram, error charts, bar chats, etc,
Program:
# Python program using Matplotib
# for forming a linear plot
# importing the necessary packages and modules
import matplotlib.pyplot as plt
import numpy as np
# Prepare the data
x = np.linspace(0, 10, 100)
# Plot the data
plt.plot(x, x, label ='linear')
# Add a legend
plt.legend()
# Show the plot
plt.show()
Output:
Data pre-processing in Python:
Pre-processing refers to the transformations applied to our data before
feeding it to the algorithm.
Data Preprocessing is a technique that is used to convert the raw data
into a clean data set. In other words, whenever the data is gathered from
different sources it is collected in raw format which is not feasible for the
analysis.
Program:
# Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
output:
Data Frame:
Interpolate and extrapolate missing data and Categorical
data
Data in real world are rarely clean and homogeneous. Typically, they tend to be incomplete, noisy,
and inconsistent and it is an important task of a Data scientist to prepossess the data by filling missing
values. It is important to be handled as they could lead to wrong prediction or classification for any
given model being used.
Missing values could be: NaN, empty string, ?,-1,-99,-999
Program:
# Data Preprocessing
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
Output:
?????
Linear Regression
Aim: To perform Linear regression on given dataset
Theory:
Linear Regression is a machine learning algorithm based on supervised
learning. It performs a regression task. Regression models a target prediction
value based on independent variables. It is mostly used for finding out the
relationship between variables and forecasting. Different regression models
differ based on – the kind of relationship between dependent and independent
variables, they are considering and the number of independent variables being
used.
Linear regression performs the task to predict a dependent variable
value (y) based on a given independent variable (x). So, this regression
technique finds out a linear relationship between X (input) and Y(output).
Y = a+bX
X: input training data (univariate – one input variable(parameter))
Y: labels to data (supervised learning)
a: intercept
b: coefficient of X
For example: X is years of experience based on which Y (Salary) may be
estimated by a linear relationship.
File1: simple_linear_regression.py
# Simple Linear Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3,
random_state = 0)
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
File2: Salary_Data.csv
Results:
VIVA QUESTIONS:
1. Can we use Simple Linear Regression to predict the winner of a football
game?
2. What is the class used in Python to create a simple linear regressor ?
3. What is the formula used in a simple linear regression model ?
4. Parameter B in Y= Ax+B indicates in simple linear regression model?
5. What is the perfect ratio to divide training and testing data in simple linear
regression model?
6. What is the difference between test data , train data and predicting data in
Linear regression concept?
7. What is regression?
8. What is the use of Simple linear regression algorithm?
9. What is the use of Pandas library and when we will use Pandas library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Multiple linear regression
Aim: To perform polynomial regression on given data set
Theory:
The goal in any data analysis is to extract from raw information the accurate
estimation. One of the most important and common question concerning if
there is statistical relationship between a response variable (Y) and
explanatory variables (Xi).
Multiple linear Regression is one of the simplest Machine Learning Algorithm.
It comes under the class of Supervised Learning Algorithms i.e, when we are
provided with training dataset.
Example
Example: A data scientist who wants to buy a car. He uses Multi variate
Regression model to estimate the price of the car. He estimates price as a
function of engine size, horse power, peakRPM, length, width and height.
Files:
50_Startups.csv
multiple_linear_regression.py
# Multiple Linear Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
# Avoiding the Dummy Variable Trap
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 0)
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
Results:
Prediction values of y: (for test data of x)
1 103015
2 132582
3 132448
4 71976.1
5 178537
6 116161
7 67851.7
8 98791.7
9 113969
10 167921
VIVA QUESTIONS:
1. Can we use Multiple Linear Regression to predict the winner of a football
game?
2. What is the class used in Python to create a Multiple linear regressor ?
3. What is the formula used in a Multiple linear regression model ?
5. What is the perfect ratio to divide training and testing data in Multiple linear
regression model?
6. What is the difference between test data , train data and predicting data in
Linear regression concept?
7. What is regression?
8. What is the use of Multiple linear regression algorithm?
9. What is the use of numpy library and when we will use numpy library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Polynomial Regression
Aim: To perform polynomial regression on given data set
Theory: polynomial regression is a form ofregression analysis in which the
relationship between the independent variable x and the dependent variable y is
modelled as an nth degree polynomial in x. Polynomial regression fits a
nonlinear relationship between the value of x and the
corresponding conditional mean of y, denoted E(y |x), and has been used to
describe nonlinear phenomena.
Polynomial vs linear regression
Files:
Position_Salaries.csv
polynomial_regression.py
# Polynomial Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
X=pd.DataFrame(X)
y = dataset.iloc[:, 2].values
y=pd.DataFrame(y)
# Splitting the dataset into the Training set and Test set
"""from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"""
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)"""
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
# Visualising the Linear Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Visualising the Polynomial Regression results (for higher resolution and smoother curve)
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, lin_reg_2.predict(poly_reg.fit_transform(X_grid)), color = 'blue')
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Predicting a new result with Linear Regression
lin_reg.predict(6.5)
# Predicting a new result with Polynomial Regression
lin_reg_2.predict(poly_reg.fit_transform(6.5))
Results:
VIVA QUESTIONS:
1. Can we use Polynomial Linear Regression to predict the winner of a dice
game?
2. What is the class used in Python to create a Polynomial linear regressor ?
3. What is the formula used in a Polynomial linear regression model ?
4. Parameter B in Y= Ax2+B indicates what in Polynomial linear regression
model?
5. What is the perfect ratio to divide training and testing data in Polynomial
linear regression model?
6. What is the difference between supervised and unsupervised machine
learning
7. What is regression?
8. What is the use of Polynomial linear regression algorithm?
9. What is the use of Pandas library and when we will use Pandas library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Naïve bayes algorithm (classification)
Aim: To classify given data set by Naïve bayes algorithm
Theory:
It is a classification technique based on Bayes’ Theorem with an
assumption of independence among predictors. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature. For example, a fruit may be
considered to be an apple if it is red, round, and about 3 inches in diameter.
Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability
that this fruit is an apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large
data sets. Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods.
Files:
Social_Network_Ads.csv
naive_bayes.py
# Naive Bayes
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,
random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:,
0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1,
step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:,
0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1,
step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
Results:
Viva questions:
1. Can we use Naïve bayes algorithm to predict the winner of a dice game?
2. What is the class used in Python to create a Naïve bayes algorithm?
3. What is the formula used in a Naïve bayes algorithm model ?
4. What is the perfect ratio to divide training and testing data in Naïve bayes
algorithm model?
6. Is classification supervised or unsupervised machine learning
7. What is Naïve bayes algorithm
8. What is the use of Pandas library and when we will use Pandas library?
9. What is the use of numpy Library and when we need it in Naïve bayes
algorithm model?
10. what is supervised learning?