DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
AD3411: DATA SCIENCE AND ANALYTICS LABORATORY
SYLLABUS
List of Experiments:
Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
1. Working with Numpy arrays
2. Working with Pandas data frames
3. Basic plots using Matplotlib
4. Frequency distributions, Averages, Variability
5. Normal curves, Correlation and scatter plots, Correlation coefficient
6. Regression
7. Z-test
8. T-test
9. ANOVA
10. Building and validating linear models
11. Building and validating logistic models
12. Time series analysis
EX. No.1 IMPLEMENTATION OF NUMPY ARRAYS
Aim:
To write a python program to implement arrays using Numpy package.
Algorithm:
1. Start.
2. Import Numpy.
3. Define two arrays.
4. Add a number to each array and print it.
5. Perform unary and binary operations and print it.
6. Stop
Program:
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
print("Addition")
print(array1 + array2)
print("-" * 20)
print("Subtraction")
print(array1 - array2)
print("-" * 20)
print("Multiplication")
print(array1 * array2)
print("-" * 20)
print("Division")
print(array2 / array1)
print("-" * 40)
print(array1 ** array2)
print("-" * 40)
a = np.array([1, 2, 5, 3])
print ("Adding 1 to every element:", a+1)
print ("Subtracting 3 from each element:", a-3)
print ("Multiplying each element by 10:", a*10)
print ("Squaring each element:", a**2)
a *= 2
print ("Doubled each element of original array:", a)
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)
Output:’
Addition
[[ 8 10 12]
[14 16 18]]
Subtraction [[-6 -6 -6]
[-6 -6 -6]]
Multiplication [[ 7 16 27]
[40 55 72]]
Division
[[7. 4. 3. ]
[2.5 2.2 2. ]]
[[ 1 256 19683]
[ 1048576 48828125 -2118184960]]
Adding 1 to every element: [2 3 6 4]
Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
Result:
Thus a python program to implement arrays using Numpy package was written and executed successfully.
EX. No.2 IMPLEMENTATION OF PANDAS DATAFRAMES
Aim:
To write a Python program to implement Pandas Dataframes using some of the basic function of Python‟s
package pandas.
Algorithm:
1. Start
2. Import pandas package.
3. Create a csv file of students data.[ Type data in excel sheet and save it in .csv format]
4. Load the file into a dataframe.
5. Use the various basic functions of pandas package
a. Filtering data – Filters the particular data needed
b. head() – Display first few rows.
c. tail() - Display last few rows.
d. describe() – Gives a description of the table
e. info() – Gives information regarding the table
f. iloc() – Displaying certain parts of a row and columns
g. sort_values(„Name‟, ascending=False) – Sorting in Descending order
h. sort_values(„Age‟, ascending=True)) - Sorting in Ascending order
6. Print result of each function.
7. Stop
Program:
import pandas as pd
df = pd.read_csv("stud_data.csv")
filtered_data = df[df['Age'] > 19]
print("Filtered DataFrame:")
print(filtered_data)
print("Printing first few data")
print(df.head())
print("Printing last few data ")
print(df.tail())
print("Table Description")
print(df.describe())
print("Table Information")
print(df.info())
print("Printing first few data")
print(df.iloc[:2,:2])
print("Descending order")
print(df.sort_values(“Name”, ascending=False))
print("Ascending order")
print(df.sort_values(“Name”, ascending=True))
Sample csv file : stud_data.csv (draw it in the left side of the notebook)
Output:
Filtered DataFrame:
Empty DataFrame
Columns: [Roll No., Name, Age]
Index: []
Printing first few data
Roll No. Name Age
0 1 Ajith 17
1 2 Arun 18
2 3 Ben 19
3 4 Colen 17
4 5 Dishone 18
Printing last few data
Roll No. Name Age
5 6 Evin 18
6 7 Esha 17
7 8 Fino 19
8 9 Gene 19
9 10 Harry 19
Table Description
Roll No. Age
count 10.00000 10.000000
mean 5.50000 18.100000
std 3.02765 0.875595
min 1.00000 17.000000
25% 3.25000 17.250000
50% 5.50000 18.000000
75% 7.75000 19.000000
max 10.00000 19.000000
Table Information
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Roll No. 10 non-null int64
1 Name 10 non-null object
2 Age 10 non-null int64
dtypes: int64(2), object(1)
memory usage: 368.0+ bytes
None
Printing first few data
Roll No. Name
0 1 Ajith
1 2 Arun
Descending order
Roll No. Name Age
9 10 Harry 19
8 9 Gene 19
7 8 Fino 19
5 6 Evin 18
6 7 Esha 17
4 5 Dishone 18
3 4 Colen 17
2 3 Ben 19
1 2 Arun 18
0 1 Ajith 17
Ascending order
Roll No. Name Age
0 1 Ajith 17
1 2 Arun 18
2 3 Ben 19
3 4 Colen 17
4 5 Dishone 18
6 7 Esha 17
5 6 Evin 18
7 8 Fino 19
8 9 Gene 19
9 10 Harry 19
Result:
Thus the program to work with Pandas Dataframes using Python‟s package pandas was written, executed and
output verified successfully.
Ex. No.3 IMPLEMENTATION OF BASIC PLOTS USING MATPLOTLIB
Aim:
To write a Python program to implement basic plots in Python using Matplotlib .
Algorithm:
1. Start
2. Import numpy and matplolib package
3. Create two arrays for points x and y.
4. Design the plots using the method
a. Dashed plot: Use plot() method.
b. Scatter plot :Use scatter() method.
c. Bar plot :Use bar() method.
d. Histogram :Take x using random() method and use hist() method.
e. Pie Chart :Use pie() method.
5. Display the graph using show() method. Assign labels/legends if needed.
6. Stop
Progam:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95])
y = np.array([240, 250, 260, 310])
#Dotted graph
plt.plot(x, y, linestyle=‟dashed‟)
plt.xlabel("Temperature")
plt.ylabel("Farenheit")
plt.title("Measure")
plt.show()
#Scatter plot
x = np.array([5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6])
y = np.array([99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86])
plt.scatter(x, y) plt.show()
#Bar Graph
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.bar(x, y, color=‟red‟)
plt.show()
#Histogram
x = np.random.normal(170, 10, 250) plt.hist(x)
plt.show()
# Pie Charts
y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels=mylabels)
y = np.array([35, 25, 25,15])
plt.legend()
plt.show()
Result:
Thus the python program to implement basic plots using Matplotlib was written, executed and verified
successfully.
EX. No.4. IMPLEMENTATION OF FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY
4a) Average and Variability
Aim:
To implement Average and Varaiability using Python’s packages.
Algorithm:
1.Import the necessary python packages.
2. Create a list of elements.
3. Find the average and variance using the predefined functions.
4. Print the result.
Program:
import numpy as np
list1= [2, 4, 4, 4, 5, 5, 7, 9]
print(“Average”,np.average(list1))
print(“Variance” , np.var(list1)
Output:
Average: 5.0
Variance: 4.0
Result:
Thus the program to implement Average and Varaiability using Python’s packages was written and executed
successfully.
4b) Frequency Distribution
Aim:
To implement Frequency Distribution using Python’s packages.
Algorithm:
1. Import the necessary packages.
2. Read a dataset
3. Note the frequency of occurrence of any item in the table.
4. Display the frequency table.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('iris.csv')
freq_table = pd.crosstab(data['species'], 'no_of_species')
freq_table
Output
Result:
Thus the program to implement Frequency Distribution using Python’s packages was written and executed
successfully.
EX. NO.5 IMPLEMENTATION OF NORMAL CURVES AND CORRELATION AND SCATTER PLOTS
Aim :
To implement python programs for Normal curves and correction and scatter plots using different python
libraries.
5a) Normal Curves
Algorithm
Step1: Import the needed module
Step2: Create data points
Step3: Calculate the mean and standard deviation
Step3: Calculate normal probability density
Step4: Plot using above calculated values
Step5: Display plot
Program
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
# Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviation
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()
Output
5b) Correlation and scatter plots
Algorithms
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph using scatter plot.
Step 4: Add the title and labelling
Program
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
# plotting the data
plt.scatter(x, y)
# This will fit the best line into the graph
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
# adds the title
plt.title('Correlation')
# plot the data
plt.scatter(x, y)
# fits the best fitting line to the data
plt.plot(np.unique(x),
np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
# Labelling axes
plt.xlabel('x axis')
plt.ylabel('y axix')
Output
Result:
Thus the python programs for Normal curves , correction and scatter plots using different python libraries was
implemented and verified successfully.
Ex.No.6 REGRESSION
Aim : To implement python programs for Regression using python libraries.
Linear Regression: - Simple linear regression is an approach for predicting a response using
a single feature.
- It is assumed that the two variables are linearly related.
- Find a linear function that predicts the response value(y) as accurately as
- possible as a function of the feature or independent variable(x).
Algorithm
Step 1: Importing the libraries.
Step 2: Assign the mean for x and y.
Step 3: Calculating cross-deviation and deviation about x
Step 4: Calculating regression coefficients
Step 5: Plot the actual points as scatter plot
Step 6: Predict response vector
Step 7: plot the regression line
Step 8: Assign the labels for x and y
Step 9: Assign the values for observations / data
Step 10: Estimating the coefficients
Step 11: Plot the Regression line
Program
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Output
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
Result:
Thus the python code for Regression using various python libraries was implemente
d and verified successfully.
Ex.No.7 Z-TEST
Aim : To implement python programs for Z-test.
Algorithm
Step 1: Import the libraries
Step 2: Generate a random array of 50 numbers having mean 110 and sd 15
Step 3: Print mean and sd
Step 4: Perform the test.
Step 5: Pass the mean value in the null hypothesis, in alternative hypothesis we check
whether the mean is larger
Step 6: Assign the function outputs a p_value and z-score corresponding to that value, then
compare the p-value with alpha, if it is greater than alpha then we do not null
hypothesis
Step 7: Else we reject it.
Program
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd 15
# similar to the IQ scores data we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check whether
the
# mean is larger
ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')
# the function outputs a p_value and z-score corresponding to that value, we compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
# else we reject it.
if(p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")
Output
mean=110.22 stdv=2.37
Reject Null Hypothesis
Result:
Thus the python code for Z- test was implemented and verified successfully.
Ex.No.8 T-TEST
Aim : To implement python programs for T test.
Algorithm
Step 1: Import the library
Step 2: Create the data groups
Step 3: Perform the two sample t-test with equal variances
Program
# Python program to demonstrate how to
# perform two sample T-test
# Import the library
import scipy.stats as stats
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)
Output
Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)
Result:
Thus the python code for T- test was implemented and verified successfully.
Ex.No.9 ANOVA
Aim: To implement a python code for ANOVA
Algorithm
Steps 1: Import python library packages
Step 2: Create the data groups
Step 3: Conduct the one-way ANOVA
Program
# Importing library
from scipy.stats import f_oneway
# Performance when each of the engine
# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]
# Conduct the one-way ANOVA
f_oneway(performance1, performance2, performance3, performance4)
Output
F_onewayResult(statistic=4.625000000000002, pvalue=0.01633645983978022)
Result:
Thus the python code for ANOVA was implemented and verified successfully.
Ex.No.10
BUILDING AND VALIDATING LINEAR MODEL
Aim:
To implement a program to build and validate a linear model using python libraries .
Algorithm:
Step1: Import the packages and classes that you need.
Step2: Provide data to work with, and eventually do appropriate transformations.
Step3: Create a regression model and fit it with existing data.
Step4: Check the results of model fitting to know whether the model is satisfactory.
Step5: Apply the model for predictions.
Program
import openturns as ot
import openturns.viewer as viewer
from matplotlib import pylab as plt
ot.Log.Show(ot.Log.NONE)
N = 1000
Xsample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
Ysample = Xsample * 3.0 + ot.Normal(0.5, 1.0).getSample(N)
particularXSample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)
result = ot.LinearModelAlgorithm(Xsample, Ysample).getResult()
# Get the coefficients ai
print("coefficients of the linear regression model = ", result.getCoefficients())
# Get the confidence intervals of the ai coefficients
print(
"confidence intervals of the coefficients = ",
ot.LinearModelAnalysis(result).getCoefficientsConfidenceInterval(0.9),)
graph = ot.VisualTest.DrawLinearModel(Xsample, Ysample, result)
view = viewer.View(graph)
graph = ot.VisualTest.DrawLinearModelResidual(Xsample, Ysample, result)
view = viewer.View(graph)
resultLinearModelFisher = ot.LinearModelTest.LinearModelFisher(
Xsample, Ysample, result, 0.10
print("Test Success ? ", resultLinearModelFisher.getBinaryQualityMeasure())
print("p-value of the LinearModelFisher Test = ", resultLinearModelFisher.getPValue())
print("p-value threshold = ", resultLinearModelFisher.getThreshold())
resultLinearModelResidualMean = ot.LinearModelTest.LinearModelResidualMean(
Xsample, Ysample, result, 0.10
print("Test Success ? ", resultLinearModelResidualMean.getBinaryQualityMeasure())
print(
"p-value of the LinearModelResidualMean Test = ",
resultLinearModelResidualMean.getPValue(),
print("p-value threshold = ", resultLinearModelResidualMean.getThreshold())
plt.show()
Output:
coefficients of the linear regression model = [0.592409,2.98159]
confidence intervals of the coefficients = [0.435545, 0.749274]
[2.95382, 3.00935]
Test Success ? False
p-value of the LinearModelFisher Test = 0.0
p-value threshold = 0.1
Test Success ? True
p-value of the LinearModelResidualMean Test = 0.9999999999998426
p-value threshold = 0.1
Result:
Thus the python program to build and validate a linear model was implemented successfully.
Ex.No.11 BUILDING AND VALIDATING LOGISTIC MODEL
Aim: To implement a logistic regression model using python libraries .
Algorithm
Step 1: Import the required libraries.
Step 2: Get the data.
Step 3: Create a model and train it
Step 4: Evaluate the model of Logistic Regression
Program:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
# Step 2: Get data
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 1, 0, 0, 1, 1, 1, 1, 1, 1])
# Step 3: Create a model and train it
model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)
model.fit(x, y)
# Step 4: Evaluate the model
p_pred = model.predict_proba(x)
y_pred = model.predict(x)
score_ = model.score(x, y)
conf_m = confusion_matrix(y, y_pred)
print('x:', x, sep='\n')
print('intercept:', model.intercept_)
print('coef:', model.coef_, end='\n\n')
print('p_pred:', p_pred, sep='\n', end='\n\n')
print('y_pred:', y_pred, end='\n\n')
print('score_:', score_, end='\n\n')
print('conf_m:', conf_m, sep='\n', end='\n\n')
print('report:', report, sep='\n')
Output:
x:
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
intercept: [-1.51632619]
coef: [[0.703457]]
p_pred:
[[0.81999686 0.18000314]
[0.69272057 0.30727943]
[0.52732579 0.47267421]
[0.35570732 0.64429268]
[0.21458576 0.78541424]
[0.11910229 0.88089771]
[0.06271329 0.93728671]
[0.03205032 0.96794968]
[0.0161218 0.9838782 ]
[0.00804372 0.99195628]]
y_pred: [0 0 0 1 1 1 1 1 1 1]
score_: 0.8
conf_m:
[[2 1]
[1 6]]
report:
precision recall f1-score support
0 0.67 0.67 0.67 3
1 0.86 0.86 0.86 7
accuracy 0.80 10
macro avg 0.76 0.76 0.76 10
weighted avg 0.80 0.80 0.80 10
Result: Thus the python program for logistic regression model using python libraries was
implemented successfully.
Ex.No.12 TIME SERIES ANALYSIS
Aim:
To implement Time series Analysis using python libraries.
Algorithm:
Step1: Import the packages and classes that you need.
Step2: Read data.
Step3: Convert to dataframe.
Step4: Plot the model
Program:
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('path_to_file/stock.csv')
df = pd.DataFrame(data, columns = ['ValueDate', 'Price'])
# Set the Date as Index
df['ValueDate'] = pd.to_datetime(df['ValueDate'])
df.index = df['ValueDate']
del df['ValueDate']
df.plot(figsize=(15, 6))
plt.show()
Output:
Result: Thus the python program for time series analysis using python libraries
was implemented successfully.