Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views39 pages

Data Science & Analytics Lab Manual

The document outlines the syllabus for a Data Science and Analytics Laboratory course, detailing various experiments using Python and libraries such as Numpy, Pandas, and Matplotlib. Each experiment includes an aim, algorithm, program code, and output results, covering topics like data manipulation, statistical tests, and data visualization. The document serves as a comprehensive guide for implementing practical data science techniques using Python.

Uploaded by

ashickjoshua58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views39 pages

Data Science & Analytics Lab Manual

The document outlines the syllabus for a Data Science and Analytics Laboratory course, detailing various experiments using Python and libraries such as Numpy, Pandas, and Matplotlib. Each experiment includes an aim, algorithm, program code, and output results, covering topics like data manipulation, statistical tests, and data visualization. The document serves as a comprehensive guide for implementing practical data science techniques using Python.

Uploaded by

ashickjoshua58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

AD3411: DATA SCIENCE AND ANALYTICS LABORATORY

SYLLABUS

List of Experiments:

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

1. Working with Numpy arrays

2. Working with Pandas data frames

3. Basic plots using Matplotlib

4. Frequency distributions, Averages, Variability

5. Normal curves, Correlation and scatter plots, Correlation coefficient

6. Regression

7. Z-test

8. T-test

9. ANOVA

10. Building and validating linear models

11. Building and validating logistic models

12. Time series analysis

EX. No.1 IMPLEMENTATION OF NUMPY ARRAYS


Aim:

To write a python program to implement arrays using Numpy package.

Algorithm:

1. Start.
2. Import Numpy.
3. Define two arrays.
4. Add a number to each array and print it.
5. Perform unary and binary operations and print it.
6. Stop

Program:

import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])

print("Addition")

print(array1 + array2)

print("-" * 20)

print("Subtraction")

print(array1 - array2)

print("-" * 20)

print("Multiplication")

print(array1 * array2)

print("-" * 20)

print("Division")

print(array2 / array1)

print("-" * 40)

print(array1 ** array2)

print("-" * 40)

a = np.array([1, 2, 5, 3])

print ("Adding 1 to every element:", a+1)

print ("Subtracting 3 from each element:", a-3)


print ("Multiplying each element by 10:", a*10)

print ("Squaring each element:", a**2)

a *= 2

print ("Doubled each element of original array:", a)

a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])

print ("\nOriginal array:\n", a)

print ("Transpose of array:\n", a.T)

Output:’

Addition

[[ 8 10 12]

[14 16 18]]

Subtraction [[-6 -6 -6]

[-6 -6 -6]]

Multiplication [[ 7 16 27]

[40 55 72]]

Division

[[7. 4. 3. ]

[2.5 2.2 2. ]]

[[ 1 256 19683]

[ 1048576 48828125 -2118184960]]

Adding 1 to every element: [2 3 6 4]

Subtracting 3 from each element: [-2 -1 2 0]


Multiplying each element by 10: [10 20 50 30]

Squaring each element: [ 1 4 25 9]

Doubled each element of original array: [ 2 4 10 6]

Original array:

[[1 2 3]

[3 4 5]

[9 6 0]]

Transpose of array:

[[1 3 9]

[2 4 6]

[3 5 0]]

Result:

Thus a python program to implement arrays using Numpy package was written and executed successfully.

EX. No.2 IMPLEMENTATION OF PANDAS DATAFRAMES

Aim:
To write a Python program to implement Pandas Dataframes using some of the basic function of Python‟s
package pandas.

Algorithm:

1. Start
2. Import pandas package.
3. Create a csv file of students data.[ Type data in excel sheet and save it in .csv format]
4. Load the file into a dataframe.
5. Use the various basic functions of pandas package
a. Filtering data – Filters the particular data needed
b. head() – Display first few rows.
c. tail() - Display last few rows.
d. describe() – Gives a description of the table
e. info() – Gives information regarding the table
f. iloc() – Displaying certain parts of a row and columns
g. sort_values(„Name‟, ascending=False) – Sorting in Descending order
h. sort_values(„Age‟, ascending=True)) - Sorting in Ascending order
6. Print result of each function.
7. Stop

Program:

import pandas as pd

df = pd.read_csv("stud_data.csv")

filtered_data = df[df['Age'] > 19]

print("Filtered DataFrame:")

print(filtered_data)

print("Printing first few data")

print(df.head())

print("Printing last few data ")

print(df.tail())

print("Table Description")

print(df.describe())

print("Table Information")

print(df.info())

print("Printing first few data")

print(df.iloc[:2,:2])

print("Descending order")
print(df.sort_values(“Name”, ascending=False))

print("Ascending order")

print(df.sort_values(“Name”, ascending=True))

Sample csv file : stud_data.csv (draw it in the left side of the notebook)

Output:

Filtered DataFrame:

Empty DataFrame

Columns: [Roll No., Name, Age]

Index: []

Printing first few data

Roll No. Name Age

0 1 Ajith 17

1 2 Arun 18

2 3 Ben 19

3 4 Colen 17

4 5 Dishone 18

Printing last few data

Roll No. Name Age

5 6 Evin 18
6 7 Esha 17

7 8 Fino 19

8 9 Gene 19

9 10 Harry 19

Table Description

Roll No. Age

count 10.00000 10.000000

mean 5.50000 18.100000

std 3.02765 0.875595

min 1.00000 17.000000

25% 3.25000 17.250000

50% 5.50000 18.000000

75% 7.75000 19.000000

max 10.00000 19.000000

Table Information

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 10 entries, 0 to 9

Data columns (total 3 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Roll No. 10 non-null int64

1 Name 10 non-null object

2 Age 10 non-null int64

dtypes: int64(2), object(1)

memory usage: 368.0+ bytes

None

Printing first few data

Roll No. Name

0 1 Ajith
1 2 Arun

Descending order

Roll No. Name Age

9 10 Harry 19

8 9 Gene 19

7 8 Fino 19

5 6 Evin 18

6 7 Esha 17

4 5 Dishone 18

3 4 Colen 17

2 3 Ben 19

1 2 Arun 18

0 1 Ajith 17

Ascending order

Roll No. Name Age

0 1 Ajith 17

1 2 Arun 18

2 3 Ben 19

3 4 Colen 17

4 5 Dishone 18

6 7 Esha 17

5 6 Evin 18

7 8 Fino 19

8 9 Gene 19

9 10 Harry 19

Result:
Thus the program to work with Pandas Dataframes using Python‟s package pandas was written, executed and
output verified successfully.

Ex. No.3 IMPLEMENTATION OF BASIC PLOTS USING MATPLOTLIB

Aim:
To write a Python program to implement basic plots in Python using Matplotlib .
Algorithm:
1. Start
2. Import numpy and matplolib package
3. Create two arrays for points x and y.
4. Design the plots using the method
a. Dashed plot: Use plot() method.
b. Scatter plot :Use scatter() method.
c. Bar plot :Use bar() method.
d. Histogram :Take x using random() method and use hist() method.
e. Pie Chart :Use pie() method.
5. Display the graph using show() method. Assign labels/legends if needed.
6. Stop

Progam:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95])

y = np.array([240, 250, 260, 310])

#Dotted graph

plt.plot(x, y, linestyle=‟dashed‟)
plt.xlabel("Temperature")
plt.ylabel("Farenheit")
plt.title("Measure")
plt.show()
#Scatter plot

x = np.array([5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6])

y = np.array([99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86])

plt.scatter(x, y) plt.show()
#Bar Graph
x = np.array(["A", "B", "C", "D"])

y = np.array([3, 8, 1, 10])

plt.bar(x, y, color=‟red‟)

plt.show()

#Histogram

x = np.random.normal(170, 10, 250) plt.hist(x)


plt.show()

# Pie Charts

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels=mylabels)
y = np.array([35, 25, 25,15])
plt.legend()
plt.show()
Result:

Thus the python program to implement basic plots using Matplotlib was written, executed and verified
successfully.
EX. No.4. IMPLEMENTATION OF FREQUENCY DISTRIBUTIONS, AVERAGES, VARIABILITY

4a) Average and Variability

Aim:

To implement Average and Varaiability using Python’s packages.

Algorithm:

1.Import the necessary python packages.

2. Create a list of elements.

3. Find the average and variance using the predefined functions.

4. Print the result.

Program:

import numpy as np

list1= [2, 4, 4, 4, 5, 5, 7, 9]

print(“Average”,np.average(list1))

print(“Variance” , np.var(list1)

Output:

Average: 5.0

Variance: 4.0

Result:

Thus the program to implement Average and Varaiability using Python’s packages was written and executed
successfully.

4b) Frequency Distribution

Aim:

To implement Frequency Distribution using Python’s packages.

Algorithm:

1. Import the necessary packages.


2. Read a dataset
3. Note the frequency of occurrence of any item in the table.
4. Display the frequency table.

Program:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

data = pd.read_csv('iris.csv')

freq_table = pd.crosstab(data['species'], 'no_of_species')

freq_table

Output

Result:

Thus the program to implement Frequency Distribution using Python’s packages was written and executed
successfully.
EX. NO.5 IMPLEMENTATION OF NORMAL CURVES AND CORRELATION AND SCATTER PLOTS

Aim :

To implement python programs for Normal curves and correction and scatter plots using different python
libraries.

5a) Normal Curves

Algorithm

 Step1: Import the needed module

 Step2: Create data points

 Step3: Calculate the mean and standard deviation

 Step3: Calculate normal probability density

 Step4: Plot using above calculated values

 Step5: Display plot

Program

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import norm


import statistics

# Plot between -10 and 10 with .001 steps.

x_axis = np.arange(-20, 20, 0.01)

# Calculating mean and standard deviation

mean = statistics.mean(x_axis)

sd = statistics.stdev(x_axis)

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()

Output

5b) Correlation and scatter plots

Algorithms

Step 1: Importing the libraries.

Step 2: Finding the Correlation between two variables.

Step 3: Plotting the graph using scatter plot.

Step 4: Add the title and labelling

Program
import sklearn

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

y = pd.Series([1, 2, 3, 4, 3, 5, 4])

x = pd.Series([1, 2, 3, 4, 5, 6, 7])

correlation = y.corr(x)

correlation

# plotting the data

plt.scatter(x, y)

# This will fit the best line into the graph

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))

(np.unique(x)), color='red')

# adds the title

plt.title('Correlation')

# plot the data

plt.scatter(x, y)

# fits the best fitting line to the data

plt.plot(np.unique(x),

np.poly1d(np.polyfit(x, y, 1))

(np.unique(x)), color='red')

# Labelling axes
plt.xlabel('x axis')

plt.ylabel('y axix')

Output

Result:

Thus the python programs for Normal curves , correction and scatter plots using different python libraries was
implemented and verified successfully.
Ex.No.6 REGRESSION

Aim : To implement python programs for Regression using python libraries.

Linear Regression: - Simple linear regression is an approach for predicting a response using

a single feature.

- It is assumed that the two variables are linearly related.

- Find a linear function that predicts the response value(y) as accurately as


- possible as a function of the feature or independent variable(x).
Algorithm

Step 1: Importing the libraries.

Step 2: Assign the mean for x and y.

Step 3: Calculating cross-deviation and deviation about x

Step 4: Calculating regression coefficients

Step 5: Plot the actual points as scatter plot

Step 6: Predict response vector

Step 7: plot the regression line

Step 8: Assign the labels for x and y

Step 9: Assign the values for observations / data

Step 10: Estimating the coefficients

Step 11: Plot the Regression line

Program
import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":

main()

Output

Estimated coefficients:

b_0 = 1.2363636363636363

b_1 = 1.1696969696969697
Result:
Thus the python code for Regression using various python libraries was implemente
d and verified successfully.

Ex.No.7 Z-TEST

Aim : To implement python programs for Z-test.


Algorithm

Step 1: Import the libraries

Step 2: Generate a random array of 50 numbers having mean 110 and sd 15

Step 3: Print mean and sd

Step 4: Perform the test.

Step 5: Pass the mean value in the null hypothesis, in alternative hypothesis we check
whether the mean is larger

Step 6: Assign the function outputs a p_value and z-score corresponding to that value, then

compare the p-value with alpha, if it is greater than alpha then we do not null
hypothesis

Step 7: Else we reject it.

Program

import math

import numpy as np

from numpy.random import randn

from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15

# similar to the IQ scores data we assume above

mean_iq = 110

sd_iq = 15/math.sqrt(50)
alpha =0.05

null_mean =100

data = sd_iq*randn(50)+mean_iq

# print mean and sd

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter

# we passed mean value in the null hypothesis, in alternative hypothesis we check whether
the

# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

# the function outputs a p_value and z-score corresponding to that value, we compare the

# p-value with alpha, if it is greater than alpha then we do not null hypothesis

# else we reject it.

if(p_value < alpha):

print("Reject Null Hypothesis")

else:

print("Fail to Reject NUll Hypothesis")


Output

mean=110.22 stdv=2.37
Reject Null Hypothesis

Result:

Thus the python code for Z- test was implemented and verified successfully.

Ex.No.8 T-TEST

Aim : To implement python programs for T test.

Algorithm

Step 1: Import the library

Step 2: Create the data groups


Step 3: Perform the two sample t-test with equal variances

Program

# Python program to demonstrate how to

# perform two sample T-test

# Import the library

import scipy.stats as stats

# Creating data groups

data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,

17, 16, 14, 19, 20, 21, 15,

15, 16, 16, 13, 14, 12])

data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,

19, 19, 14, 17, 22, 24, 16,

13, 16, 13, 18, 15, 13])

# Perform the two sample t-test with equal variances

stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True)

Output

Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)

Result:
Thus the python code for T- test was implemented and verified successfully.

Ex.No.9 ANOVA

Aim: To implement a python code for ANOVA

Algorithm

Steps 1: Import python library packages

Step 2: Create the data groups

Step 3: Conduct the one-way ANOVA

Program

# Importing library
from scipy.stats import f_oneway

# Performance when each of the engine


# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]

# Conduct the one-way ANOVA


f_oneway(performance1, performance2, performance3, performance4)

Output

F_onewayResult(statistic=4.625000000000002, pvalue=0.01633645983978022)

Result:
Thus the python code for ANOVA was implemented and verified successfully.

Ex.No.10
BUILDING AND VALIDATING LINEAR MODEL

Aim:

To implement a program to build and validate a linear model using python libraries .

Algorithm:

Step1: Import the packages and classes that you need.

Step2: Provide data to work with, and eventually do appropriate transformations.

Step3: Create a regression model and fit it with existing data.

Step4: Check the results of model fitting to know whether the model is satisfactory.

Step5: Apply the model for predictions.

Program

import openturns as ot

import openturns.viewer as viewer

from matplotlib import pylab as plt

ot.Log.Show(ot.Log.NONE)

N = 1000

Xsample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)

Ysample = Xsample * 3.0 + ot.Normal(0.5, 1.0).getSample(N)

particularXSample = ot.Triangular(1.0, 5.0, 10.0).getSample(N)

result = ot.LinearModelAlgorithm(Xsample, Ysample).getResult()


# Get the coefficients ai

print("coefficients of the linear regression model = ", result.getCoefficients())

# Get the confidence intervals of the ai coefficients

print(

"confidence intervals of the coefficients = ",

ot.LinearModelAnalysis(result).getCoefficientsConfidenceInterval(0.9),)

graph = ot.VisualTest.DrawLinearModel(Xsample, Ysample, result)

view = viewer.View(graph)

graph = ot.VisualTest.DrawLinearModelResidual(Xsample, Ysample, result)

view = viewer.View(graph)

resultLinearModelFisher = ot.LinearModelTest.LinearModelFisher(

Xsample, Ysample, result, 0.10

print("Test Success ? ", resultLinearModelFisher.getBinaryQualityMeasure())

print("p-value of the LinearModelFisher Test = ", resultLinearModelFisher.getPValue())

print("p-value threshold = ", resultLinearModelFisher.getThreshold())

resultLinearModelResidualMean = ot.LinearModelTest.LinearModelResidualMean(

Xsample, Ysample, result, 0.10

print("Test Success ? ", resultLinearModelResidualMean.getBinaryQualityMeasure())

print(

"p-value of the LinearModelResidualMean Test = ",

resultLinearModelResidualMean.getPValue(),

print("p-value threshold = ", resultLinearModelResidualMean.getThreshold())


plt.show()

Output:

coefficients of the linear regression model = [0.592409,2.98159]

confidence intervals of the coefficients = [0.435545, 0.749274]

[2.95382, 3.00935]

Test Success ? False

p-value of the LinearModelFisher Test = 0.0

p-value threshold = 0.1

Test Success ? True

p-value of the LinearModelResidualMean Test = 0.9999999999998426

p-value threshold = 0.1


Result:

Thus the python program to build and validate a linear model was implemented successfully.

Ex.No.11 BUILDING AND VALIDATING LOGISTIC MODEL

Aim: To implement a logistic regression model using python libraries .

Algorithm

Step 1: Import the required libraries.


Step 2: Get the data.

Step 3: Create a model and train it

Step 4: Evaluate the model of Logistic Regression

Program:

import numpy as np

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix

# Step 2: Get data

x = np.arange(10).reshape(-1, 1)

y = np.array([0, 1, 0, 0, 1, 1, 1, 1, 1, 1])

# Step 3: Create a model and train it

model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)

model.fit(x, y)

# Step 4: Evaluate the model

p_pred = model.predict_proba(x)

y_pred = model.predict(x)

score_ = model.score(x, y)
conf_m = confusion_matrix(y, y_pred)

print('x:', x, sep='\n')

print('intercept:', model.intercept_)

print('coef:', model.coef_, end='\n\n')

print('p_pred:', p_pred, sep='\n', end='\n\n')

print('y_pred:', y_pred, end='\n\n')

print('score_:', score_, end='\n\n')

print('conf_m:', conf_m, sep='\n', end='\n\n')

print('report:', report, sep='\n')

Output:

x:

[[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]]
intercept: [-1.51632619]

coef: [[0.703457]]

p_pred:

[[0.81999686 0.18000314]

[0.69272057 0.30727943]

[0.52732579 0.47267421]

[0.35570732 0.64429268]

[0.21458576 0.78541424]

[0.11910229 0.88089771]

[0.06271329 0.93728671]

[0.03205032 0.96794968]

[0.0161218 0.9838782 ]

[0.00804372 0.99195628]]

y_pred: [0 0 0 1 1 1 1 1 1 1]

score_: 0.8

conf_m:

[[2 1]
[1 6]]

report:

precision recall f1-score support

0 0.67 0.67 0.67 3

1 0.86 0.86 0.86 7

accuracy 0.80 10

macro avg 0.76 0.76 0.76 10

weighted avg 0.80 0.80 0.80 10

Result: Thus the python program for logistic regression model using python libraries was
implemented successfully.
Ex.No.12 TIME SERIES ANALYSIS

Aim:

To implement Time series Analysis using python libraries.

Algorithm:

Step1: Import the packages and classes that you need.

Step2: Read data.

Step3: Convert to dataframe.

Step4: Plot the model

Program:

from datetime import datetime

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv('path_to_file/stock.csv')

df = pd.DataFrame(data, columns = ['ValueDate', 'Price'])

# Set the Date as Index


df['ValueDate'] = pd.to_datetime(df['ValueDate'])

df.index = df['ValueDate']

del df['ValueDate']

df.plot(figsize=(15, 6))

plt.show()

Output:

Result: Thus the python program for time series analysis using python libraries
was implemented successfully.

You might also like