AD8412 - DATA ANALYTICS LAB
1. Implement the following functions in the list of BMI values for people living in a rural area
bmi_list = [29, 18, 20, 22, 19, 25, 30, 28,22, 21, 18, 19, 20, 20, 22, 23]
(i) random.choice()
(ii) random.sample()
(iii) random.randint()
PROGRAM:
import random
from random import sample
def BMI(height, weight):
bmi = weight/(height**2)
return bmi
bmi_list = [29, 18, 20, 22, 19, 25, 30, 28,22, 21, 18, 19, 20, 20, 22, 23]
height = 1.79832
weight = 70
bmi= BMI(height, weight)
print("The BMI is", format(bmi), "so ", end='')
if (bmi < 18.5):
print("underweight")
elif ( bmi >= 18.5 and bmi < 24.9):
print("Healthy")
elif ( bmi >= 24.9 and bmi < 30):
print("overweight")
elif ( bmi >=30):
print("Suffering from Obesity")
The BMI is 21.64532402096181 so Healthy
(i) random.choice()
print(random.choice(bmi_list))
output:
30
In [98]:
Page 1 of 10
(ii)random.sample()
print(sample(bmi_list,3))
output: [18, 25, 22]
(iii)random.randint()
print(random.randint(0, 12))
output: 9
2. Use the random.choices() function to select multiple random items from a sequence with
repetition.
For example, You have a list of names, and you want to choose random four names from it,
and it’s okay for you if one of the names repeats.
names = ["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina", “KUMAR”]
PROGRAM:
import random
names=["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina","Kumar"]
# choose three random sample with replacement to including repetition
sample_list3 = random.choices(names, k=4)
print(sample_list3)
Output:
['Novac', 'Novac', 'Martina', 'Sarena']
3. Write a Python program to demonstrate the use of sample() function for string and tuple
types.
import random
string = "Welcome World"
print("With string:", random.sample(string, 4))
output: With string: ['r', 'm', 'W', 'W']
Page 2 of 10
tuple1 = ("Selshia", "AI", "computer", "science", "Jansons", "Engineering", "btech")
print("With tuple:", random.sample(tuple1, 4))
output:
With tuple: ['Jansons', 'Selshia', 'btech', 'Engineering']
4. Write a python script to implement the Z-Test for the following problem:
A school claimed that the students’ study that is more intelligent than the average school.
On calculating the IQ scores of 50 students, the average turns out to be 11. The mean of
the population IQ is 100 and the standard deviation is 15. Check whether the claim of
principal is right or not at a 5% significance level.
PROGRAM:
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')
if(p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")
OUTPUT:mean=109.65 stdv=2.06
Reject Null Hypothesis
Page 3 of 10
5. Write a Python program to demonstrate the ‘T-Test’ with suitable libraries for a sample
student’s data. (Create and use dataset of your own)
import pandas as pd
df=pd.read_csv("paired_ttest - paired_ttest.csv") tscore,pvalue= stats.ttest_rel(df['Brand
1'],df['Brand 2']) alpha=0.20
print(tscore,pvalue) if (pvalue>alpha):
print("Failed to reject or do not reject null hypothesis") else:
print("Reject null hypothesis")
output:
6. Import the necessary libraries in Python for implementing ‘One-Way ANOVA Test’ in a
sample dataset. (Create and use dataset of your own)
Program:
import pandas as pd
# load data file
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.txt",
sep="\t")
# reshape the d dataframe suitable for statsmodels package
df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
# replace column names
df_melt.columns = ['index', 'treatments', 'value']
# generate a boxplot to see the data distribution by treatments. Using boxplot, we can
# easily detect the differences between different treatments
import matplotlib.pyplot as plt
import seaborn as sns
Page 4 of 10
ax = sns.barplot(x='treatments', y='value', data=df_melt)
ax = sns.swarmplot(x="treatments", y="value", data=df_melt)
plt.show()
output:
7. Import the necessary libraries in Python for implementing Two-Way ANOVA Test’ in a
sample dataset. (Create and use dataset of your own)
8. Let us consider a dataset where we have a value of response y for every feature x:
Generate a regression line for this sample data using Python.
PROGRAM:
Page 5 of 10
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
y_pred = b[0] + b[1]*x
plt.plot(x, y_pred, color = "g")
Page 6 of 10
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
OUTPUT:
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
9. Import scipy and draw the line of Linear Regression for the following data:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
Page 7 of 10
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Where the x-axis represents age, and the y-axis represents speed. We have registered the
age and speed of 13 cars as they were passing a tollbooth.
PROGRAM:
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.xlabel('Age')
plt.ylabel('Speed Of Cars')
plt.show()
OUTPUT:
Page 8 of 10
Implement the time series analysis concept for a sample dataset using Pandas.
10. (Create and use dataset of your own)
Refer 12th program
Write a Python program to visualize the time series concepts using Matplotlib.
11. (Create and use dataset of your own)
REFER 12th program
12. Demonstrate various time series models using Python.(Create and use dataset of your own)
PROGRAM:
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from
Page 9 of 10
1992 to 2008.')
OUTPUT:
Page 10 of 10