Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
132 views27 pages

FDSA Lab Manual

The document is a lab manual for a B.Tech course in Artificial Intelligence and Data Science, detailing various experiments related to data science and analytics using Python. It covers topics such as working with Pandas data frames, basic plotting with Matplotlib, statistical tests like Z-test and T-test, and building linear and logistic regression models. Each experiment includes an aim, algorithm, program code, output, and results indicating successful completion.

Uploaded by

Vaishnavi A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views27 pages

FDSA Lab Manual

The document is a lab manual for a B.Tech course in Artificial Intelligence and Data Science, detailing various experiments related to data science and analytics using Python. It covers topics such as working with Pandas data frames, basic plotting with Matplotlib, statistical tests like Z-test and T-test, and building linear and logistic regression models. Each experiment includes an aim, algorithm, program code, output, and results indicating successful completion.

Uploaded by

Vaishnavi A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

B.

Tech- Artificial Intelligence and Data Science

AD3411- DATA SCIENCE AND ANALYTICS LABORATORY

II Year/IV Semester

LAB MANUAL
EXP NO: 1 Working with Pandas data frames
Date:

AIM: To work with Pandas data frames.

ALGORITHM:

Step1: Start

Step2: import pandas module

Step3: Create a dataframe using the dictionary

Step4: Print the output

Step5: Stop

PROGRAM:

import pandas as pd

data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'City': ['New York', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

print(df.head())

filtered_df = df[df['Age'] > 30]

print(filtered_df)

df['Senior'] = df['Age'] > 30

print(df)

grouped_df = df.groupby('City')['Age'].mean()

print(grouped_df)
OUTPUT:

Name Age City


0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
Name Age City
2 Charlie 35 Chicago
Name Age City Senior
0 Alice 25 New York False
1 Bob 30 Los Angeles False
2 Charlie 35 Chicago True
City
Chicago 35.0
Los Angeles 30.0
New York 25.0
Name: Age, dtype: float64

RESULT:

Thus the working with Pandas data frames was successfully completed.
EXP NO: 2 Basic plots using Matplotlib
Date:

AIM:

To draw basic plots in Python program using Matplotlib.

ALGORITHM:

Step1: Start

Step2: import Matplotlib module

Step3: Create a Basic plots using Matplotlib

Step4: Print the output

Step5: Stop

PROGRAM:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [2, 3, 5, 7, 11]

plt.plot(x, y, color='green', linestyle='--', marker='o', markersize=10)

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Customized Line Plot')

plt.show()
OUTPUT:

RESULT:

Thus the basic plots using Matplotlib in Python program was successfully completed.
EXP NO: Frequency distributions, Averages, variability
3A
Date:

AIM:

To write a python program to find the frequency distribution, Averages,


variability in jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import the python library modules

Step 3: Write the code to the frequency distributions

Step 4: Print the result

Step 5: Stop the program

PROGRAM:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

#sample data

data=[12, 15, 12, 19, 15, 13, 14, 12, 16, 17, 15, 18, 19, 20, 16, 15]

#1. frequency distribution using pandas for convenience

frequency_distribution = pd.Series(data).value_counts().sort_index()

# 2. average(mean)

mean=np.mean(data)
#3. variability(variance and standard deviation)
variance=np.var(data)
std_deviation=np.std(data)

#printing results
print("frequency distribution:")
print(frequency_distribution)
print("\nAverage(mean):",mean)
print("Variance:",variance)
print("Standard deviation:", std_deviation)

#4. plotting the frequency distribution as a bar chart


plt.figure(figsize=(5,3))
frequency_distribution.plot(kind='bar',color='skyblue')
plt.title("frequency distribution")
plt.xlabel('value')
plt.ylabel('frequency')
plt.xticks(rotation=0)
plt.show()

OUTPUT:

Frequency distribution:
12 3
13 1
14 1
15 4
16 2
17 1
18 1
19 2
20 1
Name: count, dtype: int64

Average (mean): 15.5


Variance: 6.25
Standard deviation: 2.5
RESULT:

Thus the computation of frequency distribution, Averages, variability was successfully completed.
EXP NO: Normal curves, Correlation and scatter plots, Correlation
4A
Date:
coefficient

AIM:

To create a normal curves, correlation and scatter plots, correlation coefficient using python
program

ALGORITHM:

Step 1: Start the program

Step 2: Import packages numpy and matplotlib

Step 3: Create the distribution

Step 4: Visualizing the distribution

Step 5: Stop the program

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy.stats import pearsonr

np.random.seed(42)

data = np.random.normal(loc=0, scale=1, size=1000) # Normal distribution with mean=8, std=1

#Plot the normal curve

plt.figure(figsize=(5, 3))

sns. histplot(data, kde= True, color='blue', stat='density', linewidth=0)

plt.title('Normal Distribution Curve')

plt.xlabel('x')
plt.ylabel('Density')

plt.show()

#2. Scatter Plot and Correlation Coefficient

# Generating two sets of data that have a linear relationship

x= np.random.rand(100)* 100 # Random data for X

y = 2*x + 5 + np.random.randn(100)* 10 # Linear relationship with some noise

# Scatter plot

plt.figure(figsize=(5, 3))

plt.scatter(x, y, color='green')

plt.title('Scatter Plot of X vs Y')

plt.xlabel('x')

plt.ylabel('Y')

plt.show()

#Calculate the correlation coefficient

corr_coefficient,_= pearsonr(x, y)

print("Correlation Coefficient between X and Y: {corr_coefficient:.2f}")


OUTPUT:

RESULT:

Thus the normal curves, correlation, and scatter plots, correlation coefficient using python
program was successfully completed.
EXP NO: 5
Date:
Simple Linear Regression

AIM:

To write a python program for Simple Linear Regression

ALGORITHM:

Step 1: Start the Program

Step 2: Import numpy and matplotlib package

Step 3: Define coefficient function

Step 4: Calculate cross-deviation and deviation about x

Step 5: Calculate regression coefficients

Step 6: Plot the Linear regression and define main function

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

np.random.seed(0)

X = np.random.rand(100) * 10

Y = 2.5 * X + np.random.normal(0, 2, 100)

plt.scatter(X, Y, color='blue', alpha=0.7)

plt.xlabel('X')

plt.ylabel('Y')

plt.title('Scatter Plot: X vs Y')

plt.show()
X = X.reshape(-1, 1)

model = LinearRegression()

model.fit(X, Y)

slope = model.coef_[0]

intercept = model.intercept_

print(f"Slope (beta_1): {slope}")

print(f"Intercept (beta_0): {intercept}")

Y_pred = model.predict(X)

plt.scatter(X, Y, color='blue', alpha=0.7, label='Data')

plt.plot(X, Y_pred, color='red', label='Fitted Line')

plt.xlabel('X')

plt.ylabel('Y')

plt.title('Simple Linear Regression: Fitted Line')

plt.legend()

plt.show()

r_squared = model.score(X, Y)

print(f"R-squared: {r_squared}")

X_new = np.array([[15]])

Y_new = model.predict(X_new)

print(f"Predicted Y for X = 15: {Y_new[0]}")


OUTPUT:

Slope (beta_1): 2.487387004280408


Intercept (beta_0): 0.4443021548944568

R-squared: 0.928337996765404
Predicted Y for X = 15: 37.75510721910058

RESULT:

Thus the computation for Simple Linear Regression was successfully completed.
EXP NO: 6
Date:
Z-test

AIM:

To write a python program for Z-test

ALGORITHM:

Step 1: Start the Program

Step 2: Import math package

Step 3: Define Z-test function

Step 4: Calculate Z-test using formula

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import numpy as np

import scipy.stats as stats

mean_1 = 50

mean_2 = 45

std_1 = 10

std_2 = 12

size_1 = 40

size_2 = 35

z_score_two_sample = (mean_1 - mean_2) / np.sqrt((std_1**2 / size_1) + (std_2**2 / size_2))

p_value_two_sample = 2 * (1 - stats.norm.cdf(abs(z_score_two_sample)))

print(f"Z-Score: {z_score_two_sample}")

print(f"P-value: {p_value_two_sample}")
OUTPUT:

Z-Score: 1.9441444452997994
P-value: 0.051878034893831915

RESULT:

Thus the computation for Z-test was successfully completed.


EXP NO: 7
Date:
T-test

AIM:

To write a python program for T-test

ALGORITHM:

Step 1: Start the Program

Step 2: Import math package

Step 3: Define T-test function

Step 4: Calculate T-test using formula

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import scipy.stats as stats

import numpy as np

sample_data = np.array([52, 55, 48, 49, 53, 54, 51, 50, 55, 58, 56, 57, 52, 51, 54, 53, 59, 61, 50,
52, 54, 53, 49, 47, 52, 51, 50, 48, 56, 55])

population_mean = 50

t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)

print(f"T-statistic: {t_stat}")

print(f"P-value: {p_value}")

OUTPUT:

T-statistic: 4.571679054413011
P-value: 8.327654458471987e-05

RESULT:

Thus the computation for T-test was successfully completed.


EXP NO: 8
Date:
ANOVA

AIM:

To write a python program for ANOVA

ALGORITHM:

Step 1: Start the Program

Step 2: Import package

Step 3: Prepare the Data

Step 4: Perform ANOVA

Step 5: Calculate the F-statistic

Step 6: Calculate the P-value

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import scipy.stats as stats

group_1 = np.array([23, 45, 67, 32, 45, 34, 43, 45, 56, 42])

group_2 = np.array([45, 32, 23, 43, 46, 32, 21, 22, 43, 43])

group_3 = np.array([65, 78, 56, 67, 82, 73, 74, 65, 68, 74])

f_stat, p_value = stats.f_oneway(group_1, group_2, group_3)

print(f"F-statistic: {f_stat}")

print(f"P-value: {p_value}")

if p_value < 0.05:

print("There is a significant difference between the group means.")


else:

print("There is no significant difference between the group means.")

OUTPUT:

F-statistic: 32.6259618124822
P-value: 6.255218731829188e-08
There is a significant difference between the group means.

RESULT:

Thus the computation for ANOVA was successfully completed.


EXP NO: 9
Date:
Building and validating linear models

AIM:

To write a python program to building and validating linear models using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import package

Step 3: Prepare the Data

Step 4: Build the Model

Step 5: Evaluate the Model

Step 6: Model Diagnostics

Step 7: Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import pandas as pd

import statsmodels.api as sm

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(0)

X = np.random.rand(100, 1) * 10

y = 2.5 * X.squeeze() + np.random.randn(100) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)

model = sm.OLS(y_train, X_train_sm).fit()

y_pred = model.predict(X_test_sm)

print(model.summary())

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')

print(f'R-squared: {r2}')

OUTPUT:

OLS Regression Results


=====================================================================
=========
Dep. Variable: y R-squared: 0.932
Model: OLS Adj. R-squared: 0.931
Method: Least Squares F-statistic: 1074.
Date: Thu, 19 Dec 2024 Prob (F-statistic): 2.29e-47
Time: 14:52:46 Log-Likelihood: -169.42
No. Observations: 80 AIC: 342.8
Df Residuals: 78 BIC: 347.6
Df Model: 1
Covariance Type: nonrobust
=====================================================================
=========
coef std err t P>|t| [0.025 0.975]

const 0.4127 0.417 0.990 0.325 -0.417 1.242


x1 2.4961 0.076 32.776 0.000 2.344 2.648
=====================================================================
=========
Omnibus: 8.580 Durbin-Watson: 2.053
Prob(Omnibus): 0.014 Jarque-Bera (JB): 3.170
Skew: 0.107 Prob(JB): 0.205
Kurtosis: 2.048 Cond. No. 10.3
=====================================================================
=========

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 3.6710129878857174
R-squared: 0.896480483165161

RESULT:

Thus the computation for building and validating linear models was successfully completed.
EXP NO: 10
Date:
Building and validating logistic models

AIM:

To write a python program to building and validating logistic models using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import python libraries

Step 3: Generate synthetic data

Step 4: Split the data

Step 5: Build the logistic regression model

Step 6: Make predictions and Evaluate the model

Step 7: Print evaluation metrics and Print the result

Step 8: Stop the process

PROGRAM:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

np.random.seed(0)

X = np.random.rand(100, 2)

y = (X[:, 0] + X[:, 1] > 1).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')

print('Confusion Matrix:')

print(conf_matrix)

print('Classification Report:')

print(class_report)

plt.figure(figsize=(10, 6))

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='k', s=100,


label='True Labels')

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, marker='x', cmap='coolwarm', s=100,


label='Predicted Labels')

plt.title('Logistic Regression Predictions')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.legend()

plt.show()

OUTPUT:

Accuracy: 0.9
Confusion Matrix:
[[ 8 2]
[ 0 10]]
Classification Report:
precision recall f1-score support
0 1.00 0.80 0.89 10
1 0.83 1.00 0.91 10

accuracy 0.90 20
macro avg 0.92 0.90 0.90 20
weighted avg 0.92 0.90 0.90 20

RESULT:

Thus the computation for building and validating logistic models was successfully completed.
EXP NO: 11
Date:
Time series analysis

AIM:

To write a python program to time series analysis using jupyter notebook.

ALGORITHM:

Step 1: Start the Program

Step 2: Import python libraries

Step 3: Generate a time series data

Step 4: Create a DataFrame

Step 5: Print the result

Step 6: Stop the process

PROGRAM:

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

date_range = pd.date_range(start='1/1/2020', periods=100)

data = np.random.randn(100).cumsum()

time_series_data = pd.DataFrame(data, index=date_range, columns=['Value'])

plt.figure(figsize=(12, 6))

plt.plot(time_series_data.index, time_series_data['Value'], label='Random Data', color='blue')

plt.title('Time Series Analysis')

plt.xlabel('Date')

plt.ylabel('Value')

plt.legend()

plt.grid()
plt.show()

OUTPUT:

RESULT:

Thus the computation for time series analysis was successfully completed.

You might also like