B.
Tech- Artificial Intelligence and Data Science
AD3411- DATA SCIENCE AND ANALYTICS LABORATORY
II Year/IV Semester
LAB MANUAL
EXP NO: 1 Working with Pandas data frames
Date:
AIM: To work with Pandas data frames.
ALGORITHM:
Step1: Start
Step2: import pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
df = pd.DataFrame(data)
print(df.head())
filtered_df = df[df['Age'] > 30]
print(filtered_df)
df['Senior'] = df['Age'] > 30
print(df)
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)
OUTPUT:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
Name Age City
2 Charlie 35 Chicago
Name Age City Senior
0 Alice 25 New York False
1 Bob 30 Los Angeles False
2 Charlie 35 Chicago True
City
Chicago 35.0
Los Angeles 30.0
New York 25.0
Name: Age, dtype: float64
RESULT:
Thus the working with Pandas data frames was successfully completed.
EXP NO: 2 Basic plots using Matplotlib
Date:
AIM:
To draw basic plots in Python program using Matplotlib.
ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
PROGRAM:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y, color='green', linestyle='--', marker='o', markersize=10)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Line Plot')
plt.show()
OUTPUT:
RESULT:
Thus the basic plots using Matplotlib in Python program was successfully completed.
EXP NO: Frequency distributions, Averages, variability
3A
Date:
AIM:
To write a python program to find the frequency distribution, Averages,
variability in jupyter notebook.
ALGORITHM:
Step 1: Start the Program
Step 2: Import the python library modules
Step 3: Write the code to the frequency distributions
Step 4: Print the result
Step 5: Stop the program
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#sample data
data=[12, 15, 12, 19, 15, 13, 14, 12, 16, 17, 15, 18, 19, 20, 16, 15]
#1. frequency distribution using pandas for convenience
frequency_distribution = pd.Series(data).value_counts().sort_index()
# 2. average(mean)
mean=np.mean(data)
#3. variability(variance and standard deviation)
variance=np.var(data)
std_deviation=np.std(data)
#printing results
print("frequency distribution:")
print(frequency_distribution)
print("\nAverage(mean):",mean)
print("Variance:",variance)
print("Standard deviation:", std_deviation)
#4. plotting the frequency distribution as a bar chart
plt.figure(figsize=(5,3))
frequency_distribution.plot(kind='bar',color='skyblue')
plt.title("frequency distribution")
plt.xlabel('value')
plt.ylabel('frequency')
plt.xticks(rotation=0)
plt.show()
OUTPUT:
Frequency distribution:
12 3
13 1
14 1
15 4
16 2
17 1
18 1
19 2
20 1
Name: count, dtype: int64
Average (mean): 15.5
Variance: 6.25
Standard deviation: 2.5
RESULT:
Thus the computation of frequency distribution, Averages, variability was successfully completed.
EXP NO: Normal curves, Correlation and scatter plots, Correlation
4A
Date:
coefficient
AIM:
To create a normal curves, correlation and scatter plots, correlation coefficient using python
program
ALGORITHM:
Step 1: Start the program
Step 2: Import packages numpy and matplotlib
Step 3: Create the distribution
Step 4: Visualizing the distribution
Step 5: Stop the program
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=1000) # Normal distribution with mean=8, std=1
#Plot the normal curve
plt.figure(figsize=(5, 3))
sns. histplot(data, kde= True, color='blue', stat='density', linewidth=0)
plt.title('Normal Distribution Curve')
plt.xlabel('x')
plt.ylabel('Density')
plt.show()
#2. Scatter Plot and Correlation Coefficient
# Generating two sets of data that have a linear relationship
x= np.random.rand(100)* 100 # Random data for X
y = 2*x + 5 + np.random.randn(100)* 10 # Linear relationship with some noise
# Scatter plot
plt.figure(figsize=(5, 3))
plt.scatter(x, y, color='green')
plt.title('Scatter Plot of X vs Y')
plt.xlabel('x')
plt.ylabel('Y')
plt.show()
#Calculate the correlation coefficient
corr_coefficient,_= pearsonr(x, y)
print("Correlation Coefficient between X and Y: {corr_coefficient:.2f}")
OUTPUT:
RESULT:
Thus the normal curves, correlation, and scatter plots, correlation coefficient using python
program was successfully completed.
EXP NO: 5
Date:
Simple Linear Regression
AIM:
To write a python program for Simple Linear Regression
ALGORITHM:
Step 1: Start the Program
Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
np.random.seed(0)
X = np.random.rand(100) * 10
Y = 2.5 * X + np.random.normal(0, 2, 100)
plt.scatter(X, Y, color='blue', alpha=0.7)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot: X vs Y')
plt.show()
X = X.reshape(-1, 1)
model = LinearRegression()
model.fit(X, Y)
slope = model.coef_[0]
intercept = model.intercept_
print(f"Slope (beta_1): {slope}")
print(f"Intercept (beta_0): {intercept}")
Y_pred = model.predict(X)
plt.scatter(X, Y, color='blue', alpha=0.7, label='Data')
plt.plot(X, Y_pred, color='red', label='Fitted Line')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Simple Linear Regression: Fitted Line')
plt.legend()
plt.show()
r_squared = model.score(X, Y)
print(f"R-squared: {r_squared}")
X_new = np.array([[15]])
Y_new = model.predict(X_new)
print(f"Predicted Y for X = 15: {Y_new[0]}")
OUTPUT:
Slope (beta_1): 2.487387004280408
Intercept (beta_0): 0.4443021548944568
R-squared: 0.928337996765404
Predicted Y for X = 15: 37.75510721910058
RESULT:
Thus the computation for Simple Linear Regression was successfully completed.
EXP NO: 6
Date:
Z-test
AIM:
To write a python program for Z-test
ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define Z-test function
Step 4: Calculate Z-test using formula
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import numpy as np
import scipy.stats as stats
mean_1 = 50
mean_2 = 45
std_1 = 10
std_2 = 12
size_1 = 40
size_2 = 35
z_score_two_sample = (mean_1 - mean_2) / np.sqrt((std_1**2 / size_1) + (std_2**2 / size_2))
p_value_two_sample = 2 * (1 - stats.norm.cdf(abs(z_score_two_sample)))
print(f"Z-Score: {z_score_two_sample}")
print(f"P-value: {p_value_two_sample}")
OUTPUT:
Z-Score: 1.9441444452997994
P-value: 0.051878034893831915
RESULT:
Thus the computation for Z-test was successfully completed.
EXP NO: 7
Date:
T-test
AIM:
To write a python program for T-test
ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define T-test function
Step 4: Calculate T-test using formula
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import scipy.stats as stats
import numpy as np
sample_data = np.array([52, 55, 48, 49, 53, 54, 51, 50, 55, 58, 56, 57, 52, 51, 54, 53, 59, 61, 50,
52, 54, 53, 49, 47, 52, 51, 50, 48, 56, 55])
population_mean = 50
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")
OUTPUT:
T-statistic: 4.571679054413011
P-value: 8.327654458471987e-05
RESULT:
Thus the computation for T-test was successfully completed.
EXP NO: 8
Date:
ANOVA
AIM:
To write a python program for ANOVA
ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Perform ANOVA
Step 5: Calculate the F-statistic
Step 6: Calculate the P-value
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import scipy.stats as stats
group_1 = np.array([23, 45, 67, 32, 45, 34, 43, 45, 56, 42])
group_2 = np.array([45, 32, 23, 43, 46, 32, 21, 22, 43, 43])
group_3 = np.array([65, 78, 56, 67, 82, 73, 74, 65, 68, 74])
f_stat, p_value = stats.f_oneway(group_1, group_2, group_3)
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")
if p_value < 0.05:
print("There is a significant difference between the group means.")
else:
print("There is no significant difference between the group means.")
OUTPUT:
F-statistic: 32.6259618124822
P-value: 6.255218731829188e-08
There is a significant difference between the group means.
RESULT:
Thus the computation for ANOVA was successfully completed.
EXP NO: 9
Date:
Building and validating linear models
AIM:
To write a python program to building and validating linear models using jupyter notebook.
ALGORITHM:
Step 1: Start the Program
Step 2: Import package
Step 3: Prepare the Data
Step 4: Build the Model
Step 5: Evaluate the Model
Step 6: Model Diagnostics
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
np.random.seed(0)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.squeeze() + np.random.randn(100) * 2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)
model = sm.OLS(y_train, X_train_sm).fit()
y_pred = model.predict(X_test_sm)
print(model.summary())
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
OUTPUT:
OLS Regression Results
=====================================================================
=========
Dep. Variable: y R-squared: 0.932
Model: OLS Adj. R-squared: 0.931
Method: Least Squares F-statistic: 1074.
Date: Thu, 19 Dec 2024 Prob (F-statistic): 2.29e-47
Time: 14:52:46 Log-Likelihood: -169.42
No. Observations: 80 AIC: 342.8
Df Residuals: 78 BIC: 347.6
Df Model: 1
Covariance Type: nonrobust
=====================================================================
=========
coef std err t P>|t| [0.025 0.975]
const 0.4127 0.417 0.990 0.325 -0.417 1.242
x1 2.4961 0.076 32.776 0.000 2.344 2.648
=====================================================================
=========
Omnibus: 8.580 Durbin-Watson: 2.053
Prob(Omnibus): 0.014 Jarque-Bera (JB): 3.170
Skew: 0.107 Prob(JB): 0.205
Kurtosis: 2.048 Cond. No. 10.3
=====================================================================
=========
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 3.6710129878857174
R-squared: 0.896480483165161
RESULT:
Thus the computation for building and validating linear models was successfully completed.
EXP NO: 10
Date:
Building and validating logistic models
AIM:
To write a python program to building and validating logistic models using jupyter notebook.
ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate synthetic data
Step 4: Split the data
Step 5: Build the logistic regression model
Step 6: Make predictions and Evaluate the model
Step 7: Print evaluation metrics and Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
np.random.seed(0)
X = np.random.rand(100, 2)
y = (X[:, 0] + X[:, 1] > 1).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)
plt.figure(figsize=(10, 6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='k', s=100,
label='True Labels')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, marker='x', cmap='coolwarm', s=100,
label='Predicted Labels')
plt.title('Logistic Regression Predictions')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
OUTPUT:
Accuracy: 0.9
Confusion Matrix:
[[ 8 2]
[ 0 10]]
Classification Report:
precision recall f1-score support
0 1.00 0.80 0.89 10
1 0.83 1.00 0.91 10
accuracy 0.90 20
macro avg 0.92 0.90 0.90 20
weighted avg 0.92 0.90 0.90 20
RESULT:
Thus the computation for building and validating logistic models was successfully completed.
EXP NO: 11
Date:
Time series analysis
AIM:
To write a python program to time series analysis using jupyter notebook.
ALGORITHM:
Step 1: Start the Program
Step 2: Import python libraries
Step 3: Generate a time series data
Step 4: Create a DataFrame
Step 5: Print the result
Step 6: Stop the process
PROGRAM:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
date_range = pd.date_range(start='1/1/2020', periods=100)
data = np.random.randn(100).cumsum()
time_series_data = pd.DataFrame(data, index=date_range, columns=['Value'])
plt.figure(figsize=(12, 6))
plt.plot(time_series_data.index, time_series_data['Value'], label='Random Data', color='blue')
plt.title('Time Series Analysis')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid()
plt.show()
OUTPUT:
RESULT:
Thus the computation for time series analysis was successfully completed.