B.E / B.Tech.
MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
AD3411 - DATA SCIENCE AND ANALYTICS LABORATORY
(Regulations 2021)
Time : 3 Hours Answer any one Question Max. Marks 100
AIM & OUTPUT &
PROGRAM VIVA RECORD TOTAL
PROCEDURE RESULTS
10
20 30 30 10 100
1. i. Using the Iris dataset, perform the following:
(a) Compute frequency distributions and descriptive statistics (mean, median, std) for all
numerical features.
(b) Plot scatter plots for SepalLength vs PetalLength and compute their correlation coefficient.
(c) Perform linear regression to predict PetalLength based on SepalLength and
visualize the regression line.
ii. Using the Iris dataset, perform T-test
(a) Compare sepal length between Setosa and Versicolor species using an independent t-test.
(b) Use a paired t-test by adding random noise to petal length and checking the significance
of change.
(c) Test if petal width differs significantly between Setosa and Virginica.
2. i. On the Pima Indians Diabetes dataset:
(a) Generate frequency distributions for Outcome and calculate mean/variance for
BMI and Glucose.
(b) Create scatter plots for Glucose vs Insulin and find their Pearson correlation.
(c) Build a regression model to predict BMI using Glucose and Insulin.
ii. Using Penguins Dataset (seaborn), Perform T-test.
(a) Test if the mean body mass differs significantly between male and female penguins.
(b) Compare flipper length between Adelie and Gentoo species using an independent t-test.
(c) Simulate a paired test by adjusting the flipper length (e.g., subtract 5mm) and run a paired t-
test.
3. i. Using the Wine dataset from UCI:
(a) Display frequency and variability measures for class labels and Alcohol content.
(b) Plot a correlation matrix and scatter plot between Alcohol and Malic acid.
(c) Use regression to model Alcohol as a function of Color intensity and Proline.
B.E / B.Tech. MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
ii. Education Dataset (Simulated) – T-Test Questions
(a) A class of students takes a math test before and after a workshop. Use a paired t-test to
check if the workshop helped.
(b) Compare exam scores of students from urban vs rural backgrounds using an independent
t-test.
(c) Simulate GPA data and check if students with part-time jobs have significantly different
GPAs.
4. i. Using the Titanic dataset:
(a) Compute average, range, and standard deviation of Age grouped by gender.
(b) Plot scatter plots between Fare and Age with correlation coefficients.
(c) Fit a logistic regression model to predict Survived using Age and Fare.
ii. Using the Titanic dataset:
(a) Use an independent t-test to check whether the mean Fare differs significantly between
male and female passengers.
(b) Use a paired t-test by simulating a 10% discount on fares. Test if this change is
statistically significant.
(c) Compare the average age of passengers who survived and those who did not using an
independent t-test.
5. i. Using a weather dataset (weather.csv):
(a) Generate frequency distribution of daily temperatures and calculate central tendency
measures.
(b) Plot temperature vs humidity with correlation coefficient.
(c) Build and evaluate a linear regression model to predict temperature from humidity.
ii. Perform ANOVA using weather dataset,
(a) Use one-way ANOVA to test if the average temperature differs significantly among
different cities.
(b) Test if the humidity levels vary significantly by weather type (e.g., Sunny, Rainy,
Cloudy).
(c) Create a boxplot showing wind speed variation across different months.
6. i. Using the Pima Indians Diabetes dataset:
(a) Display descriptive statistics for Insulin and BloodPressure.
(b) Create a correlation heatmap for all numeric variables.
(c) Perform multiple linear regression to predict BloodPressure using Age and Glucose.
B.E / B.Tech. MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
ii. Perform ANOVA using Pima Indian Diabetes dataset,
(a) Use ANOVA to test if the mean glucose level differs significantly across groups of
pregnancies (e.g., 0–1, 2–3, 4+).
(b) Test if BMI varies significantly with diabetes outcome (0 = No diabetes, 1 = Diabetes).
(c) Create a bar plot of mean Age across different Pregnancies groups with error bars.
7. i. Using the Iris dataset:
(a) Calculate frequency and variation (standard deviation) for PetalWidth per species.
(b) Plot SepalLength vs SepalWidth and compute their correlation.
(c) Fit a regression model to predict SepalLength using SepalWidth and PetalWidth.
ii. Use Education Dataset (Simulated). Perform the following,
(a) Test if students from different teaching methods (traditional, online, hybrid) score
differently on a standardized test.
(b) Use one-way ANOVA to analyze GPA differences across three departments (Science,
Arts, Commerce).
(c) Visualize score differences using a bar plot with error bars.
8. i. Using the Wine dataset:
(a) Summarize statistics for Flavanoids and compute variability.
(b) Plot scatter and correlation matrix for Flavanoids vs Color intensity.
(c) Perform polynomial regression to predict Flavanoids from Color intensity.
ii. Using Penguins Dataset (seaborn). Perform the following,
(a) Use one-way ANOVA to test whether body mass differs significantly across species
(species).
(b) Use ANOVA to test if flipper length varies significantly by island (island).
(c) Create boxplots to show the body mass distribution by species.
9. i. Using Titanic dataset:
(a) Generate frequency distributions for Pclass and compute age variability by class.
(b) Plot correlation matrix for Age, Fare, and SibSp.
(c) Run regression to estimate Fare using Age and Pclass.
ii. Use Tips Dataset (seaborn) .Build and validate Linear Model,
(a) Build a linear model to predict tip using total_bill, size, and smoker.
(b) Convert day and time to dummy variables and include them in the model.
(c) Use seaborn’s lmplot() to visualize the relationship between tip and total_bill.
B.E / B.Tech. MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
10. i. Using a custom Sales dataset:
(a) Calculate daily sales frequency, average, and standard deviation.
(b) Create scatter plot of Units Sold vs Revenue and compute correlation.
(c) Build linear regression model to predict Revenue based on Units Sold.
ii. Using Wine dataset. Build and Validate Linear model,
(a) Build a linear model to predict alcohol content using malic_acid, ash, and proline.
(b) Use feature scaling and re-fit the model to check for changes in coefficients.
(c) Evaluate model performance using adjusted R² and residual analysis.
11. i. On the Diabetes dataset:
(a) Show frequency distribution of BMI ranges and compute variance.
(b) Visualize Glucose vs Age scatter plot and compute correlation.
(c) Build regression model to predict Glucose using Age and BMI.
ii. Using Iris dataset. Build and validate linear model.
(a) Build a linear regression model to predict sepal_length using petal_length and petal_width.
(b) Evaluate the model’s accuracy and plot the actual vs predicted values.
(c) Use species as a categorical variable (one-hot encode it) and include it in the model.
12. i. Using Iris dataset:
(a) Calculate frequency tables for PetalLength intervals and variability.
(b) Plot scatter plot between PetalLength and SepalWidth and compute Pearson correlation.
(c) Fit and visualize a linear regression model to predict PetalLength using SepalWidth.
ii. PIMA Indians Diabetes Dataset. Build and validate Linear Model,
(a) Build a linear regression model to predict Glucose using Age, BMI, and Pregnancies.
(b) Evaluate the model using R², MAE, and residual plots.
(c) Add interaction terms (e.g., Age × BMI) and evaluate if they improve the model.
13. i. Using Wine dataset:
(a) Display frequency of wine classes and variability in Ash content.
(b) Create correlation plots between Ash and Magnesium.
(c) Use multiple regression to model Ash using Alcohol and Magnesium.
ii. Use PIMA Indians Diabetes Dataset. Build and Validate Logistic Model
(a) Build a logistic regression model to predict Outcome using Glucose, BMI, and Age.
(b) Evaluate the model using accuracy, confusion matrix, and ROC-AUC score.
B.E / B.Tech. MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
(c) Use statsmodels to view odds ratios and p-values of coefficients.
14. i. Titanic dataset tasks:
(a) Summarize statistics (mean, median, std) for Fare by Embarked port.
(b) Plot scatter between Age and SibSp, and find correlation.
(c) Predict Age using Pclass and SibSp via regression.
ii. Use Titanic Dataset. Build and validate Logistic Model
(a) Build a logistic regression model to predict Survived using Pclass, Sex, and Fare.
(b) Convert Sex and Pclass to dummy variables before training the model.
(c) Evaluate the model using confusion matrix and classification report.
15. i. Weather dataset:
(a) Generate frequency of temperature ranges and calculate mean/std.
(b) Plot temperature vs wind speed scatter with regression line.
(c) Predict wind speed based on temperature using regression.
ii. Use Tips Dataset (seaborn). Build and validate Logistic Model,
(a) Create a new binary column HighTip based on whether tip > 3.
(b) Build a logistic regression model to predict HighTip using total_bill, size, and smoker.
(c) Assess the model using precision, recall, and ROC-AUC.
16. i. Using Diabetes dataset:
(a) Calculate frequency of high glucose values and descriptive stats.
(b) Scatter plot and correlation between Age and SkinThickness.
(c) Fit regression model: SkinThickness ~ Age + BMI.
ii. Use Heart Disease Dataset (UCI). Build and validate Logistic Model
(a) Build a logistic regression model using age, chol, thalach, and cp (chest pain type).
(b) Encode categorical variables and scale features appropriately.
(c) Evaluate the model using ROC curve, precision, recall, and accuracy.
17. i. With Iris dataset:
(a) Generate descriptive stats for SepalLength per species.
(b) Scatter plot and correlation coefficient for PetalWidth and SepalWidth.
(c) Perform regression: SepalWidth ~ PetalWidth + PetalLength.
ii. Use Weather Dataset .Perform Time Series Analysis,
(a) Convert the Date column to datetime format and set it as the index. Resample to monthly
average Temperature.
B.E / B.Tech. MODEL PRACTICAL END SEMESTER EXAMINATIONS, APRIL 2025
Fourth Semester
(b) Decompose the time series using seasonal decomposition to identify trend, seasonality,
and residuals.
(c) Fit an ARIMA or SARIMA model to forecast future temperatures.
18. i. Using Wine dataset:
(a) Calculate mean, mode, std of Hue by wine class.
(b) Correlation and scatter plot for Hue and Proline.
(c) Linear regression: Hue ~ Alcohol + Proline.
ii. Use AirPassengers Dataset. Perform Time Series Analysis,
(a) Load and visualize the time series of monthly passenger counts.
(b) Perform seasonal decomposition to analyze trend and seasonality.
(c) Apply log transformation and fit an ARIMA model to forecast future values.
19. i. Titanic dataset analysis:
(a) Frequency and variability of Fare by Sex.
(b) Correlation between Age and Fare using seaborn.
(c) Regression model: Fare ~ Age + Pclass + Sex.
ii. Using Stock Market Dataset – Time Series Analysis
(a) Select one stock (e.g., 'AAPL') and plot the closing price over time.
(b) Calculate and plot the 30-day moving average of the closing price.
(c) Fit an ARIMA model to forecast future Close prices.
20. i. Using Pima Indians Diabetes:
(a) Frequency of Age groups and descriptive stats.
(b) Scatter plots and correlation of Glucose vs BMI.
(c) Fit regression model: Outcome ~ BMI + Glucose.
ii. With Electricity Consumption Dataset. Perform Time Series Analysis
(a) Load the dataset and convert Date to datetime index.
(b) Resample data to weekly totals and visualize.
(c) Use ETS (Exponential Smoothing) to forecast consumption.