Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views11 pages

Data Mining Regression and Classification

This homework assignment involves analyzing a 'Restaurant Orders' dataset using regression techniques, divided into three parts: single-feature linear regression, multiple-feature linear regression, and polynomial regression. Students are required to implement models in PyTorch, visualize results, and provide detailed analyses of their findings. Additionally, there are sections on binary and multi-class classification, emphasizing the importance of clear organization, correctness, and insightful interpretation in the final report.

Uploaded by

xexudragon11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Data Mining Regression and Classification

This homework assignment involves analyzing a 'Restaurant Orders' dataset using regression techniques, divided into three parts: single-feature linear regression, multiple-feature linear regression, and polynomial regression. Students are required to implement models in PyTorch, visualize results, and provide detailed analyses of their findings. Additionally, there are sections on binary and multi-class classification, emphasizing the importance of clear organization, correctness, and insightful interpretation in the final report.

Uploaded by

xexudragon11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Homework: Description

Overview
In this assignment, you will revisit the pre-processed “Restaurant Orders” dataset from
Homework #1 and apply regression techniques to uncover relationships between different
variables (features) in the dataset. The assignment is divided into three main parts:
1. Single-Feature Linear Regression

2. Multiple Linear Regression (3 Features)

3. Polynomial Regression
TableNumber WaiterID OrderDateTime ItemsOrdered NumberOfGuests BillAmount PaymentMethod DiscountUsed WaitTime Tip CustomerSatisfaction

5 W005 2025-01-06 13:38 [Water,Soup,Pizza] 3 41.34 CreditCard Yes 14 2.94 Neutral

7 W005 2025-01-05 03:03 [Fries,Soup] 1 44.18 PayPal No 18 5.49 Satisfied

4 W003 2025-01-02 01:12 [Bread,Beer,Soup,Coke] 5 60.16 Cash Yes 49 4.72 Neutral

5 W007 2025-01-08 22:41 [Salad,Fries] 1 40.75 Cash Yes 42 6.4 Satisfied

2 W003 2025-01-01 18:21 [Soup,Burger,Pasta,Bread] 1 71.96 Cash No 58 10.14 Satisfied

2 W006 2025-01-06 21:43 [Burger,Salad] 4 33.28 PayPal Yes 15 0 Unsatisfied

10 W003 2025-01-05 16:15 [Juice,Pizza,Pasta,Water,Salad] 2 92.57 PayPal Yes 29 14.35 Satisfied

2 W005 2025-01-03 22:05 [Wine] 4 35.06 PayPal Yes 29 5.49 Satisfied


Homework: Description
1. Single-Feature Linear Regression
Objective

• Select one dependent variable (output) and one independent variable (feature) from the
restaurant dataset.

• Train a simple linear regression model to predict the output from the single chosen feature.

Deliverables

• A plot with data points and the regression line.

• A short write-up explaining your training procedure, final parameters, final loss, and
observations.
Homework: Description
1. Single-Feature Linear Regression
Tasks/Steps
Data Selection:
Justify which single feature and which output you chose.
Implementation in PyTorch:
Initialize model parameters.
Forward pass and loss function (MSE).
Optimization algorithm (gradient descent).
Visualization:
Plot the data points (scatter plot).
Plot the best-fit line learned by your model on the same figure.
Analysis:
Summarize the training process.
Discuss any difficulties or anomalies you observed when fitting the line.
Interpret how well the linear model fits the data visually and numerically (final loss, etc.).
Homework: Description
2. Multiple-Feature Linear Regression
Objective

• Select 3 features and 1 output from the restaurant dataset (may be the same output
variable as in Part 1 or a different one).

• Train a simple linear regression model to predict the output from the chosen features.

Deliverables

• If it is an achievale task, the deliverables should be similar with the previous Single-Feature
Linear Regression; follow the tasks in the next page.

• If you think it is not an achievale task, prvide the analysis to show the reason; just ignore
the tasks in the next page.
Homework: Description
2. Single-Feature Linear Regression
Tasks/Steps
Data Selection:
Justify which three features and which output you chose; Provide a brief rationale.
Implementation in PyTorch:
Build a multi-feature linear model.
Train and optimize (MSE; gradient descent).
Results:
Show the final loss (training error; MSE).
If the model successfully converges, report your final set of learned weights and bias; if the
model fails to converge or you encounter difficulties, analyze and explain potential reasons.
Interpretation:
Discuss whether the multi-feature regression model appears to be a better fit than a single-
feature model.
Reflect on any new challenges that arose when using multiple features.
Homework: Description
3. Polynomial Regression
Objective

• Using the same 3 features and 1 output from Part 2, implement polynomial regression of at
least three different polynomial degrees (e.g., degree=2, degree=4, degree=6).

• Train a polynomial regression model to predict the output from the chosen features.

Deliverables

• A summary table or short discussion comparing performance for each chosen polynomial
degree.

• Plots or numeric results illustrating how well each polynomial model fits.
• A reflection on potential risks of higher-degree polynomials (e.g., overfitting).
Homework: Description
3. Single-Feature Linear Regression
Tasks/Steps
Feature Transformation:
Explain how you generated polynomial terms (e.g., by manually expanding each feature or
using a PyTorch mechanism for polynomial features).
Decide how you handle interactions (only single-feature powers vs. cross-terms).
Training & Model Comparison:
Train a polynomial regression model for each degree (≥ 3 degrees).
Compare the training losses across different degrees.
Analysis:
Discuss any overfitting or underfitting you observe.
Identify which polynomial degree produced the most favorable result based on loss or
other metrics.
Provide any insights into runtime or complexity differences.
Homework: Description
4. Binary Classification
Tasks/Steps
Choose a Binary Label:
Construct a binary classification label from the dataset (e.g., “Satisfied” vs. “Unsatisfied”
or build any other appropriate yes/no outcome, e.g., “Satisfied” vs. “Not Satisfied
(Unsatisfied + Neutral)”).
Implement Logistic Regression in PyTorch:
Loss function: Binary Cross-Entropy (BCE).
Data Splitting & Preprocessing:
Clearly split your data into training and testing (or validation) sets.
Model Training & Evaluation:
Train on the training set for a certain number of epochs or until convergence.
Report & Visualization
Summarize final training loss, test performance metrics, and any interesting findings.
(Optional) Provide a decision boundary plot if feasible (for a single or two-feature
scenario), or a confusion matrix heatmap to illustrate predictions vs. ground truth.
Homework: Description
5. Multiple-classes Classification
Tasks/Steps
Label Selection:
Identify a multi-class label from your dataset (≥ 3 classes); if your data does not inherently have three
or more distinct classes, you can derive one.
Strategy:
Implement one-vs-all (OvA) and one-vs-one (OvO) logistic regression.
OvA: Train a separate logistic regression classifier for each class vs. “all others.”
OvO: Train pairwise classifiers for each possible pair of classes.
Implementation Details:
In any case, the core idea is to manually handle the multi-class scenario within PyTorch, rather than
relying on built-in high-level methods.
Training & Evaluation
Train each classifier on the training data.
On the test set, produce predictions by combining the outputs of your sub-classifiers (OvA and OvO
logic).
Analysis & Discussion
How does OvA compare to OvO (in terms of code complexity, training time, or performance)?
Homework: Description
Implementation Requirements
1. PyTorch Only
• You must implement the regression logic (forward pass, gradient updates, etc.) in
PyTorch.
• Do not use scikit-learn or other high-level ML libraries to handle the model training.
2. Data Handling
• You may use pandas or plain Python to load the dataset from CSV or other formats.
• Feel free to do any necessary feature engineering or transformations to handle missing
values, scaling, etc.
3. Plots & Visualization
• matplotlib or seaborn is recommended for plotting.
• Clearly label axes, legend, and titles for each figure.
4. Written Report
• Provide your observations, interpretations, and analysis for each part.
• Discuss any difficulties or additional experiments you performed.
Homework: Description
Report & Grading
1. Organization (20%)
• Is your submission clearly structured? Are code, plots, and analysis sections logically
presented?
2. Correctness & Implementation (40%)
• Proper usage of PyTorch for linear and polynomial regression.
• Evidence of correct gradient-based training for each part.
3. Analysis & Interpretation (40%)
• Clarity in explaining results, including final losses, potential reasons for success or
failure.
• Depth of insight into overfitting, data distribution, or hyperparameter choices.
4. Extra Credit / Deep Thinking (up to +10%)
• If your report is well-organized, provides deeper insights or additional experiments
(e.g., trying different regularization, comparing different subsets of features,
exploring other polynomial expansions), you may receive extra points.

You might also like