0% found this document useful (0 votes)

21 views6 pages

Report

The document presents a vehicle price analysis using multiple linear regression, revealing that newer vehicles, those closer to urban centers, and those with more nearby dealerships tend to have higher sale prices, while older vehicles depreciate in value. The model is statistically sound with no evidence of heteroskedasticity or multicollinearity, justifying the use of a linear approach. Additionally, a K-NN classification of iris species demonstrates effective classification with odd K values, achieving the highest accuracy with K=9, while emphasizing the importance of parameter selection and model validation.

Uploaded by

lorenlorenzo667

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

Report

Uploaded by

lorenlorenzo667

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Section A – Question 1: Vehicle Price Analysis

Introduction:
Understanding the factors that influence vehicle sale prices is crucial for both buyers and
sellers in the automotive market. By applying statistical modeling techniques, we can
quantify the impact of various features on sale price, identify key drivers, and make
informed predictions. In this section, we use multiple linear regression to analyze a dataset
of vehicle sales, focusing on how sale date, model age, proximity to urban centres, and the
number of dealerships nearby affect the final sale price. The analysis also includes
diagnostic checks for model validity and an exploration of potential nonlinear effects.

1.1 Linear Regression Model and Interpretation

(a) Model Building:
A multiple linear regression model was constructed to predict vehicle sale price using the
following predictors: sale date, model age, proximity to urban centres, and number of
dealerships nearby. The regression equation is:
Vehicle Sale Price=−17 , 858.70+8.90 ×Sale Date − 0.41× Model Age− 0.0087 × Proximity+1.98 × Dealerships
This equation allows us to estimate the expected sale price for any vehicle in the dataset,
given its characteristics. Each coefficient represents the average change in sale price
associated with a one-unit increase in the corresponding predictor, holding all other
variables constant.
(b) Interpretation:
• Model Fit: The model’s R2 is 0.558, meaning about 56% of the variance in vehicle
sale price is explained by the predictors. The F-statistic (137.06, p<0.001 ) indicates
the model is highly significant, suggesting that the predictors collectively provide
substantial explanatory power. This level of R2 is typical for real-world economic
data, where many unmeasured factors can influence price.

• Predictor Effects:

– Sale Date: Each unit increase in sale date (i.e., newer sales) increases price
by 8.90 units, holding other factors constant ( p<0.001 ). This likely reflects
inflation or market trends over time, where newer sales tend to fetch higher
prices.

– Model Age: Each additional year of model age reduces price by 0.41 units (
p<0.001), reflecting the well-known effect of depreciation. Older vehicles are
generally less valuable due to wear and technological obsolescence.

– Proximity to Urban Centres: Each unit increase in proximity (further from

urban centres) reduces price by 0.0087 units ( p<0.001), suggesting vehicles
located closer to cities are more desirable, possibly due to better access to
services and higher demand.

– Number of Dealerships Nearby: Each additional dealership nearby

increases price by 1.98 units ( p<0.001 ), possibly due to increased
competition, better service options, or greater buyer confidence in areas with
more dealerships.

• Intercept: The intercept is not directly interpretable in this context, as it represents

the expected price when all predictors are zero, which is not realistic for actual
vehicles.

Regression Statistics:

Statistic Value
Multiple R 0.7471
R Square 0.5581
Adjusted R Square 0.5541
Standard Error 14.0989
Observations 439

These statistics confirm that the model fits the data reasonably well, with a moderate
standard error and a large sample size, which increases the reliability of the estimates.
ANOVA Table:

Source df SS MS F Significan
ce F
Regressio 4 108,978.0 27,244.50 137.0584 1.29 ×10
− 75

n 1
Residual 434 86,270.63 198.78
Total 438 195,248.6
4

The ANOVA table shows that the regression model explains a significant portion of the total
variance in sale price, with a very small significance value indicating strong evidence
against the null hypothesis of no relationship.
Coefficients Table:

Variabl Coeff. Std. Err. t Stat P-value Lower Upper

e 95% 95%
Intercep - 4,770.62 -3.74 0.00021 - -
t 17,858.7 27,235.1 8,482.30
0 0
sale date 8.90 2.37 3.76 0.00020 4.25 13.56
Model -0.41 0.06 -6.98 1.12 ×10− 11 -0.53 -0.30
age
proximit -0.0087 0.0007 -12.59 3.50 ×10− 31 -0.0101 -0.0073
y
number 1.98 0.29 6.86 2.34 × 10−11 1.42 2.55
of
dealersh
ips

Scatter plots show clear relationships between predictors and price, especially for model age
(negative) and proximity (negative). These visualizations help confirm the linear
relationships assumed by the model and provide intuitive support for the statistical findings.

1.2 Heteroskedasticity and Multicollinearity

(a) Concepts:
• Heteroskedasticity refers to non-constant variance of residuals across levels of the
predicted value, which can lead to inefficient estimates and invalid significance
tests. In regression analysis, we assume that the spread of residuals is roughly the
same for all fitted values; violations of this assumption can undermine the reliability
of confidence intervals and hypothesis tests.

• Multicollinearity occurs when predictors are highly correlated with each other,
inflating standard errors and making it difficult to assess individual predictor
effects. Severe multicollinearity can make coefficient estimates unstable and
sensitive to small changes in the data.

(b) Evidence from Results:

• Heteroskedasticity: The residuals vs. fitted values plot does not display a clear
pattern or funnel shape, suggesting residual variance is roughly constant. This
indicates no strong evidence of heteroskedasticity, and the model’s standard errors
and significance tests are likely valid. If heteroskedasticity were present, we might
see a fan or cone shape, with residuals spreading out as fitted values increase.

• Multicollinearity: The Variance Inflation Factors (VIFs) for all predictors are low:

Predictor VIF
sale date 1.00
Model age 1.01
proximity 1.59
number of dealerships 1.59
VIFs below 5 (or even 2) indicate negligible multicollinearity. Thus, the model does
not suffer from this issue, and the estimated effects of each predictor can be
interpreted with confidence.
(c) Remedies (if needed):
• Heteroskedasticity: Use robust standard errors, transform the dependent variable
(e.g., log transformation), or apply weighted least squares. These approaches help
correct for non-constant variance and yield more reliable inference.

• Multicollinearity: Remove or combine correlated predictors, use principal

component analysis, or apply regularization techniques (e.g., ridge regression). In
this case, such remedies are unnecessary due to the low VIFs.

1.3 Nonlinear Model Evaluation

A quadratic (nonlinear) model was fitted for model age to check for improvement over the
linear model. The quadratic regression coefficients were small, and visual inspection of the
scatter plot and quadratic fit did not show a substantial improvement in fit or pattern over
the linear model. The R2 of the linear model is already moderate (0.56), and the residuals
do not show a clear nonlinear pattern. This suggests that the relationship between model
age and price is adequately captured by a linear term, and adding complexity does not yield
meaningful gains.
Conclusion: The linear model is appropriate for this data. There is no strong evidence that
a nonlinear model would provide a significantly better fit. The analysis demonstrates the
value of checking for nonlinearity, but also the importance of parsimony in model selection.

Summary
The regression analysis reveals that newer vehicles, those closer to urban centres, and
those with more dealerships nearby tend to have higher sale prices, while older vehicles
are less valuable. The model is statistically sound, with no evidence of heteroskedasticity or
multicollinearity, and a linear approach is justified for this dataset. These findings can
inform pricing strategies and highlight the key factors that buyers and sellers should
consider in the vehicle market.

Section A – Question 2: Iris Classification with K-NN

Introduction:
Classification is a fundamental task in data science, with applications ranging from medical
diagnosis to species identification. The K-Nearest Neighbour (K-NN) algorithm is a simple
yet powerful non-parametric method for classifying observations based on their similarity
to known examples. In this section, we apply K-NN to the classic iris dataset, focusing on
distinguishing between the versicolor and virginica species using four flower
measurements. We also explore the impact of different values of K on classification
performance and discuss best practices for model selection.
2.1 K-NN Classification of a New Observation
A K-Nearest Neighbour (K-NN) algorithm using the Euclidean distance was applied to
classify a new iris observation with features ( 6.6 , 3.2 ,5.1 , 1.5 ) (sepal length, sepal width,
petal length, petal width). The algorithm computes the distance from the test point to all
samples in the dataset and selects the K closest neighbors. This approach is intuitive: it
assumes that similar flowers (in terms of measurements) are likely to belong to the same
species.
Nearest Neighbors and Predicted Class: For K=5 ,7 , 9 , the nearest neighbors and their
classes are summarized below:

K Versicolor Virginica Count Predicted Class

Count
5 2 3 Iris-virginica
7 2 5 Iris-virginica
9 2 7 Iris-virginica

The majority of the nearest neighbors for each K are of the class Iris-virginica, so the model
predicts this class for the new observation. The use of Euclidean distance is standard for
continuous features and ensures that the closest points in feature space are selected. The
results are robust across different odd values of K , indicating that the prediction is not
sensitive to the exact choice of K within this range. This stability is desirable in practical
applications, as it suggests the model is not overfitting to noise.

2.2 The Problem with Even K and Solutions

When K is set to an even number (e.g., K=4 ), the algorithm may encounter a tie between
classes. For the test observation, the 4 nearest neighbors are split evenly:

K Versicolor Virginica Count Tie?

Count
4 2 2 Yes

This tie makes the prediction ambiguous. The standard solution is to use an odd value for
K , which reduces the likelihood of ties. Alternatively, a tie-breaking rule (such as always
choosing the class with the closest neighbor, or random assignment) can be implemented,
but using odd K is preferred for interpretability and consistency. In real-world
applications, ties can lead to inconsistent or arbitrary predictions, so careful selection of K
is important for reliable classification.

2.3 Model Performance: Confusion Matrices and Accuracy

The K-NN model was evaluated using leave-one-out cross-validation for K=5 ,7 , 9. This
method involves using each observation in turn as a test case, with the remaining data
serving as the training set. The confusion matrices and accuracies are as follows:
Actual ∖
K Predicted Iris-versicolor Iris-virginica
5 Iris-versicolor 48 9
Iris-virginica 9 40
Accuracy 0.83
7 Iris-versicolor 51 6
Iris-virginica 8 41
Accuracy 0.87
9 Iris-versicolor 52 5
Iris-virginica 6 43
Accuracy 0.90

The accuracy improves as K increases from 5 to 9. For K=9, the model achieves the
highest accuracy (0.90), with the lowest number of misclassifications for both classes. The
confusion matrices show that most errors are between the two classes, with very few false
positives or negatives. The full prediction results for each instance (actual, predicted,
correct/incorrect) are available in the output, allowing for detailed error analysis and
identification of borderline cases.
Best Model Choice: Based on the confusion matrices and accuracy, K=9 provides the best
balance of sensitivity and specificity for both classes in this dataset. However, it is
important to note that the optimal K may vary with different datasets, and cross-validation
is essential for robust model selection.

Summary
The K-NN algorithm, using Euclidean distance, effectively classifies iris varieties in this
dataset. For the new observation, the model robustly predicts Iris-virginica for all odd K
values tested. Using an odd K avoids ties, and K=9 yields the highest accuracy in cross-
validation. These results support the use of K-NN with odd K and highlight the importance
of model validation in classification tasks. In summary, K-NN is a flexible and interpretable
method for classification, but careful attention must be paid to parameter selection and
validation to ensure reliable results.

Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Capstone PPT Final
No ratings yet
Capstone PPT Final
25 pages
Report
No ratings yet
Report
7 pages
Car Price Prediction
No ratings yet
Car Price Prediction
12 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
33 Submission
No ratings yet
33 Submission
8 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Model Evalution
No ratings yet
Model Evalution
6 pages
Report Car Price Prediction
No ratings yet
Report Car Price Prediction
8 pages
A10421291S3
No ratings yet
A10421291S3
8 pages
Identifying The Most Influential Attributes For Predicting Vehicle Prices Using Extremely Randomized Trees Regression
No ratings yet
Identifying The Most Influential Attributes For Predicting Vehicle Prices Using Extremely Randomized Trees Regression
7 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Machine Learning-Based Models For Accurate Car Pri
No ratings yet
Machine Learning-Based Models For Accurate Car Pri
6 pages
Methadology 400 Words
No ratings yet
Methadology 400 Words
2 pages
Ajay and Saurabh
No ratings yet
Ajay and Saurabh
16 pages
1st Review
No ratings yet
1st Review
9 pages
ML Case Study
No ratings yet
ML Case Study
11 pages
Multiple Regression Analysis Project
No ratings yet
Multiple Regression Analysis Project
9 pages
Research Paper
No ratings yet
Research Paper
3 pages
Car Price Prediction Project
No ratings yet
Car Price Prediction Project
34 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
ML Project (1) Final
No ratings yet
ML Project (1) Final
15 pages
Prediction of Car Price Using Linear Regression
No ratings yet
Prediction of Car Price Using Linear Regression
4 pages
RA Presentation
No ratings yet
RA Presentation
27 pages
Kuiper Ch03 PDF
No ratings yet
Kuiper Ch03 PDF
35 pages
Kuiper Ch03
No ratings yet
Kuiper Ch03
35 pages
Problem Statement
No ratings yet
Problem Statement
3 pages
Car Price Prediction Insights
No ratings yet
Car Price Prediction Insights
18 pages
Bulldozer Price Prediction Using Regression Model (Research Ethics)
No ratings yet
Bulldozer Price Prediction Using Regression Model (Research Ethics)
19 pages
WW-M1 Bernardo
No ratings yet
WW-M1 Bernardo
3 pages
Car Price Prediction
No ratings yet
Car Price Prediction
5 pages
Linear Regression Analysis SH
No ratings yet
Linear Regression Analysis SH
12 pages
Ai and Machine Learning For Predicting
No ratings yet
Ai and Machine Learning For Predicting
9 pages
We Have To Consider Different Variables While Purchasing The Used Cars
No ratings yet
We Have To Consider Different Variables While Purchasing The Used Cars
2 pages
Regression Presentation
No ratings yet
Regression Presentation
12 pages
Plag
No ratings yet
Plag
3 pages
Sample Paper 6
No ratings yet
Sample Paper 6
10 pages
Predictive Analytics for Car Pricing
No ratings yet
Predictive Analytics for Car Pricing
8 pages
Car Price Prediction Using Various Algorithms
100% (1)
Car Price Prediction Using Various Algorithms
19 pages
Project Group 20
No ratings yet
Project Group 20
3 pages
Project Soft
No ratings yet
Project Soft
28 pages
Final Report Team 4
No ratings yet
Final Report Team 4
12 pages
Q 3 Multiple Linear Regression Interpretation (Shreyash)
No ratings yet
Q 3 Multiple Linear Regression Interpretation (Shreyash)
2 pages
Mini
No ratings yet
Mini
16 pages
Prediction of Resale Value of The Car Using Linear Regression Algorithm
No ratings yet
Prediction of Resale Value of The Car Using Linear Regression Algorithm
5 pages
Predicting Used Car Prices With Data Analytics
No ratings yet
Predicting Used Car Prices With Data Analytics
10 pages
PE Assignment2 1RN22AI428
No ratings yet
PE Assignment2 1RN22AI428
9 pages
Predicting Car Selling Prices Using Linear Regression
No ratings yet
Predicting Car Selling Prices Using Linear Regression
10 pages
Carprediction
No ratings yet
Carprediction
9 pages
Tristan8 Paper 115
No ratings yet
Tristan8 Paper 115
4 pages
78 - Used Car Price Prediction Using Machine Learning
100% (1)
78 - Used Car Price Prediction Using Machine Learning
5 pages
BMW Used Car Price Prediction
No ratings yet
BMW Used Car Price Prediction
13 pages
Homework1 1
No ratings yet
Homework1 1
3 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
3 pages
Minor Project RRR
No ratings yet
Minor Project RRR
24 pages
Team AN
No ratings yet
Team AN
23 pages
Secondhand Car Price Analysis
No ratings yet
Secondhand Car Price Analysis
12 pages
Case Study98
No ratings yet
Case Study98
14 pages
MSG To Writer 5.15 Veryimp - 21 - 7970924329724336
No ratings yet
MSG To Writer 5.15 Veryimp - 21 - 7970924329724336
2 pages
Reveiew Checklist Automation-Feasibility-Analysis 28 256436800786713
No ratings yet
Reveiew Checklist Automation-Feasibility-Analysis 28 256436800786713
19 pages
QA WB Sample Images or 2CPs 81 4946575973862625
No ratings yet
QA WB Sample Images or 2CPs 81 4946575973862625
6 pages
Request For Expert Advice and Solution Design Based On The Following Initial Requirements 5 4828185634903750
No ratings yet
Request For Expert Advice and Solution Design Based On The Following Initial Requirements 5 4828185634903750
3 pages
The Constitution
No ratings yet
The Constitution
5 pages
Receipt 2 2025-06-11
No ratings yet
Receipt 2 2025-06-11
1 page
AIAB Report
No ratings yet
AIAB Report
14 pages
Rental Receipt John Doe June2025
No ratings yet
Rental Receipt John Doe June2025
1 page
Important Notes and Action Items 87 2492812900428225
No ratings yet
Important Notes and Action Items 87 2492812900428225
3 pages
Reciept Samwell210 Furnished Apartments
No ratings yet
Reciept Samwell210 Furnished Apartments
1 page
Receipt 2 2025-06-11
No ratings yet
Receipt 2 2025-06-11
1 page
Important Notes and Action Items-6-14-Next Steps 96 4426215118657206
No ratings yet
Important Notes and Action Items-6-14-Next Steps 96 4426215118657206
3 pages
NM
No ratings yet
NM
12 pages
JNTU Syllabus Books For EEE (R07 Regulation)
No ratings yet
JNTU Syllabus Books For EEE (R07 Regulation)
68 pages
Chapter Three: Interpolation: Curve Fitting: Fit Function& Data Not Exactly Agree
No ratings yet
Chapter Three: Interpolation: Curve Fitting: Fit Function& Data Not Exactly Agree
21 pages
PlanformStatisticsTools v. 2.0 (For ArcGIS 10)
No ratings yet
PlanformStatisticsTools v. 2.0 (For ArcGIS 10)
28 pages
Interpolation and Approximation
No ratings yet
Interpolation and Approximation
8 pages
Correlation & Regression Analysis Guide
No ratings yet
Correlation & Regression Analysis Guide
49 pages
Advanced Trading Indicator
No ratings yet
Advanced Trading Indicator
2 pages
Saramäki Polynomial Based Interpolation Filters Springer 2007
No ratings yet
Saramäki Polynomial Based Interpolation Filters Springer 2007
32 pages
Course Note
No ratings yet
Course Note
121 pages
DỰ BÁO, MÔ HÌNH HOLT, SAN BẰNG MŨ
No ratings yet
DỰ BÁO, MÔ HÌNH HOLT, SAN BẰNG MŨ
15 pages
Econometrics Final Assignment
No ratings yet
Econometrics Final Assignment
4 pages
Outreg 2
No ratings yet
Outreg 2
34 pages
Separable Nonlinear Least Squares For Estimating
No ratings yet
Separable Nonlinear Least Squares For Estimating
5 pages
RM-Lab20 - Correlation and Regression Analysis Using SPSS
No ratings yet
RM-Lab20 - Correlation and Regression Analysis Using SPSS
6 pages
3D Object Reconstruction with RBFs
No ratings yet
3D Object Reconstruction with RBFs
10 pages
Ecotrix Assignment
No ratings yet
Ecotrix Assignment
5 pages
Stats Project Hafid El Hassani Alaoui
No ratings yet
Stats Project Hafid El Hassani Alaoui
12 pages
Origin vs OriginPro Feature Comparison
No ratings yet
Origin vs OriginPro Feature Comparison
5 pages
1st Year Syllabus 23-24 - New Compressed
No ratings yet
1st Year Syllabus 23-24 - New Compressed
183 pages
Piecewise Linear Interpolation: - Simple Idea
No ratings yet
Piecewise Linear Interpolation: - Simple Idea
14 pages
Lagrangian Interpolation Guide
No ratings yet
Lagrangian Interpolation Guide
11 pages
Newton's Interpolation Formulae PDF
No ratings yet
Newton's Interpolation Formulae PDF
7 pages
Latent Translation: Crossing Modalities by Bridging Generative Models
No ratings yet
Latent Translation: Crossing Modalities by Bridging Generative Models
16 pages
Lecture 3
No ratings yet
Lecture 3
25 pages
Nonlinear Interpolation
No ratings yet
Nonlinear Interpolation
10 pages
Curve Fitting - Lecturers - 2
No ratings yet
Curve Fitting - Lecturers - 2
21 pages
Cie140 2000
100% (2)
Cie140 2000
36 pages
Labreport 3
No ratings yet
Labreport 3
6 pages
KTEE309
No ratings yet
KTEE309
5 pages
Numerical Analysis Tutorial 2
No ratings yet
Numerical Analysis Tutorial 2
2 pages

Report

Uploaded by

Report

Uploaded by

Section A – Question 1: Vehicle Price Analysis

1.1 Linear Regression Model and Interpretation

– Proximity to Urban Centres: Each unit increase in proximity (further from

– Number of Dealerships Nearby: Each additional dealership nearby

• Intercept: The intercept is not directly interpretable in this context, as it represents

Variabl Coeff. Std. Err. t Stat P-value Lower Upper

1.2 Heteroskedasticity and Multicollinearity

(b) Evidence from Results:

• Multicollinearity: Remove or combine correlated predictors, use principal

1.3 Nonlinear Model Evaluation

Section A – Question 2: Iris Classification with K-NN

K Versicolor Virginica Count Predicted Class

2.2 The Problem with Even K and Solutions

K Versicolor Virginica Count Tie?

2.3 Model Performance: Confusion Matrices and Accuracy

You might also like