Model Evalution

Uploaded by

thanhhoai2244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Model Evalution

Uploaded by

thanhhoai2244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Model Evaluation

5.1 Problem in the Project

Our research highlights limitations in popular car valuation guides, which typically avoid
machine learning techniques and instead rely on average prices from local sales. While this
approach works for common cars with standard features, it fails to accurately value unique
vehicles. Traditional methods often overlook the nuanced insights provided by different
makes and models. In contrast, machine learning utilizes the entire dataset to generate
more precise predictions, maximizing the use of all available data and features.
New cars of the same make, model, location, and features are identical in condition,
function, and price. However, once sold, they transition into used cars whose value changes
over time due to factors like aging (depreciation), inflation, and obsolescence (revaluation).
For simplicity, these changes are collectively referred to as depreciation. Additionally,
attributes such as engine size influence value, larger engines may become less desirable as
fuel prices increase.
Used cars acquire unique service histories that impact their condition and market value.
Traditional methods often simplify these details into a broad “condition” category,
considering repairs or modifications only if they have a substantial effect on the car's value.
We propose using machine learning models, such as Random Forest and Linear Regression,
to better account for uncommon features, leading to more accurate vehicle valuations. This
approach provides consumers with a reliable and equitable tool for buying or selling cars
with non-standard features.
5.2. Application of the model
5.2.1. Evaluation method
1. MSE

Significance:
 MSE measures the average squared difference between the predicted values and the
actual values, indicating the model's overall error.
 A smaller MSE reflects better predictive accuracy, suggesting that the model is
performing well.
 Since MSE involves squaring the errors, it penalizes larger errors more heavily,
making it highly sensitive to outliers. This can help identify and address significant
deviations in the data.
Application:
 MSE is well-suited for evaluating regression models, particularly in scenarios like
price prediction, where understanding and minimizing the overall prediction error is
crucial.
2. RMSE

Significance:
 If RMSE is small, it shows that the model has good predictive ability and is less
affected by large errors.
 This index is sensitive to outliers, so it is very useful in assessing the degree of their
influence on the model.
Application:
 RMSE is suitable for problems where large errors need to be strictly handled, such as
price prediction or weather forecasting.
3. MAE
MAE measures the average error between the predicted value and the actual value. It does
not consider the direction of the error (whether the prediction is larger or smaller than the
actual value).

Significance:
 If MAE is small, it shows that the model has a prediction ability close to reality,
suitable for cases where the average error needs to be assessed.
 However, MAE does not penalize large errors, so it does not clearly reflect the
influence of outliers.
Application:
 MAE is suitable when you want a simple and easy-to-understand estimate of the
error.
4. R²
R² = 1- (ESS/TSS)
ESS is explained sum of squares
TSS is total sum of squares
Significance:
 If R² is high, it indicates that the model fits the data well, meaning that the factors in
the model have explained most of the variation in the target variable.
 If R² is low, it indicates that the model has not taken advantage of the information
from the data well or that important factors are still missing.
Application:
 R² is used to evaluate the overall suitability of the model.

After calculating the above indices for each model (Linear Regression and Random Forest),
you will analyze the results obtained:
 RMSE: The model with low RMSE will be evaluated as having more accurate
forecasts. However, if the RMSE of Random Forest is much lower than that of Linear
Regression, it may indicate that Random Forest has a better ability to capture non-
linear relationships.
 MAE: MAE will provide a view of the average deviation of the forecasts without
being too affected by outliers. If the Random Forest MAE is lower, the model may be
doing a better job of minimizing the absolute error.
 R²: A high R² value indicates that the model is able to explain a large part of the
variation in the forecast value. If the Random Forest R² is much higher than the
Linear Regression, it indicates that the Random Forest can better handle complex
relationships in the data.
 After evaluating the models, you will decide which model is better suited for your car
price prediction problem.
5.2.2 Input
Data Collection and Exploration
 Collect a detailed dataset containing car listings, including key attributes and the
target variable (car price).
 Analyze the dataset to understand the distribution of variables, detect missing data,
identify outliers, and resolve any inconsistencies.
 Conduct exploratory data analysis (EDA) to discover meaningful patterns and
relationships between features and the target variable.
Data Preprocessing
 Handle missing values: Use suitable methods such as mean or median imputation
based on the pattern and extent of missing data.
 Encode categorical variables: Transform categorical features (e.g., make, model,
transmission type) into numerical formats using methods like one-hot encoding.
 Scale numerical features: Standardize numerical data to have a similar scale,
applying either standardization (zero mean, unit variance) or normalization (min-max
scaling).
 Feature engineering: Generate new features from existing ones to extract more
information and enhance the model's predictive capabilities. For example, split the
"engine" column into two new columns: "fuel_type" for fuel type and
"engine_capacity" for engine power.

Model Training and Evaluation

Linear Regression
 Split the cleaned and preprocessed data into training and testing sets, ensuring that
the test set remains unseen during the model's training phase to provide an
unbiased evaluation.
 Train the Linear Regression model using the training set, with car features serving as
input variables and the car price as the target variable.
 Evaluate the model's performance on the test set using metrics such as Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE),
and R-squared (R²). These metrics help assess the model's accuracy and ability to
generalize.
 Visualize the relationship between actual and predicted car prices through a scatter
plot, allowing for an intuitive evaluation of the model's predictions.
 Review sample predictions, formatted in Vietnamese currency (VND), to provide
practical insights into the model's output and highlight its real-world application.
Random Forest
 Split the cleaned and preprocessed data into training and testing sets, ensuring that
the test set remains unseen during the model's training phase to provide an
unbiased evaluation.
 A Random Forest Regressor model with 50 decision trees (n_estimators=50) and a
fixed random seed (random_state=42) was created to ensure reproducibility. The
model was trained on the training dataset (X_train and y_train), allowing it to learn
patterns and relationships between features and their target values. This training
enables the model to make predictions on the test dataset (X_test).
 Evaluate the model's performance on the test set using metrics such as Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE),
and R-squared (R²). These metrics help assess the model's accuracy and ability to
generalize.
 Visualize the relationship between actual and predicted car prices through a scatter
plot, allowing for an intuitive evaluation of the model's predictions.
 Review sample predictions, formatted in Vietnamese currency (VND), to provide
practical insights into the model's output and highlight its real-world application.

5.2.3 Output
The output of the project is the predicted car price for each input car, based on the trained
Random Forest model

The output of the project is the predicted car price for each input car, based on the trained
Linear Regression model
5.3 Conclusion

Errors MSE MAE RMSE

Linear Regression 2.35E+16 111625851 153176192.8

Random Forest Regressor 5.9189E+11 201230.155 769343.6023

The results demonstrate that Random Forest significantly outperforms Linear Regression in
terms of MSE and RMSE, indicating superior predictive accuracy. While Random Forest
exhibits lower MAE, it remains substantial, suggesting the presence of instances with
significant prediction errors. This disparity likely stems from the data's potential non-linear
nature, which Random Forest, with its ensemble of decision trees, is better equipped to
capture. However, the high MAE for both models points to potential overfitting or the
influence of outliers. To enhance model performance, strategies such as regularization,
hyperparameter tuning, feature engineering, and data cleaning should be explored.
This study successfully developed a predictive model for car prices in Vietnam, offering
valuable insights into the factors influencing car valuations. By utilizing machine learning
and thorough data analysis, the research provides practical recommendations for
consumers, dealers, and policymakers. The results highlight the necessity of considering
various car attributes and market conditions to achieve fair and competitive pricing.
Looking ahead, ongoing research and continuous model improvement will be essential to
keeping pace with the dynamic automotive market. Incorporating more diverse data
sources and applying advanced analytical methods can further enhance predictive accuracy
and understanding. Ultimately, this progress will contribute to a more transparent and
efficient automotive market in Vietnam and beyond.

Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
ASTMD6433 Distress in Asphalt PDF
No ratings yet
ASTMD6433 Distress in Asphalt PDF
48 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Car Price Prediction
No ratings yet
Car Price Prediction
5 pages
Car Price Prediction
No ratings yet
Car Price Prediction
12 pages
Solidification of Metals (To Be Completed) : Prof. H. K. Khaira Professor, Deptt. of MSME M.A.N.I.T., Bhopal
100% (1)
Solidification of Metals (To Be Completed) : Prof. H. K. Khaira Professor, Deptt. of MSME M.A.N.I.T., Bhopal
62 pages
Rohit Godke Dsbda Report Sppu
No ratings yet
Rohit Godke Dsbda Report Sppu
10 pages
Abstract
No ratings yet
Abstract
4 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Car Price Prediction Using Various Algorithms
100% (1)
Car Price Prediction Using Various Algorithms
19 pages
Data Analysis Report
No ratings yet
Data Analysis Report
67 pages
Guidelines For Stability Testing of New Drug Substances and Products
No ratings yet
Guidelines For Stability Testing of New Drug Substances and Products
44 pages
Bio Statistics Test Unit2
No ratings yet
Bio Statistics Test Unit2
4 pages
Classical Nucleation Theory
100% (1)
Classical Nucleation Theory
41 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
29 pages
Report
No ratings yet
Report
6 pages
Data Analytics CS 605
No ratings yet
Data Analytics CS 605
4 pages
Car Resale Value
No ratings yet
Car Resale Value
20 pages
Chapter 12
No ratings yet
Chapter 12
44 pages
Report
No ratings yet
Report
7 pages
A10421291S3
No ratings yet
A10421291S3
8 pages
Class Participation
No ratings yet
Class Participation
9 pages
JETIR2204201
No ratings yet
JETIR2204201
7 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
WW-M1 Bernardo
No ratings yet
WW-M1 Bernardo
3 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Basic Concepts of Logistic Regression
No ratings yet
Basic Concepts of Logistic Regression
5 pages
ML Case Study
No ratings yet
ML Case Study
11 pages
Data Science Simpli-Ed Part 4: Simple Linear Regression Models
No ratings yet
Data Science Simpli-Ed Part 4: Simple Linear Regression Models
1 page
W3 - Linear Regression
No ratings yet
W3 - Linear Regression
55 pages
Report
No ratings yet
Report
4 pages
Unit 5
No ratings yet
Unit 5
18 pages
1
No ratings yet
1
3 pages
Machine Learning-Based Models For Accurate Car Pri
No ratings yet
Machine Learning-Based Models For Accurate Car Pri
6 pages
Chemical Engineering Exam Guide
100% (1)
Chemical Engineering Exam Guide
2 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
20220523121909pmwebology 18 (6) - 443 PDF
No ratings yet
20220523121909pmwebology 18 (6) - 443 PDF
14 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
8 pages
Used Car Model2
No ratings yet
Used Car Model2
4 pages
Flood Frequency Analysis Guide
No ratings yet
Flood Frequency Analysis Guide
35 pages
Carprediction
No ratings yet
Carprediction
9 pages
Reinforcement Corrosion - An Overview
No ratings yet
Reinforcement Corrosion - An Overview
9 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
55 pages
Mini
No ratings yet
Mini
16 pages
Module 2
No ratings yet
Module 2
5 pages
The Use of Quantitative Techniques in Budgeting Relevant To Acca Qualification Paper F5
No ratings yet
The Use of Quantitative Techniques in Budgeting Relevant To Acca Qualification Paper F5
19 pages
Purifying Alcoholic Beverage Using Simple and Fractional Distillation
No ratings yet
Purifying Alcoholic Beverage Using Simple and Fractional Distillation
4 pages
Regression Presentation
No ratings yet
Regression Presentation
12 pages
33 Submission
No ratings yet
33 Submission
8 pages
Project Soft
No ratings yet
Project Soft
28 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
Report Car Price Prediction
No ratings yet
Report Car Price Prediction
8 pages
Final Report Team 4
No ratings yet
Final Report Team 4
12 pages
Pa Da1
No ratings yet
Pa Da1
17 pages
Problem: # Partition
No ratings yet
Problem: # Partition
5 pages
Sample Paper 6
No ratings yet
Sample Paper 6
10 pages
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
No ratings yet
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
5 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Car Price Prediction Insights
No ratings yet
Car Price Prediction Insights
18 pages
Car Features Case Study
No ratings yet
Car Features Case Study
10 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
Acf
No ratings yet
Acf
23 pages
Car Price Prediction
No ratings yet
Car Price Prediction
21 pages
1st Review
No ratings yet
1st Review
9 pages
MSC CS Mqp0708
No ratings yet
MSC CS Mqp0708
12 pages
Report
No ratings yet
Report
24 pages
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
No ratings yet
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
26 pages
Team AN
No ratings yet
Team AN
23 pages
Used Car Price Prediction
No ratings yet
Used Car Price Prediction
20 pages
7 Regression
No ratings yet
7 Regression
15 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Pre-Owned Car Price and Life Prediction Using Machine Learning
No ratings yet
Pre-Owned Car Price and Life Prediction Using Machine Learning
26 pages
02-MLR For Prediction
No ratings yet
02-MLR For Prediction
24 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
28 pages
Documentation & Report For Flyzy Flight Cancellation Project
No ratings yet
Documentation & Report For Flyzy Flight Cancellation Project
25 pages
Day-Ahead Electricity Market Price Forecasting Using Artificial Neural Network With Spearman Data Correlation
No ratings yet
Day-Ahead Electricity Market Price Forecasting Using Artificial Neural Network With Spearman Data Correlation
6 pages
Height Gage Calibration Certificate
No ratings yet
Height Gage Calibration Certificate
2 pages
Urbansci 09 00213 v2
No ratings yet
Urbansci 09 00213 v2
24 pages
12 - 4 - June - 2015 - Fluid Mechanics
No ratings yet
12 - 4 - June - 2015 - Fluid Mechanics
2 pages
Horizontal Plate Natural Convection Equations and Calculator - Engineers Edge
No ratings yet
Horizontal Plate Natural Convection Equations and Calculator - Engineers Edge
4 pages
Water: Hydrological Modeling Approach Using Radar-Rainfall Ensemble and Multi-Runo Blending Technique
No ratings yet
Water: Hydrological Modeling Approach Using Radar-Rainfall Ensemble and Multi-Runo Blending Technique
18 pages
Binomial Probability
No ratings yet
Binomial Probability
4 pages
Stat Symbols
No ratings yet
Stat Symbols
8 pages
Stat 350 Midterm 2
No ratings yet
Stat 350 Midterm 2
6 pages
Curvas de Rarefacción
No ratings yet
Curvas de Rarefacción
3 pages
Soil Moisture and Its Effect On Bulk Density and Porosity of Intact Aggregates of Three Mollic Soils
No ratings yet
Soil Moisture and Its Effect On Bulk Density and Porosity of Intact Aggregates of Three Mollic Soils
6 pages
Drought Monitoring Using Remote Sensing
No ratings yet
Drought Monitoring Using Remote Sensing
5 pages

Model Evalution

Uploaded by

Model Evalution

Uploaded by

Model Evaluation

5.1 Problem in the Project

Model Training and Evaluation

Errors MSE MAE RMSE

Linear Regression 2.35E+16 111625851 153176192.8

Random Forest Regressor 5.9189E+11 201230.155 769343.6023

You might also like