Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views6 pages

Model Evalution

Uploaded by

thanhhoai2244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Model Evalution

Uploaded by

thanhhoai2244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Model Evaluation

5.1 Problem in the Project


Our research highlights limitations in popular car valuation guides, which typically avoid
machine learning techniques and instead rely on average prices from local sales. While this
approach works for common cars with standard features, it fails to accurately value unique
vehicles. Traditional methods often overlook the nuanced insights provided by different
makes and models. In contrast, machine learning utilizes the entire dataset to generate
more precise predictions, maximizing the use of all available data and features.
New cars of the same make, model, location, and features are identical in condition,
function, and price. However, once sold, they transition into used cars whose value changes
over time due to factors like aging (depreciation), inflation, and obsolescence (revaluation).
For simplicity, these changes are collectively referred to as depreciation. Additionally,
attributes such as engine size influence value, larger engines may become less desirable as
fuel prices increase.
Used cars acquire unique service histories that impact their condition and market value.
Traditional methods often simplify these details into a broad “condition” category,
considering repairs or modifications only if they have a substantial effect on the car's value.
We propose using machine learning models, such as Random Forest and Linear Regression,
to better account for uncommon features, leading to more accurate vehicle valuations. This
approach provides consumers with a reliable and equitable tool for buying or selling cars
with non-standard features.
5.2. Application of the model
5.2.1. Evaluation method
1. MSE

Significance:
 MSE measures the average squared difference between the predicted values and the
actual values, indicating the model's overall error.
 A smaller MSE reflects better predictive accuracy, suggesting that the model is
performing well.
 Since MSE involves squaring the errors, it penalizes larger errors more heavily,
making it highly sensitive to outliers. This can help identify and address significant
deviations in the data.
Application:
 MSE is well-suited for evaluating regression models, particularly in scenarios like
price prediction, where understanding and minimizing the overall prediction error is
crucial.
2. RMSE

Significance:
 If RMSE is small, it shows that the model has good predictive ability and is less
affected by large errors.
 This index is sensitive to outliers, so it is very useful in assessing the degree of their
influence on the model.
Application:
 RMSE is suitable for problems where large errors need to be strictly handled, such as
price prediction or weather forecasting.
3. MAE
MAE measures the average error between the predicted value and the actual value. It does
not consider the direction of the error (whether the prediction is larger or smaller than the
actual value).

Significance:
 If MAE is small, it shows that the model has a prediction ability close to reality,
suitable for cases where the average error needs to be assessed.
 However, MAE does not penalize large errors, so it does not clearly reflect the
influence of outliers.
Application:
 MAE is suitable when you want a simple and easy-to-understand estimate of the
error.
4. R²
R² = 1- (ESS/TSS)
ESS is explained sum of squares
TSS is total sum of squares
Significance:
 If R² is high, it indicates that the model fits the data well, meaning that the factors in
the model have explained most of the variation in the target variable.
 If R² is low, it indicates that the model has not taken advantage of the information
from the data well or that important factors are still missing.
Application:
 R² is used to evaluate the overall suitability of the model.

After calculating the above indices for each model (Linear Regression and Random Forest),
you will analyze the results obtained:
 RMSE: The model with low RMSE will be evaluated as having more accurate
forecasts. However, if the RMSE of Random Forest is much lower than that of Linear
Regression, it may indicate that Random Forest has a better ability to capture non-
linear relationships.
 MAE: MAE will provide a view of the average deviation of the forecasts without
being too affected by outliers. If the Random Forest MAE is lower, the model may be
doing a better job of minimizing the absolute error.
 R²: A high R² value indicates that the model is able to explain a large part of the
variation in the forecast value. If the Random Forest R² is much higher than the
Linear Regression, it indicates that the Random Forest can better handle complex
relationships in the data.
 After evaluating the models, you will decide which model is better suited for your car
price prediction problem.
5.2.2 Input
Data Collection and Exploration
 Collect a detailed dataset containing car listings, including key attributes and the
target variable (car price).
 Analyze the dataset to understand the distribution of variables, detect missing data,
identify outliers, and resolve any inconsistencies.
 Conduct exploratory data analysis (EDA) to discover meaningful patterns and
relationships between features and the target variable.
Data Preprocessing
 Handle missing values: Use suitable methods such as mean or median imputation
based on the pattern and extent of missing data.
 Encode categorical variables: Transform categorical features (e.g., make, model,
transmission type) into numerical formats using methods like one-hot encoding.
 Scale numerical features: Standardize numerical data to have a similar scale,
applying either standardization (zero mean, unit variance) or normalization (min-max
scaling).
 Feature engineering: Generate new features from existing ones to extract more
information and enhance the model's predictive capabilities. For example, split the
"engine" column into two new columns: "fuel_type" for fuel type and
"engine_capacity" for engine power.

Model Training and Evaluation


Linear Regression
 Split the cleaned and preprocessed data into training and testing sets, ensuring that
the test set remains unseen during the model's training phase to provide an
unbiased evaluation.
 Train the Linear Regression model using the training set, with car features serving as
input variables and the car price as the target variable.
 Evaluate the model's performance on the test set using metrics such as Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE),
and R-squared (R²). These metrics help assess the model's accuracy and ability to
generalize.
 Visualize the relationship between actual and predicted car prices through a scatter
plot, allowing for an intuitive evaluation of the model's predictions.
 Review sample predictions, formatted in Vietnamese currency (VND), to provide
practical insights into the model's output and highlight its real-world application.
Random Forest
 Split the cleaned and preprocessed data into training and testing sets, ensuring that
the test set remains unseen during the model's training phase to provide an
unbiased evaluation.
 A Random Forest Regressor model with 50 decision trees (n_estimators=50) and a
fixed random seed (random_state=42) was created to ensure reproducibility. The
model was trained on the training dataset (X_train and y_train), allowing it to learn
patterns and relationships between features and their target values. This training
enables the model to make predictions on the test dataset (X_test).
 Evaluate the model's performance on the test set using metrics such as Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE),
and R-squared (R²). These metrics help assess the model's accuracy and ability to
generalize.
 Visualize the relationship between actual and predicted car prices through a scatter
plot, allowing for an intuitive evaluation of the model's predictions.
 Review sample predictions, formatted in Vietnamese currency (VND), to provide
practical insights into the model's output and highlight its real-world application.

5.2.3 Output
The output of the project is the predicted car price for each input car, based on the trained
Random Forest model

The output of the project is the predicted car price for each input car, based on the trained
Linear Regression model
5.3 Conclusion

Errors MSE MAE RMSE

Linear Regression 2.35E+16 111625851 153176192.8

Random Forest Regressor 5.9189E+11 201230.155 769343.6023

The results demonstrate that Random Forest significantly outperforms Linear Regression in
terms of MSE and RMSE, indicating superior predictive accuracy. While Random Forest
exhibits lower MAE, it remains substantial, suggesting the presence of instances with
significant prediction errors. This disparity likely stems from the data's potential non-linear
nature, which Random Forest, with its ensemble of decision trees, is better equipped to
capture. However, the high MAE for both models points to potential overfitting or the
influence of outliers. To enhance model performance, strategies such as regularization,
hyperparameter tuning, feature engineering, and data cleaning should be explored.
This study successfully developed a predictive model for car prices in Vietnam, offering
valuable insights into the factors influencing car valuations. By utilizing machine learning
and thorough data analysis, the research provides practical recommendations for
consumers, dealers, and policymakers. The results highlight the necessity of considering
various car attributes and market conditions to achieve fair and competitive pricing.
Looking ahead, ongoing research and continuous model improvement will be essential to
keeping pace with the dynamic automotive market. Incorporating more diverse data
sources and applying advanced analytical methods can further enhance predictive accuracy
and understanding. Ultimately, this progress will contribute to a more transparent and
efficient automotive market in Vietnam and beyond.

You might also like