0% found this document useful (0 votes)

34 views15 pages

Introduction To Artificial Intelligence

Uploaded by

saadqureshiksa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views15 pages

Introduction To Artificial Intelligence

Uploaded by

saadqureshiksa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

MODULE TITLE: Introduction to Artificial Intelligence

MODULE NUMBER: CIS2205

DEPARTMENT: School of Computing Engineering

COURSEWORK: Report

MODULE LEADER: Quratul-Ain-Mahesar

Abstract
Machine learning techniques for increasing the accuracy of real estate valuation predictions
are really critical in real estate market analysis. The pre-processed dataset contains
numerical and categorical variables, including dimensions, geographic location, and
features. Data integrity and information leakage are maintained by replacing missing
numerical values with the median and categorical data with the mode. ColumnTransformer
in Scikit-learn is an automated process that combines OneHotEncoder for encoding
categorical variables and StandardScaler for normalising numeric variables. Both linear
regression and random forest models are trained using 70% for the training set and 30%
for the test set. For the performance evaluation, mean absolute error (MAE), root mean
square error (RMSE), and coefficient of determination (R2) are taken as metrics. The results
indicate that the random forest model outperforms linear regression because the former
models complex interactions and non-linear relationships well. For example, scatter plots
and feature importance plots help understand which parts of the model are doing their job
and what could be key drivers of prices for real estate. This piece thus presents a very
pertinent message for comprehensive preprocessing in ensemble methods of predictive
analytics. It provides a basis for developing models of machine learning in the field of real
estate that could be further optimised via additional features and tuning of hyperparameters.

1
List of Contents
Abstract ............................................................................................................................... 1

List of Contents ................................................................................................................... 2

Table of Figures .................................................................................................................. 3

1. Objective ..................................................................................................................... 4

2. Introduction ................................................................................................................. 4

3. Approach ..................................................................................................................... 4

4. Key Insights ................................................................................................................ 5

4.1. Data Preprocessing ............................................................................................... 5

4.2. Performance of models......................................................................................... 5

4.3. Feature Analysis ................................................................................................... 5

4.4. Visualisations ....................................................................................................... 5

4.5. Optimisation ......................................................................................................... 6

5. Code Guidelines .......................................................................................................... 6

5.1. Needed Imports Pandas are imported as pd by libraries. ..................................... 6

6. Loading the dataset ..................................................................................................... 6

7. Addressing Missing Values ........................................................................................ 6

8. Divide the dataset ........................................................................................................ 7

9. Preprocessor for Data .................................................................................................. 7

10. Divide the data into testing and training .................................................................. 7

10.1. Linear Regression Model.................................................................................. 7

10.2. Random Forest Model ...................................................................................... 8

11. Metrics for Evaluation ............................................................................................. 8

12. Comparison of Expected and Real Prices ................................................................ 8

13. Significance of Features ........................................................................................... 9

2
14. Linear Regression: Forecasted vs Real Costs .......................................................... 9

14.1. Results of Linear Regression: ......................................................................... 10

14.2. Results of Random Forest: ............................................................................. 11

15. Findings.................................................................................................................. 11

16. Recommendations .................................................................................................. 11

17. Conclusion ............................................................................................................. 12

18. References .............................................................................................................. 13

Table of Figures
Figure 1 Actual vs Predicted prices using linear regression model .................................. 10
Figure 2 Actual vs Predicted prices using random forest model ...................................... 10

3
1. Objective
This research aims to predict house prices from the Kaggle dataset "House Prices using
Advanced Regression Techniques." Both Linear Regression (LR) and Random Forest (RF)
algorithms are used to estimate the target variable SalePrice. The procedural outline
includes data preprocessing, training, evaluation, and visualisation.

2. Introduction
Linear regression is the most fundamental and most applied technique in machine learning.
(Maulud & Abdulazeez, 2020). This method (Akgün & Öğüdücü, 2015; Maulud &
Abdulazeez, 2020) is the type of mathematical model to be used in determining a
relationship between the variables involved in the analysis. In addition, linear regression
(Lim, 2019) is also encountered many times in mathematical research methods to estimate
what may happen and predict their values based on several input variables. This method
involves data analysis and modelling to determine the linear relationships of independent
and dependent variables. The process helps present relationships between dependent and
independent variables from analysis and learning to real-world training results. Features of
each process are usually hidden, such as algorithms used, databases, precision, and
performance. (Sarkar et al., 2015). The rapid development of remote sensing technologies,
especially concerning platforms, sensors, and information infrastructure, has significantly
increased the accessibility of Earth observation data for geospatial analysis. (Sarkar et al.,
2015; Seo et al., 2017). Land use and land cover mapping applications are expanding, along
with a need to update the available maps, creating new avenues for the production of
innovative methods for land use classification in various land management sectors. These
advancements aim to address challenges that are local, regional, and global in nature.
(Karan & Samadder, 2018; Ramo & Chuvieco, 2017).

3. Approach
In the pre-processing stage of data, missing values for numerical features were corrected
using the median and categorical features using mode. Additionally, it normalises its
numerical features by applying StandardScaler and encoded categorical variables using
OneHotEncoder. ColumnTransformer was applied, combining all these operations in one
pipeline. The data were split into a training set that included 70% of all data and a testing

4
subset that had 30% of the data. It was then run on two machine learning techniques, linear
regression and random forest, in Scikit-learn pipelines, which involve both preprocessing
and modelling procedures, to make an estimation possible. It compared such through
metrics, such as the mean absolute error, the root mean square error, and coefficient of
determination R2 that were used at a very granular level to test the performance versus the
model. It's highly efficient and systematic, which essentially allows meaningful data on
just how well these models would predict the real estate's value to acquire. (Dimopoulos et
al., 2018; Mathotaarachchi et al., 2024).

4. Key Insights

4.1. Data Preprocessing

• It is important since it is a basic process that ensures the quality and reliability of data
used in an analysis or model.
• Feature Scaling and Encoding to make it interoperable with Machine Learning.

4.2. Performance of models

• The Random Forest model performs better than linear regression since its accuracy and
generalisation capabilities are better because the model can depict nonlinear relations
more accurately.
• The proposed algorithm of Random Forest was proved to perform pretty well in
managing the minimisation of mean absolute error and root mean square error while
maximising coefficient estimate.

4.3. Feature Analysis

• The Random Forest algorithm allows easy selection for the factors influencing a
property the most; information very valuable for a real estate professional (Jui et al.,
2020).

4.4. Visualisations
• As scatter plots will show, the random forests were more accurate compared to the
linear regression with respect to the correlation between the actual values and
predictions.

5
4.5. Optimisation
• More advanced techniques like gradient boosting and XGBoost can be used along with
hyperparameter optimization, and running more feature engineering should be
achieved.(Rey-Blanco et al., 2024).

5. Code Guidelines

5.1. Needed Imports Pandas are imported as pd by libraries.

train_test_split from sklearn.preprocessing import from
sklearn.model_selection import Importing StandardScaler and
OneHotEncoder from sklearn.compose Importing ColumnTransformer from
sklearn.pipeline pipeline from the import of sklearn.linear_model
Importing LinearRegression from sklearn.ensemble sklearn.metrics'
RandomForestRegressor import r2_score, mean_absolute_error, and
mean_squared_error
import seaborn as sns import matplotlib.pyplot as plt

Goal: Import machine learning (scikit-learn), data handling (pandas), and visualisation
(matplotlib, seaborn) libraries.

6. Loading the dataset

Dataset link dataset

Dataset link dataset data =

pd.read_csv('/content/drive/MyDrive/AiLabs/house_prices.csv')

• Goal: Open a DataFrame and load the dataset for analysis.

• Note: Enter the real path to your dataset in place of the file path.

7. Addressing Missing Values

Since 'SalePrice' is the target variable in numeric_features = data,
remove it from the list.(include=['number'])
select_dtypes.(columns=['SalePrice']) drop.columns
data.select_dtypes(include=['object']) = categorical_features.columns
Data[categorical_features] =
data[categorical_features].fillna(data[categorical_features].mode().ilo

6
c[0]) data[numeric_features] =
data[numeric_features].fillna(data[numeric_features].median())

• Numerical Columns: The median of each column is used to replace missing values.
• Categorical Columns: The mode, or most frequent value, of each column is used to
replace missing data.
• Why: For numerical data, the median manages outliers better.
For categorical data, the mode makes sense.

8. Divide the dataset

x = data and Features.drop(axis=1, 'SalePrice') Features
y = data['SalePrice'] Aim

• Goal: Isolate the target variable (y) from the characteristics (x) for supervised learning.

9. Preprocessor for Data

• StandardScaler was used to normalise the data (mean = 0, std = 1) in the numerical
features.
• Category Features: Converted categories to binary indicators through encoding using
OneHotEncoder.

• Goal: Ascertain that every feature is appropriately scaled for machine learning
techniques and presented numerically.

10.Divide the data into testing and training

Train_test_split(x, y, test_size=0.3, random_state=42) sets x_train,
x_test, y_train, y_test.

• Goal: Separate the information

• Training Set: Models are trained using 70% of the data. The testing set contains 30%
of the data required to assess the model's performance.
• Random Seed: Guarantees repeatability of outcomes.

10.1. Linear Regression Model

Pipeline(steps=[('preprocessing', preprocessor), lr_pipeline =
Pipeline(Linear Regression Model, lr_pipeline) ('model',

7
LinearRegression())])
(X_train, y_train) lr_pipeline.fit
lr_pipeline.predict(X_test) = lr_pred

• Pipeline: Integrates the model (LinearRegression) and preprocessing (preprocessor).

• Fit: Use the training data to train the pipeline.
• Predict: Make assumptions based on the test findings.

10.2. Random Forest Model

Pipeline(steps=[('preprocessor', preprocessor), rf_pipeline = Random
Forest Model ('model', RandomForestRegressor(random_state=42))])
(X_train, y_train) rf_pipeline.fit
rf_pipeline.predict(X_test) = rf_pred

• Pipeline: combines the RandomForestRegressor model with preprocessing. A reliable

model for capturing nonlinear interactions is a Random Forest.

11.Metrics for Evaluation

"Linear Regression:" is printed.
mean_absolute_error(y_test, lr_pred)) print("MAE:",
y_test, lr_pred, squared=False) print("RMSE:", mean_squared_error)
"R2_score(y_test, lr_pred)" print("R²:")
"Random Forest:" print("
mean_absolute_error(y_test, rf_pred)) print("MAE:",
y_test, rf_pred, squared=False) print("RMSE:", mean_squared_error)
"R2_score(y_test, rf_pred)" print("R²:")

• Measures: (MAE) Magnitude.

• RMSE: More severe penalties are applied to greater errors.
• R2: Shows how effectively the model accounts for data variance.
• Goal: Evaluate how well the Random Forest and Linear Regression models perform.

12.Comparison of Expected and Real Prices

Plot.Scatter(y_test, rf_pred, alpha=0.5)
('Actual Prices') plt.xlabel
'Predicted Prices' plt.ylabel

8
plt.title('Predicted vs. Actual Prices in Random Forest')
plt.show()

• Scatter Plot: Describes the relationship between actual and predicted values. The
diagonal line (y = x) represents ideal expectations.

13. Significance of Features

rf_pipeline.named_steps['model'] =
importancesPreprocessor.transformers_[0]
=.feature_importances_num_features[2]
list(preprocessor.transformers_[1][1]) =
cat_features.categorical_features() to obtain feature_names_out
list(num_features) + cat_features = features
sns.barplot(y=features, x=importances)
'Feature Importance' plt.title
plt.show()

• Helpful: Determining which traits have the greatest predictive power.

14.Linear Regression: Forecasted vs Real Costs

(figsize=(8, 6)) plt.figure
plt.scatter(alpha=0.5, y_test, lr_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()]'k--
', lw=2)
('Actual Prices') plt.xlabel
'Predicted Prices' plt.ylabel
plt.title('Predicted vs. Actual Prices in Linear Regression')
plt.show()

9
• Line-based scatter plotting: This highlights the closeness of values to the projections.
The results for these can be seen in Figure 1 and Figure 2.

Figure 1 Comparison of Actual Prices to Predicted Prices Using a Linear Regression Model.

14.1. Results of Linear Regression:

MAE: 18247.00205131123

RMSE: 28007.160173400105

R²: 0.8875909312985639

Figure 2 Comparing Actual and Predicted Prices with the Random Forest Model.

10
14.2. Results of Random Forest:
MAE: 17028.310616438357

RMSE: 27234.863837233843

R²: 0.8937048091293024

15.Findings
The study examined significant data in the context of property price forecasting using
machine learning-based predictive modelling techniques (Khobragade et al., 2018). The
data was analysed using both linear regression models and random forest models, each of
which has specific advantages and disadvantages. Linear regression presents an explicit
and easily interpretable model that emphasizes linear relationships between the explanatory
and dependent variables. It suffered linear dynamics, as was determined by an MAE of
18,247 and an R2 of 0.887. Comparing this with the performance and capacity of the model
as indicated by the coefficient of determination, showing an increase in complex and
interaction patterns between the given variables, the model indicates a decrease in MAE
by 17,028 and an R2 of 0.894 for a random forest model. The results indicated that the
Random Forest method was more precise and robust than linear regression, and thus, it was
a better predictor of real estate values. (Mathotaarachchi et al., 2024).
Furthermore, the examination of feature significance facilitated the identification of critical
attributes influencing property pricing. These insights bear considerable consequences for
participants in the real estate market, notably developers, investors, and property owners.
Scatter plots effectively represented the model's efficacy and the importance of various
features while highlighting the practical utility of Random Forest methodologies in real-
world scenarios.

16.Recommendations
To further improve the model:

• Perform hyperparameter tuning for the Random Forest model (Probst et al., 2019).
• Explore additional ensemble methods like Gradient Boosting or XGBoost.
• Engineer new features that capture interactions between existing variables.

11
17.Conclusion
The study assesses the effectiveness of two machine learning algorithms, Random Forest
and Linear Regression, in estimating residential property values. The code lays down a
strong foundation for predictive modelling by proper preparation of data, handling missing
values, and execution of relevant transformations such as one-hot encoding for categorical
data and normalisation for numerical attributes. The explanation for different performance
metrics proves the effectiveness of the predictions of the models. The RF model appears to
outperform the LR model based on its assessment metrics. This is because the RF model,
as an ensemble of decision trees, can identify complex relationships and non-linear
associations between features. In all the parameters, R2 is higher, and MAE and RMSE are
lower on the side of the random forest. So, that will say it more effectively predicts the
value of the home as well as accurately represents home value. To continue, visual
demonstration is in the form of a scatter plot for comparison between the models, showing
the difference between the original price and the estimated or expected prices for each one
of them. The Random Forest model has a better mechanism for predicting because data
points lie closer to each other on the optimal prediction line. The feature importance plot
is an excellent tool for explaining what attributes are more important for residential
property values. It is particularly helpful for home owners, investors, and real estate
professionals in understanding what are the primary drivers of the valuation of properties.
While very informative, it must be said that the machine learning models are not flawless.
More advanced algorithms like Gradient Boosting or XGBoost and potentially even more
diverse or complete training datasets through acquisition could further improve the
performance of this approach. Such a solution already forms a very good foundation for a
predictive model in the field of real estate because it so vividly shows the capacity of data
science methods to transform unstructured housing data into meaningful and actionable
property valuation insights. This code illustrates:

1. A complete workflow for predictive modelling from start to finish.

2. Proactively managing missing values and transforming features is essential for
successful data analysis.
3. Training and comparison of two models (Linear Regression and Random Forest).

12
4. Visualisation of model performance and feature importance.

Both models provide insights into the dataset, but Random Forest performs better for
capturing complex patterns.

18.References
Akgün, B., & Öğüdücü, Ş. G. (2015). Streaming linear regression on spark MLlib and
MOA. Proceedings of the 2015 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining 2015,

13
Dimopoulos, T., Tyralis, H., Bakas, N. P., & Hadjimitsis, D. (2018). Accuracy
measurement of Random Forests and Linear Regression for mass appraisal models
that estimate the prices of residential apartments in Nicosia, Cyprus. Advances in
geosciences, 45, 377-382.
Jui, J. J., Imran Molla, M., Bari, B. S., Rashid, M., & Hasan, M. J. (2020). Flat price
prediction using linear and random forest regression based on machine learning
techniques. Embracing Industry 4.0: Selected Articles from MUCET 2019,
Karan, S. K., & Samadder, S. R. (2018). Improving accuracy of long-term land-use change
in coal mining areas using wavelets and Support Vector Machines. International
Journal of Remote Sensing, 39(1), 84-100.
Khobragade, A. N., Maheswari, N., & Sivagami, M. (2018). Analyzing the housing rate in
a real estate informative system: A prediction analysis. Int. J. Civil Engine. Technol,
9(5), 1156-1164.
Mathotaarachchi, K. V., Hasan, R., & Mahmood, S. (2024). Advanced Machine Learning
Techniques for Predictive Modeling of Property Prices. Information, 15(6), 295.
Maulud, D., & Abdulazeez, A. M. (2020). A review on linear regression comprehensive in
machine learning. Journal of Applied Science and Technology Trends, 1(2), 140-
147.
Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning
strategies for random forest. Wiley Interdisciplinary Reviews: data mining and
knowledge discovery, 9(3), e1301.
Ramo, R., & Chuvieco, E. (2017). Developing a random forest algorithm for MODIS
global burned area classification. Remote Sensing, 9(11), 1193.
Rey-Blanco, D., Zofío, J. L., & González-Arias, J. (2024). Improving hedonic housing
price models by integrating optimal accessibility indices into regression and
random forest analyses. Expert Systems with Applications, 235, 121059.
Sarkar, M. R., Rabbani, M. G., Khan, A. R., & Hossain, M. M. (2015). Electricity demand
forecasting of Rajshahi City in Bangladesh using fuzzy linear regression model.
2015 International Conference on Electrical Engineering and Information
Communication Technology (ICEEICT),
Seo, D. K., Kim, Y. H., Eo, Y. D., Park, W. Y., & Park, H. C. (2017). Generation of
radiometric, phenological normalized image based on random forest regression for
change detection. Remote Sensing, 9(11), 1163.

KIIT Deemed To Be University: A Project Report
No ratings yet
KIIT Deemed To Be University: A Project Report
33 pages
MCQ'S - Business Analytics
No ratings yet
MCQ'S - Business Analytics
42 pages
Machine Learning Internshala: Mini Project / Internship Report
100% (1)
Machine Learning Internshala: Mini Project / Internship Report
28 pages
Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
DK Mlsummary
No ratings yet
DK Mlsummary
3 pages
Data Science Assignment Chapter 1
No ratings yet
Data Science Assignment Chapter 1
5 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
Linear Regression vs Decision Trees for Housing Prices
No ratings yet
Linear Regression vs Decision Trees for Housing Prices
8 pages
10082-Article Text-40794-1-10-20230326
No ratings yet
10082-Article Text-40794-1-10-20230326
9 pages
Project - Synopsis - Format (1) (1) (1) Copy 2
No ratings yet
Project - Synopsis - Format (1) (1) (1) Copy 2
33 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
Main Content (1) - Merged
No ratings yet
Main Content (1) - Merged
50 pages
Real Estate Cost Estimation Using Data Mining
No ratings yet
Real Estate Cost Estimation Using Data Mining
15 pages
ES205 Researchpaper
No ratings yet
ES205 Researchpaper
17 pages
PBL-1 Research Paper
No ratings yet
PBL-1 Research Paper
5 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
55 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
Real Estate Price Prediction Using A Logistic Regression Model
No ratings yet
Real Estate Price Prediction Using A Logistic Regression Model
8 pages
HOUSE PREDICTION (1) (1) New
No ratings yet
HOUSE PREDICTION (1) (1) New
24 pages
Real Estate Price Prediction Guide
No ratings yet
Real Estate Price Prediction Guide
10 pages
House Prices
No ratings yet
House Prices
5 pages
Main Content (1) - Merged
No ratings yet
Main Content (1) - Merged
50 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
Phase 5
No ratings yet
Phase 5
5 pages
Extended House Price Prediction Synopsis
No ratings yet
Extended House Price Prediction Synopsis
16 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
House Price Prediction Using Machine Learning and Neural Networks
No ratings yet
House Price Prediction Using Machine Learning and Neural Networks
4 pages
Synopsis of Predicting House Prices Using Decison Tree
No ratings yet
Synopsis of Predicting House Prices Using Decison Tree
14 pages
Real Estate Project PDF
No ratings yet
Real Estate Project PDF
8 pages
Machine Learning for Real Estate
No ratings yet
Machine Learning for Real Estate
9 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
14 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
4 pages
House Prices Prediction
100% (1)
House Prices Prediction
51 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Report On Java Chatting
No ratings yet
Report On Java Chatting
10 pages
Real Estate Price Prediction Using Machine Learning
No ratings yet
Real Estate Price Prediction Using Machine Learning
7 pages
B4 Boston House Pricing
No ratings yet
B4 Boston House Pricing
63 pages
House Report
No ratings yet
House Report
26 pages
Price Prediction
No ratings yet
Price Prediction
16 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
Main
No ratings yet
Main
35 pages
Sameeksha Mishra Project Report
No ratings yet
Sameeksha Mishra Project Report
28 pages
House Price Prediction with ML
No ratings yet
House Price Prediction with ML
5 pages
Minor Project Report
No ratings yet
Minor Project Report
23 pages
Rep Project Journal
No ratings yet
Rep Project Journal
10 pages
Seminar Ppt4
No ratings yet
Seminar Ppt4
19 pages
A Comparative Study For Predicting House Price Based On Machine Learning
No ratings yet
A Comparative Study For Predicting House Price Based On Machine Learning
7 pages
Ijcse Icter P113
No ratings yet
Ijcse Icter P113
5 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
3 pages
House Price Prediction
No ratings yet
House Price Prediction
17 pages
Bda Report
No ratings yet
Bda Report
27 pages
Mini-Project Report
No ratings yet
Mini-Project Report
14 pages
Project Report
No ratings yet
Project Report
15 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Real Estate Price Prediction Tool
No ratings yet
Real Estate Price Prediction Tool
36 pages
Machine Learning Approaches To Real Estate Market Prediction Problem - A Case Study
No ratings yet
Machine Learning Approaches To Real Estate Market Prediction Problem - A Case Study
20 pages
A Database-Driven Web Site
No ratings yet
A Database-Driven Web Site
7 pages
Renewable Energy Finance & Law
No ratings yet
Renewable Energy Finance & Law
20 pages
Solar Energy Technology Modelling and Analysis
No ratings yet
Solar Energy Technology Modelling and Analysis
18 pages
Critically Examine Environmental Laws Along With Case Studies
No ratings yet
Critically Examine Environmental Laws Along With Case Studies
13 pages
Ism Research Assessment 3
No ratings yet
Ism Research Assessment 3
27 pages
Data Science Selection Questions and Their Answers 2022
No ratings yet
Data Science Selection Questions and Their Answers 2022
5 pages
Tree-Based Methods Explained
No ratings yet
Tree-Based Methods Explained
68 pages
Lecture 2
No ratings yet
Lecture 2
98 pages
Voice Recognition Using Machine Learning PDF
No ratings yet
Voice Recognition Using Machine Learning PDF
16 pages
2 Customer Churning Analysis Using Machine Learning Algorithms
No ratings yet
2 Customer Churning Analysis Using Machine Learning Algorithms
10 pages
Cervical Cancer Prediction Using Machine Learning
No ratings yet
Cervical Cancer Prediction Using Machine Learning
10 pages
Loan Prediction Using Machine Learning
No ratings yet
Loan Prediction Using Machine Learning
89 pages
Horse Pologne
No ratings yet
Horse Pologne
37 pages
Clevered Brochure 6-8 Years
No ratings yet
Clevered Brochure 6-8 Years
24 pages
Data Science 面试必备指南 + 面试真题
No ratings yet
Data Science 面试必备指南 + 面试真题
54 pages
Thesis Full 1 To 6
No ratings yet
Thesis Full 1 To 6
138 pages
Mattheus Lim - Data Scientist CV
No ratings yet
Mattheus Lim - Data Scientist CV
1 page
Predictiveanalysis of PSL Match Winners Using Machine Learning Techniques
No ratings yet
Predictiveanalysis of PSL Match Winners Using Machine Learning Techniques
12 pages
Wang Honyangproject
No ratings yet
Wang Honyangproject
26 pages
Automatic Mood Classification of Indian Popular Music
No ratings yet
Automatic Mood Classification of Indian Popular Music
64 pages
Final Paper - Group14
No ratings yet
Final Paper - Group14
5 pages
KNN and Decision Tree Algorithms
No ratings yet
KNN and Decision Tree Algorithms
50 pages
Fake Job Detection Using Machine Learning
No ratings yet
Fake Job Detection Using Machine Learning
8 pages
Applsci 11 07987 v2
No ratings yet
Applsci 11 07987 v2
37 pages
Maximizing Campus Placement Through Machine Learni
No ratings yet
Maximizing Campus Placement Through Machine Learni
7 pages
THESIS TOPIC Soil Prediction Arsenic
No ratings yet
THESIS TOPIC Soil Prediction Arsenic
5 pages
Ajanah, Hakeema Ize Final Project
No ratings yet
Ajanah, Hakeema Ize Final Project
97 pages
Machine Learning: Chapter 2 Clustering
No ratings yet
Machine Learning: Chapter 2 Clustering
23 pages
Ensemble Solar Forecasting Review
No ratings yet
Ensemble Solar Forecasting Review
15 pages
Efficacy of Customer Churn Prediction System
No ratings yet
Efficacy of Customer Churn Prediction System
8 pages
Machine Learning Approach For Intrusion Detection On Cloud Virtual Machines
No ratings yet
Machine Learning Approach For Intrusion Detection On Cloud Virtual Machines
10 pages
A Tour of The Oil Industry - Kaggle
No ratings yet
A Tour of The Oil Industry - Kaggle
19 pages

Introduction To Artificial Intelligence

Uploaded by

Introduction To Artificial Intelligence

Uploaded by

MODULE TITLE: Introduction to Artificial Intelligence

MODULE NUMBER: CIS2205

DEPARTMENT: School of Computing Engineering

MODULE LEADER: Quratul-Ain-Mahesar

List of Contents ................................................................................................................... 2

Table of Figures .................................................................................................................. 3

4. Key Insights ................................................................................................................ 5

4.1. Data Preprocessing ............................................................................................... 5

4.2. Performance of models......................................................................................... 5

4.3. Feature Analysis ................................................................................................... 5

4.4. Visualisations ....................................................................................................... 5

4.5. Optimisation ......................................................................................................... 6

5. Code Guidelines .......................................................................................................... 6

5.1. Needed Imports Pandas are imported as pd by libraries. ..................................... 6

6. Loading the dataset ..................................................................................................... 6

7. Addressing Missing Values ........................................................................................ 6

8. Divide the dataset ........................................................................................................ 7

9. Preprocessor for Data .................................................................................................. 7

10. Divide the data into testing and training .................................................................. 7

10.1. Linear Regression Model.................................................................................. 7

10.2. Random Forest Model ...................................................................................... 8

11. Metrics for Evaluation ............................................................................................. 8

12. Comparison of Expected and Real Prices ................................................................ 8

13. Significance of Features ........................................................................................... 9

14.1. Results of Linear Regression: ......................................................................... 10

14.2. Results of Random Forest: ............................................................................. 11

16. Recommendations .................................................................................................. 11

17. Conclusion ............................................................................................................. 12

18. References .............................................................................................................. 13

4.1. Data Preprocessing

4.2. Performance of models

4.3. Feature Analysis

5.1. Needed Imports Pandas are imported as pd by libraries.

6. Loading the dataset

Dataset link dataset data =

• Goal: Open a DataFrame and load the dataset for analysis.

7. Addressing Missing Values

8. Divide the dataset

9. Preprocessor for Data

10.Divide the data into testing and training

• Goal: Separate the information

10.1. Linear Regression Model

• Pipeline: Integrates the model (LinearRegression) and preprocessing (preprocessor).

10.2. Random Forest Model

• Pipeline: combines the RandomForestRegressor model with preprocessing. A reliable

11.Metrics for Evaluation

• Measures: (MAE) Magnitude.

12.Comparison of Expected and Real Prices

13. Significance of Features

• Helpful: Determining which traits have the greatest predictive power.

14.Linear Regression: Forecasted vs Real Costs

14.1. Results of Linear Regression:

1. A complete workflow for predictive modelling from start to finish.

You might also like