0% found this document useful (0 votes)

21 views27 pages

Lecture-18 - Evaluation Metrics For Different Model

Uploaded by

Tahir Mahmod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views27 pages

Lecture-18 - Evaluation Metrics For Different Model

Uploaded by

Tahir Mahmod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

1

Lecture 18 | Evaluation Metrics for different

model

Our previous topic was Linear Regression. Linear Regression is used to

predict a number from continuous nominal or numeric values. The number
we predict can be any number, it can be a positive number or a
negative number, it can be a high value or a low value. A graphical
representation of linear regression was also shown. In graphical
representation there is plot, a studio plane, data points and a line
that must have most of the data points around it. In order to measure
the performance of graph, we also discussed the Mean Squared error and
in order to nullify the effect of resultant zero root mean square
error.
In colab we trained a modal over a data set. An issue raised during
its execution that the resultant value of Mean Squared Error was very
high. Our today’s topic will start from the finding the reasons of
this high value of Mean Squared Error.

We will find the reasons for high value of Mean Squared Error. When we
will find the reasons, then we will find the actions to minimize the
problems. Remember a little value of loss will exist even after
putting the efforts to mitigate the loss. We can apply number of

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

regression techniques to train the modal, then how we will find which
model will perform better and which model will not perform better. It
is the Mean Squared Error that will help us to select the better
model. The model having low Mean Squared Error will perform better and
vice versa.
The line in linear regression we discussed is between two dimensions.
In three-dimensional Linear Regression the line would be displayed
like white board which is termed as plane in mathematics. The linear
regression in more than three lines, human brain can’t apprehend.
Even if there are n number of independent variables and one number of
independent variable, the linear regression is possible and we will
have a white board like structure or plane arranged in n number of
directions.
One thing we can do to reduce the Mean Squared Error is to label the
modal with different attributes. As we have seen during the training
the modal, values of some attributes/ features/ columns were very high
and values of some attributes / features / columns were very low. When
we have such values, the modal focus more on high values in order to
learn more from them. It means that the model will learn less
significantly from attributes having minimal values between 0 and 1’s.
If the model would have given due importance to these small values
too, it could have performed better.
What is the cure to this problem. When we discussed about data pre
processing we mentioned data transformation and data reduction. We had
applied data reduction rigorously on data set of titanic. By applying
correlation, we extracted those columns which were suitable for model
training and columns those were not suitable for model training. So,
dimensionality reduction in data preprocessing means we shred the
unnecessary columns from the input data set, enabling our model to
utilize useful data for its learning.
Now we put data transformation in practice. As we discussed earlier
that our data set comprises some high values i.e. 1000nds and low
values from 0 to 01. Due to this huge difference our model does not
give due weightage to the low values between o and 01. So, there is
need to transform the data in such a way that model gives equal
importance to all attributes of the data set. As the model is giving
equal attention to all attributes so it now has more opportunity to
learn. So, data transformation technique is used for making the data
set of comprising features of equal importance.
Suppose we have to measure the performance of players in a team. We
have data set about weight of players that is figured out as 60 kg, 65
Kg, 55 Kg, 50 Kg and the height of players that is figured out as
163cm, 165cm 173cm etc. As the modal focus only on values to train
itself and don’t pay attention on units. Modal will give weightage to

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

high values and ignore or pay less attention to low values. This will
affect the modal badly. In order to rectify this, we limit the values
in a range and confine the model to use the values within specified
range.

Let us discuss few methods of data transformation.

1. Z-score is a variation of scaling that represents the number of
standard deviations away from the mean. We would use z-score to
ensure our feature distributions have mean = 0 and std = 1. It's
useful when there are a few outliers, but not so extreme that we
need clipping.
2. MinMax scaling
Rescaling (min-max normalization)

Also known as min-max scaling or min-max normalization,

rescaling is the simplest method and consists in rescaling the
range of features to scale the range in [0, 1] or [−1, 1].
Selecting the target range depends on the nature of the data.

Now question arises, whether scaling or standardization is to

be applied on high value features or all features in data set.
Should the target variable must also be part of that scaling
or standardization.
We are discussing different algorithms of Linear Regression.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Suppose we have to predict a score of a batsman from a data set. Now

we divide the data set in number of values and distribute it to
different group of people to predict. Suppose rows having score from
1-10 are given a group of 25 people to predict. Rows having score from
11-50 to another group and so on. We define range of score that
particular group of people will predict.
Similar is in the case of above Decision Regression Tree. Each tree is
predicting different set of values. Each tree has a range of values
from which resultant value is predicted.
Until now only thing that is to be understood is that decision tree
regression can be used to perform the regression and classification.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Random Forest regression is the form of decision tree regression. It

can be explained with the help of an example that suppose we are in a
problem in a forest and each tree in the forest is like our friend. We
take suggestion from each tree or friend about the problem. We can
simply say that it is like voting. We make decision after considering
the opinion of majority of friend.

This is another extension of Decision Tree Regressor. In this

regressor we take action on the basis of opinion given and if it does

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

not work, we came back and take the opinion again an avoid the actions
that produced the wrong results.

Now we move towards colab note book. Our practice session from colab
starts with the question that on which features scaling is to be
applied. Normally scaling is applied on all features including target
feature. However, in this colab the scaling will be applied only on
features and scaling all features including will be our task in the
assignment.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

OBJECTIVE 2: MACHINE LEARNING

Next, I will feed these features into various classification
algorithms to determine the best performance using a simple
framework: Split, Fit, Predict, Score It.

Target Variable Splitting

We will split the Full dataset into Input and target variables
Input is also called Feature Variables Output refers to
Target variables

# Split data to be used in the models

# Create matrix of features
x = full_data.drop('Price', axis = 1) # grabs everything else but
'Price'

# Create target variable

y = full_data['Price'] # y is the column we're trying to predict

Before we train the models, it's essential to split our data into
training and testing sets. This ensures that we have a separate
dataset to evaluate the performance of our trained models. The
common practice is to use a certain portion of our data for
training (e.g., 70-80%) and the remaining portion for testing
(e.g., 20-30%)

from sklearn import preprocessing

pre_process = preprocessing.StandardScaler().fit(x)
x_transform = pre_process.fit_transform(x)

Now we are applying feature scaling to our feature matrix x using

the StandardScaler from scikit-learn. Feature scaling is a common
preprocessing step in machine learning to standardize or
normalize the features so that they have a mean of 0 and a
standard deviation of 1. This can be helpful, especially for
algorithms that are sensitive to the scale of input features.

In the code above, we first create an instance of StandardScaler

and then fit it to our data (x) using the fit method. After that,

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

we can apply the transformation to the feature matrix using the

transform method, which gives we x_transform with scaled features.

Remember, when we use the same scaling parameters for both the
training and testing sets, it ensures that the features are
scaled consistently, which is crucial for accurate model
performance evaluation.

# pipe = make_pipeline(StandardScaler(), LogisticRegression())

# pipe.fit(X_train, y_train)

We are creating a machine learning pipeline using scikit-learn's

make_pipeline function. This pipeline combines feature scaling
using StandardScaler() and a logistic regression model.

Let's break down the code:

1. make_pipeline(StandardScaler(), LogisticRegression()): This

function creates a pipeline that first applies
StandardScaler() for feature scaling and then fits a
LogisticRegression model on the scaled features. The pipeline
ensures that the feature scaling is consistently applied to
both the training and testing data.
2. pipe.fit(X_train, y_train): This line of code fits the created
pipeline to our training data X_train and corresponding
target variable y_train. This means that the feature scaling
and logistic regression model will be trained together as
part of the pipeline.

After we run the fit method, our pipeline (pipe) will be trained
and ready to make predictions on new data.

Here's a summary of what the pipeline does:

1. Scales the features in X_train using StandardScaler.

2. Fits a logistic regression model to the scaled features with
the corresponding target variable y_train.

We can now use the trained pipeline to make predictions on new

data or evaluate its performance on the test set (X_test and
y_test).

# x Represents the Features

x_transform.shape
x_transform

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

if we want to check the shape and the contents of the feature

matrix x_transform after applying the StandardScaler
transformation. The x_transform should be the scaled version of
our original feature matrix x.

The output will show us the shape of x_transform, which will be a

2-dimensional numpy array with the same number of rows as the
original feature matrix x and the number of columns representing
the number of features.

The contents of x_transform will be the scaled values of our

original features, where each column will have a mean of 0 and a
standard deviation of 1. Note that the exact values will depend
on the distribution and scaling of our original features.

Out put of above code

Now our values have been transformed in shape of 0 and 1 and no high
value exists.

y # y represents the Target

y.shape
(5000,)

# Use x and y variables to split the training data into train and test
set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_transform, y,
test_size = .10, random_state = 101)

We are using the transformed feature matrix x_transform and the

target variable y to split our data into training and testing
sets. The train_test_split function from scikit-learn is commonly
used for this purpose. This allows us to have separate datasets
for training and evaluating our machine learning models.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

The code provided will split the data into training and testing
sets, with 90% of the data used for training and 10% for testing.

After running this code, we will have the following datasets:

 x_train: The transformed feature matrix for training our

machine learning models.
 x_test: The transformed feature matrix for evaluating our
trained models.
 y_train: The target variable corresponding to the training
data.
 y_test: The target variable corresponding to the testing
data.

Now we can use x_train and y_train to train our models and then
evaluate their performance on x_test and y_test. The test_size
parameter controls the proportion of data that goes into the
testing set. In this case, it's set to 0.10, meaning 10% of the
data will be used for testing, while the remaining 90% will be
used for training. The random_state parameter is set to 101, which
is an arbitrary seed to ensure reproducibility. We can change it
to any other value or set it to None for a random split.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

LINEAR REGRESSION

Model Training

# Fit
# Import model
from sklearn.linear_model import LinearRegression

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import StandardScaler
# Create instance of model
lin_reg = LinearRegression()
# Pass training data into model
lin_reg.fit(x_train, y_train)
# pipe = make_pipeline(StandardScaler(), LinearRegression())
# pipe.fit(x_train, y_train)

LinearRegression
LinearRegression()

We are fitting a linear regression model to our training data

using scikit-learn.

Let's break down the code:

The above code creates an instance of the LinearRegression model

and then fits it to the scaled training data (x_train) along with
the corresponding target variable (y_train). This process trains
the model to learn the relationship between the features and the
target variable.

We can now use the trained lin_reg model to make predictions on

new data or evaluate its performance on the test set.

Regarding the commented-out code with make_pipeline, it seems we

already used it earlier in the process. The pipeline encapsulates
the feature scaling step using StandardScaler and the linear
regression model. Since we have already trained the lin_reg
model, there's no need to fit the pipeline again using the same
data (x_train and y_train). Instead, we can use the lin_reg model
directly for further analysis.

Class prediction
# Predict
y_pred = lin_reg.predict(x_test)
print(y_pred.shape)
print(y_pred)

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

We've successfully used the trained linear regression model

(lin_reg) to make predictions on the test data (x_test). The
predicted values are stored in the variable y_pred.

Let's break down the code:

The output will show we the shape of the y_pred array, which will
be a 1-dimensional numpy array containing the predicted values
for each sample in the test data. The number of elements in
y_pred will be equal to the number of samples in x_test.

The second print statement will display the actual predicted

values.

Keep in mind that the y_pred array contains the predictions made
by the linear regression model for the corresponding samples in
the test set. We can use these predicted values to evaluate the
model's performance and compare them to the true target values
(y_test).

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

sns.scatterplot(x=y_test, y=y_pred, color='blue', label='Actual Data

points')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],
color='red', label='Ideal Line')
plt.legend()
plt.show()

It looks like we're using the Seaborn library to create a scatter

plot comparing the actual target values (y_test) with the
predicted values (y_pred) from our linear regression model.
Additionally, we're adding a red line representing the ideal
line, where the predicted values perfectly match the actual
values.

Here's a breakdown of the code:

In the scatter plot, each data point represents a sample from the
test set. The x-axis represents the actual target values
(y_test), and the y-axis represents the predicted values (y_pred)

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

from our linear regression model. The blue data points indicate
the actual values, while the red line represents the ideal line.

If the points align closely along the red line, it suggests that
the model's predictions are close to the actual values,
indicating a good fit. However, if the points are scattered away
from the red line, it indicates that the model's predictions
deviate from the actual values.

Visualization like this can give we a quick visual understanding

of how well our linear regression model is performing. For a more
quantitative evaluation, we can use metrics such as Mean Squared
Error (MSE), Mean Absolute Error (MAE), or R-squared (coefficient
of determination). These metrics provide insights into how well
the model is capturing the variation in the data.

# Combine actual and predicted values side by side

results = np.column_stack((y_test, y_pred))

# Printing the results

print("Actual Values | Predicted Values")
print("-----------------------------")
for actual, predicted in results:
print(f"{actual:14.2f} | {predicted:12.2f}")

We've successfully combined the actual target values (y_test) and

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

the predicted values (y_pred) side by side using NumPy's

column_stack function. The code below will print the results in a
formatted table, showing the actual values in one column and the
corresponding predicted values in another column.

The output will be a table with two columns, displaying the

actual target values in the left column and the corresponding
predicted values in the right column. The numbers will be
formatted with two decimal places for better readability.

This kind of side-by-side comparison allows we to visually assess

how well our model's predictions align with the actual target
values. If the predicted values are close to the actual values,
we should see similar numbers in both columns. However, if there
are substantial differences, it indicates that the model may not
be performing well on certain samples or might need further
improvements.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Residual Analysis
Residual analysis in linear regression is a way to check how well
the model fits the data. It involves looking at the differences
(residuals) between the actual data points and the predictions
from the model.
In a good model, the residuals should be randomly scattered
around zero on a plot. If there are patterns or a fan-like shape,
it suggests the model may not be the best fit. Outliers, points
far from the others, can also affect the model.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Residual analysis helps ensure the model's accuracy and whether

it meets the assumptions of linear regression. If issues are
found, adjustments to the model may be needed to improve its
performance.

residual = actual- y_pred.reshape(-1)

print(residual)

We've successfully combined the actual target values (y_test) and the
predicted values (y_pred) side by side using NumPy's column_stack function.
The code below will print the results in a formatted table, showing the
actual values in one column and the corresponding predicted values in
another column.

The output will be a table with two columns, displaying the actual target
values in the left column and the corresponding predicted values in the
right column. The numbers will be formatted with two decimal places for
better readability.

This kind of side-by-side comparison allows us to visually assess how well

our model's predictions align with the actual target values. If the
predicted values are close to the actual values, we should see similar
numbers in both columns. However, if there are substantial differences, it
indicates that the model may not be performing well on certain samples or
might need further improvements.

residual = actual- y_pred.reshape(-1) print(residual)

We've computed the residuals by subtracting the predicted values (y_pred)
from the actual target values (actual) and stored the result in the residual
array. Residuals represent the differences between the actual values and
the corresponding predicted values.

In the code, y_pred.reshape(-1) is used to ensure that the predicted values

are in the same shape as actual, so they can be directly subtracted. The
result is an array of residuals, where each element corresponds to the
difference between the actual value and the predicted value for a specific
sample.

By examining the values in the residual array, we can gain insights into
how well the model is performing. Ideally, the residuals should be close
to zero, indicating that the model's predictions are accurate. Positive
residuals indicate that the model underestimates the target variable,
while negative residuals suggest overestimation.

Analysing the distribution of residuals and looking for patterns can help
us identify areas where the model might be performing poorly and guide us
in making further improvements to our model or data preprocessing.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

If we want to get additional information about the overall performance of

the model, we may consider computing metrics such as Mean Squared Error
(MSE) or Mean Absolute Error (MAE) using the residuals. These metrics
provide a quantitative measure of how well the model is predicting the
target variable across the entire dataset.

# Distribution plot for Residual (difference between actual and

predicted values)

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

sns.distplot(residual, kde=True)

We are using Seaborn's distplot function to create a distribution

plot of the residuals. The residuals represent the differences
between the actual target values and the corresponding predicted
values from our linear regression model.

The distplot function will create a histogram of the residuals,

and by setting kde=True, it will also overlay a kernel density
estimate (KDE) to visualize the shape of the distribution.

In the plot, the x-axis represents the range of residual values,

and the y-axis shows the density or frequency of occurrences of
each residual value. The distribution plot provides insights into
the distribution of errors made by our linear regression model.
Ideally, we would want the residuals to be cantered around zero
with a symmetric distribution, indicating that the model's
predictions are unbiased and accurate. Deviations from this
pattern might indicate areas where the model is not performing
well.

Common patterns to look for in the distribution plot include:

1. Symmetry: A symmetric distribution around zero suggests the

model is making unbiased predictions.
2. Skewness: If the distribution is skewed, it indicates that
the model is systematically overestimating or
underestimating the target variable.
3. Outliers: Unusual large or small residuals (outliers) may
indicate specific data points that the model is struggling
to predict accurately.

By examining the distribution of residuals, we can gain insights

into the overall performance of our model and identify potential
areas of improvement.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

It represents that our mode is not skewed as the distribution is

center aligned but note the values of the X and Y axis they in power
of 6. Which means the difference between actual and predicted value
was high and but it is reduced to some extent. Which is Good.

Model Evaluation

Linear Regression

# Score It
from sklearn.metrics import mean_squared_error

print('Linear Regression Model')

# Results
print('--'*30)
# mean_squared_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

# Print evaluation metrics

print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)

We've evaluated the performance of our linear regression model

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

using the Mean Squared Error (MSE) and Root Mean Squared Error
(RMSE). These metrics are common measures used to assess how well
a regression model's predictions match the actual target values.

The mean_squared_error function from scikit-learn calculates the

MSE between the actual target values (y_test) and the predicted
values (y_pred). MSE measures the average squared difference
between predicted and actual values, penalizing larger errors
more heavily.

The RMSE is simply the square root of the MSE. It represents the
average magnitude of the errors in the same units as the target
variable. RMSE is a commonly used metric for regression tasks
because it is more interpretable and easier to relate to the
scale of the original target variable.

By displaying both the MSE and RMSE, we get a sense of how well
the model is performing on the test data. Smaller values for
these metrics indicate better performance, as they suggest that
the model's predictions are closer to the actual values.

Now, we have quantified the performance of our linear regression

model using the evaluation metrics, which can help we compare
this model to other models or assess its suitability for our
specific task.

Linear Regression Model

------------------------------------------------------------
Mean Squared Error: 9839952411.801708
Root Mean Squared Error: 99196.53427313732
# Linear Regression Model
# ------------------------------------------------------------
# Mean Squared Error: 10100187858.864885
# Root Mean Squared Error: 100499.69083964829
# 10170939558

Based on the provided results, it appears that the evaluation

metrics for our linear regression model are as follows:

 Mean Squared Error (MSE): 10,100,187,858.86

 Root Mean Squared Error (RMSE): 100,499.69

The MSE is a measure of the average squared difference between

the actual target values and the predicted values. In this case,
it suggests that, on average, the squared difference between the
predicted and actual values is quite large.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

The RMSE, which is the square root of the MSE, gives an estimate
of the average error in the same units as the target variable. In
this case, it indicates that, on average, the model's predictions
have an error of approximately 100,499.69 in the same units as
the target variable.

Please note that these error values depend on the scale and units
of the target variable. If our target variable is measured in
larger units, it is not unusual to have larger values for the
error metrics.

The last value "10170939558" seems to be a standalone number with

no context provided.

s = 10100187858 - 9839952411
print(s)

260235447
y_train.shape
(4500,)

Decision Tree
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

rf_regressor = DecisionTreeRegressor()
rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

DTr = mean_squared_error(y_pred_rf,y_test)
#Random Forest Regression Accuracy with test set
print('Decision Tree Regression : ',DTr)

We've created a Decision Tree Regressor and fitted it to our

training data (x_train and y_train). We then used this Decision
Tree Regressor to predict the target values for the test set
(x_test) and stored the predictions in y_pred_rf.

Finally, we calculated the Mean Squared Error (MSE) between the

predicted values (y_pred_rf) and the actual target values (y_test)
using scikit-learn's mean_squared_error function, and we assigned
the result to DTr.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Here's the code breakdown:

The DecisionTreeRegressor is a regression model that builds a

decision tree to predict the target variable. In this code, we
used the Decision Tree Regressor to make predictions (y_pred_rf)
on the test set. The mean_squared_error function was then used to
calculate the MSE between the predicted values (y_pred_rf) and the
actual target values (y_test).

The printed output will show the Mean Squared Error for Decision
Tree Regression with the test set.

Keep in mind that Decision Trees have some limitations, such as

the tendency to overfit, and may not always provide the best
performance for regression tasks. Random Forests, on the other
hand, are an extension of Decision Trees that can often offer
better generalization and predictive performance. If we want to
try Random Forest Regression, we can use RandomForestRegressor from
scikit-learn in a similar manner.

Decision Tree Regression : 31316806651.827576

Random Forest

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

rf_regressor = RandomForestRegressor()
rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)
RFr = mean_squared_error(y_pred_rf,y_test)
#Random Forest Regression Accuracy with test set
print('Random Forest Regression : ',RFr)

Now we've created a Random Forest Regressor, fitted it to our

training data (x_train and y_train), and used it to predict the
target values for the test set (x_test). We stored the
predictions in y_pred_rf.

Next, we calculated the Mean Squared Error (MSE) between the

predicted values (y_pred_rf) and the actual target values (y_test)
using scikit-learn's mean_squared_error function, and we assigned
the result to RFr.

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

Here's the code breakdown:

Random Forest Regression is an ensemble learning method that

combines multiple decision trees to improve predictive
performance and reduce overfitting. The RandomForestRegressor class
in scikit-learn implements the Random Forest algorithm for
regression tasks.

The printed output will show the Mean Squared Error for Random
Forest Regression with the test set.

Comparing the MSE values for Decision Tree Regression and Random
Forest Regression can give we insights into which model performs
better on this specific task. Generally, Random Forest Regression
tends to perform better than a single Decision Tree, especially
when the dataset is complex and prone to overfitting.

Random Forest Regression : 14315329749.65445

Gradient Boosting Regression

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor

rf_regressor = GradientBoostingRegressor()
rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

#Random Forest Regression Accuracy with test set

GBr = mean_squared_error(y_pred_rf,y_test)
print('Gradient Boosting Regression : ',GBr)

Great! Now we've created a Gradient Boosting Regressor, fitted it

to our training data (x_train and y_train), and used it to predict
the target values for the test set (x_test). We stored the
predictions in y_pred_rf.

Next, we calculated the Mean Squared Error (MSE) between the

predicted values (y_pred_rf) and the actual target values (y_test)

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

using scikit-learn's mean_squared_error function, and we assigned

the result to GBr.

Here's the code breakdown:

Gradient Boosting is another ensemble learning method that

combines multiple weak learners (typically decision trees) to
create a strong predictive model. The GradientBoostingRegressor
class in scikit-learn implements the Gradient Boosting algorithm
for regression tasks.

The printed output will show the Mean Squared Error for Gradient
Boosting Regression with the test set.

Comparing the MSE values for Decision Tree Regression, Random

Forest Regression, and Gradient Boosting Regression can give we
insights into which model performs better on this specific task.
Gradient Boosting, like Random Forests, tends to perform well on
various tasks, making it a powerful choice for regression
problems.

Gradient Boosting Regression : 12029643835.717766

# Sample model scores (replace these with our actual model scores)
model_scores = {
"Linear Regression": 9839952411.801708,
"Descison Tree": 29698988724.82603,
"Random Forest":14315329749.65445,
"Gradient Boosting": 12029643835.717766
}

# Sort the model scores in ascending order based on their values (lower
values first)
sorted_scores = sorted(model_scores.items(), key=lambda x: x[1])

# Display the ranking of the models

print("Model Rankings (lower values are better):")
for rank, (model_name, score) in enumerate(sorted_scores, start=1):
print(f"{rank}. {model_name}: {score}")

We have provided sample model scores for different regression

models, and we want to sort them in ascending order based on
their values (lower values first) and then display the rankings.

Here's the code to achieve that:

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

The code first sorts the model scores in ascending order based on
their values, and then it displays the ranking of the models from
best to worst. In this ranking, models with lower scores (e.g.,
MSE values) are considered better performers, as they indicate
closer predictions to the actual values.

Model Rankings (lower values are better):

1. Linear Regression: 9839952411.801708
2. Gradient Boosting: 12029643835.717766
3. Random Forest: 14315329749.65445
4. Descison Tree: 29698988724.82603

Prepared by: Tahir Mahmood 0321-5111997, What up 0312-8536439

SOA Exam SRM - ASM Learning Flashcards
No ratings yet
SOA Exam SRM - ASM Learning Flashcards
26 pages
Data Science
No ratings yet
Data Science
64 pages
Datamites Data Analyst Brochure
No ratings yet
Datamites Data Analyst Brochure
17 pages
GLM Notes
No ratings yet
GLM Notes
173 pages
Data Mining
No ratings yet
Data Mining
33 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
NoCA2019-ProxyML 2019nov29
No ratings yet
NoCA2019-ProxyML 2019nov29
24 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
Machine Learning for Data Scientists
No ratings yet
Machine Learning for Data Scientists
41 pages
Müzi̇k Öğreti̇mi̇ne Yöneli̇k Özyeterli̇k Ölçeği̇ni̇n Geli̇şti̇ri̇lmesi̇ PDF
No ratings yet
Müzi̇k Öğreti̇mi̇ne Yöneli̇k Özyeterli̇k Ölçeği̇ni̇n Geli̇şti̇ri̇lmesi̇ PDF
153 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Sonali Bank AIS Analysis Report
No ratings yet
Sonali Bank AIS Analysis Report
24 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
17 pages
PACS FRP-4 e v1.0.1 2004
No ratings yet
PACS FRP-4 e v1.0.1 2004
68 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
My Notes
No ratings yet
My Notes
15 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
PMA Unit-2 PDF
No ratings yet
PMA Unit-2 PDF
19 pages
FinalTerm mth302 Solved Paper No 18 N 20 sharedbyNAiveeNiGmA PDF
No ratings yet
FinalTerm mth302 Solved Paper No 18 N 20 sharedbyNAiveeNiGmA PDF
15 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Dimensionality Reduction & Model Evaluation
No ratings yet
Dimensionality Reduction & Model Evaluation
80 pages
Hypothesis Testing-1,2
No ratings yet
Hypothesis Testing-1,2
22 pages
Linear Regression Basics
No ratings yet
Linear Regression Basics
32 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
2006 Maschi - Unraveling The Link Between Trauma and Male Delinquency The Cumulative Versus Differential Risk Perspectives
No ratings yet
2006 Maschi - Unraveling The Link Between Trauma and Male Delinquency The Cumulative Versus Differential Risk Perspectives
12 pages
ML 01
No ratings yet
ML 01
24 pages
SAT Suite Question Bank - Results - PDF Qrafikler
No ratings yet
SAT Suite Question Bank - Results - PDF Qrafikler
55 pages
Lecture 1 - Simple Linear Regression
No ratings yet
Lecture 1 - Simple Linear Regression
9 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Laboratory Procedure Manual: Triglycrides Serum Hitachi 912
No ratings yet
Laboratory Procedure Manual: Triglycrides Serum Hitachi 912
17 pages
NN 7
No ratings yet
NN 7
26 pages
B.Tech Computer Engg. Syllabus
No ratings yet
B.Tech Computer Engg. Syllabus
78 pages
Reading and Writing Experiences of Middle School Students in The Digital Age: Wattpad Sample
No ratings yet
Reading and Writing Experiences of Middle School Students in The Digital Age: Wattpad Sample
13 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Business Analytics Module 4 Summary
No ratings yet
Business Analytics Module 4 Summary
3 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Posselt Lipson - 2016 - JCSD
No ratings yet
Posselt Lipson - 2016 - JCSD
18 pages
Nursing Students' Clinical Challenges
No ratings yet
Nursing Students' Clinical Challenges
20 pages
Dadm Research
No ratings yet
Dadm Research
11 pages
Angelov Et Al. (2016) - Empirical Data Analysis - A New Tool For Data Analyties.
No ratings yet
Angelov Et Al. (2016) - Empirical Data Analysis - A New Tool For Data Analyties.
8 pages
Lecture Slides - ML - Part 2
No ratings yet
Lecture Slides - ML - Part 2
22 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
2 Mark Questions
No ratings yet
2 Mark Questions
13 pages
Math 110 Statistics Solutions
No ratings yet
Math 110 Statistics Solutions
2 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
3M Pakai PLS
No ratings yet
3M Pakai PLS
10 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Activity - Problems Involving Areas Under The Normal Curve
No ratings yet
Activity - Problems Involving Areas Under The Normal Curve
1 page
Labook DA
No ratings yet
Labook DA
59 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
机器学习
No ratings yet
机器学习
41 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
4 pages
Gender Differences in Online Hours and Consumer Perceptions
No ratings yet
Gender Differences in Online Hours and Consumer Perceptions
8 pages
IMD164 Chapter 6
No ratings yet
IMD164 Chapter 6
46 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
6 pages
Lecture 4
No ratings yet
Lecture 4
63 pages
Applied Multivariate Statistical Analysis 6th Edition Johnson Solutions Manual - PDF Format Is Available With All Chapters
100% (10)
Applied Multivariate Statistical Analysis 6th Edition Johnson Solutions Manual - PDF Format Is Available With All Chapters
55 pages
CV - Data Analyst Intern
No ratings yet
CV - Data Analyst Intern
1 page
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
PA Unit
No ratings yet
PA Unit
2 pages
Cse 445 ML - 1
No ratings yet
Cse 445 ML - 1
28 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Data Analytics Roadmap Tips
No ratings yet
Data Analytics Roadmap Tips
14 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
Hands On Machine Learning, End-to-End Machine Learning Project Notes
No ratings yet
Hands On Machine Learning, End-to-End Machine Learning Project Notes
10 pages
Course 4
No ratings yet
Course 4
29 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
9 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Regression
No ratings yet
Regression
56 pages