0% found this document useful (0 votes)

38 views11 pages

QB 1

Uploaded by

ksaikrishna5601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

QB 1

Uploaded by

ksaikrishna5601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Write a snippet to download data from di erent repositories

External source

import pandas as pd

url = "https://forstoringﬁles.000webhostapp.com/vault/uploads/Iris.csv"

housing = pd.read_csv(url)

housing.head()

github

import pandas as pd

# Load the dataset

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

housing.head()
Write a snippet to load and read the data

import pandas as pd

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

housing.head()

code for custom tranformers

Although Scikit-Learn provides many useful transformers, you will need to write your
own for tasks such as custom cleanup operations or combining specific attributes. You
will want your transformer to work seamlessly with Scikit-Learn func- tionalities (such
as pipelines), and since Scikit-Learn relies on duck typing (not inher- itance), all you
need is to create a class and implement three methods: fit() (returning self), transform(),
and fit_transform(). You can get the last one for free by simply adding TransformerMixin
as a base class. Also, if you add BaseEstima tor as a base class (and avoid *args and
**kargs in your constructor) you will get two extra methods (get_params() and
set_params()) that will be useful for auto- matic hyperparameter tuning. For example,
here is a small transformer class that adds the combined attributes we discussed
earlier

from sklearn.base import BaseEstimator, TransformerMixin

rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):

def init(self, add_bedrooms_per_room = True): # no *args or **kargs

self.add_bedrooms_per_room = add_bedrooms_per_room

def ﬁt(self, X, y=None):

return self # nothing else to do

def transform(self, X, y=None):

rooms_per_household = X[:, rooms_ix] / X[:, households_ix]

population_per_household = X[:, population_ix] / X[:, households_ix]

if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]

return np.c_[X, rooms_per_household, population_per_household,

bedrooms_per_room]

else:

return np.c_[X, rooms_per_household, population_per_household]

attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)

housing_extra_attribs = attr_adder.transform(housing.values)

In this example the transformer has one hyperparameter, add_bedrooms_per_room, set

to True by default (it is often helpful to provide sensible defaults). This hyperparameter
will allow you to easily ﬁnd out whether adding this attribute helps the Machine Learning
algorithms or not. More generally, you can add a hyperparameter to gate any data
preparation step that you are not 100% sure about. The more you automate these data
preparation steps, the more combinations you can automatically try out, making it
much more likely that you will ﬁnd a great combination (and sav- ing you a lot of time).

Code for transformer pipeline

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([

('imputer', SimpleImputer(strategy="median")),

('attribs_adder', CombinedAttributesAdder()),

('std_scaler', StandardScaler()),

])

housing_num_tr = num_pipeline.ﬁt_transform(housing_num)

The Pipeline constructor in Scikit-Learn takes a list of name/estimator pairs to deﬁne a

sequence of steps, where all but the last estimator must be transformers with a
fit_transform() method. When the pipeline's fit() method is called, it sequentially applies
fit_transform() on all transformers and then calls fit() on the final estimator. The pipeline
inherits the methods of the final estimator. To handle both categorical and numerical
columns within a single transformer, Scikit-Learn's ColumnTransformer can be used.
Introduced in version 0.20, ColumnTransformer works well with Pandas DataFrames to
apply appropriate transformations to each column of the dataset.

from sklearn.compose import ColumnTransformer

num_attribs = list(housing_num)

cat_attribs = ["ocean_proximity"]

full_pipeline = ColumnTransformer([

("num", num_pipeline, num_attribs),

("cat", OneHotEncoder(), cat_attribs),

])

housing_prepared = full_pipeline.ﬁt_transform(housing)

heres how to use the ColumnTransformer, ﬁrst import the ColumnTransformer class.
Then, get lists of numerical and categorical column names. Construct a
ColumnTransformer with a list of tuples, where each tuple contains a name, a
transformer, and a list of column names (or indices) that the transformer applies to. In
this example, numerical columns use a pre-deﬁned num_pipeline, and categorical
columns use a OneHotEncoder. Finally, apply the ColumnTransformer to the housing
data, which applies each transformer to the appropriate columns and concatenates the
outputs along the second axis, ensuring the transformers return the same number of
rows.
train and test data code

import pandas as pd

from sklearn.model_selection import train_test_split

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

train_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)

Explain performance measure ? explain rmse and mse

When building and evaluating machine learning models, especially regression models,
performance measurement is critical. It helps us understand how well the model
predicts continuous target variables (e.g., housing prices, sales ﬁgures).

Here are two popular performance measure

Mean square error :

Mean Square Error (MSE) is a common measure used to evaluate the accuracy of a model. It
measures the average of the squares of the errors, which are the differences between the
observed and predicted values. The formula for MSE is given by:

MSE=

Where:

 y is the number of observations.

 yi is the actual value of the iii-th observation.
 y^i is the predicted value of the iii-th observation.
 ∑ denotes the summation over all observations from i=1 to n.
Code snippet for building a model for linear regression ,decision tree,and random forest

Linear Regression

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.ﬁt(housing_prepared, housing_labels)

Decision tree

from sklearn.tree import DecisionTreeRegressor

tree_reg = DecisionTreeRegressor()

tree_reg.ﬁt(housing_prepared, housing_labels)

Random forest

from sklearn.ensemble import RandomForestRegressor

forest_reg = RandomForestRegressor()

forest_reg.fit(housing_prepared, housing_labels)
explain fine tunning of the model and give code snippet to find parameters (grid search)

Fine-tuning a Model

Fine-tuning involves adjusting a model's hyperparameters to optimize its performance for a

specific task. These hyperparameters are settings that control the model's behavior but aren't
directly learned from the data. Examples include the number of trees in a Random Forest or
the learning rate in a neural network. By tweaking these hyperparameters, you can
significantly improve a model's ability to fit your data and make accurate predictions.

Finding Best Parameters with Grid Search

Manually trying out different hyperparameter combinations can be tedious and time-
consuming. Grid search automates this process by systematically evaluating a predefined set
of hyperparameter values. Here's how it works:

1. Define the Hyperparameter Grid: You specify a dictionary (param_grid) where

each key represents a hyperparameter and the corresponding value is a list of values to
try. In the example, the grid explores different combinations of n_estimators
(number of trees) and max_features (number of features considered) for the Random
Forest model.
2. Create the Model: You define the machine learning model you want to fine-tune
(e.g., RandomForestRegressor).
3. Perform Grid Search: Scikit-Learn's GridSearchCV class is used to perform the grid
search. You provide the model, the hyperparameter grid (param_grid), the number of
folds for cross-validation (cv), a scoring metric (scoring), and optionally, a flag to
return training scores (return_train_score).
4. Fit the Grid Search: Call the fit method of grid_search on your training data
(housing_prepared) and target labels (housing_labels). This trains the model with
all defined hyperparameter combinations using cross-validation and evaluates their
performance based on the scoring metric.
5. Access Results: After fitting, you can access the best hyperparameter combination
using grid_search.best_params_. This provides a dictionary containing the
hyperparameter names and their corresponding best values identified by the grid
search.
6. Retrieve Best Model: The grid_search.best_estimator_ attribute stores the
model instance trained with the best hyperparameters found during the search.

Code Snippet:

Python
from sklearn.model_selection import GridSearchCV

# Define the hyperparameter grid (refer to the explanation above)

param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3,
4]},
]

# Create a RandomForestRegressor model (refer to the explanation above)

forest_reg = RandomForestRegressor()

# Create a GridSearchCV object (refer to the explanation above)

grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='neg_mean_squared_error', return_train_score=True)

# Fit the grid search to the training data

grid_search.fit(housing_prepared, housing_labels)

# Access the best hyperparameters (refer to the explanation above)

print(grid_search.best_params_)

# Access the best model (refer to the explanation above)

print(grid_search.best_estimator_)
data visualization and gain insights code snippet

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from pandas.plotting import scatter_matrix

# Load the dataset

housing = pd.read_csv('housing.csv')

# Visualizing Geographical Data

plt.ﬁgure(ﬁgsize=(10,7))

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,

s=housing["population"]/100, label="population",

c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)

plt.legend()

plt.show()

# Calculate Correlation Matrix

corr_matrix = housing.corr()

# Display Correlation Matrix as a Heatmap

plt.ﬁgure(ﬁgsize=(12, 8))

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", linewidths=0.5)

plt.title('Correlation Matrix')

plt.show()

# Visualize Correlations with Scatter Matrix

attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]

scatter_matrix(housing[attributes], ﬁgsize=(12, 8))

plt.show()

# Zoom in on the most promising correlation: median_income vs median_house_value

plt.ﬁgure(ﬁgsize=(10,7))

housing.plot(kind="scatter", x="median_income", y="median_house_value", alpha=0.1)

plt.title('Correlation between Median Income and Median House Value')

plt.show()

S14 Zenki ECU Pinout Guide
No ratings yet
S14 Zenki ECU Pinout Guide
1 page
Top Datasets for Data Science
100% (1)
Top Datasets for Data Science
9 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
End To End Machine Learning Project-2
No ratings yet
End To End Machine Learning Project-2
10 pages
Neural Network Housing Price Prediction
No ratings yet
Neural Network Housing Price Prediction
30 pages
Lab 14 Questions
No ratings yet
Lab 14 Questions
4 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Machine Learning Data Prep Guide
No ratings yet
Machine Learning Data Prep Guide
17 pages
ML Record
No ratings yet
ML Record
19 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Machine Learning Problem-Solving Steps: 1. Look at The Big Picture
No ratings yet
Machine Learning Problem-Solving Steps: 1. Look at The Big Picture
41 pages
CWH Sklearn Merged
No ratings yet
CWH Sklearn Merged
74 pages
Ai ML
No ratings yet
Ai ML
2 pages
Random Forest Hyperparameter Tuning Guide
No ratings yet
Random Forest Hyperparameter Tuning Guide
5 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
ML Manual
No ratings yet
ML Manual
24 pages
ML Cheat Sheet
No ratings yet
ML Cheat Sheet
7 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Advanced Feature Engineering and Data Preprocessing in Machine Learning
No ratings yet
Advanced Feature Engineering and Data Preprocessing in Machine Learning
7 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
ML Record
No ratings yet
ML Record
21 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
Report
No ratings yet
Report
40 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
Machinelearning
No ratings yet
Machinelearning
26 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Grid Search CV
No ratings yet
Grid Search CV
5 pages
0 PDF
No ratings yet
0 PDF
9 pages
ML Manual
No ratings yet
ML Manual
29 pages
Lecture20 TuningHyperparametersAndPipelines
No ratings yet
Lecture20 TuningHyperparametersAndPipelines
9 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Assignmnet
No ratings yet
Assignmnet
25 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
ML Functions
No ratings yet
ML Functions
12 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Exercise - First Machine Learning Model
No ratings yet
Exercise - First Machine Learning Model
2 pages
DE - Python For Data Science - Machine Learning
No ratings yet
DE - Python For Data Science - Machine Learning
45 pages
Module 2
No ratings yet
Module 2
35 pages
AIMLlatestmodule 2notes Removed
No ratings yet
AIMLlatestmodule 2notes Removed
33 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Train
No ratings yet
Train
17 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
Strong Swan Documentation (Updated Till Eap-Md5)
No ratings yet
Strong Swan Documentation (Updated Till Eap-Md5)
58 pages
Blackbody Radiation Planck's Law
No ratings yet
Blackbody Radiation Planck's Law
4 pages
Pavement Engineering Solutions
No ratings yet
Pavement Engineering Solutions
1 page
Skewb Puzzle Solving Guide
No ratings yet
Skewb Puzzle Solving Guide
12 pages
CC218 Lec1 DiscreteMath Logic of Compound Stat
No ratings yet
CC218 Lec1 DiscreteMath Logic of Compound Stat
7 pages
CVVT (Continuously Variable Valve Timing) System: Description
No ratings yet
CVVT (Continuously Variable Valve Timing) System: Description
3 pages
Diffusion of Solids in Liquids
No ratings yet
Diffusion of Solids in Liquids
8 pages
Outokumpu Stainless Steel Bar Sizes and Specifications
No ratings yet
Outokumpu Stainless Steel Bar Sizes and Specifications
2 pages
Fluid Statics and Fluid Dynamics General Physics 1
No ratings yet
Fluid Statics and Fluid Dynamics General Physics 1
41 pages
Proteus CT1628 Electrical Simulation
No ratings yet
Proteus CT1628 Electrical Simulation
4 pages
Astm A278 A278m
No ratings yet
Astm A278 A278m
4 pages
Transformers Noise Questions and Answers - Sanfoundry
No ratings yet
Transformers Noise Questions and Answers - Sanfoundry
9 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
Assignment 1spring25
No ratings yet
Assignment 1spring25
3 pages
Danyal Education: Tanjong Katong Girls' I
No ratings yet
Danyal Education: Tanjong Katong Girls' I
20 pages
2 T24Updates
No ratings yet
2 T24Updates
24 pages
Gr-7 Term 1 & 2 Annual Planner 2024-25
No ratings yet
Gr-7 Term 1 & 2 Annual Planner 2024-25
11 pages
Solving Recurrences in Discrete Math
No ratings yet
Solving Recurrences in Discrete Math
8 pages
Grade 8 Informal Activities For Algebraic Expressions Teacher Guide
No ratings yet
Grade 8 Informal Activities For Algebraic Expressions Teacher Guide
31 pages
B10 AutoCAD 201222
No ratings yet
B10 AutoCAD 201222
2 pages
Phy Pract Mock
No ratings yet
Phy Pract Mock
9 pages
NIC Asia Bank Limited
No ratings yet
NIC Asia Bank Limited
49 pages
ET - W2021 (2131905) (GTURanker - Com)
No ratings yet
ET - W2021 (2131905) (GTURanker - Com)
2 pages
Object Oriented Programming Assignment
No ratings yet
Object Oriented Programming Assignment
24 pages
Kathrein 80010430 PDF
No ratings yet
Kathrein 80010430 PDF
1 page
Evo Series
No ratings yet
Evo Series
2 pages
Fluid Mechanics Exam Guide
No ratings yet
Fluid Mechanics Exam Guide
8 pages
Wave Properties of Light
No ratings yet
Wave Properties of Light
36 pages

QB 1

Uploaded by

QB 1

Uploaded by

Write a snippet to download data from di erent repositories

# Load the dataset

code for custom tranformers

from sklearn.base import BaseEstimator, TransformerMixin

rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):

def __init__(self, add_bedrooms_per_room = True): # no *args or **kargs

def ﬁt(self, X, y=None):

return self # nothing else to do

def transform(self, X, y=None):

rooms_per_household = X[:, rooms_ix] / X[:, households_ix]

population_per_household = X[:, population_ix] / X[:, households_ix]

return np.c_[X, rooms_per_household, population_per_household,

return np.c_[X, rooms_per_household, population_per_household]

In this example the transformer has one hyperparameter, add_bedrooms_per_room, set

Code for transformer pipeline

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

The Pipeline constructor in Scikit-Learn takes a list of name/estimator pairs to deﬁne a

from sklearn.compose import ColumnTransformer

("num", num_pipeline, num_attribs),

("cat", OneHotEncoder(), cat_attribs),

from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)

Here are two popular performance measure

Mean square error :

 y is the number of observations.

from sklearn.linear_model import LinearRegression

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

Fine-tuning involves adjusting a model's hyperparameters to optimize its performance for a

Finding Best Parameters with Grid Search

1. Define the Hyperparameter Grid: You specify a dictionary (param_grid) where

# Define the hyperparameter grid (refer to the explanation above)

# Create a RandomForestRegressor model (refer to the explanation above)

# Create a GridSearchCV object (refer to the explanation above)

# Fit the grid search to the training data

# Access the best hyperparameters (refer to the explanation above)

# Access the best model (refer to the explanation above)

import matplotlib.pyplot as plt

import seaborn as sns

from pandas.plotting import scatter_matrix

# Load the dataset

# Visualizing Geographical Data

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,

c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)

# Calculate Correlation Matrix

# Display Correlation Matrix as a Heatmap

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", linewidths=0.5)

# Visualize Correlations with Scatter Matrix

scatter_matrix(housing[attributes], ﬁgsize=(12, 8))

# Zoom in on the most promising correlation: median_income vs median_house_value

housing.plot(kind="scatter", x="median_income", y="median_house_value", alpha=0.1)

plt.title('Correlation between Median Income and Median House Value')

You might also like

def init(self, add_bedrooms_per_room = True): # no *args or **kargs