0% found this document useful (0 votes)

38 views13 pages

Train-Test Split

The document explains the concept of train-test split in machine learning, where data is divided into a training set for model training and a testing set for model evaluation. It emphasizes the importance of representative sampling and reproducibility through the random_state parameter. An example using the iris dataset illustrates how to implement the train-test split using Python's sklearn library.

Uploaded by

rksdocument123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views13 pages

Train-Test Split

Uploaded by

rksdocument123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Train-test split

in ML

Ashima Tyagi
Assistant Professor
School of Computer Science & Engineering
2 Outline

 Train-test split
 Working example

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

3 Two Splitting: Train-Test Split
 A train test split is when you split your data into a training set and
a testing set.
 The training set is used for training the model, and the testing set
is used to test your model.
 This allows you to train your models on the training set, and then
test their accuracy on the unseen testing set.
 For example 80% for training and 20% for testing. This ensures that
both sets are representative of the entire dataset, and gives you
a good way to measure the accuracy of your models.

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

4 Here's how the train-test split works:

1. Splitting the Data: The dataset is divided into two subsets: the training set and the test
set. The training set is used to train the model, while the test set is used to evaluate its
performance.

2. Training the Model: The model is trained on the training set using a machine learning
algorithm. The model learns patterns and relationships in the data to make predictions.

3. Evaluating the Model: Once the model is trained, it is evaluated on the test set. This
provides an estimate of how well the model will perform on new, unseen data.

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

7 Syntax of Train Test Split

Before continuing, please note that in order to use this feature, you
must first import it.
from sklearn.model_selection import train_test_split

After importing the function as above, call it as train_test_split() .

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

9 If the train-test split is: 0.2 then,

Split the data set into two pieces — a training set and a testing set.
This consists of random sampling without replacement about 80
percent of the rows (you can vary this) and putting them into your
training set. The remaining 20 percent is put into your test set. Note
that the colors in “Features” and “Target” indicate where their data
will go (“X_train,” “X_test,” “y_train,” “y_test”) for a particular train
test split.

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

10 Random State: The random_state is a pseudo-random number parameter

that allows you to reproduce the same train test split each time you run
the code.

The image above shows that if you select a different value for
random_state, different information would go to “X_train,” “X_test,”
“y_train” and “y_test”.
Prepared by: Ashima Tyagi (Asst. Prof. SCSE)
Train-test split

11 Which random number to choose?

In machine learning, the choice of the random number to use for
the random_state parameter is arbitrary. You can use any non-
negative integer value, and the specific value you choose does not
matter as long as you use the same value consistently if you want to
reproduce the same random splits.

For example, you could use random_state=0, random_state=42, or

any other integer value. The important thing is to use the same
value consistently if you want to ensure that your results are
reproducible.

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Train-test split

12 Example
Let's consider a dataset of iris flowers with features such as sepal length, sepal
width, petal length, and petal width. We want to predict the species of the iris
flower based on these features.

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris

# Load the iris dataset

iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 'X_train' and 'y_train' are used to train the model
# 'X_test' and 'y_test' are used to evaluate the model's performance

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Thank You

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Text Book Introduction To Computer Programming CIT-113
No ratings yet
Text Book Introduction To Computer Programming CIT-113
355 pages
Java Programming Solved MCQs (Set-4)
No ratings yet
Java Programming Solved MCQs (Set-4)
5 pages
Xii Ai Capstone Project
No ratings yet
Xii Ai Capstone Project
35 pages
Python ML Lab for Beginners
No ratings yet
Python ML Lab for Beginners
10 pages
Lab 2 Train - Test Split
No ratings yet
Lab 2 Train - Test Split
11 pages
Unit 7 Deterministic Models
No ratings yet
Unit 7 Deterministic Models
71 pages
Train-Test Split For Evaluating Machine Learning Algorithms
No ratings yet
Train-Test Split For Evaluating Machine Learning Algorithms
13 pages
Ajp Practical 20
100% (1)
Ajp Practical 20
4 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
31 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
DL Practical 1 Train - Test - Split
No ratings yet
DL Practical 1 Train - Test - Split
5 pages
Algorithms:: Inserting at Beginning of The List
No ratings yet
Algorithms:: Inserting at Beginning of The List
4 pages
Train and Test Datasets in Machine Learning
No ratings yet
Train and Test Datasets in Machine Learning
26 pages
Train and Test Datasets in Machine Learning
No ratings yet
Train and Test Datasets in Machine Learning
6 pages
Data Splitting-Training Material
No ratings yet
Data Splitting-Training Material
42 pages
Week-6 Linear Regression
No ratings yet
Week-6 Linear Regression
16 pages
Intro To ML
No ratings yet
Intro To ML
29 pages
ML Remaining Jds
No ratings yet
ML Remaining Jds
35 pages
CSC407 - Chapter 5-6
No ratings yet
CSC407 - Chapter 5-6
42 pages
11-AI ML Intro 2022
No ratings yet
11-AI ML Intro 2022
54 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
Machine Learning-Lecture 02
No ratings yet
Machine Learning-Lecture 02
28 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
Aula4 Myself
No ratings yet
Aula4 Myself
105 pages
ML 6
No ratings yet
ML 6
15 pages
Training Day 23
No ratings yet
Training Day 23
14 pages
Scikit Learn
No ratings yet
Scikit Learn
107 pages
Week 5
No ratings yet
Week 5
18 pages
Unit 1
No ratings yet
Unit 1
28 pages
C++ Inheritance Module Guide
No ratings yet
C++ Inheritance Module Guide
9 pages
Moule 3
No ratings yet
Moule 3
25 pages
Unit 3
No ratings yet
Unit 3
37 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
Train Test Split in Machine Learning
No ratings yet
Train Test Split in Machine Learning
1 page
Deep Learning Unit 3
No ratings yet
Deep Learning Unit 3
19 pages
Information Check 172124
No ratings yet
Information Check 172124
3 pages
5 DL
No ratings yet
5 DL
33 pages
IDML Presentation
No ratings yet
IDML Presentation
12 pages
Machine Learning With Scikit Learn Strata 2015
No ratings yet
Machine Learning With Scikit Learn Strata 2015
72 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Machine Learning & Python Basics
No ratings yet
Machine Learning & Python Basics
9 pages
DAA Unit-1
No ratings yet
DAA Unit-1
46 pages
Capstone Project
No ratings yet
Capstone Project
40 pages
ML Unit 2
No ratings yet
ML Unit 2
18 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
SPlit An Optimal Method For Data Splitting
No ratings yet
SPlit An Optimal Method For Data Splitting
36 pages
Importance of Random State in Sklearn
No ratings yet
Importance of Random State in Sklearn
1 page
Random State
No ratings yet
Random State
1 page
Random State
No ratings yet
Random State
1 page
Random State
No ratings yet
Random State
1 page
Data Splitting for Model Training
No ratings yet
Data Splitting for Model Training
9 pages
Lec 2
No ratings yet
Lec 2
13 pages
Deep Learning and Machine Learning: Lab Explanation
No ratings yet
Deep Learning and Machine Learning: Lab Explanation
34 pages
10-SLAM Presentation
No ratings yet
10-SLAM Presentation
62 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
Random State
No ratings yet
Random State
1 page
Python Train/Test & Cross Validation
No ratings yet
Python Train/Test & Cross Validation
11 pages
Unit 1 (Dsa)
No ratings yet
Unit 1 (Dsa)
51 pages
Game Remix Algorithm
No ratings yet
Game Remix Algorithm
33 pages
19EL013 Full Adder Using Data Flow and Gate Level
No ratings yet
19EL013 Full Adder Using Data Flow and Gate Level
9 pages
IML 8 - Grid Search and Cross Validation
No ratings yet
IML 8 - Grid Search and Cross Validation
22 pages
Getting Started With ML
No ratings yet
Getting Started With ML
1 page
Multi-Output Classification With Machine Learning
No ratings yet
Multi-Output Classification With Machine Learning
10 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Import Os
No ratings yet
Import Os
2 pages
Array in Java
No ratings yet
Array in Java
9 pages
MPLAB Harmony System Service Libraries Help
No ratings yet
MPLAB Harmony System Service Libraries Help
426 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
20 pages
AI Model Train Test QA
No ratings yet
AI Model Train Test QA
1 page
Module 1 TF1
No ratings yet
Module 1 TF1
54 pages
History of Java
No ratings yet
History of Java
2 pages
TS 2017 Solutions Guide
No ratings yet
TS 2017 Solutions Guide
67 pages
Meta-Learning How To Forecast Time Series
No ratings yet
Meta-Learning How To Forecast Time Series
38 pages
Oops Unit 3
No ratings yet
Oops Unit 3
31 pages
Integer Operations
No ratings yet
Integer Operations
4 pages
Lab Report
No ratings yet
Lab Report
43 pages
5 Marks Ai
No ratings yet
5 Marks Ai
5 pages
Unit 1 Introduction To OOP
No ratings yet
Unit 1 Introduction To OOP
23 pages
Classes Constructors
No ratings yet
Classes Constructors
20 pages
Comprehensive Review On Lossy and Lossless Compression Techniques
No ratings yet
Comprehensive Review On Lossy and Lossless Compression Techniques
10 pages
Mtcs Question Paper - I
No ratings yet
Mtcs Question Paper - I
4 pages
Python Tour Revision Questions
No ratings yet
Python Tour Revision Questions
5 pages
Digital Logic and Circuits Simulations - Unit 1 - Week 1
No ratings yet
Digital Logic and Circuits Simulations - Unit 1 - Week 1
4 pages
Tutorial Problems
No ratings yet
Tutorial Problems
4 pages
Unpack SIB1 in cell_measurement.c
No ratings yet
Unpack SIB1 in cell_measurement.c
3 pages
Classroom Zombie Escape Guide
No ratings yet
Classroom Zombie Escape Guide
2 pages
Be - Computer Engineering - Semester 8 - 2023 - December - Iloc II Project Management Rev 2019 C Scheme
No ratings yet
Be - Computer Engineering - Semester 8 - 2023 - December - Iloc II Project Management Rev 2019 C Scheme
2 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages

Train-Test Split

Uploaded by

Train-Test Split

Uploaded by

Train-test split

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

4 Here's how the train-test split works:

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

7 Syntax of Train Test Split

After importing the function as above, call it as train_test_split() .

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

9 If the train-test split is: 0.2 then,

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

10 Random State: The random_state is a pseudo-random number parameter

11 Which random number to choose?

For example, you could use random_state=0, random_state=42, or

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

from sklearn.model_selection import train_test_split

# Load the iris dataset

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

Prepared by: Ashima Tyagi (Asst. Prof. SCSE)

You might also like