0% found this document useful (0 votes)

105 views4 pages

What - Why: Dummy Variables

Dummy variables are used to represent categorical variables in machine learning models that cannot process categorical data directly. There are two main techniques for converting categorical variables to dummy variables: label encoding and one-hot encoding. Label encoding assigns integer values to each category, but this can incorrectly imply an ordering between categories. One-hot encoding creates a new binary feature for each unique category value and avoids any ordering implications. It is preferable when the categorical variable is nominal rather than ordinal.

Uploaded by

Naing Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views4 pages

What - Why: Dummy Variables

Uploaded by

Naing Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Dummy Variables

 What - A dummy variable is a numerical variable that represents

categorical variables.

 Why – A lot of machine learning algorithms cannot work with

categorical variables directly, they need to be converted to numbers.

 How – There are multiple ways of handling Categorical variables

1. Label Encoding
2. One-Hot Encoding
Label Encoding
 Each categorical label is simply assigned a unique integer.

Country Age Salary Country Age Salary

India 44 32000 0 44 32000

US 34 33400 2 34 33400

Japan 43 45000 1 43 45000

US 23 23000 2 23 23000

Japan 23 67000 1 23 67000

 An effective technique when categorical data is ordinal.

 Challenge – Country is a nominal variable, there is no inherent ordering, Label encoding creates ranks for
countries. For eg here: India < Japan < US.
 This will affect model interpretation.
 We can use one-hot encoding to overcome this.
One-Hot Encoding
 One hot encoding is a representation of categorical variables as binary vectors.
 It creates additional features based on the number of unique labels in the categorical feature

Country Age Salary Country.India Country.Japan Country.US Age Salary

India 44 32000 1 0 0 44 32000

US 34 33400 0 0 1 34 33400

Japan 43 45000 0 1 0 43 45000

US 23 23000
0 0 1 23 23000
Japan 23 67000
0 1 0 23 67000

 3 new features are added in place of Country

 We solved the problem of ranking as each category is represented by a binary vector.
 Apply this technique when the categorical data is not ordinal
 Challenges – If number of categories is high, it can lead to high dimensionality.
Note : For One Hot Encoding

 The regression model won't actually need all the dummy variables.
 It doesn't need the final dummy variable as it can deduce that information from the combination of
all other dummy variables!
 To avoid multicollinearity, drop one dummy variable (use n-1 of them for model building).

Vedic Numerology Course Guide
91% (64)
Vedic Numerology Course Guide
114 pages
Dummy Variables
No ratings yet
Dummy Variables
8 pages
Dummy Variables 1
No ratings yet
Dummy Variables 1
15 pages
Dummy Dependent Variables Models
No ratings yet
Dummy Dependent Variables Models
15 pages
Chapter I
No ratings yet
Chapter I
86 pages
Dummy Variables
No ratings yet
Dummy Variables
2 pages
Chapter Four Econometrics
No ratings yet
Chapter Four Econometrics
61 pages
L7 - Categorical Data - Encoding - Preprocessing - NCU
No ratings yet
L7 - Categorical Data - Encoding - Preprocessing - NCU
23 pages
Grinding Machines
100% (2)
Grinding Machines
140 pages
Dummy Variables MCQ
No ratings yet
Dummy Variables MCQ
2 pages
ML Concepts Papers
No ratings yet
ML Concepts Papers
3 pages
Dummy 19
No ratings yet
Dummy 19
9 pages
Dummy Variable Final
No ratings yet
Dummy Variable Final
14 pages
What Are Categorical Data Encoding Methods - Binary Encoding
No ratings yet
What Are Categorical Data Encoding Methods - Binary Encoding
14 pages
Developing and Analysis of Power Systems Using Psat Software
100% (1)
Developing and Analysis of Power Systems Using Psat Software
5 pages
Ees 401 Econometrics II Module
No ratings yet
Ees 401 Econometrics II Module
77 pages
Man Xtvsuite en
No ratings yet
Man Xtvsuite en
74 pages
Categorical Variables Explained
No ratings yet
Categorical Variables Explained
3 pages
Understanding Dummy Variables in Econometrics
No ratings yet
Understanding Dummy Variables in Econometrics
9 pages
Ecoometrics Chapter 5 Regression With Qualitative Information
No ratings yet
Ecoometrics Chapter 5 Regression With Qualitative Information
68 pages
SMDS Unit 3
No ratings yet
SMDS Unit 3
45 pages
Metro Jobs Clearance Form Blank
100% (1)
Metro Jobs Clearance Form Blank
1 page
Encoding Notes
No ratings yet
Encoding Notes
4 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
Lecture7 - Regression Extensions
No ratings yet
Lecture7 - Regression Extensions
58 pages
Econo A2 Part C
No ratings yet
Econo A2 Part C
1 page
L1 - Data Pre-Processing & Steps of Building A Model
No ratings yet
L1 - Data Pre-Processing & Steps of Building A Model
30 pages
Sale of Goods Act 1930 Overview
No ratings yet
Sale of Goods Act 1930 Overview
27 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Econoch 7
No ratings yet
Econoch 7
32 pages
Exp 6
No ratings yet
Exp 6
9 pages
Encoding Categorical Data
No ratings yet
Encoding Categorical Data
4 pages
Label Encoding in Machine Learning
No ratings yet
Label Encoding in Machine Learning
11 pages
Dummy Variables EAB
No ratings yet
Dummy Variables EAB
12 pages
Dummy Variables in Regression
No ratings yet
Dummy Variables in Regression
4 pages
Research Question
No ratings yet
Research Question
1 page
Dealing With Categorical
No ratings yet
Dealing With Categorical
25 pages
6 One Hot Encoding
No ratings yet
6 One Hot Encoding
3 pages
Object Oriented Programming in Java
No ratings yet
Object Oriented Programming in Java
5 pages
Grade 9 Math: Understanding Mean
No ratings yet
Grade 9 Math: Understanding Mean
8 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Chapter10 Econometrics DummyVariableModel
No ratings yet
Chapter10 Econometrics DummyVariableModel
8 pages
Richland Technologies 5th Anniversary Press Release
No ratings yet
Richland Technologies 5th Anniversary Press Release
2 pages
2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
SM Chapter 5 Booster - Examiner Notes
No ratings yet
SM Chapter 5 Booster - Examiner Notes
60 pages
Econometrics II (N)
No ratings yet
Econometrics II (N)
30 pages
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
Categorical Variable Encoding Guide
No ratings yet
Categorical Variable Encoding Guide
21 pages
Dummy Variable - Lecture
No ratings yet
Dummy Variable - Lecture
20 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
Ch07 - Dummy Variables - Ver1
No ratings yet
Ch07 - Dummy Variables - Ver1
29 pages
MCB Types
No ratings yet
MCB Types
3 pages
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
15 pages
EBE Dummy Variables
No ratings yet
EBE Dummy Variables
9 pages
1 Preoperative
No ratings yet
1 Preoperative
67 pages
Dummy Variables
No ratings yet
Dummy Variables
2 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Purposive Communication - Lesson 3
No ratings yet
Purposive Communication - Lesson 3
7 pages
Rare Project-2023-24 - 230614 - 163032
No ratings yet
Rare Project-2023-24 - 230614 - 163032
6 pages
CRM Assignment: Key Concepts Quiz
100% (2)
CRM Assignment: Key Concepts Quiz
28 pages
Radiology MD Training Guide
No ratings yet
Radiology MD Training Guide
12 pages
Lecture 08 Dummy Variables
No ratings yet
Lecture 08 Dummy Variables
6 pages
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
No ratings yet
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
25 pages
Dummy Variables in Regression
No ratings yet
Dummy Variables in Regression
37 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Lec11 Ecmt
No ratings yet
Lec11 Ecmt
25 pages
Aly 8520 To Aly 8526 12V PL
No ratings yet
Aly 8520 To Aly 8526 12V PL
4 pages
Android File Management Guide
No ratings yet
Android File Management Guide
19 pages
All About Encoding - by Baijayanta Roy - Towards Data Science
No ratings yet
All About Encoding - by Baijayanta Roy - Towards Data Science
25 pages
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
No ratings yet
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
16 pages
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
No ratings yet
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
2 pages
ECommerce Virtual Assistant Course
100% (1)
ECommerce Virtual Assistant Course
18 pages
Natural & Artificial Resources Vocabulary
No ratings yet
Natural & Artificial Resources Vocabulary
20 pages
Understanding Dummy Variables in Regression
No ratings yet
Understanding Dummy Variables in Regression
25 pages
IJRPR15453
No ratings yet
IJRPR15453
7 pages
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Regression Analysis Insights
No ratings yet
Regression Analysis Insights
5 pages
Use Case Points for Objectory Projects
No ratings yet
Use Case Points for Objectory Projects
9 pages
Sample CV
No ratings yet
Sample CV
6 pages
UPI Transactiosn Frauds in India
No ratings yet
UPI Transactiosn Frauds in India
4 pages
Worksheet - Chapter 11 - Biotechnology - Principles and Processes
No ratings yet
Worksheet - Chapter 11 - Biotechnology - Principles and Processes
3 pages
9 RWS PT 4 Math Nida 202425
No ratings yet
9 RWS PT 4 Math Nida 202425
2 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Chapter One - Re
No ratings yet
Chapter One - Re
10 pages

What - Why: Dummy Variables

Uploaded by

What - Why: Dummy Variables

Uploaded by

Dummy Variables

 What - A dummy variable is a numerical variable that represents

 Why – A lot of machine learning algorithms cannot work with

 How – There are multiple ways of handling Categorical variables

Country Age Salary Country Age Salary

India 44 32000 0 44 32000

Japan 43 45000 1 43 45000

Japan 23 67000 1 23 67000

 An effective technique when categorical data is ordinal.

Country Age Salary Country.India Country.Japan Country.US Age Salary

India 44 32000 1 0 0 44 32000

Japan 43 45000 0 1 0 43 45000

 3 new features are added in place of Country

You might also like