0% found this document useful (0 votes)

8 views3 pages

Data Set

The document outlines a data preprocessing and dimensionality reduction workflow using Python's pandas, Seaborn, and scikit-learn libraries. It includes one-hot encoding, PCA, and t-SNE techniques to analyze and visualize a dataset, ultimately concluding that these methods help simplify modeling and reveal underlying patterns. The final dataset combines principal components with the original data for further analysis.

Uploaded by

Dzurg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Data Set

Uploaded by

Dzurg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

dataset = pd.

get_dummies(dataset, drop_first=True)

dataset.head()

- This code is performing one-hot encoding on the categorical variables in the pandas DataFrame
object 'dataset' by converting them into binary dummy variables.

X = dataset.drop(columns = 'Rings (+1.5=Years)') X.head()

- This code creates a new DataFrame X by dropping the column 'Rings (+1.5=Years)' from the
original dataset.

import seaborn as sns

sns.heatmap(X.corr(),

annot = True,

fmt = '.1g',

center = 0,

cmap = 'coolwarm',

linewidths = 1,

linecolor='black')

- This code creates a heatmap using the Seaborn library to visualize the correlation between all
the columns in the DataFrame ‘X’

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

X_scaled = scaler.fit_transform(X) # chuyeenr hoas dữ liệu về dạng 0, 1

X_scaled = pd.DataFrame(X_scaled, columns = X.columns)

X_scaled.head()

- This code imports the MinMaxScaler function from the scikit-learn library and initializes a scaler
object.

model = PCA(random_state=1503

).fit(X_scaled)
plt.plot(model.explained_variance_ratio_,

linewidth = 4)

plt.xlabel('Component')

plt.ylabel('Explained Variance')

plt.show()

- This code performs PCA (Principal Component Analysis) on the scaled data ‘X_scaled’ using
scikit-learn's ‘PCA’ class with a specified random state.

model = PCA(n_components=3,

random_state=1108).fit(X_scaled)

- This code performs principal component analysis (PCA) on the scaled data 'X_scaled' and creates
a PCA object with 3 principal components using the 'PCA' function from the scikit-learn library.

model_interpretation = pd.DataFrame(model.components_,

columns = X.columns)

model_interpretation

- This code creates a Pandas DataFrame named ‘model_interpretation’ that stores the principal
component analysis (PCA) interpretation of the ‘X_scaled’ data.

components = model.transform(X_scaled)

components = pd.DataFrame(components,

columns =['small size and weight', 'sex','big size and weight'])

components.head()

- This code uses the trained PCA model to transform the scaled dataset ‘X_scaled’ into a new
dataset called 'components', which contains the principal components of X_scaled.

final_dataset = pd.concat([components,dataset], axis = 1)

final_dataset.head()

- This code initializes a new dataframe named ‘final_dataset’ is created by concatenating the
original dataset ‘dataset’ and the dataframe ‘components’.
from sklearn.manifold import TSNE

model = TSNE(n_components = 2,

random_state = 1108)

components = model.fit_transform(X)

components

- This code uses the t-distributed stochastic neighbor embedding (TSNE) algorithm to reduce the
dimensionality of the dataset X to 2 dimensions.

plt.scatter(components[:,0],

components[:,1],

cmap='hsv',

c = dataset["Rings (+1.5=Years)"])

plt.title("t-SNE scatter plot")

plt.show()

- This code uses the t-SNE algorithm to perform dimensionality reduction on the data in X and
project it onto a two-dimensional space.

CONCLUSION: From the dimension reduction challenge above, we can infer that the original dataset had
a large number of features that were potentially correlated with each other, leading to the possibility of
multicollinearity issues. By using dimension reduction techniques such as Principal Component Analysis
(PCA) or t-SNE, we were able to reduce the number of features in the dataset while still preserving most
of the variance in the data. This can help simplify the modeling process and potentially improve the
model's performance. In addition, by visualizing the data in a scatter plot using t-SNE, we can observe
the grouping or separation of data points in a lower-dimensional space, which can provide insights into
the underlying patterns or relationships in the data.

Principal Component Analysis Python
No ratings yet
Principal Component Analysis Python
7 pages
New CMA Part 1 Section A
100% (2)
New CMA Part 1 Section A
114 pages
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
Data Science Module 5
No ratings yet
Data Science Module 5
28 pages
Classification and Dimension Reduction: Load Dataset
No ratings yet
Classification and Dimension Reduction: Load Dataset
11 pages
Principal Component Analysis: #Question 1
No ratings yet
Principal Component Analysis: #Question 1
6 pages
Week 2 A
No ratings yet
Week 2 A
4 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Preprocessing
No ratings yet
Preprocessing
9 pages
MLFILE
No ratings yet
MLFILE
21 pages
Principal Component Analysis With Cats
No ratings yet
Principal Component Analysis With Cats
10 pages
Exp 3
No ratings yet
Exp 3
4 pages
Tendeiro Niessen Crisan Meijer (2019)
No ratings yet
Tendeiro Niessen Crisan Meijer (2019)
41 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Practical 5
No ratings yet
Practical 5
6 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
ML Lab
No ratings yet
ML Lab
14 pages
Numerical Descriptive Measure, Lecture-2
No ratings yet
Numerical Descriptive Measure, Lecture-2
21 pages
Assignment (3) ML - AmanVerma
No ratings yet
Assignment (3) ML - AmanVerma
6 pages
Experiment 3 PCA On Iris Dataset
No ratings yet
Experiment 3 PCA On Iris Dataset
2 pages
M PDF
No ratings yet
M PDF
13 pages
Python PCA for Data Scientists
No ratings yet
Python PCA for Data Scientists
5 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Strangers
No ratings yet
Strangers
8 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
BCG Virtual Experience Task 3 Feature Engineering1
No ratings yet
BCG Virtual Experience Task 3 Feature Engineering1
12 pages
Data Display & Exploration Techniques
No ratings yet
Data Display & Exploration Techniques
5 pages
V25 C12 240tips
No ratings yet
V25 C12 240tips
14 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
Mine 5
No ratings yet
Mine 5
8 pages
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
O-Level Statistics (4040) - Quiz Level 2
No ratings yet
O-Level Statistics (4040) - Quiz Level 2
21 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
No ratings yet
7034 1713335587607 Dimensionality - Reduction - Ipynb Colaboratory
4 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
Data Modeling for Credit Risk
No ratings yet
Data Modeling for Credit Risk
31 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
6 pages
ECON1007 PS3 2025 Solutions
No ratings yet
ECON1007 PS3 2025 Solutions
7 pages
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
No ratings yet
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
8 pages
13 Statistical Analysis Methods For Data Analysts & Data Scientists - by BTD - Medium
No ratings yet
13 Statistical Analysis Methods For Data Analysts & Data Scientists - by BTD - Medium
22 pages
PCA & Decision Tree Tutorial
No ratings yet
PCA & Decision Tree Tutorial
23 pages
Table VII Critical Values For The Sign Test
No ratings yet
Table VII Critical Values For The Sign Test
1 page
10 24331-Ijere 453512-523855
No ratings yet
10 24331-Ijere 453512-523855
8 pages
Iris Dataset PCA Analysis Code
No ratings yet
Iris Dataset PCA Analysis Code
21 pages
Dimensionality Reduction - PCA LDA
No ratings yet
Dimensionality Reduction - PCA LDA
25 pages
Business Analytics - Training Curriculum - SKOLAR
No ratings yet
Business Analytics - Training Curriculum - SKOLAR
16 pages
Handout 2
No ratings yet
Handout 2
10 pages
Random Forest On Titanic
No ratings yet
Random Forest On Titanic
4 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Data Analysis and Visualization Guide
No ratings yet
Data Analysis and Visualization Guide
16 pages
Example For Z Distribution Confidence Intervals 1
No ratings yet
Example For Z Distribution Confidence Intervals 1
7 pages
Suicide Analysis
No ratings yet
Suicide Analysis
18 pages
Portfolio Risk & Return Analysis
No ratings yet
Portfolio Risk & Return Analysis
91 pages
Year Wise GDP, Growth of Agriculture Sector, Manufacturing and Services Sector of Pakistan
No ratings yet
Year Wise GDP, Growth of Agriculture Sector, Manufacturing and Services Sector of Pakistan
25 pages
BUS 172 Practice Maths
0% (1)
BUS 172 Practice Maths
18 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
MTH145 Pyq1
No ratings yet
MTH145 Pyq1
3 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
Entropy 24 00713 v2
No ratings yet
Entropy 24 00713 v2
12 pages
Solve All Questions Without - Choice
No ratings yet
Solve All Questions Without - Choice
4 pages
Lecture 6
No ratings yet
Lecture 6
11 pages
Estimation and Sampling
No ratings yet
Estimation and Sampling
5 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
20 pages
Exp 2 SDK Ok
No ratings yet
Exp 2 SDK Ok
18 pages
Lab 6 Answers
No ratings yet
Lab 6 Answers
14 pages
Slides 0
No ratings yet
Slides 0
21 pages
PCA by Vikram Kumar
No ratings yet
PCA by Vikram Kumar
19 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
ML Lab Manual PRGM 2&3
No ratings yet
ML Lab Manual PRGM 2&3
6 pages
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
No ratings yet
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
29 pages
Inferential Statistics: Basic Concepts
No ratings yet
Inferential Statistics: Basic Concepts
45 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages