0% found this document useful (0 votes)

39 views18 pages

Seaborn Ploting in Titanic

The document contains a Python script that utilizes pandas, numpy, and seaborn for data analysis and visualization of the Titanic dataset. It includes data loading, descriptive statistics, handling missing values, and various plots to analyze passenger survival based on different features such as age, fare, and class. The analysis highlights trends in survival rates and suggests further steps for data imputation and categorization.

Uploaded by

haridivya6650

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views18 pages

Seaborn Ploting in Titanic

Uploaded by

haridivya6650

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [2]: df=pd.read_csv('titan.csv')
df.head()

Out[2]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare ... Embarked WikiId

Braund,
A/5
0 1 0.0 3 Mr. Owen male 22.0 1 0 7.2500 ... S 691.0
21171
Harris

Cumings,
Mrs. John
Bradley
1 2 1.0 1 female 38.0 1 0 PC 17599 71.2833 ... C 90.0
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1.0 3 Miss. female 26.0 0 0 7.9250 ... S 865.0
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1.0 1 female 35.0 1 0 113803 53.1000 ... S 127.0
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0.0 3 William male 35.0 0 0 373450 8.0500 ... S 627.0
Henry

5 rows × 21 columns

In [3]: df.describe()

Out[3]: PassengerId Survived Pclass Age SibSp Parch Fare WikiId

count 1309.000000 891.000000 1309.000000 1046.000000 1309.000000 1309.000000 1308.000000 1304.000000 13

mean 655.000000 0.383838 2.294882 29.881138 0.498854 0.385027 33.295479 658.534509

std 378.020061 0.486592 0.837836 14.413493 1.041658 0.865560 51.758668 380.377373

min 1.000000 0.000000 1.000000 0.170000 0.000000 0.000000 0.000000 1.000000

25% 328.000000 0.000000 2.000000 21.000000 0.000000 0.000000 7.895800 326.750000

50% 655.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 661.500000

75% 982.000000 1.000000 3.000000 39.000000 1.000000 0.000000 31.275000 987.250000

max 1309.000000 1.000000 3.000000 80.000000 8.000000 9.000000 512.329200 1314.000000

In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 1309 non-null int64
1 Survived 891 non-null float64
2 Pclass 1309 non-null int64
3 Name 1309 non-null object
4 Sex 1309 non-null object
5 Age 1046 non-null float64
6 SibSp 1309 non-null int64
7 Parch 1309 non-null int64
8 Ticket 1309 non-null object
9 Fare 1308 non-null float64
10 Cabin 295 non-null object
11 Embarked 1307 non-null object
12 WikiId 1304 non-null float64
13 Name_wiki 1304 non-null object
14 Age_wiki 1302 non-null float64
15 Hometown 1304 non-null object
16 Boarded 1304 non-null object
17 Destination 1304 non-null object
18 Lifeboat 502 non-null object
19 Body 130 non-null object
20 Class 1304 non-null float64
dtypes: float64(6), int64(4), object(11)
memory usage: 214.9+ KB

In [5]: df.shape
(1309, 21)
Out[5]:

In [6]: df.isnull().sum()

PassengerId 0
Out[6]:
Survived 418
Pclass 0
Name 0
Sex 0
Age 263
SibSp 0
Parch 0
Ticket 0
Fare 1
Cabin 1014
Embarked 2
WikiId 5
Name_wiki 5
Age_wiki 7
Hometown 5
Boarded 5
Destination 5
Lifeboat 807
Body 1179
Class 5
dtype: int64

In [7]: df.sample(10)

Out[7]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare ... Embarked W

797 798 1.0 3 Osman, Mrs. female 31.00 0 0 349244 8.6833 ... S
Mara
Hanna, Mr.
296 297 0.0 3 male 23.50 0 0 2693 7.2292 ... C
Mansour

Baclini,
644 645 1.0 3 Miss. female 0.75 2 1 2666 19.2583 ... C
Eugenie

Candee,
Mrs. Edward
PC
1115 1116 NaN 1 (Helen female 53.00 0 0 27.4458 ... C
17606
Churchill
Hungerford)

Denkoff, Mr.
335 336 0.0 3 male NaN 0 0 349225 7.8958 ... S
Mitto

Ling, Mr.
169 170 0.0 3 male 28.00 0 0 1601 56.4958 ... S
Lee

Barry, Miss.
977 978 NaN 3 female 27.00 0 0 330844 7.8792 ... Q
Julia

Warren,
Mrs. Frank
Manley
366 367 1.0 1 female 60.00 1 0 110813 75.2500 ... C
(Anna
Sophia
Atkinson)

Elias, Mr.
532 533 0.0 3 male 17.00 1 1 2690 7.2292 ... C
Joseph Jr

Moran, Mr.
5 6 0.0 3 male NaN 0 0 330877 8.4583 ... Q
James

10 rows × 21 columns

UNIVARIATE ANALYSIS

KDE PLOT

In [8]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.PassengerId)
plt.show()
In [9]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.Age)
plt.show()

In [10]: plt.figure(figsize=(4,3))
sns.kdeplot(data=df.Fare)
plt.show()
HISTPLOT
In [11]: sns.histplot(df.Fare)
plt.show

<function matplotlib.pyplot.show(close=None, block=None)>

Out[11]:

BOX PLOT

In [12]: sns.boxplot(df.Age)
plt.show()
In [13]: sns.boxplot(x='Embarked', y='Age', data=df)
plt.title("Age distribution as function of Embarked Port")
plt.show()

In [14]: sns.boxplot(x='Embarked', y='Fare', data=df)

plt.title("Fare distribution as function of Embarked Port")
plt.show()
MULTI VARIATE ANALYSIS

LINE PLOT

In [15]: sns.lineplot(x='Age', y='Fare', data=df)

plt.title('Age vs Fare')
plt.show()
PIE CHART
In [16]: df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

Out[16]:
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'WikiId', 'Name_wiki',
'Age_wiki', 'Hometown', 'Boarded', 'Destination', 'Lifeboat', 'Body',
'Class'],
dtype='object')

In [17]: pclass_survived=df.groupby(['Pclass'])['Survived'].sum()

In [18]: pclass_survived
Pclass
Out[18]:
1 136.0
2 87.0
3 119.0
Name: Survived, dtype: float64

In [19]: sns.set_style('ticks')
pclass_survived.plot.pie()
plt.legend()
plt.show()
In [20]: pclass_sex_Survived=df.groupby(['Pclass','Sex'])['Survived'].sum()

In [21]: pclass_sex_Survived

Pclass Sex
Out[21]:
1 female 91.0
male 45.0
2 female 70.0
male 17.0
3 female 72.0
male 47.0
Name: Survived, dtype: float64

In [22]: pclass_sex_Survived.plot.pie(autopct = '%1.2f%%')

plt.legend(bbox_to_anchor=(1.5,1),loc='upper left',borderaxespad=0)
plt.show()
BAR CHART

In [23]: sns.countplot(x='Sex',data=df)

<Axes: xlabel='Sex', ylabel='count'>

Out[23]:

In [24]: sns.catplot(x ="Sex", hue ="Survived",

kind ="count", data = df);
COUNT PLOT
In [25]: sns.countplot(x='Embarked', hue='Pclass', data=df)
plt.title("Count of Passengers as function of Embarked Port")
plt.show()
In [26]: plt.figure(figsize=(4,3))
sns.set_style('darkgrid')
sns.countplot(x='Pclass',hue='Survived',data=df)
plt.title('Pclass:Survived vs Dead')
plt.show()

In [27]: plt.figure(figsize=(4,3))
sns.set_style('darkgrid')
sns.countplot(x='Pclass',hue='Sex',data=df)
plt.title('Pclass:Sex vs Dead')
plt.show()
violin plot
In [28]: # Violinplot Displays distribution of data
# across all levels of a category.
sns.violinplot(x ="Sex", y ="Age", hue ="Survived",
data = df, split = True)

<Axes: xlabel='Sex', ylabel='Age'>

Out[28]:

his graph gives a summary of the age range of men, women and children who were saved. The survival rate
is –

Good for children.

High for women in the age range 20-50.

Less for men as the age increases.

Since Age column is important, the missing values need to be filled, either by using the Name
column(ascertaining age based on salutation – Mr, Mrs etc.) or by using a regressor. After this step, another
column – Age_Range (based on age column) can be created and the data can be analyzed again.

BAR PLOT

In [29]: plt.figure(figsize=(8,4))
sns.barplot(x='SibSp',y='Survived',data=df)
plt.title('SibSp & Survived')
plt.show()

In [30]: # Divide Fare into 4 bins

df['Fare_Range'] = pd.qcut(df['Fare'], 4)

# Barplot - Shows approximate values based

# on the height of bars.
sns.barplot(x ='Fare_Range', y ='Survived',
data = df)

<Axes: xlabel='Fare_Range', ylabel='Survived'>

Out[30]:
Fare denotes the fare paid by a passenger. As the values in this column are continuous, they need to be put
in separate bins(as done for Age feature) to get a clear idea. It can be concluded that if a passenger paid a
higher fare, the survival rate is more.

Pair plot
In [31]: sns.pairplot(data=df)
plt.show()
Heat map
In [32]: heat_map=df.corr()
sns.heatmap(heat_map)
plt.show()
In [33]: plt.scatter(df.Fare,df.Age);

strip plot

In [34]: sns.stripplot(x='Fare',y='Age',data=df)
plt.show()
In [35]: sns.stripplot(x='Fare',y='Age',data=df,size=4)
plt.show()

In [ ]:

PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
Advanced Python for Data Scientists
No ratings yet
Advanced Python for Data Scientists
19 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
5 pages
Day 20
No ratings yet
Day 20
5 pages
Titanic ML for Data Scientists
No ratings yet
Titanic ML for Data Scientists
36 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
Day 20
No ratings yet
Day 20
5 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Titanic
No ratings yet
Titanic
22 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
A09Ass01 - Jupyter Notebook
No ratings yet
A09Ass01 - Jupyter Notebook
8 pages
Titanic Survival Analysis
100% (2)
Titanic Survival Analysis
13 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Data Mining
No ratings yet
Data Mining
59 pages
ML - Lab 03.ipynb Colab
No ratings yet
ML - Lab 03.ipynb Colab
4 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
9924 ML Lab3
No ratings yet
9924 ML Lab3
9 pages
Titanic Data Analysis in Colab
No ratings yet
Titanic Data Analysis in Colab
4 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Untitled26 1
No ratings yet
Untitled26 1
15 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Homework 1
No ratings yet
Homework 1
17 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Ds 9
No ratings yet
Ds 9
12 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Assign8.ipynb - Colab
No ratings yet
Assign8.ipynb - Colab
14 pages
DSBDA9
No ratings yet
DSBDA9
7 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
No ratings yet
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
19 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
12 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
16 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Experiment 9
No ratings yet
Experiment 9
7 pages
Titanic Logistic Regression Project
No ratings yet
Titanic Logistic Regression Project
35 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Titanic Dataset
No ratings yet
Titanic Dataset
9 pages
BD WPS2
No ratings yet
BD WPS2
11 pages
Lab 5.ipynb - Colab
No ratings yet
Lab 5.ipynb - Colab
6 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Pandas Day 4
No ratings yet
Pandas Day 4
7 pages
Pra 8-1
No ratings yet
Pra 8-1
3 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
10 pages
Alteryx Webinar Lecture 1 - Slides PDF
100% (1)
Alteryx Webinar Lecture 1 - Slides PDF
56 pages
Practical 4
No ratings yet
Practical 4
3 pages
Configure The Network For VxRail
No ratings yet
Configure The Network For VxRail
16 pages
Fundamental Notes 3-6 Month Course
No ratings yet
Fundamental Notes 3-6 Month Course
5 pages
Computer Security & Forensics Exam
No ratings yet
Computer Security & Forensics Exam
5 pages
Question No 1: Cryptanalytic Attacks On 3DES
No ratings yet
Question No 1: Cryptanalytic Attacks On 3DES
2 pages
Wholesale Services Agreement
No ratings yet
Wholesale Services Agreement
19 pages
Konnwei Kw310 Can Obdii+Eobd Code Reader: Specifications
No ratings yet
Konnwei Kw310 Can Obdii+Eobd Code Reader: Specifications
16 pages
MT6622 MediaTek
No ratings yet
MT6622 MediaTek
35 pages
Screenshot 2024-03-12 at 6.57.10 PM
No ratings yet
Screenshot 2024-03-12 at 6.57.10 PM
1 page
Direct Memory Access Overview
No ratings yet
Direct Memory Access Overview
21 pages
Java Notes Module 4 3rd Year
No ratings yet
Java Notes Module 4 3rd Year
24 pages
Management Information System: Bba LLB by The - Lawgical - World
No ratings yet
Management Information System: Bba LLB by The - Lawgical - World
18 pages
TDX Agentforce Hackathon Rules
No ratings yet
TDX Agentforce Hackathon Rules
11 pages
ISO/IEC JTC1 SC35 Dissemination Event February 2023 - WG6
No ratings yet
ISO/IEC JTC1 SC35 Dissemination Event February 2023 - WG6
17 pages
June 2024 (v1) MS P1 IGCSE MATHEMATICS (CORE)
No ratings yet
June 2024 (v1) MS P1 IGCSE MATHEMATICS (CORE)
7 pages
Introduction to Linear Programming
No ratings yet
Introduction to Linear Programming
17 pages
24p Syed Akhmal Syed Jamalil
No ratings yet
24p Syed Akhmal Syed Jamalil
38 pages
SQL Basics: Aggregates & Joins
No ratings yet
SQL Basics: Aggregates & Joins
52 pages
0 Intro
No ratings yet
0 Intro
26 pages
Math Notes Inequalities Grade 11
No ratings yet
Math Notes Inequalities Grade 11
9 pages
Finding The Groove
No ratings yet
Finding The Groove
7 pages
RBX - G2 - Man08008 (Ing)
No ratings yet
RBX - G2 - Man08008 (Ing)
45 pages
Rdgupta PPT Gi Sip Part-Ii3
No ratings yet
Rdgupta PPT Gi Sip Part-Ii3
39 pages
STL ToneHub v2.0 User Manual
No ratings yet
STL ToneHub v2.0 User Manual
76 pages
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
No ratings yet
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
6 pages
Symbol Table
No ratings yet
Symbol Table
24 pages
Algorithm Assignment Solutions
No ratings yet
Algorithm Assignment Solutions
3 pages
(D862.Ebook) PDF Download Principles of Textile Testing by Je Booth
50% (2)
(D862.Ebook) PDF Download Principles of Textile Testing by Je Booth
4 pages

Seaborn Ploting in Titanic

Uploaded by

Seaborn Ploting in Titanic

Uploaded by

In [1]: import pandas as pd

Out[3]: PassengerId Survived Pclass Age SibSp Parch Fare WikiId

count 1309.000000 891.000000 1309.000000 1046.000000 1309.000000 1309.000000 1308.000000 1304.000000 13

mean 655.000000 0.383838 2.294882 29.881138 0.498854 0.385027 33.295479 658.534509

std 378.020061 0.486592 0.837836 14.413493 1.041658 0.865560 51.758668 380.377373

min 1.000000 0.000000 1.000000 0.170000 0.000000 0.000000 0.000000 1.000000

25% 328.000000 0.000000 2.000000 21.000000 0.000000 0.000000 7.895800 326.750000

50% 655.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 661.500000

75% 982.000000 1.000000 3.000000 39.000000 1.000000 0.000000 31.275000 987.250000

max 1309.000000 1.000000 3.000000 80.000000 8.000000 9.000000 512.329200 1314.000000

<function matplotlib.pyplot.show(close=None, block=None)>

In [14]: sns.boxplot(x='Embarked', y='Fare', data=df)

In [15]: sns.lineplot(x='Age', y='Fare', data=df)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

In [22]: pclass_sex_Survived.plot.pie(autopct = '%1.2f%%')

<Axes: xlabel='Sex', ylabel='count'>

In [24]: sns.catplot(x ="Sex", hue ="Survived",

<Axes: xlabel='Sex', ylabel='Age'>

Good for children.

High for women in the age range 20-50.

In [30]: # Divide Fare into 4 bins

# Barplot - Shows approximate values based

<Axes: xlabel='Fare_Range', ylabel='Survived'>

You might also like