0% found this document useful (0 votes)

137 views14 pages

Titanic Data Analysis

The document outlines the process of performing exploratory data analysis (EDA) on the Titanic dataset using Python libraries such as pandas, numpy, matplotlib, and seaborn. It includes steps for loading the dataset, displaying its structure, handling missing values, and visualizing survival rates based on various factors like gender and passenger class. Key insights include the average age of passengers, survival rates by class, and the impact of gender on survival outcomes.

Uploaded by

sudeep shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views14 pages

Titanic Data Analysis

Uploaded by

sudeep shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Exploratory Data Analysis with Titanic

dataset
In [1]: #Import the required libraries
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline

In [2]: #Load the data set

train_data=pd.read_csv('titanic_train_dataset.csv')

In [3]: #Display first 10 rows of dataset.

train_data.head(10)
Out[3]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Far

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.250
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.283
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.925
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.100
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.050
Henry

Moran,
5 6 0 3 male NaN 0 0 330877 8.458
Mr. James

McCarthy,
6 7 0 1 Mr. male 54.0 0 0 17463 51.862
Timothy J

Palsson,
Master.
7 8 0 3 male 2.0 3 1 349909 21.075
Gosta
Leonard

Johnson,
Mrs.
Oscar W
8 9 1 3 female 27.0 0 2 347742 11.133
(Elisabeth
Vilhelmina
Berg)

Nasser,
Mrs.
9 10 1 2 Nicholas female 14.0 1 0 237736 30.070
(Adele
Achem)

In [6]: #Display last 5 rows of dataset.

train_data.tail()
Out[6]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Montvila,
886 887 0 2 Rev. male 27.0 0 0 211536 13.00
Juozas

Graham,
Miss.
887 888 1 1 female 19.0 0 0 112053 30.00
Margaret
Edith

Johnston,
Miss.
W./C.
888 889 0 3 Catherine female NaN 1 2 23.45
6607
Helen
"Carrie"

Behr, Mr.
889 890 1 1 Karl male 26.0 0 0 111369 30.00
Howell

Dooley,
890 891 0 3 Mr. male 32.0 0 0 370376 7.75
Patrick

In [5]: #Get the features of data set.

train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

Survived - Survival (0 = No; 1 = Yes) Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 =

3rd) Name - Name of the passenger Sex - Sex Age - Age

SibSp - Sibsp: The dataset defines family relations... Sibling = brother, sister,
stepbrother, stepsister Spouse = husband, wife

Parch: The dataset defines family relations in this way... Parent = mother, father Child =
daughter, son, stepdaughter, stepson Some children travelled only with a nanny,
therefore parch=0 for them.

Ticket - Ticket Number Fare - Passenger Fare Cabin - Cabin Embarked - Port of
Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

In [6]: #4.Find the total number of rows and columns in the dataset.
train_data.shape

(891, 12)
Out[6]:

In [21]: #Find the mean,std,count,min,max and percentiles of dataset.

train_data.describe()

Out[21]: PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000

mean 446.000000 0.383838 2.308642 29.856532 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 13.292134 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 22.000000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 30.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 36.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

In [7]: #Check the null values in the data set.

train_data.isnull().sum()

PassengerId 0
Out[7]:
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

In [23]: #Draw the count plot to show the passengers survived or not survived.

#A countplot counts the categories and returns a count of their occurrenc

#As it only returns the count based of a categorical column,
#we need to specify only the x parameter.

sb.countplot('Survived',hue='Survived',data=train_data)
plt.show()

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: Fut
ureWarning: Pass the following variable as a keyword arg: x. From version
0.12, the only valid positional argument will be `data`, and passing othe
r arguments without an explicit keyword will result in an error or misint
erpretation.
warnings.warn(
In [26]: #Identify the number of male and female survived and not survived.

train_data.groupby(['Sex', 'Survived'])['Survived'].count()

Sex Survived
Out[26]:
female 0 81
1 233
male 0 468
1 109
Name: Survived, dtype: int64

In [12]: #Plot to show a passenger class has any impact on survived vs dead.
#train_data[['Sex','Survived']].groupby(['Sex']).mean().plot.bar()
sb.countplot('Sex',hue='Survived',data=train_data)
plt.show()

In [9]: sb.countplot('Pclass', hue='Survived', data=train_data)

plt.title('Pclass: Survived vs Dead')
plt.show()
In [ ]: #it is clearly seen that pasangers of Class 1 are given high priority whi
#Rescue. There are greater number of passangers in Class 3 than Class 1 a
#Class 2 but very few, almost 25% in Class 3 survived. In Class 2, surviv
#and non-survival rate is 49% and 51% approx. While in Class 1 almost 68%
#people survived.

In [10]: #Identify the number of male and female survived or died based on the pas
#class.

# Compute a simple cross-tabulation of two (or more) factors. By default

#computes a frequency table of the factors unless an array of values and
#aggregation function are passed.

pd.crosstab([train_data.Sex,train_data.Survived],train_data.Pclass)

Out[10]: Pclass 1 2 3

Sex Survived

female 0 3 6 72

1 91 70 72

male 0 77 91 300

1 45 17 47

In [11]: #Find the age of oldest,youngest and average age of person travelled.
print('Age of oldest person travelled :',train_data['Age'].max())
print('Age of youngest person travelled :',train_data['Age'].min())
print('Average Age of person travelled :',train_data['Age'].mean())

Age of oldest person travelled : 80.0

Age of youngest person travelled : 0.42
Average Age of person travelled : 29.69911764705882

In [27]: train_data['Initial']=0
for i in train_data:
train_data['Initial']=train_data.Name.str.extract('([A-Za-z]+)\.') #e

In [28]: pd.crosstab(train_data.Initial,train_data.Sex)
Out[28]: Sex female male

Initial

Capt 0 1

Col 0 2

Countess 1 0

Don 0 1

Dr 1 6

Jonkheer 0 1

Lady 1 0

Major 0 2

Master 0 40

Miss 182 0

Mlle 2 0

Mme 1 0

Mr 0 517

Mrs 125 0

Ms 1 0

Rev 0 6

Sir 0 1

In [14]: train_data.groupby('Initial')['Age'].mean()

Initial
Out[14]:
Capt 70.000000
Col 58.000000
Countess 33.000000
Don 40.000000
Dr 42.000000
Jonkheer 38.000000
Lady 48.000000
Major 48.500000
Master 4.574167
Miss 21.773973
Mlle 24.000000
Mme 24.000000
Mr 32.368090
Mrs 35.898148
Ms 28.000000
Rev 43.166667
Sir 49.000000
Name: Age, dtype: float64

In [15]: train_data['Initial'].replace(['Capt','Col','Countess','Don','Dr','Jonkhe
'Mr','Miss','Mr','Other','Mr','Mrs','Mr','

In [ ]:

In [16]: train_data['Initial'].replace(['Mlle','Mme','Ms','Dr','Major','Lady','Cou
'Miss','Miss','Mr','Mr','Mrs','Mrs','Othe
In [17]: #Average age based on initials
train_data.groupby('Initial')['Age'].mean()

Initial
Out[17]:
Master 4.574167
Miss 21.879195
Mr 32.891990
Mrs 35.828829
Other 42.000000
Name: Age, dtype: float64

In [19]: #Fill the null values of age with average age based on initial
train_data.loc[(train_data.Age.isnull()) & (train_data.Initial=='Mr'),'Ag
train_data.loc[(train_data.Age.isnull()) & (train_data.Initial=='Mrs'),'A
train_data.loc[(train_data.Age.isnull()) & (train_data.Initial=='Master')
train_data.loc[(train_data.Age.isnull()) & (train_data.Initial=='Miss'),'
train_data.loc[(train_data.Age.isnull()) & (train_data.Initial=='Other'),

In [27]: train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 13 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 891 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
Initial 891 non-null object
dtypes: float64(2), int64(5), object(6)
memory usage: 90.6+ KB

In [28]: train_data.Age.isnull().any()

False
Out[28]:

In [20]: f,ax=plt.subplots(1,2,figsize=(20,20))
train_data[train_data['Survived']==0].Age.plot.hist(ax=ax[0],bins=20,edge
ax[0].set_title('Survived = 0')
x1=list(range(0,85,5))
ax[0].set_xticks(x1)
train_data[train_data['Survived']==1].Age.plot.hist(ax=ax[1],bins=20,edge
x2=list(range(0,85,5))
ax[1].set_xticks(x2)
ax[1].set_title('Survived = 1')
plt.show()
In [ ]: #Observations: (1) First priority during Rescue is given to children and
#as the persons<5 are save by large numbers (2) The oldest saved passange
# of age 80 (3) The most deaths were between 30-40

In [31]: #Identify the number of passenger died based on size of the family
#(using SibSp feature) and also draw the plot..
pd.crosstab([train_data.SibSp],train_data.Survived)

Out[31]: Survived 0 1

SibSp

0 398 210

1 97 112

2 15 13

3 12 4

4 15 3

5 5 0

8 7 0

In [29]: #Identify the number of passenger died based on size of the family
#(using SibSp feature) and also draw the plot..
pd.crosstab([train_data.SibSp],train_data.Survived).style.background_grad
Out[29]: Survived 0 1

SibSp

0 398 210

1 97 112

2 15 13

3 12 4

4 15 3

5 5 0

8 7 0

In [41]: sb.barplot('SibSp','Survived', data=train_data)

<matplotlib.axes._subplots.AxesSubplot at 0x24c61b50748>
Out[41]:

In [ ]: #The graph shows that if a passanger is alone in ship with no siblings, s

#The graph decreases as no of siblings increase. This is interesting beca
#I will save them instead of saving myself.
#But there's something wrong, the survival rate for families with 5-8 mem
#Is this because of PClass? Yes this is PClass, The crosstab shows that P
#were all in Pclass3. It is imminent that all the large families in Pclas

In [39]: f,ax=plt.subplots(1,2,figsize=(20,8))
sb.barplot('SibSp','Survived', data=train_data,ax=ax[0])
ax[0].set_title('SipSp vs Survived in BarPlot')
plt.show()
In [40]: f,ax=plt.subplots(1,2,figsize=(20,8))
sb.barplot('SibSp','Survived', data=train_data,ax=ax[0])
ax[0].set_title('SipSp vs Survived in BarPlot')
sb.factorplot('SibSp','Survived', data=train_data,ax=ax[1])
ax[1].set_title('SibSp vs Survived in FactorPlot')
plt.close(2)
plt.show()

In [33]: pd.crosstab(train_data.SibSp,train_data.Pclass).style.background_gradient

Out[33]: Pclass 1 2 3

SibSp

0 137 120 351

1 71 55 83

2 5 8 15

3 3 1 12

4 0 0 18

5 0 0 5

8 0 0 7

In [ ]: #Barplot and Crosstab data shows that if a passanger is alone in ship wit
#siblings, survival rate is 34.5%. The graph decreases as no of siblings
#increase. This is interesting because, If I have a family onboard, I wil
#them instead of saving myself. But there's something wrong, the survival
#for families with 5-8 members is 0%. Is this because of PClass?
#Yes this is PClass, The crosstab shows that Person with SibSp>3 were all
#Pclass3. It is imminent that all the large families in Pclass3(>3) died.

In [64]: train_data.corr(method='pearson')

Out[64]: PassengerId Survived Pclass Age SibSp Parch Fare

PassengerId 1.000000 -0.005007 -0.035144 0.041709 -0.057527 -0.001652 0.012658

Survived -0.005007 1.000000 -0.338481 -0.091497 -0.035322 0.081629 0.257307

Pclass -0.035144 -0.338481 1.000000 -0.339582 0.083081 0.018443 -0.549500

Age 0.041709 -0.091497 -0.339582 1.000000 -0.267734 -0.198712 0.089087

SibSp -0.057527 -0.035322 0.083081 -0.267734 1.000000 0.414838 0.159651

Parch -0.001652 0.081629 0.018443 -0.198712 0.414838 1.000000 0.216225

Fare 0.012658 0.257307 -0.549500 0.089087 0.159651 0.216225 1.000000

In [ ]: #From above correlation table we can see that Survival is inversly correl
#Pclass value. In this case since Class 1 has lower numerical value, it h
#better survival rate compared to other classes.

#We also see that Age and Survival are slighltly correlated.

In [70]: train_data.groupby(['Survived']).hist()

sb.factorplot('Survived', data=train_data, kind='count')

<seaborn.axisgrid.FacetGrid at 0x27a34bf3c50>
Out[70]:
In [78]: #Plot Agewise distribution of the passenger aboard.
sb.distplot(train_data['Age'].dropna(), bins=15,kde=False)

<matplotlib.axes._subplots.AxesSubplot at 0x27a35629b70>
Out[78]:

In [ ]: #Note: Many passensgers are of age 15-40 yrs. But this is not complete da

In [83]: #Age wise Distribution of Male and Female passengers

plt.hist(train_data['Age'][(train_data['Sex'] == 'female')].dropna(), bin
plt.hist(train_data['Age'][(train_data['Sex'] == 'male')].dropna(), bins=
plt.xlabel('Age')
plt.ylabel('Count')
plt.title('Age wise Distribution of Male and Female passengers')
plt.show()

In [ ]: #There were many male passengers aboared compared to female passengers.

In [85]: #survival based on passenger’s class for both genders.

grouped_by_pclass = train_data.groupby(['Pclass', 'Survived', 'Sex'])
grouped_by_pclass.size()

Pclass Survived Sex

Out[85]:
1 0 female 3
male 77
1 female 91
male 45
2 0 female 6
male 91
1 female 70
male 17
3 0 female 72
male 300
1 female 72
male 47
dtype: int64

In [ ]:

Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
5 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
Advanced Python for Data Scientists
No ratings yet
Advanced Python for Data Scientists
19 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
9924 ML Lab3
No ratings yet
9924 ML Lab3
9 pages
9914 ML Lab3
No ratings yet
9914 ML Lab3
6 pages
Titanic ML for Data Scientists
No ratings yet
Titanic ML for Data Scientists
36 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
11 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
A09Ass01 - Jupyter Notebook
No ratings yet
A09Ass01 - Jupyter Notebook
8 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Titanic Logistic Regression Project
No ratings yet
Titanic Logistic Regression Project
35 pages
Titanic Dataset
No ratings yet
Titanic Dataset
9 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
ML - Lab 03.ipynb Colab
No ratings yet
ML - Lab 03.ipynb Colab
4 pages
01-Logistic Regression With Python
No ratings yet
01-Logistic Regression With Python
12 pages
Day 20
No ratings yet
Day 20
5 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Data Mining
No ratings yet
Data Mining
59 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Project Report
No ratings yet
Project Report
7 pages
Titanic Survival Analysis
100% (2)
Titanic Survival Analysis
13 pages
Titanic Data Analysis in Colab
No ratings yet
Titanic Data Analysis in Colab
4 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
BD WPS2
No ratings yet
BD WPS2
11 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Titanic
No ratings yet
Titanic
6 pages
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
No ratings yet
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
19 pages
Seaborn Ploting in Titanic
No ratings yet
Seaborn Ploting in Titanic
18 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
ML 3
No ratings yet
ML 3
9 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Day 20
No ratings yet
Day 20
5 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Titanic
No ratings yet
Titanic
22 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
16 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
Module 5 Bivariate Analysis
No ratings yet
Module 5 Bivariate Analysis
81 pages
Module 5 Stem and Leaf Plot
No ratings yet
Module 5 Stem and Leaf Plot
26 pages
Eda 2
No ratings yet
Eda 2
69 pages
Computer Officer - IT Officer Syllabus For Sangh
No ratings yet
Computer Officer - IT Officer Syllabus For Sangh
16 pages
AI Planning: Concepts and Algorithms
No ratings yet
AI Planning: Concepts and Algorithms
16 pages
Siva CTSD Lab Manual
No ratings yet
Siva CTSD Lab Manual
46 pages
Selenium Python Guide
No ratings yet
Selenium Python Guide
75 pages
Animasi Karakter: The Struggle
No ratings yet
Animasi Karakter: The Struggle
16 pages
Towards A Methodology For Evaluating The Execution Semantics of IEC 61499 Runtime Environments
No ratings yet
Towards A Methodology For Evaluating The Execution Semantics of IEC 61499 Runtime Environments
7 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
Cap450:Artificial Intelligence and Intelligent Systems: Session 2023-24 Page:1/2
No ratings yet
Cap450:Artificial Intelligence and Intelligent Systems: Session 2023-24 Page:1/2
2 pages
Time Duration Calculator
No ratings yet
Time Duration Calculator
1 page
BTech IT
No ratings yet
BTech IT
81 pages
Mark Scheme For Specimen Paper 1 2016
No ratings yet
Mark Scheme For Specimen Paper 1 2016
8 pages
1 1 - Why Clouds 01032023 013831am
No ratings yet
1 1 - Why Clouds 01032023 013831am
17 pages
Speech Analysis
No ratings yet
Speech Analysis
6 pages
IT Professional Resume
100% (2)
IT Professional Resume
3 pages
Week 2 - Brute Force and Exhaustive Search
No ratings yet
Week 2 - Brute Force and Exhaustive Search
67 pages
E-Governance in India This Article Discussed About E-Governance, Models, Successes, Limitations. Before That What Is E-Governance E-Governance
No ratings yet
E-Governance in India This Article Discussed About E-Governance, Models, Successes, Limitations. Before That What Is E-Governance E-Governance
7 pages
1 Output Based Question Part With Answer
No ratings yet
1 Output Based Question Part With Answer
6 pages
Nss Labs Sd-Wan Comparative Report
No ratings yet
Nss Labs Sd-Wan Comparative Report
9 pages
2.0 Review of Related Literature
50% (2)
2.0 Review of Related Literature
16 pages
DM (Vertex, Degree, Graphs
No ratings yet
DM (Vertex, Degree, Graphs
26 pages
Motorola Canopy PTP 100 Full and Lite Review
No ratings yet
Motorola Canopy PTP 100 Full and Lite Review
7 pages
Internet Cafe Cafesuite 3 50 PDF
No ratings yet
Internet Cafe Cafesuite 3 50 PDF
25 pages
Yoder Schrag Nassi - Schart
No ratings yet
Yoder Schrag Nassi - Schart
8 pages
Manual Camara Feutron
No ratings yet
Manual Camara Feutron
165 pages
Ie Bus
No ratings yet
Ie Bus
92 pages
Pub059-035!00!1116 (Pakscan FCU (Field Control Unit) Technical Manual For IQ3 CVA CMA)
No ratings yet
Pub059-035!00!1116 (Pakscan FCU (Field Control Unit) Technical Manual For IQ3 CVA CMA)
52 pages
BRKSEC-3020 ASA Flow PDF
No ratings yet
BRKSEC-3020 ASA Flow PDF
102 pages
S4 Hana 1709
No ratings yet
S4 Hana 1709
7 pages
Zond-MT 1D User Guide
No ratings yet
Zond-MT 1D User Guide
25 pages
Experienced Business Consultant
No ratings yet
Experienced Business Consultant
2 pages
Unit III - Digital Image Fundamentals
No ratings yet
Unit III - Digital Image Fundamentals
19 pages

Titanic Data Analysis

Uploaded by

Titanic Data Analysis

Uploaded by

Exploratory Data Analysis with Titanic

In [2]: #Load the data set

In [3]: #Display first 10 rows of dataset.

In [6]: #Display last 5 rows of dataset.

In [5]: #Get the features of data set.

Survived - Survival (0 = No; 1 = Yes) Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 =

In [21]: #Find the mean,std,count,min,max and percentiles of dataset.

Out[21]: PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000

mean 446.000000 0.383838 2.308642 29.856532 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 13.292134 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 22.000000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 30.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 36.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

In [7]: #Check the null values in the data set.

#A countplot counts the categories and returns a count of their occurrenc

In [9]: sb.countplot('Pclass', hue='Survived', data=train_data)

# Compute a simple cross-tabulation of two (or more) factors. By default

Age of oldest person travelled : 80.0

In [41]: sb.barplot('SibSp','Survived', data=train_data)

In [ ]: #The graph shows that if a passanger is alone in ship with no siblings, s

0 137 120 351

Out[64]: PassengerId Survived Pclass Age SibSp Parch Fare

PassengerId 1.000000 -0.005007 -0.035144 0.041709 -0.057527 -0.001652 0.012658

Survived -0.005007 1.000000 -0.338481 -0.091497 -0.035322 0.081629 0.257307

Pclass -0.035144 -0.338481 1.000000 -0.339582 0.083081 0.018443 -0.549500

Age 0.041709 -0.091497 -0.339582 1.000000 -0.267734 -0.198712 0.089087

SibSp -0.057527 -0.035322 0.083081 -0.267734 1.000000 0.414838 0.159651

Parch -0.001652 0.081629 0.018443 -0.198712 0.414838 1.000000 0.216225

Fare 0.012658 0.257307 -0.549500 0.089087 0.159651 0.216225 1.000000

sb.factorplot('Survived', data=train_data, kind='count')

In [83]: #Age wise Distribution of Male and Female passengers

In [ ]: #There were many male passengers aboared compared to female passengers.

In [85]: #survival based on passenger’s class for both genders.

Pclass Survived Sex

You might also like