Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
49 views22 pages

Data Science Fundamentals

The document discusses various data science concepts and algorithms including working with packages in Python, NumPy arrays, Pandas data frames, and univariate analysis on diabetes data. Example code is provided to work with NumPy, Pandas, and perform frequency, mean, median, mode, variance, standard deviation, skewness and kurtosis on a diabetes dataset.

Uploaded by

thilakraj.a0321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views22 pages

Data Science Fundamentals

The document discusses various data science concepts and algorithms including working with packages in Python, NumPy arrays, Pandas data frames, and univariate analysis on diabetes data. Example code is provided to work with NumPy, Pandas, and perform frequency, mean, median, mode, variance, standard deviation, skewness and kurtosis on a diabetes dataset.

Uploaded by

thilakraj.a0321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

DATA SCIENCE FUNDAMENTALS

LAB EXERCISE
PROGRAMS AND OUTPUTS
EX:1 PACKAGES FOR DATA SCINCE IN PYTHON

AIM : TO DOWNLOAD,INSTALL AND EXPLORE THE FEATURES


OF PYTHON FOR DATA ANALYTICS.
ALGORITHM :
STEP 1 :
STEP 2 :
STEP 3 :
STEP 4 :
STEP 5 :
PROGRAM :

OUTPUT :
AIM : WORKING WITH NUMPY ARRAYS

ALGORITHM :
STEP 1 :
STEP 2 :
STEP 3 :
STEP 4 :
STEP 5 :

PROGRAM :
import numpy as np
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
sample_array = np.array([list_1,
list_2,
list_3])
print("Numpy multi dimensional array in python\n",
sample_array)
OUTPUT :
Numpy multi dimensional array in python
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
AIM : WORKING WITH PANDAS DATA FRAMES
ALGORTIHM :
STEP 1 :
STEP 2:
STEP 3 :
STEP 4 :
STEP 5 :
PROGRAM :
import pandas as pd
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
print(df[['Name', 'Qualification']])
OUTPUT :
EX.NO 5. USE THE DIABETES DATA SET FROM UCI
AND PIMA INDIANS DATE:DIABETES DATA SET FOR
PERFORMING THE FOLLOWING:

A) UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN,


MODE, VARIANCE, STANDARD DEVIATION, SKEWNESS
AND KURTOSIS.

AIM:
To explore various commands for doing Univariate analytics on
the UCI AND PIMA
INDIANS DIABETES data set.
ALGORITHM:
STEP 1: Start the program
STEP 2: To download the UCI AND PIMA INDIANS DIABETES
data set using Kaggle.
STEP 3: To read data from UCI AND PIMA INDIANS DIABETES
data set.
STEP 4: To find the mean, median, mode, variance,
standard deviation, skewness and kurtosis in the
given excel data set package.
STEP 5: Display the output.
STEP 6: Stop the program.
PROGRAM:
import
pandas as pd
import numpy
as np
import matplotlib.pyplot
as plt import seaborn
as sns
sns.set_style('darkgrid')
%matplotlib inline
from matplotlib.ticker import
FormatStrFormatter import warnings
warnings.filterwarnings('ignore')

df =
pd.read_csv('C:/Users/kirub/Documents/Learning/Untitled
Folder/diabetes.csv') df.head()
df.shape
df.dtypes
df['Outcome']=df['Outcome'].astype('bool')
df.dtypes['Outcome']
df.info()
df.describ
e().T

# Frequency# finding the


unique count df1 =
df['Outcome'].value_count
s()

#
displaying
df1
print(df1)
#mean
df.mean()
#median
df.median(
)
#mode df.mode()
#Variance
df.var()
#standard
deviation
df.std()
#
#kurtosis
df.kurtosis(axis=0,skipn
a=True)
df['Outcome'].kurtosis(axis=0,s
kipna=True) #skewness
# skewness along the
index axis df.skew(axis
= 0, skipna = True)

# skip the na values


# find skewness in
each row df.skew(axis
= 1, skipna = True)

#Pregnancy variable
preg_proportion =
np.array(df['Pregnancies'].value_counts())
preg_month =
np.array(df['Pregnancies'].value_counts().index)
preg_proportion_perc =
np.array(np.round(preg_proportion/
sum(preg_proportion),3)*100,dtype=int)

preg =
pd.DataFrame({'month':preg_month,'count_of_preg_prop':preg_p
roportion,'percentage_pro portion':preg_proportion_perc})
preg.set_index(['month'],inplac
e=True) preg.head(10)
sns.countplot(data=df['Outcome'])

sns.distplot(df['Pregnancies'])

sns.boxplot(data=df['Pregnancies'])
OUTPUT:
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :

You might also like