Python Codes
Importing dataset into Jupyter Notebook
• Before you can import a dataset, you must import Pandas
library/package.
• Import Pandas:
import pandas as pd
# read a dataset from tip.csv file and store it in a dataframe named df
df = pd.read_csv('tip.csv')
# three ways to display the content of the dataset
df
display(df)
print(df)
# to load a dataset that contains a separator
# read a dataset from revenue-profit.csv file and store it in a dataframe
named df
df = pd.read_csv('revenue-profit.csv', sep= '; ' )
display(df)
Additional codes:
# to show all rows and columns of the dataset
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# to know the data type of each column
df.info()
# to know which columns that contain numerical value for calculation
df.describe()
[By: Madam Azimah, ICT Department, CFSIIUM)
Calculating Measures of Central Tendency (Mean, Median, Mode)
• Before you can calculate mean/median/mode, you must import
Statistics or Numpy library/package.
• Import Statistics or Numpy:
import statistics as st
import numpy as np
• Choose whether you want to use Statistics or Numpy.
Using Statistics:
mean = st.mean(df.column_name)
median = st.median(df.column_name)
mode = st.mode(df.column_name)
Using Numpy: (Note: mode does not exist in Numpy)
mean = np.mean(df.column_name)
median = np.median(df.column_name)
[By: Madam Azimah, ICT Department, CFSIIUM)