Practical Question – 1
PANDAS SERIES QUESTION - 1
Study the following series –
1 Sunday
2 Monday
3 Tuesday
4 Wednesday
5 Thursday
And create the same using
(a)np arrays
(b)Dictionary
Input:
(a) import pandas as pd
import numpy as np
days =['Sunday','Monday','Tuesday','Wednesday','Thursday']
array = np.array(days)
s1 = pd.Series (array, index = [1, 2, 3, 4, 5])
print (s1)
(b)import pandas as pd
dict1 = {1:'Sunday', 2:'Monday', 3:'Tuesday',4:'Wednesday', 5:'Thursday'}
s2 = pd.Series (dict1)
print (s2)
Output
(a)
(b)
Practical Question – 2
PANDAS SERIES QUESTION - 2
A series that stores the average marks scored by 10 students is as follows –
[90, 89, 78, 91, 80, 88, 95, 98, 75, 97]
Write a Input: to :
(a)Create a series using the given dataset with index values (1-10)
generated using arange ( ).
(b)Give name to the series as ‘AVERAGES’ and index values as ‘ROLL
NUMBER’.
(c)Display the top three averages.
(d) Display all mark averages less than 80.
(e) Update the mark averages of roll number (index) 5 to 82 and display the series.
(f) Display mark detail of roll number 7, 8 and 9.
Input:
import pandas as pd
import numpy as np
dataset = [90, 89, 78, 91, 80, 88, 95, 98, 75, 97]
index_array = np.arange(1, 11, 1)
s1 = pd.Series (dataset, index = index_array)
(a)print(s1)
(b)s1.name = 'AVERAGES' ; s1.index.name = 'ROLL NUMBER'
print (s1)
(c)print (s1.head(3))
(d) print(s1[s1<80])
(e) s1[5] = 82
print (s1)
(f) print(s1.iloc[7 : 10])
Output
(a)
(b)
(c)
(d)
(e)
(f)
Practical Question – 3
PANDAS SERIES QUESTION - 3
Write a program to store employees’ salary data of one year.
Write a Input: to do the following :
Salary_data = [120000, 120000, 130000, 115000, 300000, 150000, 100000, 250000, 160000,
400000, 250000, 350000]
Index = Jan, Feb, March, April, May, June, July, Aug, Sep, Oct, Nov, Dec
(a)Display salary data by slicing in 4 parts.
(b)Display salary of any April month.
(c)Apply increment of 10% into salary for all values.
(d) Give 2400 arrear to employees in April month.
Input:
import pandas as pd
salary_data = [120000, 120000, 130000, 115000, 300000,
150000, 100000, 250000, 160000, 400000, 250000, 350000]
index_data = ['Jan', 'Feb', 'March', 'April', 'May','June', 'July', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
s1 = pd.Series (salary_data, index = index_data)
print (s1)
(a)print ("First Quarter Salary Details : ")
print (s1.loc['Jan' : 'March'])
print ("Second Quarter Salary Details : ")
print (s1.loc['April' : 'June'])
print ("Third Quarter Salary Details : ")
print (s1.loc['July' : 'Sep'])
print ("Fourth Quarter Salary Details : ")
print (s1.loc['Oct' : 'Dec'])
(b)print (s1.loc['April':'April'])
(c)s2 = s1 + (s1*0.1)
print (s2)
(d) s1.loc['April'] = s1.loc['April'] + 2400
print (s1)
Output
(a)
(b)
(c)
(d)
Practical Question – 4
PANDAS SERIES QUESTION - 4
Create a Series as follows :
1 5 2 10 3 15 4 20 5 25
(a)Create a similar series but with indices 5, 4, 3, 2, 1
(b)Remove the entry with index 5
Input:
import pandas as pd
import numpy as np
a = np.arange(5, 26, 5)
s1 = pd.Series(a, index = [1, 2, 3, 4, 5])
print (s1)
(a)s2 = s1.reindex ([5, 4, 3, 2, 1])
print (s2)
(b)s3 = s1.drop(5)
print (s3)
Output
(i)
(b)
Practical Question – 5
DATAFRAME - CREATION
Study the following Data Frame representing quarterly sales data of 2017, 2018 and 2019 and
create the same using
(a)Dictionary of Series
(b)List of Dictionaries.
Input:
(a)import pandas as pd
s1 = pd.Series([400000, 350000, 470000, 450000], index
= ['Qtr1', 'Qtr2', 'Qtr3', 'Qtr4'])
s2 = pd.Series([420000, 370000, 490000, 470000], index
= ['Qtr1', 'Qtr2', 'Qtr3', 'Qtr4'])
s3 = pd.Series([430000, 380000, 500000, 480000], index
= ['Qtr1', 'Qtr2', 'Qtr3', 'Qtr4'])
data = {2017 : s1, 2018 : s2, 2019 : s3}
df1 = pd.DataFrame (data)
print (df1)
(b)L = [{2017 : 400000, 2018 : 420000, 2019 : 430000},
{2017 : 350000, 2018 : 370000, 2019 : 380000},
{2017 : 470000, 2018 : 490000, 2019 : 500000},
{2017 : 450000, 2018 : 470000, 2019 : 480000}]
df2 = pd.DataFrame (L, index = ['Qtr1', 'Qtr2', 'Qtr3',
'Qtr4'])
print (df2)
Output
(i)
(b)
Practical Question – 6
DATAFRAME – ADD AND REMOVE OPERATIONS
Create a Data Frame showing details of employees with Name, Department, Salary, and
Bonus amount. The employee Input: should be the indices of the Data Frame as shown.
Perform the following operations –
(a)Create the data frame using dictionary of lists and display the same.
(b)Sort the datatype in alphabetical order of name and display.
(c)Add a new column ‘Total Salary’ where ‘Total Salary = Salary +
Bonus’. Display the updated data frame.
(d) Remove all details of E103 since he left the company. Display the
modified dataset.
Input:
(a) import pandas as pd
s1 = {'Name' : ['Rohith', 'Ajay', 'Pankaj', 'Anumod'],
'Dept' : ['HR', 'Admin', 'Sales', 'Sales'],
'Salary': [5000, 4000, 3500, 3500],
'Bonus' : [3000, 2000, 1500, 1500]}
df1 = pd.DataFrame (s1, index = ['E101', 'E102', 'E103','E104'])
print (df1)
(b) df2 = df1.sort_values(by=['Name'])
print (df2)
(c) df1['Total Salary'] = df1['Salary'] + df1['Bonus']
print (df1)
(d) df1.drop('E103', inplace = True)
print (df1)
Output
(a)
(b)
(c)
(d)
Practical Question – 7
DATAFRAME - ITERATIONS
Create a Data Frame as shown using list of dictionaries.
Iterate over rows and columns and display results.
Input:
import pandas as pd
data = [{'Name': 'Aparna', 'Degree' : 'MBA', 'Score' :90},
{'Name' : 'Pankaj', 'Degree' : 'BCA', 'Score' : 40},
{'Name' : 'Sudhir', 'Degree' : 'M. Tech', 'Score' : 80},
{'Name' : 'Geeku' , 'Degree' : 'MBA', 'Score' : 98}]
df1 = pd.DataFrame (data)
print (df1)
for (x, y) in df1.iterrows():
print ("Iterating ROWS")
print ('Row Index', "\n", x)
print ('Row Values', "\n", y)
for (a, b) in df1.iteritems():
print ("Iterating COLUMNS")
print ('Column Name', "\n", a)
print ('Column Value', "\n", b)
Output
Practical Question – 8
DATAFRAME – ITERATION AND UPDATION
Write a program to iterate over a Data Frame containing names and marks, then calculate
grades as per marks (as per guidelines below) and add them to the grade column
Input:
import pandas as pd; import numpy as np
data = {'Name' : ['Sajeev', 'Rajeev', 'Sanjay', 'Abhay'],
'Marks': [76, 86, 55, 54],
'Grade': [np.NaN, np.NaN,np.NaN, np.NaN]}
print ("**** DataFrame before updation ****")
df1 = pd.DataFrame (data); print (df1)
for (x, y) in df1.iterrows():
if y[1] >= 90:
df1.loc[x, 'Grade'] = 'A+'
elif y[1] >= 70 and y[1] < 90:
df1.loc[x, 'Grade'] = 'A'
elif y[1] >= 60 and y[1] < 70:
df1.loc[x, 'Grade'] = 'B'
elif y[1] >= 50 and y[1] < 60:
df1.loc[x, 'Grade'] = 'C'
elif y[1] >= 40 and y[1] < 50:
df1.loc[x, 'Grade'] = 'D'
elif y[1] < 40:
df1.loc[x, 'Grade'] = 'F'
print ("**** DataFrame after updation ****")
print (df1)
Output
Practical Question – 9
DATAFRAME – STATISTICAL FUNCTIONS
Create a Data Frame names ‘Cricket’ and perform all statistical functions on the same.
Input:
import pandas as pd
data = {'Name' : ['Sachin', 'Dhoni', 'Virat', 'Rohit','Shikhar'],
'Age' : [26, 25, 25, 24, 31], 'Score' : [87,67, 89, 55, 47]}
df1 = pd.DataFrame (data); print (df1)
print ("Max score : ", df1['Score'].max())
print ("Min score : ", df1['Score'].min())
print ("Sum of score : ", df1['Score'].sum())
print ("Mean/Avg of score : ", df1['Score'].mean())
print ("Mode of score : ", df1['Score'].mode())
print ("Standard deviation of score : ",df1['Score'].std())
print ("Variance of score : ", df1['Score'].var())
Output
Practical Question –10
DATAFRAME – ADDITION OF ROWS AND COLUMNS
Consider the following Data Frame.
Create the above Data Frame and add the following information using append() function.
Add job information as follows: Engr, Engr, Dr, Dr, HR, Analyst, HR
Input:
import pandas as pd
data = {'Name' : ['Jack', 'Riti', 'Vikas', 'Neelu','John'],
'Age' : [34, 30, 31, 32, 16],
'City' : ['Sydney','Delhi', 'Mumbai', 'Banglore', 'New York'],
'Country' :['Australia', 'India', 'India', 'India', 'US']}
df1 = pd.DataFrame (data); print (df1)
df2 = df1.append({'Name' : 'Mike', 'Age' : 17, 'City' :'Las Vegas', 'Country' : 'US'},
ignore_index = True)
df3 = df2.append({'Name' : 'Saahil', 'Age' : 12, 'City': 'Mumbai', 'Country' : 'India'},
ignore_index = True)
print (df3)
df3 ['Job'] = ['Engr', 'Engr', 'Dr', 'Dr', 'HR','Analyst', 'HR']
print (df3)
Output
Practical Question –11
DATAFRAME – REMOVAL OF ROWS AND COLUMNS
Consider the following Data Frame.
(a)Create the data frame
(b)Remove all details of Alpa
(c)Remove English and IP columns
(d) Display Physics and Chemistry marks of Suman and Gayatri only.
Input:
import pandas as pd
data = {'Name' : ['Suman', 'Gayatri', 'Vishruti', 'Alpa','Hetal'],
'English' : [74, 79, 48, 53, 68], 'Physics' :[76, 78, 80, 76, 73],
'Chemistry' : [57, 74, 55, 89, 70],'Biology' : [76, 85, 63, 68, 59],
'IP' : [82, 93, 69, 98,79]}
df1 = pd.DataFrame (data); print(df1)
df1.drop(index = 3, inplace = True); print (df1)
df1.drop(columns = ['English', 'IP'], inplace = True);
print (df1)
print(df1.loc[0 : 1, 'Name' : 'Chemistry'])
Output
(i)
(b)
(c)
(d)
Practical Question –12
DATAFRAME – ACCESSING ROWS AND COLUMNS
Consider the given Data Frame :
(a)Display Accountancy and Bst Marks of Vijay and Deepak.
(b)Display all details of Rajat.
(c)Display Eco marks of all Students.
(d) Display all marks of Deepak and Ravi.
Input:
import pandas as pd
data = {'Eco' : [89, 45, 77, 62],
'Acc' : [79, 56,73, 42], 'BST' : [83, 39, 48, 72],
'House' : ['Mars','Mars', 'Saturn', 'Jupiter']}
df1 = pd.DataFrame (data, index = ['Vijay', 'Deepak','Ravi', 'Rajat'])
print (df1)
print (df1.loc['Vijay' : 'Deepak', 'Acc' : 'BST'])
print (df1['Rajat' :])
print (df1['Eco'])
print (df1.loc['Deepak' : 'Ravi', 'Eco' : 'BST'])
Output
(i)
(b)
(c)
(d)
Practical Question –13
DATAFRAME – ACCESSING ELEMENTS USING OPERATIONS
Create the Data Frame shown :
(a)Select rows where age is greater than 28
(b)Select all cases where age is greater than 28 and grade is “A”
(c)Select the degree cell where age is greater than 28 and grade is “A”
(d) Display details of MBA and MS graduates
(e) Update Robin’s upgrade to B
Input:
import pandas as pd; import numpy as np
data = {'first_name': ['Sam', 'Ziva', 'Kia', 'Robin','Kim'],
'degree': ["MBA", "MS", "Graduate", "Arts","MS"],
'nationality': ["USA", "India", "UK", "France","Canada"],
'age': [25, 29, 19, 21, 33], 'grade':['A+','A', 'C', np.NaN, 'B-']}
df1 = pd.DataFrame(data, columns = ['first_name','degree','nationality',
'age','grade'])
print(df1)
print(df1[df1['age']>28])
print (df1[(df1['age']>28) & (df1['grade'] =='A')])
print (df1[(df1['age']>28) & (df1['grade'] =='A')]['degree'])
print (df1[(df1['degree'] == 'MBA') | (df1['degree'] =='MS')])
df1.iat [3,4] = 'B'; print (df1)
Output
(i)
(b)
(c)
(d)
(e)
Practical Question –14
DATAFRAME – BOOLEAN INDEXING
Create a Data Frame containing online classes information as follows :
(a)Display all details of online classes.
(b)Display all records of True index.
(c)Display all records of False index.
Input:
import pandas as pd
data = {'Days' : ['Sunday', 'Monday', 'Tuesday','Wednesday', 'Thursday'],
'Noofclasses' : [6, 0, 3, 0,8]}
df1 = pd.DataFrame (data, index = [True, False, True,False, True])
print (df1)
print (df1.loc[True])
print (df1.loc[False])
Output
(i)
(b)
Practical Question –15
CSV IMPORT AND EXPORT
Import the following data from the CSV File “PriceList”.
Increase the price of all items by 2% and export the updated data to another CSV File
“PriceList_Updated”.
Input:
import pandas as pd
df = pd.read_csv("PriceList.csv")
print (df)
df['Price'] = df['Price'] + (df['Price']*0.02)
print(df)
df.to_csv(r"PriceList_Updated.csv")
Output
Practical Question –16
DATA HANDLING USING CSV FILES
Create a menu driven program to perform the following:
(a)Add details and create a file “stu.csv”.
(b) Update details and modify csv.
(c)Delete details.
(d) View details.
(e) Display Graph.
Input:
import pandas as pd; import matplotlib.pyplot as plt
print ("**** MENU ****")
print ("1. Add Details\n2. Update Details\n3. Delete Details\n4. View all Details\n5. Display
Graph\n")
choice = int(input("Enter choice (1-5) : "))
if choice == 1:
df = pd.DataFrame (columns = ['Roll No.', 'Name','Marks'])
n = int(input ("Enter no. of students : "))
for i in range (n):
rn = int(input("Roll number : "))
name = input ("Enter name : ")
marks = float(input("Enter marks : "))
df.loc[i] = [rn, name, marks]
print (df)
df.to_csv (r"Stu.csv", index = False)
elif choice == 2 :
df = pd.read_csv(r"Stu.csv")
rn = int(input("Roll number : "))
name = input ("Enter name : ")
marks = float(input("Enter marks : "))
index = rn-1
df.loc[index, 'Name'] = name
df.loc[index, 'Marks'] = marks
print (df)
df.to_csv(r"Stu.csv", index = False)
elif choice == 3:
df = pd.read_csv("Stu.csv")
rn = int(input("Roll number : "))
df1 = df.drop(rn-1)
print (df1)
df1.to_csv(r"Stu.csv")
elif choice == 4:
df = pd.read_csv(r"Stu.csv")
print (df)
elif choice == 5:
df1= pd.read_csv(r"Stu.csv")
print ("**** Your Graph ****")
x = df1['Name'].values.tolist()
y = df1['Marks'].values.tolist()
plt.bar(x, y, width = 0.5)
plt.xlabel ("Name"); plt.ylabel ("Marks")
plt.title("Students vs. Marks")
plt.show()
Output
(i)
(b)
(c)
(d)
(e)
Practical Question –17
DATA VISUALISATION – LINE AND BAR CHARTS
Consider the data given below. Using the above data, plot the following:
(a)A line chart depicting price of apps.
(b)A bar chart depicting download of apps.
(c)Divide the downloads value by 1000 and create a multi-bar chart
depicting price and converted download values.
The charts should have appropriate titles, legends and labels.
Input:
(i)
import matplotlib.pyplot as plt
apps = ['Angry Birds', 'Teen Titan', 'Marvel Comics','ColorMe', 'Fun Run',
'Crazy Taxi']
price = [75, 120, 190, 245, 550, 55]
plt.plot(apps, price, 'royalblue',
ls = '-.', linewidth= 0.7, marker = '*', ms = 10,
mec = 'black', mfc ='midnightblue')
plt.xlabel ('App Name'); plt.ylabel ('Price')
plt.title ('Apps and its Prices')
plt.show()
(b)
import matplotlib.pyplot as plt
apps = ['Angry Birds', 'Teen Titan', 'Marvel Comics','ColorMe', 'Fun Run',
'Crazy Taxi']
downloads = [197000, 209000, 414000, 196000, 272000,311000]
plt.bar(apps, downloads, width = 0.5, color ='royalblue')
plt.xlabel ('Apps'); plt.ylabel ('No. of downloads')
plt.title ('Apps and its number of downloads')
plt.show()
(c)
import matplotlib.pyplot as plt; import numpy as np
Apps = ['Angry Birds', 'Teen Titan', 'Marvel Comics','ColorMe', 'Fun Run',
'Crazy Taxi']
p = [75, 120, 190, 245, 550, 55]
d = ([197, 209, 414, 196, 272, 311])
a = np.arange(len(Apps))
plt.bar(a-0.2, p, color = 'b', width = 0.4, label ='Price')
plt.bar(a + 0.2, d, color = 'k', width = 0.4, label ='Downloads')
plt.xlabel ('Apps'); plt.ylabel ('Price & No. ofdownloads')
plt.xticks(a, Apps)
plt.title ('Apps, Prices and its number of downloads')
plt.legend(loc = 'upper left')
plt.show()
Output
(i)
(b)
(c)
Practical Question –18
DATA VISUALISATION – MULTILINE AND MULTIBAR CHARTS
Consider the data given below. Using the above data, plot the following:
(a)A line chart depicting rainfall trend from Jan to July.
(b)A multibar chart representing rainfall measurement for first quarter of year each of
North, South and central region.
The charts should have appropriate titles, legends, and labels.
Input:
(i)
import numpy as np
import matplotlib.pyplot as plt
month = ['Jan', 'Feb', 'March', 'April', 'May', 'June','July']
north = [140, 130, 130, 190, 160, 200, 150]
south = [160, 200, 130, 200, 200, 170, 110]
east = [140, 180, 150, 170, 190, 140, 170]
west = [180, 150, 200, 120, 180, 140, 110]
central = [110, 160, 130, 110, 120, 170, 130]
x_axis = np.arange (len (central))
plt.plot (x_axis, north, label = 'North')
plt.plot (x_axis, south, label = 'South')
plt.plot (x_axis, east, label = 'East')
plt.plot (x_axis, west, label = 'West')
plt.plot (x_axis, central, label = 'Central region')
plt.xlabel ('Months')
plt.ylabel ('Rainfall (in mm)')
plt.xticks (x_axis, month)
plt.title ('Trends in Rainfall from Jan to April')
plt.legend (loc = 'upper left')
plt.show()
(b)
import numpy as np
import matplotlib.pyplot as plt
quarter=['Jan','Feb','Mar']
north = [140, 130, 130]
south = [160, 200, 130]
central = [110, 160, 130]
x_axis = np.arange (len(north))
plt.bar(x_axis, north, width = 0.25, label = 'Northregion', color =
'springgreen')
plt.bar(x_axis-0.25, south, width = 0.25, label ='South region', color =
'crimson')
plt.bar(x_axis+0.25, central, width = 0.25, label ='Central region', color = 'gold')
plt.xlabel ('Months'); plt.ylabel ('Rainfall (in mm)')
plt.title ('Trends in Rainfall of First Quarter(Jan/Feb/March)')
plt.legend(loc = 'upper left')
plt.xticks (x_axis, quarter)
plt.grid (True)
plt.show()
Output
(i)
(b)
Practical Question –19
DATA VISUALISATION – SCHOOL RESULT ANALYSIS
Given the school result data. Analyze the performance of students using data visualization
techniques.
(a)Draw a bar chart to represent above data with appropriate labels and title.
(b)Given subject average data for 3 years. Draw a multi bar chart to
represent the above data with appropriate labels, title, and legend.
(c)Plot a histogram for Marks data of 20 students of a class. The data
(Marks of 20 students) is as follows:
[90, 99, 95, 92, 92, 90, 85, 82, 75, 78, 83, 82, 85, 90, 92, 98, 99, 100]
Input:
(i)
import matplotlib.pyplot as plt; import numpy as np
avg20 = [85, 88, 87, 73, 80, 90]; x_axis = np.arange(len(avg20))
plt.bar(x_axis, avg20, width = 0.2, color = 'khaki')
plt.xticks (x_axis, ['Eng', 'Eco', 'Bst', 'Acc','Entre', 'Eco'])
plt.title ('Subject-wise Average Marks in 2020')
plt.xlabel ('Subjects'); plt.ylabel ('Average Marks')
plt.grid (True)
plt.show()
(b)
import matplotlib.pyplot as plt; import numpy as np
avg18 = [87, 88, 90, 76, 82, 90]
avg19 = [86, 87, 87, 74, 81, 91]
avg20 = [85, 88, 87, 73, 80, 90]
x_axis = np.arange(len(avg18)) #[0, 1, 2, 3, 4, 5]
plt.bar(x_axis, avg18, width = 0.2, label = '2018')
plt.bar(x_axis+0.2, avg19, width = 0.2, label = '2019')
plt.bar(x_axis+0.4, avg20, width = 0.2, label = '2020')
plt.legend(loc = 'upper left')
plt.xticks = (x_axis, ['Eng', 'Eco', 'Bst', 'Acc','Entre', 'Eco'])
plt.title ('Subjects Average Marks of 3 Years')
plt.xlabel ('Subjects'); plt.ylabel ('Average Marks')
plt.grid (True)
plt.show()
(c)
import matplotlib.pyplot as plt
class1 = [90, 99, 95, 92, 92, 90, 85, 82, 75, 78, 83,82, 85, 90, 92, 98, 99, 100]
plt.hist(class1, bins = [75, 80, 85, 90, 95, 100],
color = "springgreen", edgecolor = "darkslategrey",linewidth = 2)
plt.xlabel('Marks')
plt.ylabel ('Frequency')
plt.show()
Output
(i)
(b)
(c)
Practical Question – 20
COVID DATA ANALYSIS
Export the CSV File “CovidData.csv” and plot a bar chart with Country vs. Total Confirmed
cases.
Input:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv ("CovidData.csv")
country = df['Country'].tolist()
confirmedcases = df['Total Confirmed Cases'].tolist()
plt.bar(country, confirmedcases, width = 0.4, align ='center', color =
'midnightblue')
plt.title ('Total Number of Covid Cases')
plt.xlabel ('Country')
plt.ylabel ('Total Cases')
plt.tick_params(axis='x', labelrotation = 90)
plt.show()
Output
Practical Question – 21
COMPANY SALES DATA ANALYSIS
Export the CSV File “CompanySalesData.csv” and plot a line chart with month number
against total profit.
Input:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("CompanySalesData.csv"); print (df)
x = df['month_num'].tolist()
y = df['total_profit'].tolist()
plt.plot (x, y, color = 'royalblue', linewidth = 1.0,
marker = '*', ms = 20, mec = 'black',
mfc ='midnightblue')
plt.xlabel ('Month Number')
plt.ylabel('Total_profit')
plt.title ('Company Sales Data')
plt.show()
Output