Al-Beruni City: Python Data Analysis MCQs (Practical Scenarios)
Section 1: Data Loading and Initial Inspection
1. Scenario: You need to load Departments.csv into a DataFrame df1 and Tools.csv into df2. Which library
must be imported first?
o a) import matplotlib.pyplot as plt
o b) import numpy as np
o c) import pandas as pd
o d) import csv Answer: c) import pandas as pd
2. Scenario: After loading df1 and df2 using pd.read_csv(), you want to verify the first row of df1. Which
command achieves this?
o a) print(df1.first())
o b) print(df1.loc[1])
o c) print(df1.head(1))
o d) print(df1.iloc[1]) Answer: c) print(df1.head(1))
3. Scenario: Imagine Tools.csv has 10 columns and 500 rows (excluding the header). After loading it into df2,
what would df2.shape return?
o a) (10, 500)
o b) (500, 10)
o c) (501, 10)
o d) (500, 11) Answer: b) (500, 10)
4. Scenario: If Departments.csv failed to load correctly due to an incorrect file path, what type of error would
Python typically raise?
o a) TypeError
o b) ValueError
o c) FileNotFoundError
o d) KeyError Answer: c) FileNotFoundError
Section 2: Merging and Missing Values
5. Scenario: You need to merge df1 (Departments) and df2 (Tools) keeping all rows from df2. Which
pd.merge parameter is crucial for this?
o a) how='inner'
o b) how='left', left_on='Abb', right_on='Abb' (assuming df1 is left)
o c) how='outer', on='Abb'
o d) how='right', left_on='Abb', right_on='Abb' (assuming df1 is left) Answer: d) how='right',
left_on='Abb', right_on='Abb' (Keeps all rows from the right DataFrame, df2)
6. Scenario: After merging, you run merged_df.isnull().sum(). The output shows Department 50. What does
this indicate?
o a) The 'Department' column has 50 unique values.
o b) 50 rows have missing values across all columns.
o c) The 'Department' column contains the string "50" in some rows.
o d) There are 50 missing (NaN) values specifically in the 'Department' column. Answer: d) There
are 50 missing (NaN) values specifically in the 'Department' column.
7. Scenario: You need to create a Python dictionary dept_map where keys are 'Abb' and values are
'Department' from df1 to help fill missing values. Assuming 'Abb' is unique in df1, which code snippet
works?
o a) dept_map = df1.groupby('Abb')['Department'].to_dict()
o b) dept_map = dict(df1[['Abb', 'Department']].values)
o c) dept_map = df1.set_index('Abb')['Department'].to_dict()
o d) dept_map = {row['Abb']: row['Department'] for index, row in df1.iterrows()} Answer: c)
dept_map = df1.set_index('Abb')['Department'].to_dict() (Option 'd' also works but is generally less
efficient than vectorized methods like 'c'). Option 'b' requires specific structuring. 'a' is for group
objects.
8. Scenario: To fill missing 'Department' values in merged_df using the dept_map created previously, which is
the most appropriate Pandas method?
o a) merged_df['Department'].fillna(merged_df['Abb'].apply(lambda x: dept_map.get(x)),
inplace=True)
o b) merged_df['Department'] = merged_df['Department'].replace(np.nan,
merged_df['Abb'].map(dept_map))
o c) merged_df['Department'].fillna(merged_df['Abb'].map(dept_map), inplace=True)
o d) merged_df.loc[merged_df['Department'].isnull(), 'Department'] =
merged_df['Abb'].map(dept_map) Answer: c)
merged_df['Department'].fillna(merged_df['Abb'].map(dept_map), inplace=True) (Option 'd' also
works, 'a' uses less direct apply, 'b' uses replace which is less standard for NaN filling with a map).
9. Scenario: After filling missing values, you want to confirm there are no NaNs left in the entire DataFrame
merged_df. Which command returns True if there are absolutely no missing values?
o a) merged_df.isnull().sum().sum() == 0
o b) merged_df.notnull().all().all()
o c) merged_df.isna().any().any() == False
o d) All of the above Answer: d) All of the above
10. Scenario: You need to save the cleaned merged DataFrame to a CSV file, excluding the DataFrame index.
Which parameter in to_csv() achieves this?
o a) index=False
o b) header=False
o c) save_index=False
o d) no_index=True Answer: a) index=False
Section 3: Removing Duplicates
11. Scenario: You need to remove duplicate rows based only on the combination of 'Abb' and 'Tool' columns in
merged_df. Which command is correct?
o a) unique_df = merged_df.drop_duplicates()
o b) unique_df = merged_df.drop_duplicates(subset=['Abb', 'Tool'])
o c) unique_df = merged_df.remove_duplicates(on=['Abb', 'Tool'])
o d) unique_df = merged_df[~merged_df.duplicated(subset=['Abb', 'Tool'])] Answer: b) unique_df =
merged_df.drop_duplicates(subset=['Abb', 'Tool']) (Option 'd' also works but 'b' is more direct).
12. Scenario: merged_df has shape (1000, 12). After running unique_df =
merged_df.drop_duplicates(subset=['Abb', 'Tool']), unique_df.shape is (950, 12). How many rows were
identified as duplicates and removed?
o a) 950
o b) 12
o c) 1000
o d) 50 Answer: d) 50 (1000 - 950)
13. Scenario: When removing duplicates using drop_duplicates(subset=['Abb', 'Tool']), which duplicate row is
kept by default?
o a) The last occurring row.
o b) The first occurring row.
o c) A randomly selected row.
o d) No rows are kept if duplicates exist. Answer: b) The first occurring row (controlled by the keep
parameter, which defaults to 'first').
14. Scenario: You save the unique_df to a CSV named '12345-Unique.csv', where 12345 is your CRN. Which
pandas function call achieves this?
o a) unique_df.save_csv('12345-Unique.csv', index=False)
o b) unique_df.to_excel('12345-Unique.csv', index=False)
o c) unique_df.to_csv('12345-Unique.csv', index=False)
o d) pd.write_csv(unique_df, '12345-Unique.csv', index=False) Answer: c) unique_df.to_csv('12345-
Unique.csv', index=False)
Section 4: Data Analysis
15. Scenario: You need to find the department with the highest variety (count of unique values) of 'Analysis'
types using the unique_df. Which code snippet finds the count of unique analysis types per department?
o a) unique_df.groupby('Department')['Analysis'].count()
o b) unique_df.groupby('Department')['Analysis'].value_counts()
o c) unique_df.groupby('Department')['Analysis'].nunique()
o d) unique_df['Department'].nunique() Answer: c)
unique_df.groupby('Department')['Analysis'].nunique()
16. Scenario: Following the previous question, how do you get the name of the department with the maximum
unique count? Let the result of the previous step be stored in a Series analysis_variety.
o a) analysis_variety.max()
o b) analysis_variety.idxmax()
o c) analysis_variety.sort_values(ascending=False).index[0]
o d) Both b and c Answer: d) Both b and c
17. Scenario: Task (d)(ii) asks for the "percentage of updating of each tool". Assuming the 'Updated' column
contains Boolean values (True/False) or 1/0, how could you calculate the percentage of entries for each
unique tool name that are marked as 'Updated' (True/1)?
o a) unique_df.groupby('Tool')['Updated'].mean() * 100
o b) unique_df['Updated'].value_counts(normalize=True) * 100
o c) unique_df.groupby('Tool')['Updated'].sum() / unique_df.groupby('Tool')['Updated'].count() *
100
o d) Both a and c Answer: d) Both a and c (mean() on boolean/1-0 data calculates the proportion of
True/1s).
18. Scenario: If a specific tool 'ToolX' appears 10 times in unique_df, and 3 of these entries have Updated ==
True, what would unique_df.groupby('Tool')['Updated'].mean().loc['ToolX'] return?
o a) 3
o b) 0.3
o c) 30
o d) 7 Answer: b) 0.3 (The mean of [True, True, True, False, False, False, False, False, False, False]
treated as [1, 1, 1, 0, 0, 0, 0, 0, 0, 0] is 3/10 = 0.3).
Section 5: Data Visualization
19. Scenario: To create a bar chart showing the count of tools per department, you first need to calculate these
counts. Which code prepares the data counts_per_dept for plotting?
o a) counts_per_dept = unique_df.groupby('Department')['Tool'].nunique()
o b) counts_per_dept = unique_df['Department'].value_counts()
o c) counts_per_dept = unique_df.groupby('Department').size()
o d) Both b and c Answer: d) Both b and c (value_counts() on the Department column or grouping by
Department and using size() or count() will give the number of rows/tool entries per department).
20. Scenario: You have the counts_per_dept Series. Which command using pandas plotting interface generates
the required vertical bar chart?
o a) counts_per_dept.plot(kind='pie')
o b) counts_per_dept.plot(kind='bar')
o c) counts_per_dept.plot(kind='line')
o d) counts_per_dept.plot.barh() Answer: b) counts_per_dept.plot(kind='bar')
21. Scenario: For the pie chart of the 'Analysis' column distribution, what data does the size of each slice
represent?
o a) The number of departments using that analysis type.
o b) The average usage date of that analysis type.
o c) The relative frequency (percentage) of each analysis type in the dataset.
o d) The number of tools performing that analysis type. Answer: c) The relative frequency
(percentage) of each analysis type in the dataset.
22. Scenario: Which code generates the data needed for the 'Analysis' pie chart?
o a) analysis_counts = unique_df.groupby('Analysis').size()
o b) analysis_counts = unique_df['Analysis'].value_counts()
o c) analysis_counts = unique_df['Analysis'].unique()
o d) Both a and b Answer: d) Both a and b
23. Scenario: Task (e)(iii) requires a bar plot showing the number of tools marked as "Updated". If you
interpret this as comparing the total count of 'Updated' entries vs 'Not Updated' entries, what data source
(Series) would you plot?
o a) unique_df['Updated'].value_counts()
o b) unique_df.groupby('Updated').size()
o c) unique_df.groupby('Tool')['Updated'].sum()
o d) Both a and b Answer: d) Both a and b
24. Scenario: Given the data updated_counts = unique_df['Updated'].value_counts(), which command
generates the bar plot comparing the counts of True/False (or Yes/No) values?
o a) updated_counts.plot(kind='pie')
o b) updated_counts.plot(kind='barh')
o c) updated_counts.plot(kind='line')
o d) updated_counts.plot(kind='bar') Answer: d) updated_counts.plot(kind='bar')
Section 6: Python/Pandas Concepts & Advanced Scenarios
25. Scenario: If the 'Date' column was loaded as strings instead of datetime objects, which Pandas function is
used to convert it correctly?
o a) pd.to_datetime(unique_df['Date'])
o b) unique_df['Date'].astype('datetime64[ns]')
o c) pd.convert_dtypes(unique_df['Date'])
o d) Both a and b Answer: d) Both a and b
26. Scenario: You want to calculate the standard deviation of the number of tools used per department. Which
sequence of operations is needed?
o a) Calculate value_counts() on 'Department', then apply .std().
o b) Group by 'Department', count 'Tool' (size()), then apply .std() to the resulting Series.
o c) Calculate standard deviation directly on the 'Tool' column.
o d) Use unique_df.describe() and find the 'std' row for 'Department'. Answer: b) Group by
'Department', count 'Tool' (size()), then apply .std() to the resulting Series.
27. Scenario: Suppose you want to find if there's a correlation between the number of tools a department uses
and the variety of analysis types it employs. Which correlation method in Pandas would be suitable after
calculating these two series (tools_count and analysis_variety)?
o a) tools_count.corr(analysis_variety)
o b) pd.DataFrame({'tools': tools_count, 'variety': analysis_variety}).corr()
o c) np.correlate(tools_count, analysis_variety)
o d) Both a and b provide the correlation coefficient between the two measures. Answer: d) Both a
and b provide the correlation coefficient between the two measures.
28. Scenario: Which NumPy function could be used to efficiently check if any value in the 'Updated' column
(once converted to boolean) is True?
o a) np.sum(unique_df['Updated'])
o b) np.any(unique_df['Updated'])
o c) np.all(unique_df['Updated'])
o d) np.mean(unique_df['Updated']) Answer: b) np.any(unique_df['Updated'])
29. Scenario: Imagine you want to select all rows from unique_df where the 'Tool desc' column contains the
word "AI". Which Pandas string method is appropriate?
o a) unique_df[unique_df['Tool desc'].contains('AI')]
o b) unique_df[unique_df['Tool desc'].str.contains('AI')]
o c) unique_df[unique_df['Tool desc'].find('AI') != -1]
o d) unique_df[unique_df['Tool desc'].match('AI')] Answer: b) unique_df[unique_df['Tool
desc'].str.contains('AI')]
30. Scenario: To calculate the median 'first used' Date for tools within each 'Analysis' type, you would group
by 'Analysis' and then apply which aggregation function to the 'Date' column (assuming it's datetime)?
o a) .mean()
o b) .count()
o c) .median()
o d) .mode() Answer: c) .median()
31. Scenario: If you create a new column 'Years Since First Use' based on the 'Date' column (datetime) and the
current date (pd.Timestamp.now()), which expression calculates this approximately?
o a) (pd.Timestamp.now() - unique_df['Date']).dt.years
o b) (pd.Timestamp.now().year - unique_df['Date'].dt.year)
o c) (pd.Timestamp.now() - unique_df['Date']) / np.timedelta64(1, 'Y')
o d) Both b and c provide valid ways to estimate years (b is simpler integer difference, c is more
precise float). Answer: d) Both b and c provide valid ways to estimate years (b is simpler integer
difference, c is more precise float).
32. Scenario: Applying a function row-by-row using .apply(..., axis=1) is generally:
o a) More efficient than vectorized operations in Pandas/NumPy.
o b) Less efficient than vectorized operations in Pandas/NumPy.
o c) The only way to perform complex row-wise calculations.
o d) Primarily used for merging DataFrames. Answer: b) Less efficient than vectorized operations in
Pandas/NumPy.
33. Scenario: If df1 had 10 departments and df2 had tools used by only 8 of these departments, what would be
the result of an inner merge on 'Abb'?
o a) Rows corresponding to all 10 departments.
o b) Rows corresponding to only the 8 departments present in df2.
o c) Rows corresponding to the 2 departments only present in df1.
o d) An error because not all keys match. Answer: b) Rows corresponding to only the 8 departments
present in df2.
34. Scenario: Which Python data structure is returned by df1.set_index('Abb')['Department'] before calling
.to_dict()?
o a) A NumPy array
o b) A Python list
o c) A Pandas DataFrame
o d) A Pandas Series Answer: d) A Pandas Series
35. Scenario: To find tools used only by the 'Education' department, which approach is most direct using
pandas filtering and grouping?
o a) unique_df.groupby('Tool').filter(lambda x: (x['Department'] == 'Education').all() and
len(x['Department'].unique()) == 1)
o b) tool_counts = unique_df.groupby('Tool')['Department'].nunique(); single_dept_tools =
tool_counts[tool_counts == 1].index; unique_df[(unique_df['Tool'].isin(single_dept_tools)) &
(unique_df['Department'] == 'Education')]
o c) unique_df[unique_df['Department'] == 'Education']['Tool'].unique() (This gets tools used by
Education, not only Education)
o d) Both a and b achieve the goal (b is often more readable). Answer: d) Both a and b achieve the
goal (b is often more readable).
36. Scenario: If the 'Updated' column contained strings like 'Yes', 'No', 'YES', 'no', what's the first step using
pandas string methods before mapping to Boolean?
o a) unique_df['Updated'].str.upper()
o b) unique_df['Updated'].str.lower()
o c) unique_df['Updated'].str.capitalize()
o d) unique_df['Updated'].str.strip() Answer: b) unique_df['Updated'].str.lower() (Or upper, to
standardize case).
37. Scenario: Displaying the .shape before and after removing duplicates directly quantifies:
o a) The number of missing values handled.
o b) The number of rows removed due to duplication based on the specified columns.
o c) The number of columns remaining after cleaning.
o d) The change in memory usage of the DataFrame. Answer: b) The number of rows removed due to
duplication based on the specified columns.
38. Scenario: Which library is most commonly used alongside Pandas for numerical operations and underpins
many Pandas functionalities?
o a) Matplotlib
o b) SciPy
o c) NumPy
o d) Scikit-learn Answer: c) NumPy
39. Scenario: If you were asked to build a function that takes a department name as input and returns a list of
unique tools used by that department from unique_df, which code structure would be appropriate?
o a) def get_tools(dept_name): return unique_df[unique_df['Department'] ==
dept_name]['Tool'].tolist()
o b) def get_tools(dept_name): return unique_df[unique_df['Department'] ==
dept_name]['Tool'].unique().tolist()
o c) def get_tools(dept_name): return
unique_df.groupby('Department')['Tool'].unique().loc[dept_name].tolist()
o d) Both b and c Answer: d) Both b and c
40. Scenario: To quickly get summary statistics (count, mean, std, min, max, quartiles) for numerical columns
potentially present in unique_df (if any existed), which Pandas method is used?
o a) .info()
o b) .describe()
o c) .head()
o d) .corr() Answer: b) .describe()
41. Scenario: You want to add a column IsHealthDept which is True if Department is 'Health' and False
otherwise. Which is a correct way?
o a) unique_df['IsHealthDept'] = unique_df['Department'] == 'Health'
o b) unique_df['IsHealthDept'] = unique_df['Department'].apply(lambda x: True if x == 'Health' else
False)
o c) unique_df['IsHealthDept'] = np.where(unique_df['Department'] == 'Health', True, False)
o d) All of the above Answer: d) All of the above
42. Scenario: You want to filter unique_df to show only the tools used by the 'Health' department OR the
'Education' department. Which code works?
o a) unique_df[(unique_df['Department'] == 'Health') & (unique_df['Department'] == 'Education')]
o b) unique_df[unique_df['Department'].isin(['Health', 'Education'])]
o c) unique_df.query("Department == 'Health' | Department == 'Education'")
o d) Both b and c Answer: d) Both b and c
43. Scenario: After finding the department with the highest variety of 'Analysis' types (let's say its name is
stored in dept_max_variety), how would you filter unique_df to show only the rows corresponding to this
specific department?
o a) unique_df[unique_df['Department'] == dept_max_variety]
o b) unique_df.loc[dept_max_variety] (Incorrect indexing for this)
o c) unique_df.filter(like=dept_max_variety, axis=0) (Incorrect use of filter)
o d) unique_df.groupby('Department').get_group(dept_max_variety)
o e) Both a and d Answer: e) Both a and d
44. Scenario: Imagine the 'Date' column is already converted to datetime objects. How would you find the
number of tools first used specifically in the year 2020?
o a) unique_df[unique_df['Date'].dt.year == 2020].shape[0]
o b) unique_df['Date'].dt.year.value_counts().loc[2020] (Might raise KeyError if no tools from 2020)
o c) sum(unique_df['Date'].dt.year == 2020)
o d) All of the above (with a note about potential KeyError for b) Answer: d) All of the above (with a
note about potential KeyError for b) - a and c are generally safer.
45. Scenario: You want to count how many unique tools have a description ('Tool desc') longer than 50
characters. Which is the correct approach?
o a) unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].count() (Counts rows, not unique tools)
o b) unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].nunique()
o c) sum(unique_df['Tool desc'].str.len() > 50) (Counts rows)
o d) len(unique_df[unique_df['Tool desc'].str.len() > 50]) (Counts rows) Answer: b)
unique_df[unique_df['Tool desc'].str.len() > 50]['Tool'].nunique()
46. Scenario: How would you calculate the total number of tool entries that are both for the 'Public Safety'
department and have the 'Analysis' type 'Predictive'?
o a) unique_df.query("Department == 'Public Safety' and Analysis == 'Predictive'").shape[0]
o b) len(unique_df[(unique_df['Department'] == 'Public Safety') & (unique_df['Analysis'] ==
'Predictive')])
o c) sum((unique_df['Department'] == 'Public Safety') & (unique_df['Analysis'] == 'Predictive'))
o d) All of the above Answer: d) All of the above
47. Scenario: You need to create a Series showing the most frequent 'Analysis' type used by each department.
Which combination of Pandas methods is most suitable?
o a) unique_df.groupby('Department')['Analysis'].mode().reset_index(level=1, drop=True) (Mode can
return multiple values if tied, needs handling)
o b) unique_df.groupby('Department')['Analysis'].value_counts().idxmax() (This finds the overall max
combo, not per dept)
o c) unique_df.groupby('Department')['Analysis'].describe()['top']
o d) Both a (with handling for ties) and c provide a way to get the most frequent type per department.
Answer: d) Both a (with handling for ties) and c provide a way to get the most frequent type per
department. (c is often simpler if only one mode is needed).
48. Scenario: What pandas command would select all columns except for 'Tool desc' and 'Output' from
unique_df?
o a) unique_df.drop(columns=['Tool desc', 'Output'])
o b) unique_df.select(lambda col: col not in ['Tool desc', 'Output'], axis=1) (Select is not standard)
o c) unique_df.loc[:, ~unique_df.columns.isin(['Tool desc', 'Output'])]
o d) Both a and c Answer: d) Both a and c
49. Scenario: If you wanted to see if any 'Tool' name appears within its own 'Tool desc' column (e.g., tool
'Analyzer' is mentioned in its description), how might you check this for the first 10 rows? (Requires
combining columns row-wise)
o a) unique_df.head(10).apply(lambda row: row['Tool'] in row['Tool desc'], axis=1)
o b) unique_df['Tool'].head(10).isin(unique_df['Tool desc'].head(10)) (Checks if Tool name equals
description)
o c) [unique_df['Tool'][i] in unique_df['Tool desc'][i] for i in range(10)] (Assuming default integer
index)
o d) Both a and c (a is more robust to index changes) Answer: d) Both a and c (a is more robust to
index changes)
50. Scenario: You want to replace the 'Analysis' type 'Descriptive' with 'Summary' and 'Predictive' with
'Forecast' only in the 'Analysis' column of unique_df. Which command works best?
o a) unique_df['Analysis'].replace({'Descriptive': 'Summary', 'Predictive': 'Forecast'}, inplace=True)
o b) unique_df['Analysis'].map({'Descriptive': 'Summary', 'Predictive': 'Forecast'}) (Map replaces non-
matching with NaN unless default is set)
o c) unique_df.replace({'Analysis': {'Descriptive': 'Summary', 'Predictive': 'Forecast'}}, inplace=True)
o d) Both a and c Answer: d) Both a and c (a targets the specific column, c targets the value within the
specified column dictionary).
Python Practical MCQs — Al-Beruni City Case Study Context
Data Loading & Library Import MCQs
MCQ 1
Which command is correct to import Pandas library for data manipulation?
A) import panda as pd
B) import pandas as pd
C) import pandas.dataframe
D) from pandas import *
Answer: B
MCQ 2
If the file Departments.csv is located in the working directory, which code will load it into DataFrame df1?
A) df1 = pd.readfile('Departments.csv')
B) df1 = pd.read_csv('Departments.csv')
C) df1 = pd.load_csv('Departments.csv')
D) df1 = pd.readfile_csv('Departments.csv')
Answer: B
MCQ 3
To load Tools.csv file in df2 and view the first row only:
A) df2.head()
B) df2.loc[0]
C) df2.iloc[0]
D) df2.head(1)
Answer: D
MCQ 4
Which Python library is essential for numerical analysis and array operations in this assignment?
A) Matplotlib
B) Pandas
C) NumPy
D) Seaborn
Answer: C
MCQ 5
Which of the following is the correct command to display column names of df1?
A) df1.column_names()
B) df1.columns
C) df1.column()
D) df1.col_names
Answer: B
Data Cleaning & Merging MCQs
MCQ 6
To merge df1 and df2 such that all rows of df2 must appear in merged data:
A) pd.merge(df1, df2, how="inner")
B) pd.merge(df1, df2, how="outer")
C) pd.merge(df1, df2, how="right")
D) pd.merge(df1, df2, how="left")
Answer: C
MCQ 7
To check total missing values in each column of a dataframe:
A) df.isnull().sum()
B) df.isnull.count()
C) df.checknull()
D) df.isna()
Answer: A
MCQ 8
If 'Department' column has missing values and we have a dictionary mapping, which method should be used to fill
it?
A) fillna()
B) map().fillna()
C) replace()
D) dropna()
Answer: B
MCQ 9
After filling missing values, which command ensures there are no missing values?
A) df.isna().sum()
B) df.isnull().sum() == 0
C) df.dropna()
D) df.notnull().sum()
Answer: B
MCQ 10
To save updated dataframe to a new csv file without index:
A) df.to_csv('filename.csv')
B) df.to_csv('filename.csv', index=True)
C) df.save_csv('filename.csv')
D) df.to_csv('filename.csv', index=False)
Answer: D
Duplicate Handling MCQs
MCQ 11
Command to remove duplicates based on 'Abb' and 'Tool Name' fields:
A) df.drop_duplicates(['Abb', 'Tool Name'], inplace=True)
B) df.drop_duplicates()
C) df.unique()
D) df.dropna()
Answer: A
MCQ 12
To check the shape of dataframe:
A) df.shape()
B) df.size()
C) df.shape
D) df.count()
Answer: C
MCQ 13
To save the dataframe after removing duplicates as per assignment requirement:
A) df.to_csv('CRN-Unique.csv', index=False)
B) df.save('CRN-Unique.csv')
C) df.save_csv('CRN-Unique.csv')
D) df.to_csv('CRN_Unique.csv')
Answer: A
Data Analysis MCQs
MCQ 14
To find the department with the highest variety of 'Analysis' types:
A) df['Analysis'].value_counts()
B) df.groupby('Department')['Analysis'].nunique().idxmax()
C) df.groupby('Analysis')['Department'].count()
D) df['Department'].nunique()
Answer: B
MCQ 15
To calculate % of tools updated:
A) (df['Updated']=='Yes').sum() / len(df) * 100
B) df['Updated'].mean()
C) df['Updated'].value_counts() / df['Updated'].sum()
D) df['Updated'].count() / df['Updated'].sum()
Answer: A
Data Visualization MCQs
MCQ 16
To plot the count of tools per department using Matplotlib:
A) df['Department'].plot(kind='bar')
B) df['Department'].value_counts().plot(kind='bar')
C) sns.barplot(x='Department', y='Tool Name', data=df)
D) plt.bar(df['Department'], df['Tool Name'])
Answer: B
MCQ 17
To generate a pie chart for the 'Analysis' column:
A) df['Analysis'].value_counts().plot.pie()
B) plt.pie(df['Analysis'])
C) df['Analysis'].plot(kind='pie')
D) sns.pieplot(df['Analysis'])
Answer: A
MCQ 18
To create a bar plot for tools marked as "Updated" using Seaborn:
A) sns.countplot(x='Updated', data=df)
B) sns.barplot(x='Updated', y='Tool Name', data=df)
C) df['Updated'].plot(kind='bar')
D) plt.bar('Updated', 'Tool Name')
Answer: A
MCQ 19
To rotate x-axis labels by 45 degrees for better readability:
A) plt.xticks(rotation=45)
B) plt.xlabels(45)
C) plt.xlabel(rotation=45)
D) plt.rotate(45)
Answer: A
MCQ 20
To add values inside the slices of a pie chart in Matplotlib:
A) autopct='%1.1f%%'
B) data_label='inside'
C) plt.labels(inside=True)
D) df.plot.pie(labels='inside')
Answer: A
MCQ 21
To find total number of tools used per department after removing duplicates:
A) df['Department'].value_counts()
B) df.groupby('Department')['Tool Name'].count()
C) df.groupby('Tool Name')['Department'].count()
D) df['Tool Name'].count()
Answer: B
MCQ 22
To calculate mean of numerical column 'Population':
A) df['Population'].mean()
B) df.mean('Population')
C) np.mean(df['Population'])
D) Both A and C
Answer: D
MCQ 23
To calculate standard deviation of numerical column:
A) df['Population'].std()
B) df.std('Population')
C) np.std(df['Population'])
D) Both A and C
Answer: D
MCQ 24
To check correlation between numerical columns in dataframe:
A) df.corr()
B) df.correlation()
C) df.cov()
D) df.group_corr()
Answer: A
MCQ 25
To apply filter and show records where 'Updated' is 'No':
A) df[df['Updated'] == 'No']
B) df.loc[df['Updated'] == 'No']
C) df.query("Updated == 'No'")
D) All of the above
Answer: D
MCQ 26
In an ETL process, which is part of "Extract" in Python?
A) Reading csv using pandas
B) Using API to fetch data
C) SQL Query execution
D) All of the above
Answer: D
MCQ 27
In "Transform" step of ETL, you perform:
A) Handling Missing Values
B) Data Cleaning
C) Column Renaming
D) All of the above
Answer: D
MCQ 28
To calculate median of numerical column:
A) df['Population'].median()
B) df.median('Population')
C) np.median(df['Population'])
D) Both A and C
Answer: D
MCQ 29
To remove unwanted whitespaces from string columns:
A) df['Column'] = df['Column'].str.strip()
B) df['Column'] = df['Column'].strip()
C) df['Column'] = df.strip('Column')
D) df['Column'].remove_whitespace()
Answer: A
MCQ 30
To replace missing values with 'Unknown':
A) df.fillna('Unknown')
B) df.replace(np.nan, 'Unknown')
C) df['Column'] = df['Column'].fillna('Unknown')
D) All of the above
Answer: D
MCQ 31
Which pandas function returns basic statistics like count, mean, std, min, max?
A) df.stats()
B) df.describe()
C) df.summary()
D) df.explain()
Answer: B
MCQ 32
To reset index after dropping rows:
A) df.reset_index()
B) df.reset_index(drop=True, inplace=True)
C) df.index_reset()
D) df.drop_index()
Answer: B
MCQ 33
Which visualization is best to show distribution of numerical column?
A) Line Chart
B) Histogram
C) Pie Chart
D) Bar Chart
Answer: B
MCQ 34
To export only selected columns to CSV:
A) df[['col1', 'col2']].to_csv('output.csv', index=False)
B) df.select(['col1','col2']).to_csv('output.csv')
C) df[['col1','col2']].save('output.csv')
D) df['col1','col2'].export_csv()
Answer: A
MCQ 35
In Market Basket Analysis using Python, which library is commonly used?
A) pandas
B) numpy
C) mlxtend
D) seaborn
Answer: C
MCQ 36
For predictive maintenance model, which technique is preferred?
A) Regression
B) Clustering
C) Classification
D) Time Series Forecasting
Answer: D
MCQ 37
Which function is used to create new calculated column in dataframe?
A) df.create_column()
B) df['New_Column'] = ...
C) df.new_column()
D) df.add_column()
Answer: B
MCQ 38
Which method provides highest-level summary of dataframe?
A) df.info()
B) df.summary()
C) df.describe()
D) df.columns
Answer: A
MCQ 39
In scatter plot to show correlation between two numerical columns:
A) sns.scatterplot(x='col1', y='col2', data=df)
B) plt.scatter(df['col1'], df['col2'])
C) Both A and B
D) df.plot.scatter('col1','col2')
Answer: C
MCQ 40
To group data by Department and calculate sum of 'Population':
A) df.groupby('Department')['Population'].sum()
B) df.sum('Population').groupby('Department')
C) df.group('Department').sum('Population')
D) groupby(df['Department'])
Answer: A
MCQ 41
To create a bar chart using Seaborn:
A) sns.barplot(x='Department', y='Population', data=df)
B) sns.histplot(x='Department', y='Population', data=df)
C) sns.countplot(x='Department', data=df)
D) plt.bar(x='Department', height='Population')
Answer: A
MCQ 42
In fraud detection model using Python, which technique is most suitable?
A) Clustering
B) Classification
C) Time Series
D) PCA
Answer: B
MCQ 43
To drop rows where all values are NaN:
A) df.dropna(how='all')
B) df.dropna(all=True)
C) df.drop_allna()
D) df.remove_blank_rows()
Answer: A
MCQ 44
To create pivot table in pandas:
A) df.pivot_table(index='Department', values='Population', aggfunc='sum')
B) df.pivot('Department', 'Population')
C) df.create_pivot('Department')
D) df.group_pivot()
Answer: A
MCQ 45
To calculate variance of numerical column:
A) df['Population'].var()
B) np.var(df['Population'])
C) df.var('Population')
D) Both A and B
Answer: D
MCQ 46
To filter dataframe where Population > 1000:
A) df[df['Population'] > 1000]
B) df.loc[df['Population'] > 1000]
C) df.query("Population > 1000")
D) All of the above
Answer: D
MCQ 47
To show last 5 rows of dataframe:
A) df.tail()
B) df.head()
C) df[-5:]
D) Both A and C
Answer: D
MCQ 48
For supply chain optimization model using Python, which technique is used?
A) Linear Programming
B) Clustering
C) Regression
D) Market Basket
Answer: A
MCQ 49
To check datatype of all columns:
A) df.dtypes
B) df.types()
C) df.datatypes()
D) df.columns.dtypes
Answer: A
MCQ 50
To convert a column to datetime format:
A) pd.to_datetime(df['Date'])
B) df['Date'].astype('datetime')
C) df['Date'].convert('datetime')
D) Both A and B
Answer: A