TRICHY ENGINEERING COLLEGE
(A Unit of SS Group of Institutions)
Approved by AICTE & Affiliated to Anna University, Chennai
An ISO 9001:2015 Certified Institution
Sivagnanam Nagar, Trichy-Chennai NH, Konalai, Trichy - 621 105.
UNIT II
EDA USING PYTHON
1. What is the primary purpose of Exploratory Data Analysis (EDA)?
A. To create machine learning models
B. To summarize the main characteristics of the data
C. To clean the data
D. To reduce the dimensionality of the data
Answer: B. To summarize the main characteristics of the data
2. Which Python library is most commonly used for data manipulation in EDA?
A. Matplotlib
B. Seaborn
C. Pandas
D. NumPy
Answer: C. Pandas
3. Which function is used in Pandas to display the first few rows of a DataFrame?
A. df.tail()
B. df.show()
C. df.head()
D. df.display()
Answer: C. df.head()
4. What is the primary use of the describe() function in Pandas?
A. To plot a graph
B. To generate summary statistics
C. To merge DataFrames
D. To group data
Answer: B. To generate summary statistics
5. Which of the following Python libraries is primarily used for data visualization in EDA?
A. Scikit-learn
B. Pandas
C. Seaborn
D. TensorFlow
Answer: C. Seaborn
6. Which method is used to check for missing values in a Pandas DataFrame?
A. df.isnull()
TRICHY ENGINEERING COLLEGE
(A Unit of SS Group of Institutions)
Approved by AICTE & Affiliated to Anna University, Chennai
An ISO 9001:2015 Certified Institution
Sivagnanam Nagar, Trichy-Chennai NH, Konalai, Trichy - 621 105.
B. df.missing()
C. df.nancheck()
D. df.missvals()
Answer: A. df.isnull()
7. In Seaborn, which plot is most suitable for visualizing the relationship between two
continuous variables?
A. Boxplot
B. Histogram
C. Scatterplot
D. Barplot
Answer: C. Scatterplot
8. Which function is used to plot a histogram in Matplotlib?
A. plt.bar()
B. plt.hist()
C. plt.plot()
D. plt.scatter()
Answer: B. plt.hist()
9. How do you calculate the correlation matrix of a Pandas DataFrame?
A. df.corr()
B. df.cov()
C. df.corrcoef()
D. df.describe()
Answer: A. df.corr()
10. Which of the following is a common method to handle missing data in a DataFrame?
A. Drop missing values
B. Impute missing values with mean/median/mode
C. Leave the missing values as they are
D. All of the above
Answer: D. All of the above
11. Which Seaborn function is used to plot a heatmap of the correlation matrix?
A. sns.corrplot()
B. sns.heatmap()
C. sns.pairplot()
D. sns.lineplot()
Answer: B. sns.heatmap()
TRICHY ENGINEERING COLLEGE
(A Unit of SS Group of Institutions)
Approved by AICTE & Affiliated to Anna University, Chennai
An ISO 9001:2015 Certified Institution
Sivagnanam Nagar, Trichy-Chennai NH, Konalai, Trichy - 621 105.
12. Which plot is useful for detecting outliers in a dataset?
A. Line plot
B. Box plot
C. Bar plot
D. Pie chart
Answer: B. Box plot
13. What does the function value_counts() do in a Pandas DataFrame?
A. Displays unique values of a column
B. Counts the frequency of unique values in a column
C. Counts missing values in a column
D. Plots the unique values
Answer: B. Counts the frequency of unique values in a column
14. Which of the following is NOT a type of join operation in Pandas?
A. Inner Join
B. Outer Join
C. Middle Join
D. Left Join
Answer: C. Middle Join
15. Which of the following methods can be used for data normalization?
A. Min-Max Scaling
B. Z-score Standardization
C. Log Transformation
D. All of the above
Answer: D. All of the above