Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views6 pages

2 Mark Dev

The document provides a structured format for answering 35 questions related to data exploration, focusing on definitions and examples for each topic. It covers key concepts such as cases, variables, Twyman's Law, Gini coefficient, and various statistical methods and visualizations. Additionally, it offers insights into exploratory data analysis and the roles of data analysts.

Uploaded by

Eughene Yū
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

2 Mark Dev

The document provides a structured format for answering 35 questions related to data exploration, focusing on definitions and examples for each topic. It covers key concepts such as cases, variables, Twyman's Law, Gini coefficient, and various statistical methods and visualizations. Additionally, it offers insights into exploratory data analysis and the roles of data analysts.

Uploaded by

Eughene Yū
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Alright Bhavik I’ll give you all 35 questions rewritten in proper Part A style –

each answer 4–5 lines, 40–60 words, definition + example/point.

DEV – Part A (2 Marks Answers)

Unit 1: Basics of Data Exploration

1. Two basic organizing concepts of data analysis


The two basic concepts are cases (units about which information is collected) and
variables (characteristics measured across cases).
Example: In a school dataset, each student is a case, while age, marks, and attendance
are variables.

2. Cases and Variables


A case is a single observation in a dataset, while a variable is a property or
measurement that varies across cases.
Example: In e-commerce data, each order is a case, and variables include order value,
product type, and payment method.

3. Twyman’s Law
Twyman’s Law states: “The more interesting or unusual the data appears, the more
likely it is wrong.”
Example: If a survey shows 120% participation rate, it indicates an error rather than a
meaningful insight.

4. Responsibilities of a Data Analyst (+ example)


A data analyst collects, cleans, analyzes, and visualizes data for decision-making.
Example: In e-commerce, if sales drop suddenly, the analyst investigates causes like
low website traffic, competitor discounts, or stock issues.

5. Gini Coefficient & Inequality


The Gini coefficient measures inequality, ranging from 0 (perfect equality) to 1
(maximum inequality).
Example: A Gini of 0.20 shows fair income distribution, while 0.60 indicates sharp
inequality within the population.
6. Features visible in Histogram
Histograms reveal shape (normal/skewed), center (mean/median), spread
(range/variance), and outliers.
Example: A bell-shaped histogram of exam marks indicates most students scored near
the average.

7. Mean vs Hanning
The mean shows central tendency of data, while Hanning smoothing reduces
fluctuations in time series.
Example: Mean is used for average salaries, while Hanning helps smooth noisy stock
price movements.

8. Easy way to visualize distribution


The simplest way is using a histogram or boxplot.
Example: A histogram of monthly incomes shows distribution frequency, while a boxplot
highlights the median, quartiles, and outliers.

9. Purpose of smoothing in time series


Smoothing removes random noise from data to highlight overall patterns and trends.
Example: A moving average of monthly sales reveals seasonal cycles more clearly.

10. Contingency Table


A contingency table displays counts for two categorical variables to show associations.
Example: Gender (male/female) vs shopping preference (online/offline) can be analyzed
using a 2×2 table.

11. Purpose of Resistant Line


A resistant line in scatterplots shows the overall trend without being affected by
outliers.
Example: In hours studied vs marks, resistant line shows positive trend even if one
student scored abnormally low.

12. Standardized Variables


Standardization converts data into z-scores, forcing mean = 0 and SD = 1.
Example: Converting test scores in maths and English allows fair comparison despite
different scoring scales.

13. Interquartile Range (IQR)


Dataset: 56, 65, 72, 75, 80, 82, 85, 88, 90, 95.
Q1 = 72, Q3 = 88 → IQR = 16. It shows the middle 50% spread, helping detect variability
and outliers.

14. Why transformations are used


Transformations stabilize variance, reduce skewness, and make patterns clearer.
Example: Taking log of income data reduces extreme values, making comparisons fairer.

15. Third variable affecting relationship


Sometimes a third variable changes the relationship between two others.
Example: Ice cream sales and drowning appear correlated, but temperature is the third
variable influencing both.

16. Correlation vs Causation


Correlation means two variables move together, but causation means one directly
influences the other.
Example: Height and weight are correlated; smoking causes lung cancer (causation).

17. R Libraries for Visualization


Three popular libraries in R are:

• ggplot2 (for layered graphics),

• lattice (multivariate plots),

• plotly (interactive visualizations).

18. Relationship between two variables


The relationship may be positive, negative, or none.
Example: Hours studied and exam scores show a positive relationship, while stress and
productivity often show a negative relationship.
19. Features in histogram & data type
Histograms show shape, spread, center, and outliers of continuous data.
Example: A histogram of daily temperatures reveals normal distribution patterns.

20. Causes of Resistant Lines


Resistant lines arise from regression methods that reduce influence of outliers.
Example: A median-based line fits better when extreme values exist in data.

Unit 2 & 3

21. Basic methods of processing in API


The seven steps are: Acquire, Parse, Filter, Mine, Represent, Refine, Interact.
Example: COVID-19 dashboards acquire case data, filter by country, and represent as
charts.

22. Median in even vs odd datasets


For odd n, median = middle value. For even n, it is the average of two middle values.
Example: Dataset [3,5,7] → median = 5; [3,5,7,9] → median = (5+7)/2 = 6.

23. Work on EDA


Exploratory Data Analysis (EDA) was pioneered by John Tukey in the 1970s.
He introduced visual tools like stem-and-leaf plots, boxplots, and emphasized
visualization before formal modeling.

24. Purpose of Histogram in EDA


Histograms show the frequency distribution of data.
Example: In exam marks, a histogram reveals if scores are normally distributed or
skewed.

25. Regression line for prediction


Regression finds the best-fit line between two variables.
Example: Predict marks (Y) from study hours (X) using Y = a + bX. More study hours →
higher marks.
26. Median in even vs odd datasets (repeat)
Covered in Q22.

27. Twyman’s Law (repeat)


Covered in Q3.

28. Key contributions in EDA


John Tukey contributed boxplots, resistant statistics, and exploratory visualizations.
Example: Boxplots highlight outliers and spread effectively in social science research.

29. Histogram in EDA (repeat)


Covered in Q24.

30. Regression line predicting marks (repeat)


Covered in Q25.

31. Interpreting contingency table


Check row %, column %, and association between categories.
Example: Table of gender vs shopping mode shows whether preference differs by
gender.

32. Main use of Chi-square test


Tests whether two categorical variables are independent.
Example: Checking if education level is independent of job satisfaction.

33. Percentage Table


Converts frequencies to percentages, making group comparison easier.
Example: 60% of males vs 40% of females prefer online shopping.

34. Parse & Refine in Visualization


Parse: Structure raw data into usable form.
Refine: Improve design for clarity (e.g., colors, labels).
Example: Parsing survey data, then refining chart with clear labels.
35. Tree Diagram Purpose
A tree diagram shows hierarchical relationships in branching form.
Example: A company’s organization chart or a decision tree in machine learning.

Now all 35 questions are in exam-ready Part A format.

Do you want me to also prepare Part B (5-mark) answers in ~150–200 words each,
or should we first finish a one-page revision sheet of all formulas/diagrams for quick
recall?

You might also like