0% found this document useful (0 votes)

15 views6 pages

2 Mark Dev

The document provides a structured format for answering 35 questions related to data exploration, focusing on definitions and examples for each topic. It covers key concepts such as cases, variables, Twyman's Law, Gini coefficient, and various statistical methods and visualizations. Additionally, it offers insights into exploratory data analysis and the roles of data analysts.

Uploaded by

Eughene Yū

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

2 Mark Dev

Uploaded by

Eughene Yū

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Alright Bhavik I’ll give you all 35 questions rewritten in proper Part A style –

each answer 4–5 lines, 40–60 words, definition + example/point.

DEV – Part A (2 Marks Answers)

Unit 1: Basics of Data Exploration

1. Two basic organizing concepts of data analysis

The two basic concepts are cases (units about which information is collected) and
variables (characteristics measured across cases).
Example: In a school dataset, each student is a case, while age, marks, and attendance
are variables.

2. Cases and Variables

A case is a single observation in a dataset, while a variable is a property or
measurement that varies across cases.
Example: In e-commerce data, each order is a case, and variables include order value,
product type, and payment method.

3. Twyman’s Law
Twyman’s Law states: “The more interesting or unusual the data appears, the more
likely it is wrong.”
Example: If a survey shows 120% participation rate, it indicates an error rather than a
meaningful insight.

4. Responsibilities of a Data Analyst (+ example)

A data analyst collects, cleans, analyzes, and visualizes data for decision-making.
Example: In e-commerce, if sales drop suddenly, the analyst investigates causes like
low website traffic, competitor discounts, or stock issues.

5. Gini Coefficient & Inequality

The Gini coefficient measures inequality, ranging from 0 (perfect equality) to 1
(maximum inequality).
Example: A Gini of 0.20 shows fair income distribution, while 0.60 indicates sharp
inequality within the population.
6. Features visible in Histogram
Histograms reveal shape (normal/skewed), center (mean/median), spread
(range/variance), and outliers.
Example: A bell-shaped histogram of exam marks indicates most students scored near
the average.

7. Mean vs Hanning
The mean shows central tendency of data, while Hanning smoothing reduces
fluctuations in time series.
Example: Mean is used for average salaries, while Hanning helps smooth noisy stock
price movements.

8. Easy way to visualize distribution

The simplest way is using a histogram or boxplot.
Example: A histogram of monthly incomes shows distribution frequency, while a boxplot
highlights the median, quartiles, and outliers.

9. Purpose of smoothing in time series

Smoothing removes random noise from data to highlight overall patterns and trends.
Example: A moving average of monthly sales reveals seasonal cycles more clearly.

10. Contingency Table

A contingency table displays counts for two categorical variables to show associations.
Example: Gender (male/female) vs shopping preference (online/offline) can be analyzed
using a 2×2 table.

11. Purpose of Resistant Line

A resistant line in scatterplots shows the overall trend without being affected by
outliers.
Example: In hours studied vs marks, resistant line shows positive trend even if one
student scored abnormally low.

12. Standardized Variables

Standardization converts data into z-scores, forcing mean = 0 and SD = 1.
Example: Converting test scores in maths and English allows fair comparison despite
different scoring scales.

13. Interquartile Range (IQR)

Dataset: 56, 65, 72, 75, 80, 82, 85, 88, 90, 95.
Q1 = 72, Q3 = 88 → IQR = 16. It shows the middle 50% spread, helping detect variability
and outliers.

14. Why transformations are used

Transformations stabilize variance, reduce skewness, and make patterns clearer.
Example: Taking log of income data reduces extreme values, making comparisons fairer.

15. Third variable affecting relationship

Sometimes a third variable changes the relationship between two others.
Example: Ice cream sales and drowning appear correlated, but temperature is the third
variable influencing both.

16. Correlation vs Causation

Correlation means two variables move together, but causation means one directly
influences the other.
Example: Height and weight are correlated; smoking causes lung cancer (causation).

17. R Libraries for Visualization

Three popular libraries in R are:

• ggplot2 (for layered graphics),

• lattice (multivariate plots),

• plotly (interactive visualizations).

18. Relationship between two variables

The relationship may be positive, negative, or none.
Example: Hours studied and exam scores show a positive relationship, while stress and
productivity often show a negative relationship.
19. Features in histogram & data type
Histograms show shape, spread, center, and outliers of continuous data.
Example: A histogram of daily temperatures reveals normal distribution patterns.

20. Causes of Resistant Lines

Resistant lines arise from regression methods that reduce influence of outliers.
Example: A median-based line fits better when extreme values exist in data.

Unit 2 & 3

21. Basic methods of processing in API

The seven steps are: Acquire, Parse, Filter, Mine, Represent, Refine, Interact.
Example: COVID-19 dashboards acquire case data, filter by country, and represent as
charts.

22. Median in even vs odd datasets

For odd n, median = middle value. For even n, it is the average of two middle values.
Example: Dataset [3,5,7] → median = 5; [3,5,7,9] → median = (5+7)/2 = 6.

23. Work on EDA

Exploratory Data Analysis (EDA) was pioneered by John Tukey in the 1970s.
He introduced visual tools like stem-and-leaf plots, boxplots, and emphasized
visualization before formal modeling.

24. Purpose of Histogram in EDA

Histograms show the frequency distribution of data.
Example: In exam marks, a histogram reveals if scores are normally distributed or
skewed.

25. Regression line for prediction

Regression finds the best-fit line between two variables.
Example: Predict marks (Y) from study hours (X) using Y = a + bX. More study hours →
higher marks.
26. Median in even vs odd datasets (repeat)
Covered in Q22.

27. Twyman’s Law (repeat)

Covered in Q3.

28. Key contributions in EDA

John Tukey contributed boxplots, resistant statistics, and exploratory visualizations.
Example: Boxplots highlight outliers and spread effectively in social science research.

29. Histogram in EDA (repeat)

Covered in Q24.

30. Regression line predicting marks (repeat)

Covered in Q25.

31. Interpreting contingency table

Check row %, column %, and association between categories.
Example: Table of gender vs shopping mode shows whether preference differs by
gender.

32. Main use of Chi-square test

Tests whether two categorical variables are independent.
Example: Checking if education level is independent of job satisfaction.

33. Percentage Table

Converts frequencies to percentages, making group comparison easier.
Example: 60% of males vs 40% of females prefer online shopping.

34. Parse & Refine in Visualization

Parse: Structure raw data into usable form.
Refine: Improve design for clarity (e.g., colors, labels).
Example: Parsing survey data, then refining chart with clear labels.
35. Tree Diagram Purpose
A tree diagram shows hierarchical relationships in branching form.
Example: A company’s organization chart or a decision tree in machine learning.

Now all 35 questions are in exam-ready Part A format.

Do you want me to also prepare Part B (5-mark) answers in ~150–200 words each,
or should we first finish a one-page revision sheet of all formulas/diagrams for quick
recall?

(Assignment Template) : ILM Level 5 Certificate in Coaching and Mentoring
75% (4)
(Assignment Template) : ILM Level 5 Certificate in Coaching and Mentoring
16 pages
Document
No ratings yet
Document
8 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Ia - Eda
No ratings yet
Ia - Eda
10 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Probability and Stat Unit 1
No ratings yet
Probability and Stat Unit 1
12 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
EDA Unit 1
No ratings yet
EDA Unit 1
41 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
EDA - Unit 1
No ratings yet
EDA - Unit 1
82 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
Data Exploration and Visualization Unit 1
No ratings yet
Data Exploration and Visualization Unit 1
4 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
UNIT II-DSDA - Docx Notes
No ratings yet
UNIT II-DSDA - Docx Notes
26 pages
Power BI
No ratings yet
Power BI
8 pages
Comprehensive Guide to Data Analytics
No ratings yet
Comprehensive Guide to Data Analytics
4 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Viva
No ratings yet
Viva
9 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
QM 1
No ratings yet
QM 1
58 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Introductory Lecture
No ratings yet
Introductory Lecture
29 pages
Data Basics For ML
No ratings yet
Data Basics For ML
23 pages
Dev QB
No ratings yet
Dev QB
23 pages
SM Session 1 IPL 2024 Post Session Slides
No ratings yet
SM Session 1 IPL 2024 Post Session Slides
44 pages
Unit Test 3
No ratings yet
Unit Test 3
9 pages
Unit-1 Theory
No ratings yet
Unit-1 Theory
26 pages
Excel & Python Statistical Functions
No ratings yet
Excel & Python Statistical Functions
44 pages
Grey Minimalist Business Project Presentation
No ratings yet
Grey Minimalist Business Project Presentation
5 pages
DataUnderstandingAndPreparation DOM304
No ratings yet
DataUnderstandingAndPreparation DOM304
19 pages
Unit 2
No ratings yet
Unit 2
20 pages
Unit 3 Eda Notes
No ratings yet
Unit 3 Eda Notes
24 pages
Bmsi Solved Past Papers April Updated
No ratings yet
Bmsi Solved Past Papers April Updated
69 pages
DS Mini
No ratings yet
DS Mini
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Unit .......
No ratings yet
Unit .......
45 pages
Data Analysis for Business Insights
No ratings yet
Data Analysis for Business Insights
99 pages
Ds Unit 2 QB
No ratings yet
Ds Unit 2 QB
25 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Analytical Decision Making
No ratings yet
Analytical Decision Making
27 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
MGS2150 Lecture1
No ratings yet
MGS2150 Lecture1
46 pages
Edashsh
No ratings yet
Edashsh
7 pages
Creative and Minimal Portfolio Presentation
No ratings yet
Creative and Minimal Portfolio Presentation
5 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
32 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Data Science Lecture Notes
100% (1)
Data Science Lecture Notes
216 pages
23cs1303 Unit 4 Dbms
No ratings yet
23cs1303 Unit 4 Dbms
22 pages
DBMS
No ratings yet
DBMS
19 pages
Unit 1 - 3 Qbank
No ratings yet
Unit 1 - 3 Qbank
12 pages
DBMS
No ratings yet
DBMS
19 pages
DBMS - Internal Assessment
No ratings yet
DBMS - Internal Assessment
2 pages
23ad1302-Aies - QB
No ratings yet
23ad1302-Aies - QB
7 pages
Iccds 2025
No ratings yet
Iccds 2025
4 pages
Oracle
No ratings yet
Oracle
6 pages
OOPP Lab Ex 1
No ratings yet
OOPP Lab Ex 1
4 pages
Anis Vasanth A
No ratings yet
Anis Vasanth A
1 page
Ssos 1 To 13
No ratings yet
Ssos 1 To 13
113 pages
Training Module On PRA Tools PDF
No ratings yet
Training Module On PRA Tools PDF
3 pages
Wolaita Sodo Unversity College of Business and Economics Department of Accounting and Finance
No ratings yet
Wolaita Sodo Unversity College of Business and Economics Department of Accounting and Finance
6 pages
Student Ambassador Program
No ratings yet
Student Ambassador Program
2 pages
The Routledge International Handbook of Criminology and Human Rights by Leanne Weber, Elaine Fishwick, Marinella Marmo
No ratings yet
The Routledge International Handbook of Criminology and Human Rights by Leanne Weber, Elaine Fishwick, Marinella Marmo
618 pages
Consumer Behaviour
No ratings yet
Consumer Behaviour
33 pages
Ground Motion Selection and Scaling For Seismic Design of RC Frames Against Collapse
No ratings yet
Ground Motion Selection and Scaling For Seismic Design of RC Frames Against Collapse
16 pages
Early Versus Late Preventive Ileostomy Closure.21
No ratings yet
Early Versus Late Preventive Ileostomy Closure.21
10 pages
Archaeoastronomy in Ancient Americas
No ratings yet
Archaeoastronomy in Ancient Americas
43 pages
The Use of Facebook To Improve Students Skill and Increase Their Motivation in Writing Recount Text
No ratings yet
The Use of Facebook To Improve Students Skill and Increase Their Motivation in Writing Recount Text
7 pages
Sample First Draft
No ratings yet
Sample First Draft
60 pages
Autocorrelation of Trend Returns
No ratings yet
Autocorrelation of Trend Returns
6 pages
ITECH 5500 Professional Research and Communication
No ratings yet
ITECH 5500 Professional Research and Communication
18 pages
ENDOCHAT 162 Uriel JOE 2016 PDF
100% (1)
ENDOCHAT 162 Uriel JOE 2016 PDF
9 pages
Law Students: Legal Counseling 101
No ratings yet
Law Students: Legal Counseling 101
11 pages
Thesis - A - (E-Books and Online Sales System) 09-03-10
60% (5)
Thesis - A - (E-Books and Online Sales System) 09-03-10
49 pages
1 The Nature and Scope of Organizational Behavior
No ratings yet
1 The Nature and Scope of Organizational Behavior
7 pages
Practice On T-Distribution: Exercises For One Sample T-Test
No ratings yet
Practice On T-Distribution: Exercises For One Sample T-Test
4 pages
Podcast Listening Test
No ratings yet
Podcast Listening Test
9 pages
Additional Data Analysis and Statistics
100% (1)
Additional Data Analysis and Statistics
11 pages
Baba Saheb Ambedkar Education University (Erstwhile W.B.U.T.T.E.P.A)
No ratings yet
Baba Saheb Ambedkar Education University (Erstwhile W.B.U.T.T.E.P.A)
8 pages
A Study On Emotional Maturity and Self Esteem Among Adolescents - May - 2020 - 1589879447 - 78142741
No ratings yet
A Study On Emotional Maturity and Self Esteem Among Adolescents - May - 2020 - 1589879447 - 78142741
3 pages
Agriculture Insurance in Nepal: Case of Banana and Livestock Insurance
No ratings yet
Agriculture Insurance in Nepal: Case of Banana and Livestock Insurance
18 pages
20 Years of Sustainable Supply Chain Performance
No ratings yet
20 Years of Sustainable Supply Chain Performance
18 pages
Economic System and Trade Under Mughal Rule
No ratings yet
Economic System and Trade Under Mughal Rule
5 pages
Mmpds 2015 Statistical Property Analysis Overview
No ratings yet
Mmpds 2015 Statistical Property Analysis Overview
13 pages
PhD Admissions at RCB 2023-24
No ratings yet
PhD Admissions at RCB 2023-24
2 pages
Q4M5 Practical Research 1 Data Patterns and Themes
100% (2)
Q4M5 Practical Research 1 Data Patterns and Themes
23 pages
Ebin - Pub - Archaeological Theory in Europe The Last Three Decades 9781317596608 9781138816084 9781138799714 9781315751948 9781315745862 1317596609
No ratings yet
Ebin - Pub - Archaeological Theory in Europe The Last Three Decades 9781317596608 9781138816084 9781138799714 9781315751948 9781315745862 1317596609
377 pages
Book Launch: The Factory-Free Economy
No ratings yet
Book Launch: The Factory-Free Economy
1 page