Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views2 pages

Data Analytics Questions and Solutions

The document discusses key concepts in data analytics, including definitions of elements, variables, and data categorization. It explains levels of measurement, hypothesis testing, data wrangling processes, and highlights three Python data visualization libraries. Additionally, it distinguishes between data lakes and data warehouses and describes the role of Apache Spark in big data processing.

Uploaded by

mani manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Data Analytics Questions and Solutions

The document discusses key concepts in data analytics, including definitions of elements, variables, and data categorization. It explains levels of measurement, hypothesis testing, data wrangling processes, and highlights three Python data visualization libraries. Additionally, it distinguishes between data lakes and data warehouses and describes the role of Apache Spark in big data processing.

Uploaded by

mani manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Analytics: Important Questions and Solutions

Q1. Define elements, variables, and data categorization.

Elements are individual entities on which data is collected, like people or products.
Variables are characteristics of elements, such as age, height, or color.

Data categorization:
- Qualitative: Descriptive (e.g., gender, color)
- Quantitative:
- Discrete: Countable (e.g., number of books)
- Continuous: Measurable (e.g., weight)

Q2. Explain the levels of measurement with examples.

1. Nominal - Categories only (e.g., gender)


2. Ordinal - Ordered categories (e.g., rankings)
3. Interval - Numeric scale with no true zero (e.g., temperature)
4. Ratio - Numeric with true zero (e.g., income)

Q3. What is hypothesis testing? Mention two examples.

Hypothesis testing is a statistical method used to decide whether to accept or reject a hypothesis.

Examples:
- Testing if the average score of students is above 70.
- Comparing sales performance between two regions using t-test.

Q4. Describe the process of data wrangling.

Data wrangling includes:


1. Gathering Data - Collecting data from sources
2. Assessing Data - Checking for issues
3. Cleaning Data - Fixing or removing errors or inconsistencies
Q5. List and explain any three data visualization libraries in Python.

1. Matplotlib - Basic plotting library for line, bar, and scatter plots.
2. Seaborn - Built on matplotlib, supports statistical visualizations like boxplots and heatmaps.
3. Plotly - Interactive plots with zoom and hover support.

Q6. What are data lakes and how do they differ from data warehouses?

Data lakes store raw, unstructured, or semi-structured data at any scale.

Difference:
- Data Warehouses store structured data for analysis.
- Data Lakes handle all types of data for later processing.

Q7. Explain the role of Spark in big data processing.

Apache Spark is a distributed computing engine that processes big data in-memory, making it much
faster than traditional tools like MapReduce.

It supports:
- Batch processing
- Stream processing
- Machine learning

You might also like