0% found this document useful (0 votes)

138 views7 pages

Crash Course Data Science

The document provides an overview of key concepts in data science including data collection, descriptive statistics, exploratory data analysis, data visualizations, data cleaning, and machine learning. It discusses different techniques used at various stages of a data science project from gathering data to analyzing and visualizing insights.

Uploaded by

NABEEL KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views7 pages

Crash Course Data Science

Uploaded by

NABEEL KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CRASH COURSE DATA SCIENCE -

(BEGINNER LEVEL)
DATA COLLECTION
1) Data collection is the process of gathering relevant
information from various sources to analyze and derive insights.

2) In data science, the quality of collected data directly impacts

the accuracy of the resulting analysis and models.

3) A well-defined sampling strategy ensures that collected data

is representative of the larger population.

4) Surveys, interviews, and questionnaires are common

methods for collecting primary data directly from individuals.

5) Web scraping involves extracting information from websites

and is often used to collect data from online sources.

6) Sensor networks and Internet of Things (IoT) devices

contribute to the collection of real-time data in various
applications.

7) Secondary data refers to data collected by someone else for

a diﬀerent purpose but can still be useful for analysis.

8) The bias present in collected data can lead to skewed

insights and inaccurate conclusions.

9) Data curation involves organizing, cleaning, and preparing

collected data for analysis.

10) The process of data collection should follow ethical

guidelines to ensure privacy and respect for individuals' rights.
DESCRIPTIVE STATISTICS
1) Descriptive statistics summarize and describe the main features of a dataset.

2) Descriptive statistics can be used to summarize both categorical and

numerical variables.

3) Range is a measure of dispersion that represents the diﬀerence between the

maximum and minimum values in a dataset.

4) The range is NOT a measure of central tendency that represents the middle
value in a dataset.

5) The interquartile range (IQR) is a measure of spread that represents the range
between the first quartile (Q1) and the third quartile (Q3).

6) The mode is the value that occurs most frequently in a dataset.

7) The median is less aﬀected by outliers than the mean.

8) The median is less influenced by extreme values in the dataset, making it a

more robust measure of central tendency compared to the mean.

9) Standard deviation measures the average distance of values from the mean.

10) Standard deviation quantifies the dispersion or spread of data by measuring

the average distance between each data point and the mean.

11) Variance is NOT the square root of the standard deviation.

12) Variance is the squared value of the standard deviation.

13) Skewness is a measure of the symmetry of a distribution.

14) Skewness indicates the extent to which a distribution is skewed or

asymmetrical.

15) Correlation measures the strength and direction of the linear relationship
between two numerical variables.
EXPLORATORY DATA ANALYSIS
1) Exploratory data analysis involves summarizing and visualizing data to
gain insights and understand patterns.

2) Exploratory data analysis is typically performed after data cleaning and

preprocessing to ensure the data is in a suitable format for analysis.

3) Exploratory data analysis includes identifying outliers (extreme values) and

missing values in the dataset, which can impact the validity of the analysis.

4) Descriptive statistics, such as mean, median, and standard deviation, are

commonly calculated during exploratory data analysis to summarize the
central tendency and dispersion of the data.

5) Exploratory data analysis is NOT a flexible and iterative process.

6) Exploratory data analysis can help detect relationships and correlations

between variables, which can provide valuable insights into the dataset.

7) The primary goal of Exploratory data analysis is to gain an understanding

of the data rather than formal hypothesis testing and statistical inference.

8) Exploratory data analysis can reveal potential data quality issues, such as
inconsistent or erroneous values, and identify data anomalies that require
further investigation.

9) Graphical techniques, such as histograms, scatter plots, and box plots,

are commonly used in exploratory data analysis to visualize the distribution,
relationships, and outliers in the data.

10) Exploratory data analysis is NOT an ongoing process

DATA VISUALISATIONS

1) Data visualisation is the presentation of data in a graphical or pictorial

format

2) Bar chart, line chart and pie chart are some of the common types of
visualisation charts

3) A line chart is a data visualization technique suitable for displaying trends

over time

4) A heat map is used to represent distribution of values with colors

5) A tree map is used to show hierarchical data using nested rectangles

6) A box plot is used to show the distribution of data

7) A choropleth map is used to represent geographic data with colour

variations

8) The points on the scatter plot show the relationship between two
variables

9) In a bar chart, y-axis shows the dependent variable while x-axis shows
the independent variable

10) Python is the most commonly used programming language that creates
interactive data visualisations
DATA CLEANING
1) Imputation technique is used to fill in missing values

2) Outlier detection is used to identify and handle unusual data points

3) Standardization is used to bring all variables to a common scale

4) Deduplication is used to identify and handle duplicate records

5) Regular Expressions are used for pattern matching and extraction

6) One-Hot Encoding is used for handling categorical variables

7) Scaling is used to re-scale numerical variables

8) Trimming is used to remove unnecessary white spaces

9) Mean imputation Replacing missing values with the mean of the variable

10) Forward filling Filling missing values with the value before them

11) Interpolation Estimating missing values based on the adjacent values

12) Deleting rows Removing rows with missing values

MACHINE LEARNING
1) The two main categories of machine learning models are supervised and
unsupervised.

2) Labeled data in supervised learning provides correct answers for training

the model to learn relationships between input features and output labels.

3) Precision is the ratio of correctly predicted positive observations to the

total predicted positives, while recall is the ratio of correctly predicted
positive observations to the total actual positives.

4) Accuracy might not be suitable for imbalanced datasets because it can be

dominated by the majority class and may not reflect the true model
performance.

5) Cross-validation assesses a machine learning model's performance by

dividing the dataset into subsets, training/evaluating the model on diﬀerent
combinations, and providing insights into its generalization capability.

Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Data Fundamentals
No ratings yet
Data Fundamentals
21 pages
FDS Key Answers
No ratings yet
FDS Key Answers
24 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
2 Mark Dev
No ratings yet
2 Mark Dev
6 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
EDA Unit 1
No ratings yet
EDA Unit 1
41 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Data Science
No ratings yet
Data Science
32 pages
Exploratory Data Analysis (EDA) and Descriptive Analytic
No ratings yet
Exploratory Data Analysis (EDA) and Descriptive Analytic
47 pages
DS&ML 4
No ratings yet
DS&ML 4
9 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Da Question Bank
No ratings yet
Da Question Bank
7 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Data Analytics and Interactive Dashboards Using Python
No ratings yet
Data Analytics and Interactive Dashboards Using Python
96 pages
DataUnderstandingAndPreparation DOM304
No ratings yet
DataUnderstandingAndPreparation DOM304
19 pages
3 Preprocessing
No ratings yet
3 Preprocessing
82 pages
Data Mining
No ratings yet
Data Mining
34 pages
Module 8
No ratings yet
Module 8
13 pages
Important Questions
No ratings yet
Important Questions
26 pages
Chapter 2 - Data Exploration, Preprocessing and Visualization
No ratings yet
Chapter 2 - Data Exploration, Preprocessing and Visualization
92 pages
Data Basics For ML
No ratings yet
Data Basics For ML
23 pages
ADS IA 1 Syllabus Prep
No ratings yet
ADS IA 1 Syllabus Prep
5 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
EDA Question Bank Answers
No ratings yet
EDA Question Bank Answers
24 pages
KJWDH
No ratings yet
KJWDH
4 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Ds Unit 2 QB
No ratings yet
Ds Unit 2 QB
25 pages
Data Mining 2
No ratings yet
Data Mining 2
64 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Research Methodogy Class 4
No ratings yet
Research Methodogy Class 4
29 pages
Research Methodology & EDA Guide
No ratings yet
Research Methodology & EDA Guide
29 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
Business Research Unit - 4
No ratings yet
Business Research Unit - 4
14 pages
Data Science & Python Essentials
No ratings yet
Data Science & Python Essentials
59 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
UNIT4
No ratings yet
UNIT4
8 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
25 Essential Data Analysis Terms Every Analyst Should Know
No ratings yet
25 Essential Data Analysis Terms Every Analyst Should Know
11 pages
Unit 4
No ratings yet
Unit 4
21 pages
Social Media Data Analysis Guide
No ratings yet
Social Media Data Analysis Guide
12 pages
DAI Data Preprocessing 1 46233380 2025 06 12 17 18
No ratings yet
DAI Data Preprocessing 1 46233380 2025 06 12 17 18
14 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
DVP Unit1
No ratings yet
DVP Unit1
44 pages
21BCAD5C01 IDA Module 2 Notes
No ratings yet
21BCAD5C01 IDA Module 2 Notes
16 pages
Data Analysis
No ratings yet
Data Analysis
22 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages

Crash Course Data Science

Uploaded by

Crash Course Data Science

Uploaded by

CRASH COURSE DATA SCIENCE -

2) In data science, the quality of collected data directly impacts

3) A well-defined sampling strategy ensures that collected data

4) Surveys, interviews, and questionnaires are common

5) Web scraping involves extracting information from websites

6) Sensor networks and Internet of Things (IoT) devices

7) Secondary data refers to data collected by someone else for

8) The bias present in collected data can lead to skewed

9) Data curation involves organizing, cleaning, and preparing

10) The process of data collection should follow ethical

2) Descriptive statistics can be used to summarize both categorical and

3) Range is a measure of dispersion that represents the diﬀerence between the

6) The mode is the value that occurs most frequently in a dataset.

7) The median is less aﬀected by outliers than the mean.

8) The median is less influenced by extreme values in the dataset, making it a

10) Standard deviation quantifies the dispersion or spread of data by measuring

11) Variance is NOT the square root of the standard deviation.

12) Variance is the squared value of the standard deviation.

13) Skewness is a measure of the symmetry of a distribution.

14) Skewness indicates the extent to which a distribution is skewed or

2) Exploratory data analysis is typically performed after data cleaning and

3) Exploratory data analysis includes identifying outliers (extreme values) and

4) Descriptive statistics, such as mean, median, and standard deviation, are

5) Exploratory data analysis is NOT a flexible and iterative process.

6) Exploratory data analysis can help detect relationships and correlations

7) The primary goal of Exploratory data analysis is to gain an understanding

9) Graphical techniques, such as histograms, scatter plots, and box plots,

10) Exploratory data analysis is NOT an ongoing process

1) Data visualisation is the presentation of data in a graphical or pictorial

3) A line chart is a data visualization technique suitable for displaying trends

4) A heat map is used to represent distribution of values with colors

5) A tree map is used to show hierarchical data using nested rectangles

6) A box plot is used to show the distribution of data

7) A choropleth map is used to represent geographic data with colour

2) Outlier detection is used to identify and handle unusual data points

3) Standardization is used to bring all variables to a common scale

4) Deduplication is used to identify and handle duplicate records

5) Regular Expressions are used for pattern matching and extraction

6) One-Hot Encoding is used for handling categorical variables

7) Scaling is used to re-scale numerical variables

8) Trimming is used to remove unnecessary white spaces

11) Interpolation Estimating missing values based on the adjacent values

12) Deleting rows Removing rows with missing values

2) Labeled data in supervised learning provides correct answers for training

3) Precision is the ratio of correctly predicted positive observations to the

4) Accuracy might not be suitable for imbalanced datasets because it can be

5) Cross-validation assesses a machine learning model's performance by

You might also like