0% found this document useful (0 votes)

21 views6 pages

Foundation of Data Science Imp

The document provides an overview of key concepts in Data Science, including definitions, applications, and essential skills for data scientists. It covers topics such as data preprocessing, exploratory data analysis, statistical foundations, and machine learning, detailing processes like data cleaning, hypothesis testing, and different learning algorithms. Each section includes questions and answers to facilitate understanding of the material.

Uploaded by

Tejas suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

Foundation of Data Science Imp

Uploaded by

Tejas suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Unit 1: Introduction to Data Science

1 Mark Questions:

1. Define Data Science.

Ans: Data Science is an interdisciplinary field that uses scientific methods,
algorithms, and systems to extract insights from structured and unstructured
data.

2. Mention one application of Data Science.

Ans: Fraud detection in banking.

2 Mark Questions:

1. List any two skills required for a data scientist.

Ans: Programming (e.g., Python), and knowledge of statistics.

2. What is the difference between data and information?

Ans: Data are raw facts and figures, while information is processed data that is
meaningful.

3 Mark Questions:

1. Explain the lifecycle of data science.

Ans: It includes:

○ Data collection

○ Data cleaning

○ Data exploration

○ Modeling

○ Evaluation

○ Deployment

4 Mark Questions (Long Answer):

1. Explain the role and responsibilities of a data scientist.
Ans:
A data scientist analyzes large sets of data to find actionable insights. Their role
includes data cleaning, statistical analysis, model building, and interpreting
results to support decision-making. They work closely with business teams to
identify problems and provide data-driven solutions using tools like Python, R,
SQL, and machine learning algorithms.

Unit 2: Data Preprocessing and Data Wrangling

1 Mark Questions:

1. What is data cleaning?

Ans: It is the process of detecting and correcting errors in data.

2. Name any one technique used for handling missing data.
Ans: Mean imputation.

2 Mark Questions:

1. What is normalization?

Ans: Normalization is scaling data to fall within a small, specified range like [0,1].

2. Define data wrangling.

Ans: Data wrangling is the process of transforming and mapping raw data into a
more usable format.

3 Mark Questions:

1. Explain outlier detection.

Ans: Outliers are extreme values that differ from other data. They can be detected
using:

○ Z-score method

○ Box plot analysis

○ IQR method
4 Mark Questions (Long Answer):

1. Discuss various techniques for handling missing data.

Ans:

○ Deletion Methods: Remove rows or columns with missing values.

○ Imputation Methods: Fill missing values using:

■ Mean/median/mode imputation

■ Regression imputation

■ KNN imputation

○ Advanced Techniques: Use ML models to predict missing values.

Unit 3: Exploratory Data Analysis (EDA)

1 Mark Questions:

1. What is EDA?

Ans: It is the process of analyzing data sets to summarize their main
characteristics.

2. Name any one graphical tool used in EDA.

Ans: Histogram.

2 Mark Questions:

1. What is the purpose of a box plot?

Ans: To visualize the distribution and detect outliers in the data.

2. Mention two summary statistics used in EDA.

Ans: Mean and standard deviation.

3 Mark Questions:

1. Explain the importance of correlation analysis.

Ans: It helps in identifying the strength and direction of relationships between
variables, which is crucial for model selection and feature engineering.

4 Mark Questions (Long Answer):

1. Explain the different types of visualizations used in EDA.

Ans:

○ Histogram: Shows the frequency distribution.

○ Box Plot: Displays median, quartiles, and outliers.

○ Scatter Plot: Shows relationships between two numeric variables.

○ Bar Chart: Used for categorical data.

○ Heatmap: Used to show correlation matrices.

Unit 4: Statistical Foundations

1 Mark Questions:

1. Define population.

Ans: Population is the entire set of individuals or items that we are interested in
studying.

2. What is a hypothesis?

Ans: A hypothesis is an assumption made for the purpose of testing.

2 Mark Questions:

1. Differentiate between population and sample.

Ans: A population includes all members of a group, while a sample is a subset of
the population.

2. What is p-value?

Ans: It is the probability of obtaining test results at least as extreme as the
observed results, assuming the null hypothesis is true.

3 Mark Questions:
1. Explain Type I and Type II errors.
Ans:

○ Type I Error (False Positive): Rejecting a true null hypothesis.

○ Type II Error (False Negative): Failing to reject a false null hypothesis.

4 Mark Questions (Long Answer):

1. Describe the steps involved in hypothesis testing.

Ans:

○ Formulate null and alternative hypotheses.

○ Select significance level (alpha).

○ Choose the appropriate test.

○ Compute the test statistic.

○ Determine the p-value.

○ Compare p-value with alpha.

○ Make a decision: reject or fail to reject the null hypothesis.

Unit 5: Introduction to Machine Learning

1 Mark Questions:

1. Define machine learning.

Ans: Machine learning is a method of data analysis that automates analytical
model building.

2. Name one supervised learning algorithm.

Ans: Linear Regression.

2 Mark Questions:
1. Differentiate between supervised and unsupervised learning.
Ans: Supervised learning uses labeled data; unsupervised learning uses
unlabeled data.

2. What is overfitting?

Ans: Overfitting occurs when a model performs well on training data but poorly
on new, unseen data.

3 Mark Questions:

1. Explain the K-Nearest Neighbors algorithm.

Ans: KNN is a classification algorithm where the output is determined by the
majority label among the k nearest data points.

4 Mark Questions (Long Answer):

1. Compare and contrast supervised, unsupervised, and reinforcement learning.

Ans:

○ Supervised Learning: Input-output pairs provided; used for classification

and regression.

○ Unsupervised Learning: Only input data; used for clustering and

association.

○ Reinforcement Learning: Learning through trial and error using feedback

from actions (rewards or penalties).
Each technique is suited to different types of problems and data
availability.

Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
Datamites Certified Data Scientist Syllabus PDF
50% (2)
Datamites Certified Data Scientist Syllabus PDF
12 pages
IDS (R22) U1 NotesRK 03092024
No ratings yet
IDS (R22) U1 NotesRK 03092024
22 pages
DWDM One Shot
No ratings yet
DWDM One Shot
163 pages
OCS353 Data Science Fundamentals QB - (Common To EEE, Mech, Civil)
No ratings yet
OCS353 Data Science Fundamentals QB - (Common To EEE, Mech, Civil)
7 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
TYCS Data Science Questions Bank
No ratings yet
TYCS Data Science Questions Bank
3 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
Perform Association Mining and Analyze Clusters Using Different Methods
No ratings yet
Perform Association Mining and Analyze Clusters Using Different Methods
90 pages
Data Science Exam Question Bank
100% (1)
Data Science Exam Question Bank
5 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Statistics Mini Project
No ratings yet
Statistics Mini Project
28 pages
Data Science Notes and Questions - 250605 - 112515
No ratings yet
Data Science Notes and Questions - 250605 - 112515
5 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
139 pages
Notes
No ratings yet
Notes
18 pages
B.Tech AI & DS Course Outline
No ratings yet
B.Tech AI & DS Course Outline
38 pages
EDA - With Python Question Bank
0% (1)
EDA - With Python Question Bank
3 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Aiag Gage R&R Part Number Average & Range Met: Required Outputs
No ratings yet
Aiag Gage R&R Part Number Average & Range Met: Required Outputs
29 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
40 pages
Data Science Pyqdata Science Pyqdata Science Pyq
No ratings yet
Data Science Pyqdata Science Pyqdata Science Pyq
6 pages
Syllabus Fundamentals of Data Science
No ratings yet
Syllabus Fundamentals of Data Science
7 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
FDSA Unit 1
No ratings yet
FDSA Unit 1
34 pages
FDSA SEM Answer Key
No ratings yet
FDSA SEM Answer Key
11 pages
Data Science
No ratings yet
Data Science
9 pages
B.Sc Data Science Curriculum
No ratings yet
B.Sc Data Science Curriculum
19 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
Data Science and Statistics Quiz
No ratings yet
Data Science and Statistics Quiz
16 pages
FDS 2 Marks All Units For File
No ratings yet
FDS 2 Marks All Units For File
13 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
32 pages
Dse Q B
No ratings yet
Dse Q B
13 pages
DA Long Questions (12!11!24)
No ratings yet
DA Long Questions (12!11!24)
10 pages
Data Science Concepts & Techniques
No ratings yet
Data Science Concepts & Techniques
18 pages
UNIT I Single Topic Per Page
No ratings yet
UNIT I Single Topic Per Page
12 pages
AIDS-1 Uni Queestion Bank Unitwise 24-25
No ratings yet
AIDS-1 Uni Queestion Bank Unitwise 24-25
5 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
FDS 1
No ratings yet
FDS 1
5 pages
Sampling Distributions Solved Questions
No ratings yet
Sampling Distributions Solved Questions
4 pages
Question Bank
No ratings yet
Question Bank
5 pages
QB FDS
No ratings yet
QB FDS
5 pages
DS IMP QB (E-Next - In)
No ratings yet
DS IMP QB (E-Next - In)
4 pages
Data Science - Model Exam Question Paper
No ratings yet
Data Science - Model Exam Question Paper
2 pages
ITT306 Data Science-May2023
No ratings yet
ITT306 Data Science-May2023
3 pages
Year Wise GDP, Growth of Agriculture Sector, Manufacturing and Services Sector of Pakistan
No ratings yet
Year Wise GDP, Growth of Agriculture Sector, Manufacturing and Services Sector of Pakistan
25 pages
Ocs353 DCF
No ratings yet
Ocs353 DCF
4 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Data Science & Analysis Exam Guide
No ratings yet
Data Science & Analysis Exam Guide
6 pages
FDS 2 Marks 50 Questions
No ratings yet
FDS 2 Marks 50 Questions
2 pages
FDSA - Question Bank
No ratings yet
FDSA - Question Bank
5 pages
HW 02
No ratings yet
HW 02
3 pages
Ids 1
No ratings yet
Ids 1
2 pages
Syllabus FDS
No ratings yet
Syllabus FDS
4 pages
Data Mining Exam Guide
No ratings yet
Data Mining Exam Guide
2 pages
March 2024
No ratings yet
March 2024
2 pages
Mid-Semester Make-Up Data Mining QP v1
No ratings yet
Mid-Semester Make-Up Data Mining QP v1
3 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Communication Technology and Its Relationship To The Performance of Media Institutions Jordanian T.V. and Radio Corporation As Model
No ratings yet
Communication Technology and Its Relationship To The Performance of Media Institutions Jordanian T.V. and Radio Corporation As Model
115 pages
5DATAA1
No ratings yet
5DATAA1
68 pages
Assignment DMW
No ratings yet
Assignment DMW
2 pages
Central Tendency Measures Explained
No ratings yet
Central Tendency Measures Explained
44 pages
Gtu Computer 3160714 Winter 2023
No ratings yet
Gtu Computer 3160714 Winter 2023
2 pages
Ocs353 DSF Question Bank 25-26
No ratings yet
Ocs353 DSF Question Bank 25-26
13 pages
Chi-Square Test of Independence
No ratings yet
Chi-Square Test of Independence
13 pages
Manova: Presented By
No ratings yet
Manova: Presented By
13 pages
Aaoc ZC111
No ratings yet
Aaoc ZC111
13 pages
CS3352 FDS Solved 2024
No ratings yet
CS3352 FDS Solved 2024
3 pages
Body Esteem Scale A Validation On Italian Adolescents
No ratings yet
Body Esteem Scale A Validation On Italian Adolescents
13 pages
Stock Watson 3U ExerciseSolutions Chapter3 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter3 Instructors
23 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
The Use of Dynamic Cone Penetrometer in Determining The Strength of Existing Pavements and Subgrade
No ratings yet
The Use of Dynamic Cone Penetrometer in Determining The Strength of Existing Pavements and Subgrade
10 pages
DMBI Unit-4,5,6
No ratings yet
DMBI Unit-4,5,6
38 pages
CLM: Review: - OLS Estimation
No ratings yet
CLM: Review: - OLS Estimation
44 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
Logistic Regression Guide for Researchers
No ratings yet
Logistic Regression Guide for Researchers
4 pages
Papers Citation vs H-Index Analysis
No ratings yet
Papers Citation vs H-Index Analysis
22 pages
DSO, DIO, DPO Impact on CR in BEI
No ratings yet
DSO, DIO, DPO Impact on CR in BEI
18 pages
Dampak Otonomi Khusus Terhadap Kesejahteraan Masyarakat Asli Papua Di Distrik Mimika Timur Kabupaten Mimika Provinsi Papua
No ratings yet
Dampak Otonomi Khusus Terhadap Kesejahteraan Masyarakat Asli Papua Di Distrik Mimika Timur Kabupaten Mimika Provinsi Papua
13 pages
WWW Statisticshowto Com How To Do A T Test in Excel PDF
No ratings yet
WWW Statisticshowto Com How To Do A T Test in Excel PDF
10 pages
Statistical Failure Models For Water Distribution Pipes - A Review From A Unified Perspective
No ratings yet
Statistical Failure Models For Water Distribution Pipes - A Review From A Unified Perspective
11 pages
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
No ratings yet
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
7 pages
Biostats Practice Problems 1 Key
No ratings yet
Biostats Practice Problems 1 Key
9 pages
Horn Parallel-Analysis Packadge
No ratings yet
Horn Parallel-Analysis Packadge
4 pages
Regression Analysis MCQ's
No ratings yet
Regression Analysis MCQ's
3 pages

Foundation of Data Science Imp

Uploaded by

Foundation of Data Science Imp

Uploaded by

Unit 1: Introduction to Data Science

1.​ Define Data Science.​

2.​ Mention one application of Data Science.​

1.​ List any two skills required for a data scientist.​

2.​ What is the difference between data and information?​

1.​ Explain the lifecycle of data science.​

4 Mark Questions (Long Answer):

Unit 2: Data Preprocessing and Data Wrangling

1.​ What is data cleaning?​

1.​ What is normalization?​

2.​ Define data wrangling.​

1.​ Explain outlier detection.​

○​ Box plot analysis​

1.​ Discuss various techniques for handling missing data.​

○​ Deletion Methods: Remove rows or columns with missing values.​

○​ Imputation Methods: Fill missing values using:​

○​ Advanced Techniques: Use ML models to predict missing values.​

Unit 3: Exploratory Data Analysis (EDA)

1.​ What is EDA?​

2.​ Name any one graphical tool used in EDA.​

1.​ What is the purpose of a box plot?​

2.​ Mention two summary statistics used in EDA.​

1.​ Explain the importance of correlation analysis.​

4 Mark Questions (Long Answer):

1.​ Explain the different types of visualizations used in EDA.​

○​ Histogram: Shows the frequency distribution.​

○​ Box Plot: Displays median, quartiles, and outliers.​

○​ Scatter Plot: Shows relationships between two numeric variables.​

○​ Bar Chart: Used for categorical data.​

○​ Heatmap: Used to show correlation matrices.​

Unit 4: Statistical Foundations

1.​ Define population.​

2.​ What is a hypothesis?​

1.​ Differentiate between population and sample.​

2.​ What is p-value?​

○​ Type I Error (False Positive): Rejecting a true null hypothesis.​

○​ Type II Error (False Negative): Failing to reject a false null hypothesis.​

4 Mark Questions (Long Answer):

1.​ Describe the steps involved in hypothesis testing.​

○​ Formulate null and alternative hypotheses.​

○​ Select significance level (alpha).​

○​ Choose the appropriate test.​

○​ Compute the test statistic.​

○​ Determine the p-value.​

○​ Compare p-value with alpha.​

○​ Make a decision: reject or fail to reject the null hypothesis.​

Unit 5: Introduction to Machine Learning

1.​ Define machine learning.​

2.​ Name one supervised learning algorithm.​

2.​ What is overfitting?​

1.​ Explain the K-Nearest Neighbors algorithm.​

4 Mark Questions (Long Answer):

1.​ Compare and contrast supervised, unsupervised, and reinforcement learning.​

○​ Supervised Learning: Input-output pairs provided; used for classification

○​ Unsupervised Learning: Only input data; used for clustering and

○​ Reinforcement Learning: Learning through trial and error using feedback

You might also like

1. Define Data Science.

2. Mention one application of Data Science.

1. List any two skills required for a data scientist.

2. What is the difference between data and information?

1. Explain the lifecycle of data science.

1. What is data cleaning?

1. What is normalization?

2. Define data wrangling.

1. Explain outlier detection.

○ Box plot analysis

1. Discuss various techniques for handling missing data.

○ Deletion Methods: Remove rows or columns with missing values.

○ Imputation Methods: Fill missing values using:

○ Advanced Techniques: Use ML models to predict missing values.

1. What is EDA?

2. Name any one graphical tool used in EDA.

1. What is the purpose of a box plot?

2. Mention two summary statistics used in EDA.

1. Explain the importance of correlation analysis.

1. Explain the different types of visualizations used in EDA.

○ Histogram: Shows the frequency distribution.

○ Box Plot: Displays median, quartiles, and outliers.

○ Scatter Plot: Shows relationships between two numeric variables.

○ Bar Chart: Used for categorical data.

○ Heatmap: Used to show correlation matrices.

1. Define population.

2. What is a hypothesis?

1. Differentiate between population and sample.

2. What is p-value?

○ Type I Error (False Positive): Rejecting a true null hypothesis.

○ Type II Error (False Negative): Failing to reject a false null hypothesis.

1. Describe the steps involved in hypothesis testing.

○ Formulate null and alternative hypotheses.

○ Select significance level (alpha).

○ Choose the appropriate test.

○ Compute the test statistic.

○ Determine the p-value.

○ Compare p-value with alpha.

○ Make a decision: reject or fail to reject the null hypothesis.

1. Define machine learning.

2. Name one supervised learning algorithm.

2. What is overfitting?

1. Explain the K-Nearest Neighbors algorithm.

1. Compare and contrast supervised, unsupervised, and reinforcement learning.

○ Supervised Learning: Input-output pairs provided; used for classification

○ Unsupervised Learning: Only input data; used for clustering and

○ Reinforcement Learning: Learning through trial and error using feedback