Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views6 pages

Foundation of Data Science Imp

The document provides an overview of key concepts in Data Science, including definitions, applications, and essential skills for data scientists. It covers topics such as data preprocessing, exploratory data analysis, statistical foundations, and machine learning, detailing processes like data cleaning, hypothesis testing, and different learning algorithms. Each section includes questions and answers to facilitate understanding of the material.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Foundation of Data Science Imp

The document provides an overview of key concepts in Data Science, including definitions, applications, and essential skills for data scientists. It covers topics such as data preprocessing, exploratory data analysis, statistical foundations, and machine learning, detailing processes like data cleaning, hypothesis testing, and different learning algorithms. Each section includes questions and answers to facilitate understanding of the material.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit 1: Introduction to Data Science

1 Mark Questions:

1.​ Define Data Science.​


Ans: Data Science is an interdisciplinary field that uses scientific methods,
algorithms, and systems to extract insights from structured and unstructured
data.​

2.​ Mention one application of Data Science.​


Ans: Fraud detection in banking.​

2 Mark Questions:

1.​ List any two skills required for a data scientist.​


Ans: Programming (e.g., Python), and knowledge of statistics.​

2.​ What is the difference between data and information?​


Ans: Data are raw facts and figures, while information is processed data that is
meaningful.​

3 Mark Questions:

1.​ Explain the lifecycle of data science.​


Ans: It includes:​

○​ Data collection​

○​ Data cleaning​

○​ Data exploration​

○​ Modeling​

○​ Evaluation​

○​ Deployment​

4 Mark Questions (Long Answer):


1.​ Explain the role and responsibilities of a data scientist.​
Ans:​
A data scientist analyzes large sets of data to find actionable insights. Their role
includes data cleaning, statistical analysis, model building, and interpreting
results to support decision-making. They work closely with business teams to
identify problems and provide data-driven solutions using tools like Python, R,
SQL, and machine learning algorithms.​

Unit 2: Data Preprocessing and Data Wrangling

1 Mark Questions:

1.​ What is data cleaning?​


Ans: It is the process of detecting and correcting errors in data.​

2.​ Name any one technique used for handling missing data.​
Ans: Mean imputation.​

2 Mark Questions:

1.​ What is normalization?​


Ans: Normalization is scaling data to fall within a small, specified range like [0,1].​

2.​ Define data wrangling.​


Ans: Data wrangling is the process of transforming and mapping raw data into a
more usable format.​

3 Mark Questions:

1.​ Explain outlier detection.​


Ans: Outliers are extreme values that differ from other data. They can be detected
using:​

○​ Z-score method​

○​ Box plot analysis​

○​ IQR method​
4 Mark Questions (Long Answer):

1.​ Discuss various techniques for handling missing data.​


Ans:​

○​ Deletion Methods: Remove rows or columns with missing values.​

○​ Imputation Methods: Fill missing values using:​

■​ Mean/median/mode imputation​

■​ Regression imputation​

■​ KNN imputation​

○​ Advanced Techniques: Use ML models to predict missing values.​

Unit 3: Exploratory Data Analysis (EDA)

1 Mark Questions:

1.​ What is EDA?​


Ans: It is the process of analyzing data sets to summarize their main
characteristics.​

2.​ Name any one graphical tool used in EDA.​


Ans: Histogram.​

2 Mark Questions:

1.​ What is the purpose of a box plot?​


Ans: To visualize the distribution and detect outliers in the data.​

2.​ Mention two summary statistics used in EDA.​


Ans: Mean and standard deviation.​

3 Mark Questions:

1.​ Explain the importance of correlation analysis.​


Ans: It helps in identifying the strength and direction of relationships between
variables, which is crucial for model selection and feature engineering.​

4 Mark Questions (Long Answer):

1.​ Explain the different types of visualizations used in EDA.​


Ans:​

○​ Histogram: Shows the frequency distribution.​

○​ Box Plot: Displays median, quartiles, and outliers.​

○​ Scatter Plot: Shows relationships between two numeric variables.​

○​ Bar Chart: Used for categorical data.​

○​ Heatmap: Used to show correlation matrices.​

Unit 4: Statistical Foundations

1 Mark Questions:

1.​ Define population.​


Ans: Population is the entire set of individuals or items that we are interested in
studying.​

2.​ What is a hypothesis?​


Ans: A hypothesis is an assumption made for the purpose of testing.​

2 Mark Questions:

1.​ Differentiate between population and sample.​


Ans: A population includes all members of a group, while a sample is a subset of
the population.​

2.​ What is p-value?​


Ans: It is the probability of obtaining test results at least as extreme as the
observed results, assuming the null hypothesis is true.​

3 Mark Questions:
1.​ Explain Type I and Type II errors.​
Ans:​

○​ Type I Error (False Positive): Rejecting a true null hypothesis.​

○​ Type II Error (False Negative): Failing to reject a false null hypothesis.​

4 Mark Questions (Long Answer):

1.​ Describe the steps involved in hypothesis testing.​


Ans:​

○​ Formulate null and alternative hypotheses.​

○​ Select significance level (alpha).​

○​ Choose the appropriate test.​

○​ Compute the test statistic.​

○​ Determine the p-value.​

○​ Compare p-value with alpha.​

○​ Make a decision: reject or fail to reject the null hypothesis.​

Unit 5: Introduction to Machine Learning

1 Mark Questions:

1.​ Define machine learning.​


Ans: Machine learning is a method of data analysis that automates analytical
model building.​

2.​ Name one supervised learning algorithm.​


Ans: Linear Regression.​

2 Mark Questions:
1.​ Differentiate between supervised and unsupervised learning.​
Ans: Supervised learning uses labeled data; unsupervised learning uses
unlabeled data.​

2.​ What is overfitting?​


Ans: Overfitting occurs when a model performs well on training data but poorly
on new, unseen data.​

3 Mark Questions:

1.​ Explain the K-Nearest Neighbors algorithm.​


Ans: KNN is a classification algorithm where the output is determined by the
majority label among the k nearest data points.​

4 Mark Questions (Long Answer):

1.​ Compare and contrast supervised, unsupervised, and reinforcement learning.​


Ans:​

○​ Supervised Learning: Input-output pairs provided; used for classification


and regression.​

○​ Unsupervised Learning: Only input data; used for clustering and


association.​

○​ Reinforcement Learning: Learning through trial and error using feedback


from actions (rewards or penalties).​
Each technique is suited to different types of problems and data
availability.​

You might also like