Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views7 pages

Unit 3 Data Science

The document outlines the phases of a data science project, starting from business understanding to feedback collection. Each phase includes definitions, key objectives, tasks involved, and examples, emphasizing the importance of aligning data science efforts with business goals. The process covers everything from identifying business problems to deploying models and continuously improving them based on feedback.

Uploaded by

sanyogbiswal22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Unit 3 Data Science

The document outlines the phases of a data science project, starting from business understanding to feedback collection. Each phase includes definitions, key objectives, tasks involved, and examples, emphasizing the importance of aligning data science efforts with business goals. The process covers everything from identifying business problems to deploying models and continuously improving them based on feedback.

Uploaded by

sanyogbiswal22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT 3 DATA SCIENCE

🔹 1. Business Understanding
✅ Definition:

This is the first and most critical phase of a data science project. It involves gaining a clear
understanding of the business context, goals, objectives, and problems to be solved.

✅ Key Objectives:

 Understand the domain and industry.


 Translate business problems into data science problems.
 Define success criteria from a business perspective.

✅ Tasks Involved:

 Meeting stakeholders (e.g., marketing, finance, sales)


 Identifying pain points (e.g., churn, low sales)
 Formulating problem statements (e.g., “Predict customer churn”)
 Understanding constraints (budget, time, data availability)

✅ Example:

If a retail company has declining sales, the business problem might be:
“Identify key reasons for sales drop and predict future sales to improve stock management.”

🔹 2. Analytics Approach
✅ Definition:

This phase outlines the analytical techniques and methodologies suitable for solving the
defined business problem.

✅ Key Objectives:

 Decide whether the problem is classification, regression, clustering,


recommendation, etc.
 Select modeling techniques: statistical models, machine learning, deep learning, etc.
 Define performance metrics: accuracy, precision, recall, RMSE, etc.

✅ Tasks Involved:

 Mapping problem to algorithms


 Selecting evaluation strategies (cross-validation, A/B testing)
 Considering data types (structured/unstructured)

✅ Example:

If the task is to identify customer churn, logistic regression or decision trees could be
appropriate approaches.

🔹 3. Data Requirements
✅ Definition:

In this step, the data scientist defines what kind of data is needed to perform the analysis or
modeling.

✅ Key Objectives:

 Identify data sources and required attributes.


 Determine volume, velocity, and variety of data.
 Understand data granularity and frequency.

✅ Tasks Involved:

 Listing required data columns (e.g., age, purchase history)


 Data sampling or aggregation needs
 Noting privacy/security constraints

✅ Example:

For customer behavior analysis, data might be required on demographics, browsing history,
purchase history, and support tickets.

🔹 4. Data Collection
✅ Definition:

This is the process of gathering the required data from internal or external sources.
✅ Key Objectives:

 Acquire data from reliable sources.


 Ensure data accessibility and availability.
 Maintain data integrity during collection.

✅ Tasks Involved:

 Extracting data from databases, APIs, files, web scraping


 Working with data engineers to access data lakes/warehouses
 Storing data securely

✅ Example:

Collect customer transaction data from SQL databases and social media engagement data via
APIs.

🔹 5. Data Understanding
✅ Definition:

This phase involves exploratory data analysis (EDA) to understand data quality, patterns,
anomalies, and relationships.

✅ Key Objectives:

 Understand distributions, missing values, and data types.


 Identify trends, outliers, or inconsistencies.
 Assess whether the data is suitable for analysis.

✅ Tasks Involved:

 Statistical summaries (mean, median, std dev)


 Visualization (histograms, box plots, scatter plots)
 Correlation analysis

✅ Example:

Plotting customer age distribution, checking if older users buy more, spotting missing income
values.

🔹 6. Data Preparation
✅ Definition:

Also known as data wrangling or data preprocessing, this phase prepares data for analysis
by cleaning and transforming it.

✅ Key Objectives:

 Improve data quality.


 Format data correctly for algorithms.
 Create new features or variables.

✅ Tasks Involved:

 Handling missing values (imputation or removal)


 Removing duplicates
 Feature scaling (normalization, standardization)
 Encoding categorical variables
 Creating derived variables

✅ Example:

Convert gender column to 0/1, scale income values, and fill missing ages using median age.

🔹 7. Modeling
✅ Definition:

This is the phase where machine learning or statistical models are trained using the
prepared data.

✅ Key Objectives:

 Select suitable algorithms.


 Train models using training data.
 Tune model parameters for best performance.

✅ Tasks Involved:

 Model training and testing


 Cross-validation
 Hyperparameter tuning (e.g., using Grid Search)
 Comparing different models

✅ Example:
Train a decision tree and a random forest to predict customer churn and compare their
accuracy.

🔹 8. Evaluation
✅ Definition:

Evaluate how well the model performs using defined metrics and business expectations.

✅ Key Objectives:

 Check if the model solves the business problem effectively.


 Measure performance on unseen/test data.
 Verify with stakeholders if results are actionable.

✅ Tasks Involved:

 Calculate metrics (Accuracy, Precision, Recall, AUC, RMSE, etc.)


 Confusion matrix analysis
 Business validation: “Is this useful?”

✅ Example:

Your churn model predicts 85% accuracy, but business asks, “Can it detect high-value
customers who might leave?”

🔹 9. Deployment
✅ Definition:

The process of putting the model into production so it can be used in real-world scenarios.

✅ Key Objectives:

 Make the model accessible (as an app, API, or embedded in software).


 Ensure system integration.
 Plan for scalability and monitoring.

✅ Tasks Involved:

 Model exporting and hosting (Flask, FastAPI, AWS, Azure)


 Creating dashboards or user interfaces
 Scheduling model retraining if needed

✅ Example:

Deploy the churn prediction model in a CRM tool so sales teams get churn alerts for follow-
up.

🔹 10. Feedback
✅ Definition:

Collecting and analyzing feedback to improve the model or system continuously.

✅ Key Objectives:

 Measure real-world effectiveness.


 Track changes in data (data drift).
 Adapt model to new business conditions.

✅ Tasks Involved:

 Monitor predictions and accuracy over time.


 Collect user/stakeholder feedback.
 Plan versioning and retraining.

✅ Example:

If the churn model accuracy drops after 3 months due to new customer behavior, retrain it
with recent data.

You might also like