Data Analyst Workflow (Day-to-Day / Project-Based)
1. Understand the Problem / Objective
• Purpose: Know what the business wants to measure or improve (sales trends,
customer churn, etc.).
• Tools: None specifically; mainly meetings, notes, or documentation.
2. Data Collection / Extraction
• Purpose: Gather relevant data from databases, spreadsheets, APIs, or other
sources.
• Tools & When to Use:
o SQL: When the data is in a relational database (e.g., MySQL, PostgreSQL).
Use it to pull exactly what you need.
o Python / R: For web scraping, APIs, or large datasets.
o Excel: For small datasets or ad-hoc data from CSVs, manual reports.
3. Data Cleaning / Preprocessing
• Purpose: Make the data usable by fixing errors, missing values, duplicates, and
standardizing formats.
• Tools & When to Use:
o Excel: Small datasets or simple tasks like removing duplicates, correcting
typos, or quick filters.
o SQL: When cleaning data in a database before exporting (e.g., filtering rows,
joining tables, aggregating).
o Python (pandas) / R (dplyr): For large datasets, complex transformations,
automated cleaning, and reproducibility.
• Tip: If it’s a one-off small dataset, Excel is fine; for repeated, large, or multi-table
cleaning, use SQL or Python.
4. Exploratory Data Analysis (EDA)
• Purpose: Understand patterns, distributions, trends, and anomalies in the data.
• Tools:
o Python / R: For plotting histograms, scatterplots, boxplots, correlation
analysis.
o Excel / Power BI / Tableau: Quick visual summaries, pivot tables, basic
charts.
• Tip: Python/R is better for deeper statistical understanding, dashboards are better
for business storytelling.
5. Analysis / Modeling
• Purpose: Derive insights, test hypotheses, and predict trends.
• Tools:
o Python / R: Regression, clustering, forecasting, hypothesis testing.
o Excel: Simple calculations, trendlines, correlation, or basic pivot table
analysis.
o Power BI / Tableau: Visualize insights or KPI metrics, create dashboards.
• Tip: Complex analysis → Python/R; simple analysis → Excel; storytelling →
dashboards.
6. Reporting / Visualization
• Purpose: Present actionable insights to stakeholders in a clear, visual, and
understandable way.
• Tools:
o Power BI / Tableau: Interactive dashboards.
o Excel: Static reports, charts, pivot tables.
o Python / R (matplotlib, seaborn, ggplot): For custom, reproducible charts
for technical reports.
7. Documentation & Archiving
• Purpose: Ensure your work can be replicated or audited.
• Tools:
o Git / GitHub: Version control for code.
o Excel / CSV / Database: Store cleaned datasets.
o Markdown / Confluence / Notion: Document methodology, assumptions,
transformations.