Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views26 pages

Data Analysis Portfolio

Uploaded by

Tej 95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

Data Analysis Portfolio

Uploaded by

Tej 95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Analysis Portfolio

- By Tejas Sapkal
Professional Background:-

Currently I am working with Tata Consultancy Services and I have 3.5 years of experience in
Data Analytics in banking domain(HDFC Bank) with transforming complex data into actionable
insights with exceptional skills of statistical analysis, data visualization and client
collaboration. Developed and executed detailed management reports using advanced Excel
functions and Power BI, leading to the identification of five key inefficiencies that resulted in a
30% improvement in data processing timelines.

Junior Data Analyst


Jan 2023 – Present
Tata Consultancy Services
HDFC Web Application | Banking Domain | Mumbai, India
Technologies: Power BI | Python | Excel | SQL | Dax | EDA | Data Cleaning | Data
Transformation
● Leveraged advanced Excel features (VLOOKUP, Pivot Tables, conditional formatting)
and Power BI for data visualization to create detailed management reports.
● Improved data accuracy by 30% and reduced report generation time by 20%, enabling
faster, data-driven decisions and boosting organizational efficiency.
● Developed advanced SQL queries to extract actionable insights, reducing data
processing time by 40% and strengthening decision-making of marketing team.
● Maintained strong client relationships by delivering personalized support, saving
approximately 150 client hours annually.Achieved a 98% satisfaction rate and
increased client retention by 25% through swift responses (under 12 hours) and
seamless deliverable execution
● Delivered actionable insights to internal teams by developing comprehensive, data-
driven reports and dashboards, enabling faster, informed decisions on marketing
campaigns, customer engagement, and operational improvements.

Production Support Engineer


Oct 2021 – Dec 2022
Tata Consultancy Services

Mumbai, India
Technologies: SQL | Excel | JIRA | App Dynamis | Service Now
● Monitored and enhanced portal performance and stability through application
monitoring.
● Key responsibilities included incident management, application monitoring,
collaboration, troubleshooting and debugging, documentation, and reporting.
Table of Contents:-

1. Professional Background …1
2. Table of contents …2
3. Project 1:- Data Analytics Process …3
4. Project 2:- Instagram User Analytics …4-6
5. Project 3:- Operation & Metric Analytics …7-10
6. Project 4:- Hiring Process Analytics …11-13
7. Project 5:- IMDB Movie Analysis …14-17
8. Project 6:- Bank Loan Case Study …18-20
9. Project 7:- Impact of Car Features …21-22
10. Project 8:- ABC Call Volume Trend …23-25
11. Conclusion …26
12. Appendix …27
Project 1:- Data Analytics Process

From this project we have understand the stepwise Data Analytics Process.with the aspects
of Planning, Prepration, Process, Analysis, sharing the Idea and actual working.

Below is example for the same:-

1. Plan

• Launch the operation Sindhoor to eliminate terrorist training camps in Pakistan

2. Prepare

• Take the permission for the operation from higher authorities.

• Plan the operation with all the details and security with involvement all tri services.

3.Process

• Guided munitions and specialized forces were mobilized for the operation

• Extensive surveillance and reconnaissance were conducted to identify and confirm the
locations of nine terrorist camps.

• Use Precision guided missiles.

• The operation resulted in the complete destruction of the identified terror camps, with no
reported casualties among Indian forces.

4. Analyze

• Analyzed the plan and ensured the secrecy of operation.

• The operation successfully neutralized key terrorist infrastructure, disrupting


planned attacks.

• Ensured that no civilian were harmed.

5. Share

• Prime Minister Narendra Modi addressed the nation, emphasizing the operation's success
and India's commitment to national security.

• The operation was extensively covered by national and international media, highlighting
India's strategic response to terrorism.

6. Act

• Suspended Indus water treaty after Pakistan’s attack.

• It has been consider as non retaliatory act and strictly warned Pakistan.
Project 2:- Instagram User Analytics

As a data analyst for Instagram’s product team, our goal is to provide insights on user
behaviour and engagement using SQL queries in MySQL Workbench and assist marketing
teams, product managers, developers, and investors in making informed strategic decisions.

This analysis is organized into two main domains:

A) Marketing Analysis

B) Investor Metrics

A) Marketing Analysis

1. Loyal User Reward

Goal: Identify the five oldest users on the platform to reward them.

Approach:

• Query the users table, order by registration date in ascending order.

• Use LIMIT 5 to fetch the earliest users.

2. Inactive User Engagement

Goal: Identify users who have never posted a photo.

Approach:

• Use a LEFT JOIN between users and photos.

• Filter users where photo_id IS NULL.

Output: List of inactive users (e.g., Aniya_Hackett, David.Osinski47, etc.)


3. Contest Winner Declaration

Goal: Identify the user with the most likes on a single photo.

Approach:

• Join likes with photos and group by photo_id.

• Use COUNT to find total likes and ORDER BY to get top 1.

Output:

• Winner: Zack_Kemmer93

4. Ad Campaign Launch

Goal: Find the best day for new user registrations (to optimize ad placement).

Approach:

• Extract DAYNAME() from registration timestamps.

• GROUP BY day of week, COUNT registrations, and ORDER BY.

Output:

• Highest registrations on Thursday and Sunday.

• Recommended ad launches on either of these days

B) Investor Metrics

1. User Engagement

Goal: Measure platform health by understanding user content contribution.

Approach:

• Count total number of posts and total users.

• Average = Total Posts / Total Users.

Output:

• Average posts per user = 2.5700


2. Bots & Fake Accounts

Goal: Detect users likely to be bots by suspicious behavior.

Approach:

• Join likes with photos and count total photos on the site.

• Identify users who have liked every single photo (suspicious activity).
Project 3:- Operation & Metric Analytics

Used SQL to analyze performance, workflow efficiency, and user engagement, including
detecting metric spikes and improving business operations.

Tech Stack: MySQL Workbench


Key Focus Areas:

• Operational Metrics

• User Growth & Retention

• Device & Email Engagement

• Language Trends

• Data Quality (duplicates/outliers)

Case study 1:-Job Data Analysis

This focuses on workflow productivity and quality through the job_data table.

1. Jobs Reviewed Over Time

Objective: Calculate number of jobs reviewed per hour for each day in November 2020.

SQL Strategy:

• Filter by date range '2020-11-01' to '2020-11-30'

• Extract hour from job timestamp

• Group by date and hour

Insight:

• Identifies peak working hours and daily load patterns for operational planning

2. Throughput Analysis

Objective: Calculate 7-day rolling average of throughput (events per second).

SQL Strategy:

• Calculate daily throughput using COUNT(events) / SUM(time_spent)

• Use a window function (OVER ORDER BY) to compute 7-day average

We chose daily metrics for timely operational decisions and visibility into day-specific
anomalies.
3. Language Share Analysis

Objective: Compute language usage percentage over the last 30 days.

SQL Strategy:

• Filter rows by ds >= CURDATE() - INTERVAL 30 DAY

• Count per language, then compute percentage over total

Insight:

• Flags dominant content languages — critical for multilingual team allocation or AI


model training.

4. Duplicate Row Detection

Objective: Identify duplicate job entries.

SQL Strategy:

• Group all columns and count records > 1

Insight:

• Pinpoints data duplication issues that could skew analytics or inflate metrics.

Case Study 2: Investigating Metric Spike

Analyzes user activity trends and identifies behavioral shifts based on three tables: users,
events, email_events.

5. Weekly User Engagement

Objective: Calculate active users per week.

SQL Strategy:

• Group events by WEEK(event_time) and count distinct users

Insight:

• Highlights usage patterns, event surges, or slowdowns that may relate to product
changes or issues.
6. User Growth Analysis

Objective: Monitor new user acquisition trends.

SQL Strategy:

• Group sign-ups by week

Insight:

• Reveals adoption momentum, correlates with marketing pushes or seasonal effects.

7. Weekly Retention Analysis

Objective: Track weekly retention by signup cohort.

SQL Strategy:

• Build a cohort by signup_week

• Join with events table

• Compare event week to signup week

Insight:

• Assesses product stickiness and engagement curve after onboarding.

8. Weekly Engagement per Device

Objective: Compare activity across devices weekly.

SQL Strategy:

• Group by device_type and week

Insight:

• Prioritizes platform optimization (e.g., mobile vs desktop).

9. Email Engagement Analysis

Objective: Measure user behavior from email campaigns.

SQL Strategy:

• Count recipients, opens, and clicks by campaign or user

Insight:

• Improves email performance, guiding timing/content refinement.


Project 4:- Hiring Process Analytics

This project focuses on analyzing employee-related data using Excel, aiming to


derive insights on hiring trends, salary distribution, departmental representation, and
position tiers. By using Excel functions, pivot tables, and visualization tools, we will
transform raw data into meaningful business Data.

Approach:-

1. Data Preparation

o Download and clean the dataset.

o Check for missing values and inconsistencies.

2. Analysis & Calculations

o Apply PivotTables, AVERAGE(), and FREQUENCY() functions for analysis. o Summarize


department-wise employee counts.

3. Visualization

o Generate pie charts, bar graphs, and column charts for better insights.

o Use Excel’s Insert Chart feature to represent salary tiers and departmental distributions.

A) Hiring Analysis

Goal: Determine gender distribution in hiring.

Excel Technique:

• Use PivotTable with "Gender" in rows and count of "Employee ID".

• Alternatively, use =COUNTIF(GenderRange, "Male") and =COUNTIF(GenderRange,


"Female").

Insight:

• Males Hired: 2,563

• Females Hired: 1,856

• Balanced gender hiring or areas for improvement can be explored.


B) Salary Analysis

Goal: Calculate average salary offered.

Excel Technique:

• Use =AVERAGE(SalaryRange)

Insight:

• Average Salary: ₹49,983 — gives a benchmark for compensation planning.

C) Salary Distribution

Goal: Classify salaries into intervals to understand distribution.

Excel Technique:

• Create bins (e.g., ₹30K–₹40K, ₹40K–₹50K, etc.).

• Use =FREQUENCY(SalaryRange, BinRange) or Histogram chart under Data Analysis


ToolPak.

Insight:

• Most employees earn between ₹40,000–₹50,000, indicating mid-tier salary


dominance.

D) Departmental Analysis

Goal: Analyze employee distribution by department.

Excel Technique:

• Create a PivotTable: "Department" in rows, "Employee ID" in values (count).

• Insert a Pie Chart or Bar Chart to visualize.

Insight:

• Service Department: Highest staff

• HR Department: Lowest staff — may indicate lean operations or potential resourcing


needs.
E) Position Tier Analysis

Goal: Visualize employee count across job grades/titles (e.g., m6, n9).

Excel Technique:

• Use PivotTable with "Position Tier" and count of employees.

• Insert Column Chart to show tier distribution.

Insight:

• Tiers like m6, m7, n10, n6, n9 have fewer people — likely senior-level roles with
selective hiring.
Project 5:- IMDB Movie Analysis

Understand what factors influence a movie's success on IMDB, measured by high ratings, to
guide producers, directors, and investors.

Step 1: Data Cleaning & Preparation

Goals:

• Ensure data quality for meaningful analysis

• Prepare dataset for genre/language parsing, numeric operations

Actions:

• Identified and handled missing values using mean or mode

• Removed duplicate entries

• Standardized data types (e.g., converted durations to minutes, numeric formats)

Step 2: Exploratory Data Analysis

A) Movie Genre Analysis

Task: Analyze frequency and influence of genres on IMDB score.

Excel Functions Used:

• COUNTIF, AVERAGE, MEDIAN, MODE, MIN, MAX, VAR, STDEV

Findings:

• Most common genre: Comedy with ~244 movies

• Analyzed descriptive stats for each genre

Metric Example: Comedy

Mean IMDB ~6.4

Median IMDB ~6.3

Range 4.1–8.2

Variance 0.82

Standard Dev 0.9


Insight:

• Genre alone doesn’t guarantee success — though genres like Drama or Documentary
tend to show slightly higher averages.

B) Movie Duration Analysis

Task: Investigate relation between duration and IMDB score

Excel Techniques:

• AVERAGE, MEDIAN, STDEV, Scatter Plot with Trendline

Findings:

• Most durations: 60–120 minutes

• Highest-rated movie: 142 mins

• No strong correlation between runtime and score

Visualization:

• Scatter plot shows a slightly upward trend, but not statistically significant.

C) Language Analysis

Task: Study how language influences ratings

Excel Tools:

• COUNTIF, AVERAGE, STDEV per language

Insight:

• Niche or culturally rich languages (Persian, Hebrew) tend to perform better —


possibly due to high-quality independent cinema or festival films.
D) Director Analysis

Task: Identify top-performing directors by rating percentile

Excel Strategy:

• Pivot Table to average IMDB by director

• Applied PERCENTILE.EXC() to extract 85th percentile and above

Top Directors Identified:

• Akira Kurosawa

• Tony Kaye

• Ron Fricke

Insight:

• These directors consistently deliver high-rated films and contribute significantly


(~84%) to top-tier films in the dataset.

E) Budget vs Gross Earnings Analysis

Task: Explore financial correlation + find top profit margin films

Excel Tools:

• CORREL(Budget, Gross)

• Profit = Gross - Budget

• MAX() to get highest profit margin

Findings:

• No strong correlation between budget and earnings

• Top profit margin: ₹523,505,847

Insight:

• High budgets don’t ensure high returns. Lean-budget films with strong storytelling
or viral traction outperform expectations.
Step 4: Reporting & Data Storytelling

Summary:

• Genre trends reveal no dominant winner in quality — success depends on execution,


not category

• Duration doesn’t impact ratings — focus on pacing and content, not length

• Languages with fewer movies may be underrated gems

• Top Directors shape outcomes — investing in visionary creatives pays off

• High budget ≠ high profits — choose stories with strong appeal, not just scale.
Project 6:- Bank Loan Case Study

Used Exploratory Data Analysis (EDA) to uncover patterns in customer and loan attributes that
predict loan default, enabling the company to:

• Minimize financial loss from defaults

• Maximize approval of reliable customers

Step 1: Missing Data Identification & Handling

Task:

Detect missing values, determine severity, and decide on handling techniques.

Approach:

• Used =COUNTBLANK(), =ISBLANK(), and filters in Excel

• Removed columns with >25% missing values (e.g., apartment size, certain document
flags)

• Imputed other missing values using:

Mean or median (for numeric fields like income)

Mode (for categorical variables like occupation type)

Insight:

Removing high-null columns prevents skewed analysis; imputing ensures data


completeness for reliable modeling.

Step 2: Outlier Detection

Task:

Identify and flag unusually high/low values in financial and demographic data.

Approach:

• Calculated Q1, Q3, and IQR for fields like income, credit, goods price, annuity
• Used Excel formulas:

=QUARTILE(), =IQR = Q3 - Q1

• Flagged values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR


Insight:

• Massive outliers in income and credit (some > ₹120M)


• Such anomalies may indicate data errors or exceptional cases that require separate
handling or exclusion in modelling.

Step 3: Data Imbalance Check

Task:

Verify if target variable (loan repayment status) is imbalanced.

Approach:

• Used =COUNTIF() to compare frequency of classes:

• 0: No default
• 1: Default

Insight:

• Strong class imbalance: Significantly more non-defaults than defaults


• Models may need resampling techniques (like SMOTE or stratification) during
prediction modeling

Step 4: Exploratory Data Analysis (EDA)

A. Univariate Analysis

Analyzed distributions of key attributes:

• Credit Amount, Income, Children, Annuity

Insight:

• Most applicants fall in credit band ₹200K–₹400K


• Income shows long-tail distribution (typical for financial data

B. Segmented Univariate Analysis

Compared variables across defaulters vs. non-defaulters.

Insight:

• High credit amounts and higher annuities are strong predictors of default
• Income and number of children are weakly correlated, thus less reliable for risk
evaluation
Project 7:- Impact of Car Features
Objective is To help a car manufacturer optimize pricing strategies and refine product
development by analyzing how car features, market categories, and manufacturers
influence consumer preferences and profitability.

Data Cleaning & Preparation

• Removed missing values, duplicates, and formatted categorical features (e.g., fuel
type, car category).

• Normalized numeric variables like engine size, MPG, and price.

• Parsed multi-category fields (e.g., “Crossover, Flex Fuel”) for accurate segmentation.

Step 2: Market Category Popularity Analysis

Insight:

• Traditional models such as "Crossover" and "Crossover, Diesel" have low popularity
ratings (~2), suggesting consumer fatigue or lack of unique features.

• Enhanced models like "Crossover, Factory Tune" and "Crossover, Flex Fuel" show
exceptional popularity (ratings 60–100+), indicating a demand for performance-
tuned, fuel-flexible options.

Interpretation:

• Focus product development and marketing efforts on innovative or enhanced


variants to drive consumer engagement and sales.

Step 3: Price vs. Engine Power

Insight:

• Clear positive correlation: as engine power increases, the car's price increases.

Interpretation:

• Indicates that powerful engines are positioned as premium offerings—ideal for


performance-driven segments.
Step 4: Feature Importance in Pricing

Insight:

• Engine Cylinders: Strongly linked to higher prices.

• City MPG: Mild influence—possibly tied to eco-conscious buyers.

• Number of Doors: No noticeable impact on pricing.

Interpretation:

• Highlight engine upgrades in high-end product lines.

• Avoid overemphasizing body configurations (like door count) in pricing strategies.

Step 5: Manufacturer-Wise Average Price Analysis

Insight:

• Average prices vary widely across brands.

• Bugatti tops the chart with the highest average prices, consistent with its luxury
branding.

Interpretation:

• Reaffirms the strong role of brand equity and perceived luxury in pricing—an
essential variable in product positioning.

Step 6: Fuel Efficiency vs. Engine Cylinders

Insight:

• As engine cylinder count increases, fuel efficiency (highway MPG) decreases.

Interpretation:

• Fewer cylinders = better mileage

• Important trade-off: performance vs. fuel economy—key in targeting eco-conscious


vs performance buyers.
Project 8:- ABC Call Volume Trend
Project Focuses Optimizing contact center staffing by analyzing 23 days of inbound call
data to reduce abandon rates from 30% to below 10%.

Approach:-

1. Data Preparation

o Cleaned and formatted the dataset.

o Created time buckets (hourly slots from 9 AM to 9 PM).

2. Descriptive Analysis

o Counted total calls per time bucket with COUNTIFS.

o Visualized call volume trends using Excel charts.

3. Manpower Modeling (Day Shift)

o Computed each agent’s effective daily talk time (60% of 7.5 hrs).

o Estimated total call time per bucket.

o Derived the minimum number of agents required to keep abandon rate


below 10%.

3. Night Shift Projection

o Allocated projected calls across night-time buckets.

o Calculated agent requirements using the same productivity assumptions.

4. Scenario Planning & Recommendations

Data Analytics Tasks:

1.Average Call Duration:-

Insights:- The highest average duration is during 10–11 AM (203.33s) and 7–8 PM (203.41s).
The lowest average is during 12–1 PM (192.89s), possibly indicating quicker resolutions or
agents trying to manage call load before lunch.
2.Call Volume Analysis:

Insights:- Highest call volume with 14,626 calls, indicating mostly customer are active
between 11-12 window and after that volume drops. The Chart shows The Volume is high
in morning and it drops as day marches forward.

3.Manpower Planning:

The current rate of abandoned calls is approximately 30%.

So, we can calculate the Avg Agent work per hour at 60% by getting total time on calls and
total no. of agents.

So we have calculated below details:-

Work of all agents in hour= 4573.088611


Avg Agent work per hour at 60%= 25.80050597
Avg no. of agents required for
90%= 38.70075895

From this we have calculated that we need 39 Agents to complete the task at 90% of
efficiency.

Now , lets divide this 39 agents according to call volume throughout the day.

We have calculated the call volume and divided this agents accordingly to to reduce the
abandon rate to 10%.

4. Night Shift Manpower Planning:-

Calculation Assumptions:- An agent works for 6 days a week; On average, each agent takes 4
unplanned leaves per month; An agent's total working hours are 9 hours, out of which 1.5
hours are spent on lunch and snacks in the office. On average, an agent spends 60% of their
total actual working hours (i.e., 60% of 7.5 hours) on calls with customers/users. The total
number of days in a month is 30.

Total days:-30(4 WEEKS)

Agent’s working days:-30 days- 4 days weekly of-4 days leaves.

Agents actual working days:- 22


Working hours:- 9Hr-1.5 Hr-(60% of 7.5Hrs)

Actual Working hours-4.5 Hrs=270 Min.

Availability coef. = (30 − 8) ÷ 30 = 0.733

Each agent will available for = 270*0.733=198 minutes daily

Insights:-

There is sharp rise in call volume in early morning between 5 AM and 9 AM, demanding up
to 77 agents to maintain a 90% answer rate. The midnight-to-4 AM window experiences the
lowest activity, allowing lean staffing with just 15 agents per hour.

Suggetions :-

• We can design 3 overlapping shifts—Late Evening (9 PM–1 AM), Deep Night (1–5
AM), and Pre-Morning (5–9 AM).

• Shared agents from email/social channels can reinforce night shift during low-
demand hours and we can allot the team with the mixture of Experience and new
agents.

• We can implement AI module and Set up on-call reserves for 1–5 AM for support
during ultra-low volume slots with least no. of agents.

• We can give Night Shift Allowance for agents working at night and give rotational
night shift scheduling to avoid health issues.

Strategic Recommendations:

• Create overlapping shift windows to support 24/7 demand.

• Use AI bots and on-call support during ultra-low volume (1–4 AM).

• Assign cross-functional agents from other channels at low-traffic times.

• Implement rotational night shift incentives to balance wellness and coverage.


Conclusion:-
I have learned rich and diverse range of analytical case studies,and each one sharpened a
specific facet of my data analyst skill set. Here’s a consolidated conclusion that highlights
the core learnings, analytical techniques, and strategic insights drawn from all projects:

1. Structured, End-to-End Thinking

Across every project—whether analyzing Instagram user behavior, operational job reviews,
IMDB movie ratings, loan risk profiles, automotive pricing, employee records, or contact
center call volumes—I learned to:

Defined business problems clearly

• Designed logical, multi-step approaches

• Used Excel, SQL, and data visualization to drive insight

• Connected technical outputs to real-world decision-making

Strategic & Business Insight Development

• Instagram Analysis → Revealed which users/posts drive engagement, aiding feature


prioritization

• Operational Analytics → Quantified reviewer load/performance to improve


throughput planning

• HR Hiring Analysis → Understood workforce patterns to guide recruitment &


compensation

• Loan Risk Modeling → Identified risk signals to refine approval strategies

• Car Feature Study → Connected specs to price, guiding product positioning

• IMDB Ratings → Showed how features like genre and language influence perception

• Call Center Planning → Transformed call logs into real-world staffing plans to reduce
customer dissatisfaction and to improve the services.
Appendix:-

Key Excel Functions & Features Applied

• Data Cleaning & Missing Value Handling: ISBLANK(), IF(), COUNTBLANK()

• Statistical Analysis: AVERAGE(), MEDIAN(), MODE(), STDEV(), VAR(),


PERCENTILE.EXC()

• Aggregation & Filtering: COUNTIFS(), SUMIFS(), PivotTables, Slicers

• Outlier Detection: QUARTILE(), IQR calculation (Q3 – Q1)

• Regression/Trend Analysis: Trendlines in scatter plots, correlation via CORREL()

• Frequency Distribution: FREQUENCY(), custom binning

Modeling Assumptions & Parameters

ABC Call Center Project

• Agent productivity = 60% of 7.5 working hours per day

• Target SLA: ≤10% call abandonment

• Average call duration: 139.53 seconds (~2.33 min)

• Availability coefficient (night shifts): 0.733

• Agent working days/month = 22 (accounting for leaves/off)

Loan Default Risk Model

• Removed variables with >25% missing values

• Outliers treated based on IQR thresholds

• Retention analyzed by sign-up cohorts using time segmentation

• Correlation thresholds >±0.3 flagged for risk indicators

Automotive Analysis

• Popularity measured via frequency or model ratings

• Regression included predictors: engine power, fuel type, cylinders, price

• Market categories disaggregated from hybrid genre strings

You might also like