Professional Elective - III
Data Analytics
2025-26
Semester - I
Presented by: Dr. Bhakti S. Ahirwadkar
UNIT - I
What is Data Analytics?
• Data Analytics is the science of analyzing raw
data to find trends, patterns, and insights.
Why is Data Analytics Important?
• Data-driven decisions
• Improved operational efficiency
• Personalization & customer insights
• Fraud detection
• Innovation in various sectors (healthcare, finance, agriculture)
Purpose of Data Analytics
• Purpose:
• Support decision-making
• Improve business performance
• Predict future outcomes
• Improve efficiency
• Help in problem solving and identifying opportunities
Purpose of Data Analytics
Data Analytics Applications
• Business Intelligence: Analyzing sales data to identify top-
selling products and understand customer preferences.
• Marketing Analytics: Tracking campaign performance to
optimize marketing spend and improve return on investment
• Financial Analysis: Analyzing financial data to identify trends
and anomalies, detect fraud, and manage risk
• Healthcare Analytics: Analyzing patient data to improve
treatment outcomes and reduce healthcare costs
• Social Media Analytics: Monitoring social media trends to
understand public opinion and sentiment
Types of Data Analytics
Descriptive Analytics
• Descriptive data analytics helps to summarize
and understand past data.
• It shows what has happened by using tables,
charts and averages.
• Companies use it to compare results, find
strengths and weaknesses and spot any
unusual patterns.
Descriptive Analytics
• Examples:
• Sales reports
• Website traffic stats
• Average rainfall data
• Techniques:
• Data aggregation
• Data mining
• Summary statistics
• Tools: Excel, Power BI, Tableau, SQL
Diagnostic Analytics
• Diagnostic data analytics looks at why
something happened in the past.
• It uses tools like correlation, regression or
comparison to find the cause of a problem.
• This helps companies understand the reason
behind a drop in sales or a sudden change in
performance.
Diagnostics Analytics
• Examples:
• Sales drop in a specific region
• Website bounce rate spike analysis
• Techniques:
• Drill-down
• Correlation analysis
• Data discovery
• Tools: SQL, Python (Pandas, Seaborn), R
Predictive Analytics
• Predictive data analytics is used to guess what
might happen in the future.
• It looks at current and past data to find patterns
and make forecasts.
• Businesses use it to predict things like customer
behavior, future sales or possible risks.
Predictive Analytics
Examples:
• Customer churn prediction
• Weather forecasting
• Stock market trend analysis
Techniques:
• Machine learning
• Regression analysis
• Time-series forecasting
• Tools:
• Python (Scikit-learn, Statsmodels), R, SAS, IBM SPSS
Prescriptive Analytics
Examples:
• Recommending marketing strategies
• Route optimization in logistics
Techniques:
• Optimization algorithms
• Simulation
• Decision analysis
Tools: Python (SciPy, Pyomo), R, IBM Decision
Optimization, Gurobi
Prescriptive Data Analytics
• Prescriptive data analytics helps to choose the
best action or solution.
• It looks at different options and suggests what
should be done next.
• Companies use it for things like loan approval,
pricing decisions and managing machines or
schedules.
Types of Analytics
Method Key Question Example Use Case
Descriptive What happened? Monthly sales report
Diagnostic Why did it happen? Sales drop analysis
Forecasting next quarter's
Predictive What will happen?
revenue
Choosing best marketing
Prescriptive What should we do?
strategy
Approaches of Data Analytics
• Two types of Approaches in data analytics : Qualitative &
Quantitative
Quantitative Analytics
• Uses numerical and measurable data to uncover patterns,
relationships, or trends.
• Focus:
– Numbers, metrics, statistics
– Objective and data-driven
• Techniques:
– Statistical modeling
– Correlation and regression analysis
– Forecasting
– Machine Learning
• Examples:
– "Sales increased by 15% in Q2."
– "Customer churn rate is 8.4%."
– Predicting loan defaults using credit score data.
• Tools:
– Python (NumPy, Pandas), R, SQL, Excel, Power BI
Qualitative Analytics
• Uses non-numerical data to understand behavior, motivations, or
meanings behind actions.
• Focus:
– Text, audio, video, observations
– Subjective and interpretive
• Techniques:
– Content analysis
– Thematic analysis
– Sentiment analysis
– Interviews, open-ended surveys
• Examples:
– "Customers feel our app is hard to navigate."
– "Reviews show frustration with delivery delays."
• Tools:
– NVivo, ATLAS.ti, Excel (for categorization), Python (NLTK, spaCy), R
(tm package)
Approaches of Analytics
Aspect Quantitative Analytics Qualitative Analytics
Numbers,
Data Type Text, images, audio
measurements
Focus What, when, how much Why, how
Approach Objective Subjective
Statistics, ML,
Common Methods Interviews, text analysis
regression
Tools Python, R, SQL, Excel NVivo, ATLAS.ti, NLTK, spaCy
Types of Data
• Structured Data: Tabular, relational databases (e.g.,
SQL)
• Unstructured Data: Text, emails, images, audio,
videos, social media posts
• Semi-Structured Data: JSON, XML, HTML, log files
Structured Data
• Definition: Data that is organized in rows and columns,
typically stored in relational databases or spreadsheets.
• Characteristics:
– Follows a fixed schema (predefined format)
– Easy to store, query, and analyze using SQL
– Machine-readable
• Examples:
– Excel sheets
– SQL databases (MySQL, PostgreSQL)
– Tables with customer names, IDs, sales amounts
• Use Cases:
– Sales analysis
– Employee records
– Financial transactions
Structured Data
• Storage Options:
• Relational Databases (RDBMS)
– Examples: MySQL, PostgreSQL, Oracle DB, Microsoft SQL Server
– Best for transactional data, reporting, and analytics
• Cloud Relational Databases
– Examples: Amazon RDS, Google Cloud SQL, Azure SQL Database
Unstructured Data
• Definition: Data that does not follow a fixed structure or
format, making it difficult to organize in traditional
databases.
• Characteristics:
– Rich in information but hard to analyze
– Requires advanced techniques like NLP, image processing
– Often textual or multimedia in nature
• Examples:
– Emails, PDFs, Word documents
– Images, videos, audio files
– Social media posts, reviews, chat messages
• Use Cases:
– Sentiment analysis from tweets
– Text mining from customer support chats
– Image recognition in security systems
Unstructured Data
• Storage Options:
– NoSQL databases
– (such as MongoDB and Cassandra)
– Object Storage
• AWS S3, Azure Blob Storage, Google Cloud Storage
• Ideal for storing and retrieving large amounts of binary or text data
– Distributed File Systems
• Hadoop Distributed File System (HDFS)
• GlusterFS, CephFS
– Content Management Systems / Repositories
• Alfresco, SharePoint, ElasticSearch (for search on unstructured data)
Semi-Structured Data
• Definition: Data that is not in a tabular format, but has some
organizational properties like tags or metadata.
• Characteristics:
– Falls between structured and unstructured
– Not stored in traditional RDBMS, but can still be queried
– Uses flexible formats like XML or JSON
• Examples:
– JSON or XML files
– HTML pages
– NoSQL databases (MongoDB, Cassandra)
• Use Cases:
– Web API responses
– Logs and event tracking systems
– Application configuration files
Semi Structured Data
• Storage Options:
– NoSQL Databases
• Document stores: MongoDB, Couchbase
• Key-value stores: Redis, Amazon DynamoDB
• Columnar stores: Apache Cassandra, HBase
– Data Lakes with Schema-on-Read
• AWS S3 + AWS Athena
• Azure Data Lake
• Google Cloud Storage + BigQuery
Summary – Types of Data
Type Structure Examples Tools Used
Excel, SQL SQL, Excel,
Structured Tabular
tables Pandas
Emails,
Images, NLP, Image
Unstructured No fixed form
Videos, Social Processing, AI
Media
Semi- JSON, XML, NoSQL, Regex,
Partial tags
Structured HTML Python (json)
Data Types Based on Nature & Content
Data & Data Types (Nominal Data)
• Categories or names that cannot be ordered or ranked
• Used to categorize observations into groups, and the groups
are not comparable.
• Examples:
• Gender (Male or female),
• Race (White, Black, Asian),
• Religion (Hinduism, Christianity, Islam, Judaism), and
• Blood Group (A, B, AB, O).
• Represention: frequency tables and bar charts (number or
proportion of observations in each category.
Nominal Data
• Analysis: using non-parametric tests (do not make any
assumptions about the underlying distribution of the data.
• Common non-parametric tests for nominal data: Chi-
Squared Tests and Fisher’s Exact Tests.
• These tests are used to compare the frequency or proportion
of observations in different categories.
Data & Data Types (Ordinal)
• Categories that can be ordered or ranked.
• However, the distance between categories is not necessarily equal.
• Used to measure subjective attributes or opinions, where there is a natural
order to the responses.
• Examples:
• Education level (Elementary, Middle, High School, College),
• Job position (Manager, Supervisor, Employee), etc.
• Representation: bar charts, line charts. (Show the order or ranking of the
categories)
• Analysis: using non-parametric tests
• Common non-parametric tests : Wilcoxon Signed-Rank test and Mann-
Whitney U test.
Data & Data Types (Quantitative /
Numeric)
(DISCRETE DATA)
• Discrete Data: Countable values, limited to whole
numbers or integers (The scale is quantitative, but it does
not take up all the space.)
• Cannot be subdivided into smaller parts
• This type of data fits into specific categories and is
essential for various types of statistical analysis because it
is straightforward to summarize and compute.
Quantitative / Numeric
(Discrete Data)
• Examples:
• Size of your department’s workforce,
• Number of new clients acquired in a quarter,
• Number of tickets sold per day,
• Number of students attending a class, number of children in a family
• This data is typically visualized using bar graphs
Data & Data Types (Quantitative / Numeric)
(Continuous Data)
• Continuous Data: the scale takes up all the space.
• It involves all the values from - ∞ to + ∞ and can be fractional.
• Example: we can measure time in days, hours, seconds,
milliseconds, and so on.
• The continuous scale is determined throughout all possible
values.
• Examples: daily wind speeds, freezer temperatures, and the
weight of newborn babies, height of children, speed of car.
Parameter Discrete Data Continuous Data
Continuous data falls on a
Meaning clear spaces between values
continuous scale
Can you count the Yes, data is usually in units. Counted in
Generally, no
data? whole numbers
Can you measure the
No Yes
data?
Infinite number of possible
Finite number of possible values. values.
Values
Cannot be subdivided into smaller pieces Can be subdivided into smaller
and smaller pieces
Graphical
Bar Charts Histogram
Representation
No. of students in a class, No. of workers in
Examples Height, weight etc
a company
Supports a wide range of
Mathematical arithmetic operations like addition and mathematical operations,
Operations counting. including addition, subtraction,
multiplication, and division.
Descriptive characteristics:
Statistical Analysis frequency counts, proportions, and other mean, median, mode, range,
general characteristics is more suited standard deviation etc.
Data Analytics Life Cycle
• 1. Understand the Business
• 2. Understand Data Requirements
• 3. Data Preparation
• 4. Exploring and visualizing the Data
• 5. Model and Analyzing the Data
• 5. Model Evaluation
• 6. Operationalize
Data Analytics Life Cycle
• Business Understanding
• Focuses on defining project goals and requirements from the business perspective.
– Determine Business Objectives
– Determine Analytics Outcome
– Assess Situation
– Assess IT Infrastructure
– Produce Project Plan
• Data Understanding
• Involves collecting, describing, and exploring data.
– Select Data Sources
– Select Initial Data Set
– Describe Data
– Explore Data
– Verify Data Quality
– Select Final Data Set
• Data Preparation
– Cleans, transforms, and organizes data for modeling.
– Clean Data
– Construct Data
– Integrate Data
– Format Data
– Add Labels (if supervised learning)
– Split Data into Training and Testing Sets
• Modeling
– Building and testing analytical models.
– Select Suitable Modeling Techniques
– Generate Test Design
– Apply Selected Modeling Techniques
– Assess Model
• Evaluation
– Ensures the model is valid and meets business objectives.
– Apply Test Set
– Interpret Results
– Cross-Check Results
– Final Model Training
– Review Process
– Determine Next Steps
• Deployment
– Implements results into the business process.
– Analyze Customer’s Environment
– Plan Deployment
– Plan Monitoring & Maintenance
– Produce Final Report
– Review Project
• Model Management (Extended Phase)
– Validate Model Performance
– Monitor Input Data
– Interpret Results
Issues in Model Predictions
• Incorrect predictions by model - check:
– "Data Preparation" and
– "Modeling" phases of the Data Analytics Lifecycle
• Data Preprocessing: This is the most common cause of poor model performance or
incorrect predictions.
– Incomplete or incorrect data cleaning
– Missing values not handled properly
– Incorrect feature encoding or scaling
– Data leakage (future data used in training)
• Revisit if:
– Features aren't contributing meaningfully
– There's bias or variance in predictions
Issues in Predictions
• Issues in the model design or tuning can hurt predictions.
– Wrong model for the problem type
– Underfitting or overfitting
– Poor hyperparameter tuning
– Lack of cross-validation
• Revisit if:
– Data seems clean, but predictions are off
– Model has high training accuracy but poor generalization
Issues in Predictions
• Check how you're evaluating the model.
• Are you using the right metrics?
• Is your test set representative?
• Any data leakage in evaluation?
• Revisit if:
• You’re unsure how "bad" the predictions are
• Metrics don’t align with the business goal
Data Analytics Tools
• Python: Data manipulation, machine
learning, visualization
• R: Statistical analysis, data visualization
• SQL: Data querying and extraction
Python in Data Analytics
• Libraries:
- NumPy, Pandas: Data manipulation
- Matplotlib, Seaborn: Visualization
- Scikit-learn: Machine Learning
• Open-source, flexible, large community
R in Data Analytics
• Strong statistical capabilities
• Libraries: ggplot2, dplyr, caret
• Suitable for academic and statistical research
• Tools: RStudio, Shiny
SQL in Data Analytics
• Structured Query Language for databases
• Perform data extraction, filtering, aggregation
• Essential for working with relational databases
Comparison: Python vs R vs SQL
• Python: Versatile, good for ML and large-scale
applications
• R: Great for statistical analysis and visualization
• SQL: Efficient for data extraction and
manipulation
Applications of Data Analytics
• Healthcare: Disease prediction, diagnostics
• Agriculture: Crop yield prediction, stress analysis
• Banking: Fraud detection, credit scoring
• Retail: Customer segmentation, recommendation systems
• Manufacturing: Predictive maintenance
Future of Data Analytics
• • Integration with AI and machine learning
• • Real-time and streaming analytics
• • Edge computing and IoT
• • Cloud-based analytics platforms