Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views40 pages

Week1 1

week 1- database

Uploaded by

buivsnsong2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views40 pages

Week1 1

week 1- database

Uploaded by

buivsnsong2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Visualization

A comprehensive course on creating effective visual representations of data


to support decision-making using Excel, Tableau, and Python.

Instructor: Dr. Jia Uddin

AI and Big Data Department

Endicott College, Woosong University

https://sites.google.com/view/drjiauddin/home
Course Overview
This course provides students with a comprehensive understanding of data visualization techniques and their practical applications in data-d

Hands-on Learning Advanced Concepts Real-world Application


Create effective visual representations Explore visualization integrated with Design meaningful and interactive vi-
of both quantitative and qualitative Machine Learning and Explainable AI sualizations that enhance data inter-
data using Excel, Tableau, and Python to support managerial insights pretation and support strategic deci-
sion-making

By the end of this course, you'll have both theoretical knowledge and technical skills to create impactful visualizations across various organiz
Course Learning Outcomes
Interpret
Understand the history and evolution of data visualization

Describe
Identify key design principles and techniques for visualizing data effectively

Develop
Build fundamental communication skills required for effective data presentation

Apply
Gain introductory competency with visualization software tools

Create
Identify, understand, analyze, prepare, and present effective visualizations on various topics
Course Textbooks
Primary Texts

• AI Visualization: From Fundamentals to Practical Applica-


tions by Jia Uddin (Woosong Publisher, 2024)
• Introduction to Data Analytics: A Complete Guide to Be-
ginners by Jia Uddin (Woosong Publisher, 2021)

Supplementary Resources

• Jumpstart Tableau: A Step-By-Step Guide to Better Data


Visualization by Arshad Khan
• Visual Analytics with Tableau by Alexander Loth
• Tableau for Beginners by Nurul Haszeli Ahmad
• Getting Started with Tableau 2019.2 by Tristan Guillevin
Course Schedule: First Half
Week 1 1
Introduction to Data and Data Visualization

Understanding Data Handling Fundamentals


2 Week 2
Getting Started with Excel as a Visualization Tool

Week 3 3 Basic Chart Creation and Formatting

Introduction to Tableau as a Visualization Tool

Basic Libraries and Functions 4 Week 4


Aggregate Functions, Calculated Fields, and Parameters

Week 5 5 Advanced Data Manipulation

Charting Data and Map Visualizations

Interactive Dashboards and Stories in Tableau 6 Week 6


Tableau Group, Set, and Cluster Analysis

Week 7 7 Advanced Data Organization

Live Data Visualization

Google Form, Google Cloud, Tableau Desktop, and Tableau Online


Course Schedule: Second Half
Week 8 1
Midterm Examination

No Lecture
2 Week 9
Data Visualization with Matplotlib

Week 10 3 Advanced Python Visualization

Data Manipulations using NumPy Library

Histogram, Boxplot, Percentile, and Data Pre-Processing 4 Week 11


Tabular Data Classification using Machine Learning

Week 12 5 Building and Visualizing Classification Models

Applying Explainable AI to Classification Results

Understanding and Visualizing Model Decisions 6 Week 13


Tabular Data Prediction using Machine Learning Regressors

Week 14 7 Building and Visualizing Regression Models

Applying Explainable AI to Regression Results

Interpreting Complex Models Visually 8 Week 15


Final Exam/Project Presentation

No Lecture
Grading Scale
Grade GPA Range Definition Percent

A+ 4.25 ~ 4.50 Exceptional Achievement 95 - 100

A 3.75 ~ 4.24 Outstanding Achievement 90 - 94

B+ 3.25 ~ 3.74 Very Good Achievement 85 - 89

B 2.75 ~ 3.24 Good Achievement 80 - 84

C+ 2.25 ~ 2.74 Satisfactory Achievement 75 - 79

C 1.75 ~ 2.24 Average Achievement 70 - 74

D+ 1.25 ~ 1.74 Minimal Achievement 65 - 69

D 0.75 ~ 1.24 Minimal Achievement 40 - 64

F 0 ~ 0.74 Unsatisfactory Achievement 0 - 39

Grades are assigned based on overall performance in exams, assignments, projects, and class participation.
Grading Curve
Grade Freshman Sophomores Junior Graduate

A+ ~ A 20% or less 30% or less 30% or less 50% or less

B+ ~ B 50% or less 40% or less 40% or less 50% or less

C+ ~ D 30% or more 30% or more 30% or more ~

F Not included in calculations

Example Distribution (Class of 20 students):

30% 40% 30%

A+ ~ A0 B+ ~ B0 C+ ~ D0
6 students 8 students 6 students
Understanding Data vs. Information

Data Information
• •
Collection of discrete objects, numbers, words, events, facts, measurements, observations, or descriptions Result of processing raw data to reveal meaning
• Raw facts that have not been processed • Transformed data with context and relationships
• Lacks context and meaning on its own • Facilitates decision making
• Organized to answer specific questions
Types of Data Sources

Textual Data Numeric Data Audio Data Visual Data


Sourced from social media Business organization metrics, Speech recordings, industrial Images and videos from cam-
platforms like Facebook and national databases, personal sounds, machine operations, eras, surveillance systems,
Twitter, newspapers, research identification numbers, phone music, and environmental medical imaging equipment,
papers, and online publica- numbers, zip codes, and fi- noise measurements satellite imagery, and mobile
tions nancial records devices
The Data-Information Re-
lationship

Data Information
Data represents raw, unprocessed facts col-
lected from various sources. It has little
Information emerges when
value until organized and analyzed. Data is
data is processed, orga-
the foundation upon which information is
nized, and presented in a
built.
Characteristics of Data: way that makes it mean-
ingful and useful for spe-
• Discrete and unorganized
cific purposes.
• Requires context to be meaningful
Characteristics of
Information:
• Often voluminous and varied
• Can be structured or unstructured • Contextual and relevant
• Has purpose and meaning
• Supports decision-making
• Adds value to raw data
Data Sources in the Modern World

Web & E-commerce Financial Transactions


Click patterns, browsing behavior, purchase history, product reviews, shopping cartBank
datarecords, credit card usage, stock market trades, digital payments, cryptocurrency tr

Social Networks IoT Sensors


User profiles, connections, engagement metrics, content sharing patterns, sentiment
Environmental
data readings, smart home data, industrial sensors, wearable device measurem
What is Data Science?
Data science is a multidisciplinary Data Science Integrates:
field that manages, manipulates, ex-
tracts, and interprets knowledge from
vast amounts of data to address chal- Computer Science
lenges in big data environments.
According to Harvard Pattern recognition, visualiza-
Business Review: tion, data warehousing, high-
performance computing, data-
"Data Scientist: The Sexiest Job of bases, artificial intelligence
the 21st Century"

Mathematics
While data science principles primar-
ily tackle big data challenges, they Mathematical modeling, algo-
equally apply to smaller datasets, us- rithm development, optimiza-
ing similar methodologies at different tion techniques
scales.
Statistics

Statistical and stochastic mod-


eling, probability theory, hy-
pothesis testing
Data Science: A Multidisciplinary Approach
Data science utilizes theories and techniques from many fields to investigate and analyze large amounts of data, helping decision makers across dive

Applied Across Industries: Core Components:


• Science & Engineering: Research data analysis, experimental design, simulation modeling
Domain Expertise
• Economics & Finance: Market prediction, risk assessment, algorithmic trading
• Industry-specific knowledge to frame problems appropriatel
Politics & Public Policy: Voter analysis, policy impact prediction, social program evaluation
• Education: Learning analytics, personalized education, institutional performance
• Healthcare: Disease prediction, treatment optimization, patient outcome analysis
Technical Skills
• Retail & E-commerce: Customer behavior analysis, recommendation sys-
tems, inventory optimization Programming, database management, machine learning

Statistical Thinking

Hypothesis testing, experimental design, inference


The Data Science Ecosystem

Data science combines domain expertise, programming skills, and statistical knowledge to extract meaningful insights from data.
The visualizations above illustrate how these different disciplines interact within the data science ecosystem to transform raw data
into actionable intelligence through a systematic process of collection, cleaning, analysis, and visualization.
The Data Science Process

01 02

Business Understanding Data Acquisition


Define objectives, requirements, and success criteria for the data science project Collect relevant data from multiple sources and formats

03 04

Data Preparation Exploratory Analysis


Clean, transform, and preprocess data to make it suitable for analysis Discover patterns, trends, and relationships through visualization and statistical methods

05 06

Modeling Evaluation & Deployment


Apply algorithms and build models to extract insights or make predictions Assess model performance and implement solutions in production environments
Data Science in Action
Video Resource

To better understand what data science encompasses and how it's applied in real-world scenarios, watch this informative video:

Video link: https://www.youtube.com/watch?v=X3paOmcrTjQ

Key Takeaways from the Video:

• Data science combines multiple disciplines to extract knowledge from data


• The field requires both technical and analytical skills
• Real-world applications span numerous industries
• Data visualization plays a crucial role in communicating insights
• Modern tools and technologies facilitate complex data analysis
Case Study: Diabetes Prevention
What if we could predict diabetes occurrence and take preventive measures?

This case study demonstrates how data science can be applied to healthcare
to predict and prevent diabetes. By analyzing patient data, we can identify
risk factors and patterns that may lead to diabetes development.

Source: https://www.edureka.co/blog/what-is-data-science/

Business Value:

• Early intervention for high-risk patients


• Personalized prevention strategies
• Reduced healthcare costs
• Improved patient outcomes
• Population health management
Diabetes Prediction: Step 1 - Data Collection
Attributes Collected:

Attribute Description

npreg Number of times pregnant

glucose Plasma glucose concentration

bp Blood pressure measurement

skin Triceps skinfold thickness

bmi Body mass index

ped Diabetes pedigree function

age Patient age

income Patient income level


Diabetes Prediction: Step 2 - Data Cleaning
Before analysis can begin, we must clean and prepare the data by addressing inconsistencies and issues that would otherwise compromise our results.

Common Data Issues Cleaning Process Organization


• Missing values • Identify and handle missing values • Structure data into tables
• Blank columns • Remove duplicate records • Define appropriate attributes
• Abrupt or outlier values • Standardize formats • Ensure data consistency
• Incorrect data formats • Normalize numerical data • Prepare for analysis
Data Preprocessing Results
Before vs. After Preprocessing

Before Preprocessing: After Preprocessing:

The raw data contains inconsistencies such as: The clean data now features:

• Missing values (blank cells) • Consistent formatting


• Formatting inconsistencies • No missing values
• Potential outliers • Organized structure
• Disorganized structure • Standardized measurements
• Ready for analysis

Proper preprocessing is critical for accurate analysis and modeling, as it directly impacts the quality of insights and predictions.
Step 3: Model Planning
Analytical Sandbox Preparation Visualization Techniques

In this phase, we load the cleaned data into an analytical environment and apply various statistical functions to • Histograms
better understand its characteristics. • Line graphs

Statistical Analysis Functions • Box plots


• Scatter plots
• describe(): Provides summary statistics, identifies missing values, and counts unique values
• Heat maps
• summary(): Generates statistical information including mean, median, range, minimum and maximum values
• correlation(): Measures relationships between variables
• distribution analysis: Examines how values are distributed
Step 4: Model Building
Decision Tree Approach

For this diabetes prediction case, we're using a decision tree model that iden-
tifies and prioritizes the most important factors.

Key Findings:

• Glucose level is the most important predictor (root node)


• Each node determines the next significant parameter
• The model follows branches until reaching a conclusion (pos/neg)
• Multiple pathways can lead to diabetes prediction
• The model enables personalized risk assessment

The decision tree provides both predictive power and interpretability, making
it ideal for healthcare applications where understanding the "why" behind
predictions is critical.
Step 5: Operationalize
Pilot Project Implementation

The final step is to test our diabetes prediction model in a real-world environment through a small pilot
project, ensuring its accuracy and effectiveness before full-scale deployment.

Validation Process Performance Evaluation


• Run the model on a sample patient population • Identify any constraints or limitations
• Compare predictions with actual outcomes • Assess computational requirements
• Measure accuracy, precision, and recall • Evaluate response time for real-time use

Iterative Improvement
• If results are not accurate, return to model planning
• Refine features and parameters
• Consider alternative modeling approaches
What is Data Visualization?
Data visualization is the graphical representation of informa-
tion and data that enables decision-makers to see analytics
presented visually, making it easier to identify patterns, trends,
and outliers within large data sets.
Why Visualization Matters:

• Raw data rarely tells a compelling story on its own


• The human brain processes visual information more efficiently
• Complex relationships become apparent through visual patterns
• Effective visualization leads to faster, more informed decisions

Businesses leverage data visualization for reporting, forecasting, marketing strategy, customer analysis, and operational monitor-
ing—transforming numbers into actionable insights.
The Power of Data Visual-
ization
Visual Representation of Data
Data visualization is the graphical representation of information that
makes complex data more accessible, understandable, and usable.

Identifying Patterns
Visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data that might go unnoticed in text-
based formats.

Enhanced Comprehension
Visualizations make data more understandable by leveraging the brain's
ability to process visual information more efficiently than text or num-
bers.

Data-Driven Decisions
Effective visualizations are essential to analyze massive amounts of in-
formation and make data-driven decisions quickly and accurately.
Goals of Data Visualization
The primary goal of data visualization is to communicate information clearly and
effectively through graphical means.

01 02

Clear Communication Engagement

Present data in a way that is immedi- Make data engaging and easily di-
ately understood by the audience gestible, even for non-technical users

03 04

Pattern Recognition Storytelling

Identify trends and outliers within data Tell a compelling narrative found within the data
sets that might be missed in tables

05

Focus Attention

Highlight the important parts of a data set to drive decision-making


Data Visualization Applications

Making Data Engaging Identifying Trends Telling Stories


Transform complex data into accessi- Recognize patterns, outliers, and rela- Craft compelling narratives from data
ble, digestible formats that capture tionships within data sets that might that connect with audiences on both
audience attention and facilitate un- be invisible in raw numerical form intellectual and emotional levels
derstanding

Highlighting Importance Supporting Arguments


Direct attention to critical aspects of data that require focus, Provide visual evidence to reinforce conclusions, recommen-
action, or further investigation dations, or strategic decisions

The right visualization technique transforms raw data into actionable intelligence
Common Types of Data Visualization
Charts
• Bar Charts
• Line Charts
• Pie Charts
• Scatter Plots
• Area Charts

Tables & Graphs


• Data Tables
• Heat Maps
• Network Graphs
• Tree Maps
• Flow Charts

Maps & Geo-visualizations


• Choropleth Maps
• Point Maps
• Heat Maps
• Cartograms
• Flow Maps

Advanced Visuals
• Infographics
• Dashboards
• Interactive Visualizations
• 3D Visualizations
• Animation

Each visualization type serves specific purposes and is suitable for different types of data and analytical goals.
Bar Charts
Overview

Bar charts use rectangular bars to represent data values, with the length or height of each bar proportional to the value
it represents. They are ideal for comparing discrete categories.

Key Applications:

• Comparing values across categories


• Showing frequency distributions
• Displaying time series data (with time on one axis)
• Illustrating part-to-whole relationships
• Representing survey results

Variations:

• Vertical bar charts


• Horizontal bar charts
• Grouped bar charts
• Stacked bar charts
Line Charts
Overview

Line charts display information as a series of data points connected by


straight line segments. They excel at showing trends over time and continu-
ous data relationships.
Key Applications:

• Tracking changes over time


• Showing trends and patterns
• Comparing multiple series of data
• Forecasting future values
• Identifying correlations

Variations:

• Simple line charts


• Multiple line charts
• Area charts (filled areas below lines)
• Stacked area charts
• Step line charts

Line charts are particularly effective for visualizing time series data and continuous variables, making them ideal for financial data, temperature readings, or any data c
Pie Charts
Overview

Pie charts display data as a circular graphic divided into slices, with each slice
representing a proportion of the whole. The entire circle represents 100% of
the data.
Key Applications:
• Showing part-to-whole relationships
• Displaying percentage or proportional data
• Comparing composition of different groups
• Illustrating survey results

Best Practices:
• Limit to 5-7 slices for clarity
• Order slices by size (largest to smallest)
• Use clear labels and percentages
• Consider alternative charts for comparing multiple categories
Scatter and Bubble Charts
Scatter Charts

Scatter plots display individual data points on a two-dimensional graph, using the position of
points to show the relationship between two variables. They excel at revealing correlations,
clusters, and outliers.
Bubble Charts

Bubble charts are an extension of scatter plots that add a third dimension through the size
of each bubble, representing an additional variable. Color can be used as a fourth dimen-
sion.
Key Applications:

• Identifying correlations between variables


• Detecting patterns and clusters in data
• Finding outliers in a dataset
• Analyzing distribution of data points
• Comparing multiple variables simultaneously (bubble charts)
Visualization Types Overview

Charts

Bar, line, pie, scatter, and other chart types for comparing values, showing trends, and displaying proportions

Tables

Structured rows and columns for precise data values, detailed information, and orderly comparisons

Graphs

Network diagrams, flow charts, and tree structures for showing relationships, hierarchies, and connections

Maps

Geographic visualizations to display spatial data, regional trends, and location-based information

Infographics

Visual representations combining images, charts, and minimal text to tell a data story in an engaging format

Dashboards

Collections of multiple visualizations organized on a single screen for comprehensive monitoring and analysis

Choosing the right visualization type depends on your data characteristics, audience needs, and the specific insights you want to communicate.
Data Tables
Overview
Data tables organize information in rows and columns, providing a structured format for presenting detailed information. Tables excel at showing precise values
and supporting detailed comparisons.

Key Applications:
• Presenting exact numerical values
• Organizing multidimensional data
• Enabling lookups of specific information


Supporting detailed analysis and comparisons
Documenting comprehensive datasets
Tables

Best Practices:
• Use clear headers and consistent formatting
• Align numerical data consistently (typically right-aligned)
• Apply subtle highlighting for important values
• Limit columns to maintain readability
• Consider alternative visualizations for pattern recognition
Tables vs. Charts: When to Use Each
Tables Excel At: Charts Excel At:

• Presenting precise values • Showing trends and patterns


• Handling multiple variables simultaneously • Making comparisons between categories
• Supporting lookup of specific information • Highlighting relationships in data
• Organizing structured data sets • Communicating insights quickly
• Allowing detailed record examination • Creating visual impact for presentations

Use tables when your audience needs to see exact numbers Use charts when you want to communicate patterns, trends,
or look up specific values. Tables work well for detailed anal- or relationships at a glance. Charts are ideal for presenta-
ysis and reference purposes. tions and executive summaries.
Tables vs. Frequency Distributions
Raw Data Table Frequency Distribution

Frequency distributions organize data into groups or intervals, showing how often values occur
within each group. They reveal patterns and distributions that are difficult to see in raw data.

Raw data tables display individual records with multiple variables. They provide complete in-
formation but can be difficult to interpret for patterns or trends.

While raw data tables preserve all original information, frequency distributions and their visualizations make patterns immediately apparent, aiding quick analysis and decision-making.
Infographics: Visual Storytelling with Data
What Are Infographics?

Infographics combine data visualizations, illustrations, text, and images to tell a


story and communicate complex information quickly and clearly. They transform
data into visually engaging stories.
Key Characteristics:

• Blend of visuals and text in a cohesive design


• Focus on a central theme or message
• Simplified presentation of complex information
• Strategic use of color, typography, and imagery
• Logical flow of information (visual hierarchy)

Effective Uses: Example: The COVID-19 dashboard from Worldometers


combines multiple visualization types to present com-
• Explaining complex processes or concepts prehensive pandemic data.
https://www.worldometers.info/coronavirus/#countries
• Summarizing research findings
• Presenting statistics in a memorable way
• Comparing options or approaches
• Making data more accessible to general audiences
Dashboards: Integrated Data Visualization
What Are Dashboards?

Dashboards are visual displays that organize and present multiple related visualizations on a single screen,
providing a comprehensive view of data for monitoring, analysis, and decision-making.

Key Features:

• Multiple visualizations unified by a common theme


• Real-time or near-real-time data updates
• Interactive elements for data exploration
• Customizable views for different users/needs
• Performance indicators and summary metrics
• Visual alerts for critical thresholds

Common Applications:

• Business performance monitoring


• Financial analysis and reporting
• Marketing campaign tracking
• Operations and logistics management
• Website analytics and user behavior
• Healthcare metrics and patient monitoring
Geographic Data Visualization
Map-Based Visualizations

Geographic visualizations display data in relation to


physical locations, allowing for spatial analysis and
regional comparisons. They're particularly effective
for showing patterns across different geographic ar-
eas.
Common Map Types:

• Choropleth maps (shaded regions)


• Symbol maps (points/markers)
• Heat maps (density visualization)
• Flow maps (movement/connections)
• Cartograms (distorted based on values)
The example above shows a choropleth map with regions colored based on data
Geographic visualizations help answer questions like values. This visualization style makes it easy to identify geographic patterns and
"Where are our customers located?", "How do sales regional variations at a glance.
vary by region?", or "Which areas are most affected
by a particular phenomenon?"

As we conclude our overview of data visualization types, remember that the best visualization is one that effectively communicates your
specific data story to your target audience.

You might also like