Analyzing and Visualizing Data
Chapter 4
Working With Data
Data Assets and Tabulation Types
• Two main categories
o Data that exist in tables; Datasets
o Data that exist as isolated values
• Data Types
o Levels of data or scales of measurement
o Type of exploratory data analysis you can undertake
o Editorial thinking you establish
o Specific chart types you might use
o Color choices and layout decisions around composition
Data Assets and Tabulation Types cont.
• Textual (Qualitative)
o Unstructured streams of words
o Descriptive details of a weather forecast for a given city
o The full title of an academic research project
o The description of a product on Amazon
Data Assets and Tabulation Types cont.
• Nominal (Qualitative)
o Ordinal data is still categorical and qualitative in nature
o Characteristics of order
o The response to a survey question: based on a scale of 1 (unhappy)
to 5 (very happy)
o The general weather forecast: expressed as Very Hot, Hot, Mild, Cold,
Freezing
Data Assets and Tabulation Types cont.
• Interval (Quantitative)
o Interval data is the less common form of quantitative data
o Quantitative and numeric measurement
o Measure for temperature
Data Assets and Tabulation Types cont.
• Ratio (Quantitative)
o Most common quantitative variable
o Age of a survey participant in years
o Forecasted amount of rainfall in millimetres
o Unlike interval data, for ratio data variables zero means something
Data Assets and Tabulation Types cont.
• Temporal Data
o Time-based data
o Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’
Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’
o Interval: ‘12’, ‘12/03/2016’, ‘2016’
o Ratio: ‘16:00’
Data Assets and Tabulation Types cont.
• Discrete
o No ‘in-between’ state
o Days of the week
o Heads or tails for a coin toss
o 1,2,3,4,5,6,etc.
• Continuous
o Has in-between state
o Height and weight
o Temperature
o Time
o 1.1,1.2,1.3,1.4,1.5,etc.
Data Acquisition
• What data do you need and why?
• From where, how, and by whom will the data be acquired?
• When can you obtain it?
Data Acquisition cont.
• Curated by You
o Primary data collection
o Manual collection and data foraging
o Extracted from pdf files
o Web scraping (also known as web harvesting)
Data Acquisition cont.
• Curated by Others
o Issued to you
o Download from the Web
o System report or export
o Third-party services
o API
Data Examination
• Data Properties
o Data types
o Size
o Condition
Missing values
Erroneous values
Inconsistencies
Duplicate records
Out of date
Uncommon system characters or line breaks
Leading or trailing spaces
Data Examination cont.
• How to Approach This?
o Inspect and scan
o Data operations
o Statistical methods
o Frequency counts
o Frequency distribution
o Measurements of central tendency
o Measurements of spread
o Maximum, minimum and range
o Percentiles
o Standard deviation
Influence on Process
• Moving forward
o Purpose map ‘tone’
o Editorial angles
o Physical properties influence scale
Data Transformation
• Potential Activities
o Transform to clean
o Transform to convert
o Transform to create
o Transform to consolidate
Data Exploration
• Exploratory Data Analysis
o Instinct of the analyst
o Reasoning
Deductive
Inductive
o Chart types
o Research
o Statistical methods
o Nothings
o Not always needed