Simple and Cross Tabulation
Abu Bashar
Data exploration
Graphical plots of the data: to get a first overview of the main characteristics of the data-set, especially the distribution of the original variables across the whole sample and for subsamples Univariate descriptive statistics and one-way tabulation: to synthesize the main characteristics of each of the variables in the-set Multivariate descriptive statistics and cross-tabulation: to get a first understanding of relationship existing between different variables and enabling the joint examination of two or more variables
Graphs
Univariate plots of qualitative or discrete data Univariate plots of quantitative data Bivariate and multivariate plots of quantitative data Bivariate and multivariate plots of quantitative versus qualitative data
Univariate qualitative or discrete data
Bar chart
Line chart
Number of sampled household by household size
200
200
Sampled households by size
150
150
Count
Count
100
100
50
50
Pie chart
0 1 2 3 4 5 6 9
0 1 2 3 4 5 6 9
Sampled households by household size
n=1 n=24 n=12
Household size
Household size
Household size
1 2 3 4 5 6 9 Pies show counts
n=70
n=149
n=67
n=177
Univariate continuous data (1)
Histogram
Number of sampled households by household income
Error bar chart
Household income by quartile
Error Bars show Mean +/- 1.0 SD
200 0.0 0
100
Anonymised hhold inc + allowan ces
150 0.0 0
75
Count
Box-Whiskers Diagram
50
Weekly household income plus allowances
200 0.0 0
100 0.0 0
500 .00
25
Anonymised hhold inc + allowan ces
Maximum
150 0.0 0
0.00
0 100 0.0 0 200 0.0 0 300 0.0 0 400 0.0 0
Anonymised hhold inc + allowances
Low i ncome
100 0.0 0
Medi um-h igh i ncome Medi um-l ow in come Hi gh income
Quartile
Median
500 .00
Upper quartile Lower quartile
0.00
Minimum 5
Univariate continuous data (2)
Normal Q-Q Plot of EFS: Total Food & non-alcoholic beverage
Pareto charts
Bars ordered in decreasing order of the
The line indicates the cumulative proportion Useful for quality control (ANALYZE/QUALITY CONTROL in SPSS) Pareto Chart
Total Revenues by Income Quartile
Anonymised hhold inc + allowances
100% 300000.00 80%
150
Expected Normal Value
frequencies they represent
100
50
-50 0 100 200 300
Q-Q plots
Observed Value
Percent
200000.00
60%
40% 100000.00
Compare the empirical (observed) data
distribution and some theoretical distribution
When the observed distribution is close to the theoretical one, the plotted values tend to lie on a straight line.
170054.19
20%
78867.66 45640.06
0.00 High income Medium-high income Medium-low income
23488.32
Low income
0%
Anonymised hhold inc + allowances (Banded)
Bivariate and multivariate plots
Clustered Bar Chart
Scatterplot
Beer and sausage expenditure
Average household expenditure for selected categories by income range
120.0 EFS: Total Food & non-alcoholic beverage EFS: Total Clothing and Footwear EFS: Total Recreation EFS: Total Restaurants and Hotels
Multi-variable Line Chart
Mean Weekly Household Expenditure by Category with Confidence Intervals
EFS: Total Restaurants and Hotels EFS: Total Recreation EFS: Total Housing, Water, Electricity EFS: Total Health expenditure
75.000
100.0
Beer and lager (brought home)
80.0
Mean
60.0
50.000
40.0
20.0
0.0 Low income
EFS: Total Furnishings, HH Equipment, Carpets
Medium-low income Medium-high income High income
25.000
0.000
Anonymised hhold inc + allowances (Banded)
EFS: Total Food & non-alcoholic beverage EFS: Total Education EFS: Total Communication
Multi-variable Pie Chart
Household expenditure by category
EFS: Total Food & non-alcoholic beverage
9.22% 11.84% 3.36% 6.16% 1.93%
EFS: Total Clothing and Footwear EFS: Total Alcoholic Beverages, Tobacco
0.0
0.000
1.000
2.000
3.000
4.000
5.000
Sausages
10.16%
25.0
50.0
75.0
EFS: Total Alcoholic Beverages, Tobacco EFS: Total Clothing and Footwear EFS: Total Housing, Water, Electricity EFS: Total Furnishings, HH Equipment, Carpets EFS: Total Health expenditure EFS: Total Transport costs
Value
11.24% 16.89%
9.42% 2.88% 15.55% 1.35%
Cases weighted by Annual weight
Bivariate and multivariate plots
Pareto Chart Total Weekly Expenditure for Selected Categories
50,000 100%
Stacked Pareto Chart
Soft Drink and Fruit Juice Consumption
1,400 Soft drinks
40,000 80%
100% 1,200
Fruit juices Cumulative
30,000
Count
60%
1,000
80%
Count
Percent
Percent
20,000
800
40%
60%
600
10,000
22,725 18,110 6,335
20%
281
400
40%
Clustered Bar Chart
0%
0 EFS: Total Food & non-alcoholic beverage EFS: Total Restaurants and Hotels EFS: Total Alcoholic Beverages, Tobacco
200
381
88 196 44 116
1
20%
Alcohol expenditure away from home
Means by income quartile
8.000
0 0 2
81
3
11
4
8
5
1
6
0%
Number of children
Wine from grape or other fruit (away from home) Ciders and Perry (away from home)
6.000
Beer and lager (away from home)
Mean
4.000
2.000
0.000 Low income Medium-low income Medium-high income High income
Anonymised hhold inc + allowances (Banded)
Frequency table
Frequency Table for variable q1 in the Trust dataset
How many people do you regularly buy food for home consumption (including yourself)? Count 1 - Extremely unlikely 2 3 4 Neither 5 6 7 - Extremely likely Total Missing values 91 176 100 94 21 13 2 497 3 % 18.3 35.4 20.1 18.9 4.2 2.6 0.4 100.0
Response category
Descriptive statistics
Descriptive statistics
In a typical week how much fresh or frozen chicken do you buy for your household consumption (Kg.)? 446 54 1.0582 .06843 .9100 1.00 1.44514 2.088 .00 25.03 25 50 75 .5000 .9100 1.3600 In a typical week how much do you spend on fresh or frozen chicken (Euro)? 443 57 5.6677 .19640 5.0000 3.00 4.13383 17.089 .00 30.00 3.0000 5.0000 7.5000
N Mean Std. Error of Mean Median Mode Std. Deviation Variance Minimum Maximum Percentiles
Valid Missing
Age 500 0 45.582 .7100 45.000 45.0 15.8763 252.055 18.0 87.0 32.000 45.000 57.000
10
Cross-tabulation
Food & non-alcoholic be ve rage (Binne d) * Anonym ise d hhold inc + allow ance s (Bande d) Cross tabulation A nonymised hhold inc + allow ances (Banded) Medium-low Medium-high Low income income income High inc ome 47 19 18 4 9.4% 3.8% 3.6% .8% 57 48 24 22 11.4% 9.6% 4.8% 4.4% 17 31 45 40 3.4% 6.2% 9.0% 8.0% 4 27 38 59 .8% 5.4% 7.6% 11.8% 125 125 125 125 25.0% 25.0% 25.0% 25.0%
Food & non-alc oholic beverage (Binned)
20 or les s From 20 to 40 From 40 to 60 More than 60
Total
Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total
Total 88 17.6% 151 30.2% 133 26.6% 128 25.6% 500 100.0%
11
3-variables frequency table
Childr en, incom e and age of HRP Number of c hildren (Banded) More than No One Tw o tw o children children children children Table % Table % Table % Table % 1.4% .8% .2% 2.6% 1.0% .4% .2% 18.4% 1.0% .2% .2% 4.8% 2.2% 2.6% 1.6% 12.0% .4% .6% .4% .8% .2% 7.6% 2.6% 3.8% 2.4% 6.4% .2% 1.6% .4% 9.2% 2.6% 4.6% 1.6% 4.8% .2% Total
A nonymised hhold inc + allow ances (Banded)
Low income
A ge of HRP anonymis ed (Binned) A ge of HRP anonymis ed (Binned) A ge of HRP anonymis ed (Binned) A ge of HRP anonymis ed (Binned)
Medium-low income Medium-high income High inc ome
Less than 30 y ears From 30 to 55 years More than 55 y ears Less than 30 y ears From 30 to 55 years More than 55 y ears Less than 30 y ears From 30 to 55 years More than 55 y ears Less than 30 y ears From 30 to 55 years More than 55 y ears
Table % 2.4% 4.2% 18.4% 1.4% 11.2% 12.4% 2.0% 16.4% 6.6% 2.0% 18.0% 5.0%
12
Quantitative by categorical
Age of HRP - anonymised (Binned) Less than 30 y ears From 30 to 55 years More than 55 y ears Standard Standard Standard Mean Deviation Mean Deviation Mean Deviation .589 (1.59) 2.701 (8.26) 1.112 (3.78) .000 (.00) .067 (.39) .010 (.11) .263 (.96) .396 (1.80) .139 (.84) 1.240 (5.78) .644 (3.37) .107 (.84)
Books Ic e cream Internet subs cription f ees Cinemas
13