6107BEUG- Engineering Research Project
6205CIV- Research Project
Lecture 8
Survey & Interview and Data Analysis
In this session….
• We will discuss what is the survey research and how to
complete an interview.
3
Survey Research
4
5
Introduction- What is a survey?
• A system for collecting information
• Purpose is to produce quantitative or numerical
descriptions, trends and patterns about some aspects of
a particular study population
• Main way of collecting survey information is by
conducting questionnaire
6
Survey Design
• Taking a view on the entire survey process is critical to
the success of the research project- “total survey design”
• The procedures used to conduct a survey have a major
effect on the accuracy of the resulting data
7
Sampling
• A census survey is when we gather information about
everyone in a target population
• A sample survey is when we select a small subset of a
population representative of the whole population
• Ideal sampling method is to allow all members of your
target population to have the same change of being
selected to complete your survey
8
Key Sampling Decisions
9
The Sample Frame
• A carefully selected group that represents a target
population who have the chance to be selected to
participate in the survey
• Comprehensiveness- how completely it covers the target
population?
• Accessibility- How likely is it that you can obtain details of
the desired sample frame to enable you to conduct a
sample survey?
10
Non- probability sampling
• The probability of inclusion is deliberate and participants
are not randomly selected from a sample frame
• If you choose this option- you must evaluate the positives
and negatives associated with a non-probability sample,
including rationale for your chosen sample
11
Probability sampling
The probability of inclusion is computerised, usually by
randomly selecting participants from a sampling frame
• Simple random sampling
• Systematic sampling
• Stratified sampling
12
Simple Random Sampling
• Assigning a number to each participant in sample frame
and randomising the sample frame list.
• Example:
You have a sample frame (population) of 1000 and you
want to randomly sample 100 participants. Obtain a list of
all participants and assign each participant in the sample
frame with a number. Randomise the sample frame so the
list is sorted into a randomised order. This can be achieved
using Microsoft Excel. Select the first 100 participants in
the randomised list.
13
Systematic Sampling
• Work out a fraction based on desired sample size and the
total sample frame
Example:
You have a sample frame (population) of 1000 and you
want to randomly sample 100 participants. Obtain and list
of all participants and assign each participant a number.
Work our a fraction based on the sample frame and desired
sample size (100/1000). Hence, you would select 1 out of
every 10 persons within the sample size
14
Stratified sampling
• Useful if want to get a proportionate number of key
variables (e.g. gender, age, etc.)
• Example
Your sample frame has 400 participants and your desired sample size
is 100. However, 300 are male and 100 are female. Males therefore
represent 75% of the population and females 25%. Firstly, divide the
sample frame into two separate sub-sample frames (300 and 100).
Then, work out the sample size needed for each, based on the
representative percentages. Hence we would want our sample size to
consist of 75 males (75/100) and 25 females (25/100) to ensure that
the percentage of the sample subgroups are representative of the
percentages of the sample frame (population). You can then use either
a simple random sample or systematic sample method within each
sub-sample frame to randomly determine this 75/25 split. 15
Saturation sampling
• Saturation sampling is an attempt to conduct a population
census (i.e. give everyone in the sample frame the
chance to complete the survey).
• Very common now for online surveys- able to overcome
traditional barriers of survey implementation
• Ideal if have access to every member of target population
(e.g. email address list)
• Non- response error can however be much higher
16
Sample size
• How big should my sample by?
• Common misconception that it should be a fraction of the
sample frame
• A sample size drawn from the size of the target
population has virtually no impact on how well the sample
size is likely to describe the population
• A sample of 150 will describe a population of 15000 or 15
million
17
Sample size
• Useful to calculate the margin of error you are required to
accept from your sample
• Allows you to determine the level of confidence in your
sample using a 95% confidence interval
• A margin of error of around +/- 5% is usually acceptable
for many national polls and surveys
• Generally margin of error increases quite significantly up
to 150-200 responses. After that point the margin of error
is more negligible
18
Sample size- Example
• Example of margin of error using a 95% confidence
interval:
Through a sample size of 100 collected from a total sample
frame of 200, a 95% confidence interval of 7% was
calculated
This means that if 50% of respondents answer “yes” to a
yes/no question, we can be 95% certain that the views of
the total population answering “yes” (including those who
did not participate in the survey) will lie between 43% and
57%
19
How many surveys should I distribute?
• Rule of 25%- Therefore for every 1 questionnaire
returned you should send out 4 questionnaire
20
Question design
• Survey research takes a reductionist approach using
questions as measures
• Ensuring appropriate working and structuring of
questions will increase the effectiveness of the resulting
data
• Making sure questions are well understood and answers
and meaningful
21
Type of questions
• Closed questions – a list of acceptable responses are
given to the respondent, reducing/ limiting choice of
answer
• Open questions- acceptable responses are not provided
to the respondent, giving freedom of choice to their
answer
22
Closed questions
Respondent answers more reliable when response
alternatives are given
Researcher can perform more reliably in interpreting
meaning of answers as they are pre-planned
Makes findings more analytically interesting as more
people will have answered particular responses
Open questions take time to quantify, input and analyse
23
Levels of measurement
Refers to how categories of questions relate to each other
There are three main levels of measurement, which are
24
Levels of measurement
• Nominal:
• Used to distinguish between categories of a variable but
cannot rank categories in any order
• E.g. Country of birth, sex, ethnicity
• Interval/ ratio data:
• Used when categories can be naturally ranked and
quantified
• E.g. age (24 or under; 25-34; 35-44; 45-54; 55-64; 65 and over
25
• Ordinal:
• Used when it is appropriate to order/rank categories
along a single dimension.
• Likert (1932) had a major impact on introducing scaling
techniques to measure questions- introduced the “Likert
scale”
• Very common in survey research to use a 3 or 5 point
Likert Scale, e.g.:
Levels of satisfaction of services –” Very satisfied”, “Fairly satisfied”,
“neither”, “Fairly dissatisfied”, “very dissatisfied”
26
Creating Questions
• Come up with as many questions as possible, the list will
eventually go down
• Make sure no questions are repeated
• It may be worthwhile having negatively worded questions
to see if participants respond to these consistently or not
• Ask others for their opinion on your questions- do they
understand what you are asking them?
27
Reliability and accuracy- Relevance
• What am I trying to find out and does this question help?
• Who is my intended audience and is this question
relevant?
• Are my questions meaningful and understandable to all of
my sample frame?
• Is the information returned useful or is it just “nice to
know”?
• The crucial test of this may be to think “Can I act on, or
do anything with, the information returned”? 28
Reliability and accuracy- Importance
• Should a question be mandatory or optional?
• If it is mandatory question, can everyone answer it?
• You might need to add an option such as:
“ Not Applicable”, “Don’t Know”
...but use these with caution!
29
Reliability and accuracy- Readability
• Are the possible response to your questions consistent?
• Will it confuse the respondent?
• E.g. don’t mix possible responses such as:
• “Very good”, “good”, “not very helpful”, “Not at all helpful”
30
Reliability and accuracy- Ethics
• Have I phrased my questions correctly, are any terms I
have used politically correct?
• Have I ensured that my possible responses are equally
balanced e.g. using 5-point scale?
31
Ethics
• If questions are of a sensitive nature then you will need to
gain ethical approval from the University’s Research
Ethics Committee
• For further information, please refer to the University’s
ethics pages where all the codes of practice are available
• There is also lots of useful information to ensure help you
formulate your questions accordingly
• Please attend Ethics Testing in CANVAS
32
Data Collection Methods
• There are various types of data collection modes in order
to undertake survey research:
• Mail
• Telephone
• Internet
• Email
• Face-to-face
33
34
35
Online surveys
• Speed: can be sent to many people from a selected
distribution list and posted on web page
• Economy: usually cost to buy software, but free at the
university. Economical if targeting a large and wide
population
• Added content options: potential to add graphics such as
images/video clips
• Expanded question types: provide a wide variety of
question types to help you when designing
36
• Anonymity is preserved: there is no email address linked
to a web response unless you ask for it
• Minimise data inputting: accepted directly into a database
avoiding the need for subsequent data-entry as with
traditional methods
• Minimise data validation: real-time analysis means that
invalid responses can be easily monitored and captured
• Sampling: potential to use saturation sapling although
caution to level of non-response
37
Online survey- disadvantages
• Limited population: user must be able to access the
internet to complete the questionnaire
• Abandonment of survey: respondents can quit before
finding
• Dependence on software: requires researchers to use
software to create and deploy questionnaires
38
Interviewing
• The collection on non-numerical data
• Can refer to an inductive approach, where theory is
essentially generated through research
• Questions are open allowing interviewees to provide their
own answers that are not restricted to specific chaces
39
Interviewing process
40
Thematising
• Process of bringing attention to the subject area
• Formulating research questions and clarifying the theory
of the theme investigated
41
Face to face interviews
• Often most effective mode of interview inquiry
• Create an interpersonal situation where trust is
established and disclosure becomes possible
• Using a Dictaphone can improve the accuracy and
eligibility of the data
42
43
44
45
46
47
48
49
50
51
Research Methodology
Quantitative Analysis
Research Type
Quantitative
Research that produces Continuous
Numerical Data-
Qualitative
Research that produces Non-Numerical
Data-
Research Type
• Quantitative- Numerical Data- Generated by
Experimental Research
Laboratory based.
Field Work Based.
Numerical Meta-Analysis Research
• Qualitative- Non-Numerical Data-Generated by
Surveys
Case Studies
Non-Numerical Meta-Analysis
Observational Research
Quantitative Analysis
Examines relationships among variables
Variable- is a quantity that can be measured and have changing
values.
Analysis is conducted using statistical procedures
Levels of Quantitative Analysis
Univariate (One-dimensional)
Can be presented by a Histogram (frequency plot)
Bivariate (Two-dimensional)
Presented by a scatter plot of the dependant
and independent variables.
Multivariate (Multi-dimensional)
Scatter plot demonstrating all the variables is
produced.
Can take the form of a 3D plot.
Levels of Quantitative Analysis
Univariate (One-dimensional)
Voltage
Bivariate (Two-dimensional)
Scatter plot
Voltage- V
Straight line indicate
Dependant
relationship
Time- T
+ve slope -
increasing
relationship
Voltage- V
-ve slope -
decreasing
relationship Time- T
Multivariate (Multi-dimensional))
Scatter plot
Voltage- V
Temperature
Time- T
The effects of a third variable (temperature) on
the dependant (voltage) and independent (time)
variables.
Bivariate (Two-dimensional)
The strength of relationships in a scatter diagram can be
measured using a Correlation Coefficient (R2)
Strong Zero or very
Positive weak
Correlation Correlation
Correlation Coefficient
The coefficient is between 0 (No
Relationship) and 1 (Perfect Relationship)
– this shows the strength of the relationship
The closer the coefficient is to 1, the stronger
the relationship; the closer the coefficient is to
0, the weaker the relationship
The coefficient will either be positive or
negative – this shows the direction of the
relationship
Types of Trendlines
Linear
Exponential
Logarithmic
Power
etc
Data Distribution
Normally Distributed Data-
Parametric
Non-Normally Distributed Data-
Non-Parametric
Parametric Data-
Normally Distributed
Correlation
Pearson correlation coefficient
Calculate the linear correlation coefficient for
each pair of variables
Produces a P-value with <0.05 significance that
there is differences between the correlated
variables.
Parametric Data-
Normally Distributed
ANOVA
One-way & two-way analysis of variance (F-test)
Compares the means of a number of group samples (three or more)
for similarity.
Compares groups classified by two different factors.
Produces significance for similarities between the two groups, as
P<0.05.
Parametric Data-
Normally Distributed
Chi-square test
Test whether there is association between variable categories
Used for the evaluation of Un-paired groups (unrelated
samples).
Uses contingency tables.
Non-Parametric Data-
Non-Normally Distributed
Correlation
Spearman Rank correlation coefficient
Calculate the linear correlation coefficient for each
pair of variables
Produces a P-value with <0.05 significance that
there is differences between the correlated
variables.
Non-Parametric Data-
Non-Normally Distributed
Mann- Whitney u Test
Compares the means of two independent sample groups.
Produces significance for similarities between the two groups, as
P<0.05.
Non-parametric t-test
Wilcoxon Signed-Rank test
Non-parametric equivalent to paired t-test.
Produces significance for the measurements on the same samples
having the same mean, as P<0.05.
Non-Parametric Data-
Non-Normally Distributed
Kurskal-WallisTest
Compares the means of a number of group samples (three or more) for
similarity
Produces significance for similarities between the two groups, as
P<0.05.
Non-parametric ANOVA
If more than two groups then Mann- Whitney u Test
Non-Parametric Data-
Non-Normally Distributed
Chi-square test- as with parametric data
Test whether there is association between variable categories
Used for the evaluation of Un-paired groups (unrelated samples).
Uses contingency tables.
Fisher’s Exact test
Alternative to Chi-square test for 2x2 contingency tables and small
sample size.
Statistical Software
Minitab
User friendly - based on easy drop down menus
Data are entered in worksheet and can be copied from Excel.
Available on LJMU AppPlayer
SPSS
Popular statistical software.
Also based on drop down Menus and copying data from Excel.
Research Methodology
Qualitative Analysis
Qualitative Research
Research that produces Non-Numerical Data-
Non-Numerical Data-Collected by
Surveys
Case Studies
Non-Numerical Meta-Analysis
Observational Research
Qualitative Approaches
Traditionally engineering and scientific research has relied
on Quantitative (experimental) Research.
In recent years Qualitative Research methods have been
increasingly recognised and applied in engineering and
science to-
Advance the understanding of basic
causes, principles, and behaviours.
Qualitative Approaches-continued
Three categories of approach to the analysis of Qualitative
Data.
Language based- focuses on how language is used and its
meaning.
Example- conversation analysis.
Descriptive or Interpretive- develops view of the participants and
subjects investigated.
Theory building- seeks to develop theory from the data collected
during study.
Analysis of Data
Researcher need to establish Categories,
Groups, and relationships between them from
data collected.
This can be achieved using-
Cluster analysis
Divides data into groups (clusters)
based on similarity
Qualitative Analysis
Types of Categorical Data
Nominal
Ordinal
Interval
Ratio
Qualitative Analysis - Types of Data
Nominal data
Variables that use labelling, without any value
Example-
- Do you have site safety certification?
□ Yes □ No □ Do not know
- What materials do you mainly use?
□ Timber
□ Steel
□ Concrete
Qualitative Analysis - Types of Data
Ordinal data
With ordinal variables, it is the order of the values
that is important.
The differences between each value is not known.
The sequence makes sense in one order, or in
exactly the opposite order.
Qualitative Analysis - Types of Data
Example-
How do you rate energy saving progress?
Can summarize numerically by giving scores to the categories
Excellent Good Moderate Poor Very Bad
29 243 117 86 25
- In each case #4 is better than #3 or #2, but we don’t know–and
cannot quantify how much better it is.
Excellent Good Moderate Poor Very Bad
5 4 3 2 1
Qualitative Analysis - Types of Data
Interval data
Interval data are numeric values in which we
know both the order and the exact differences
between the values.
Example-
Temperature- the difference between each value is the same. The
difference between 60 and 50 degrees is a measurable 10 degrees,
as is the difference between 80 and 70 degrees.
Time- is an interval scale in which the intervals are known, consistent,
and measurable.
Qualitative Analysis - Types of Data
Ratio data
Numeric values in which the order and the exact
differences between the values are known
(interval) and have an absolute zero.
Example-
- Measurements of Height
- Measurements of Weight.
Qualitative Statistical
Analysis
Descriptive statistics
mean
The mean (average) is the most popular statistic.
It is found by adding the values for all the (non-
missing) cases and dividing by the number of (non-
missing) cases. Careful with too many high or low
values.
Example-
Five people take a test. Their scores are
60, 62, 65, 68, and 95
The mean is 70
Qualitative Statistical
Analysis
Descriptive statistics
median
The median provides a measure of central
tendency - half the sample will be above it and
half the sample will be below it.
Example-
Five people take a test. Their scores are
60, 62, 65, 68, and 95
The median is 65
Qualitative Statistical
Analysis
Descriptive statistics
mode
Is the most common value or score- the one
that occurs most frequently.
It is possible to have more than one mode.
Example-
The following set of data has two modes: 12
and 16.
12 12 12 13 14 15 15 16 16 16 17 18
Qualitative Statistical Analysis
Descriptive statistics application
Nominal data
mode
Ordinal data
median and mode
Interval data
mean, median and mode
Ratio data
mean, median and mode
Qualitative Data Collection
Survey Research
Survey is a system for collecting
information.
Its purpose is to produce numerical
descriptions, trends and patterns about
some aspects of a particular study
population.
Generally information is collected
through a sample of the population.
Qualitative Data Collection
Survey Conducted using
Questionnaire - a predefined series of questions are used to
collect information
Paper questionnaire
Postal delivery
Handouts
Online (web-based) questionnaire
Interview – researcher completes survey based on what
respondents says
Interview in person.
Interview by phone.
Qualitative Data Collection
Closed-Ended Questions-
–Provide a list of predetermined responses
from which to choose an answer.
–The list of responses should cover all
possible response and their meaning should
not overlap.
Open-Ended Questions-
–Survey respondents are asked to answer
each question in their own words.
–Responses are usually categorized into a
smaller list for statistical analysis.
Sampling
A census survey is when information gathered
about everyone in a target population
A sample survey is when we select a sample
of a population representative of the whole
population
Ideal sampling method is to allow all members
of the target population to have the same
chance of being selected to complete the
survey
Sampling
Random samples
Each member of population have equal chance of
being selected.
Selected members are excluded from further re-
selection.
Non-random samples- obtained by
Systematic sampling
Stratified sampling
Cluster sampling
Convenience sampling
Snowball sampling
Sampling
Non-random samples
Systematic sampling
Every xth member of the population is sampled
x is the interval and is kept constant.
The interval can be determined by
sample size/population size
Stratified sampling
Used when population occurs in distinct groups- example type
of company or construction
Sample is divided between the different groups
Sampling
Non-random samples
Cluster sampling
The population is divided into clusters (groups)
The clusters are selected randomly
The sample is represented by a Cluster
Each cluster can represent the population
Convenience sampling
Data collected from a sample that can be accessed
readily and conveniently
Population has no obvious indication of sample
Snowball sampling
Researcher collects data from a small source and
asks for further sources to build up a sample
Sample size
The larger the sample size the more
representative of the population
Sample size can be limited by the number of
respondents.
Useful to calculate the margin of error required
to accept from the sample.
Can aim for a confidence interval of 95%
means 95 out of 100 samples will have the true population value
within range of precision.
Sampling error is the level of precision- the range
in which the true value of the population is
estimated to be.
Sample size
Example
Through a sample size of 100 and a calculated
95% confidence interval of 7%
Then if 50% of respondents answer “yes” to a
yes/no question, one can be 95% certain that
the views of the total population answering
“yes” (including those who did not participate
in the survey) will lie between 43% and 57%
Survey Guidelines
Introduction
Explain the reason for the survey
Instructions
Information to the participants on how to complete
the survey.
Statements
Clear wording and structure of questions will
increase the effectiveness of the resulting data
Should be short statements without lengthy
explanations
Survey Guidelines-
continued
Should not have multiple themes
Respondents need to provide single response
Statements should not direct researcher
views onto respondents
Next Lecture
• During next lecture we will discuss Dissertation Writing.
98