An Introduction To Research Methods1
An Introduction To Research Methods1
Research is a logical and systematic search for new and useful information on a particular
topic. The use of the words how and what essentially summarizes what research is. It is an
investigation of finding solutions to scientific and social problems through objective and
systematic analysis. It is a search for knowledge, that is, a discovery of hidden truths. Here
knowledge means information about matters. The information might be collected from
different sources like experience, human beings, books, journals, nature, etc. A research can
lead to new contributions to the existing knowledge. Only through research is it possible to
make progress in a field. Research is indeed civilization and determines the economic, social
and political development of a nation. The results of scientific research very often force a
change in the philosophical view of problems which extend far beyond the restricted domain
of science itself.
18
19
(1) to get a research degree (Doctor of Philosophy (Ph.D.)) along with its benefits like
better employment, promotion, increment in salary, etc.
(2) to get a research degree and then to get a teaching position in a college or university
or become a scientist in a research institution
(3) to get a research position in countries like U.S.A., Canada, Germany, England, Japan,
Australia, etc. and settle there
(4) to solve the unsolved and challenging problems
(5) to get joy of doing some creative work
(6) to acquire respectability
(7) to get recognition
(8) curiosity to find out the unknown facts of an event
(9) curiosity to find new things
(10) to serve the society by solving social problems.
Some students undertake research without any aim possibly because of not being able to
think of anything else to do. Such students can also become good researchers by motivating
themselves toward a respectable goal. As pointed out by Prof. Rajesh Kasturirangan (NIAS,
IISc) even if you work in a company or run a company, a mind inclined towards research
would do better than a mind not trained for it and it was like the story of the hare and the
tortoise. If you have a mind trained for research, you will be the tortoise – the climb would be
slow and steady, but eventually you would win the race.
C. Importance of Research
Research is important both in scientific and non-scientific fields. In our life new problems,
events, phenomena and processes occur every day. Practically, implementable solutions and
suggestions are required for tackling new problems that arise. Scientists have to undertake
research on them and find their causes, solutions, explanations and applications. Precisely,
research assists us to understand nature and natural phenomena.
19
20
needs and desires of the people and on the availability of revenues to meet the needs helps a
government to prepare a budget.
(5) It is important in industry and business for higher gain and productivity and to improve
the quality of products.
(6) Mathematical and logical research on business and industry optimizes the problems in
them.
(7) It leads to the identification and characterization of new materials, new living things, new
stars, etc.
(8) Only through research inventions can be made; for example, new and novel phenomena
and processes such as superconductivity and cloning have been discovered only through
research.
(9) Social research helps find answers to social problems. They explain social phenomena
and seek solution to social problems.
(10) Research leads to a new style of life and makes it delightful and glorious.
Emphasizing the importance of research Louis Pasteur said: I beseech you to take interest in
these sacred domains called laboratories. Ask that there be more and that they be adorned for
these are the temples of the future, wealth and well-being. It is here that humanity will learn
to read progress and individual harmony in the works of nature, while humanity’s own works
are all too often those of barbarism, fanaticism and destruction. (Louis Paster – article by S.
Mahanti, Dream 2047, p.29–34 (May 2003)). In order to know what it means to do research
one may read scientific autobiographies like Richard Feynmann’s Surely you are joking,
Mr.Feynmann! , Jim Watson’s The double helix and Science as a way of life – A biography
of C.N.R. Rao by Mohan Sundararajan.
TYPES OF RESEARCH
A. Basic Research
20
21
B. Applied Research
In an applied research one solves certain problems employing well known and accepted
theories and principles. Most of the experimental research, case studies and inter-disciplinary
research are essentially applied research. Applied research is helpful for basic research. A
research, the outcome of which has immediate application is also termed as applied research.
Such a research is of practical use to current activity. For example, research on social
problems has immediate use. Applied research is concerned with actual life research such as
research on increasing efficiency of a machine, increasing gain factor of production of a
material, pollution control, preparing vaccination for a disease, etc. Obviously, they have
immediate potential applications. Some of the differences between basic and applied research
are summarized in the table below.
Thus, the central aim of applied research is to find a solution for a practical problem which
warrants solution for immediate use, whereas basic research is directed towards finding
information that has broad base of applications and thus add new information to the already
existing scientific knowledge.
Research Methods
Research methods are split broadly into quantitative and qualitative methods.
21
22
Qualitative research seeks to answer questions about why and how people behave in the
way that they do. It provides in-depth information about human behaviour.
* Taken from: Aliaga and Gunderson ‘Interactive Statistics ‘3rd Edition (2005)
Quantitative Research
Quantitative research is perhaps the simpler to define and identify.
The data produced are always numerical, and they are analysed using mathematical and
statistical methods. If there are no numbers involved, then it’s not quantitative research.
Some phenomena obviously lend themselves to quantitative analysis because they are already
available as numbers. Examples include changes in achievement at various stages of
education, or the increase in number of senior managers holding management degrees.
However, even phenomena that are not obviously numerical in nature can be examined using
quantitative methods.
If you wish to carry out statistical analysis of the opinions of a group of people about a
particular issue or element of their lives, you can ask them to express their relative agreement
with statements and answer on a five- or seven-point scale, where 1 is strongly disagree, 2 is
disagree, 3 is neutral, 4 is agree and 5 is strongly agree (the seven-point scale also has slightly
agree/disagree).
Such scales are called Likert scales, and enable statements of opinion to be directly translated
into numerical data.
The development of Likert scales and similar techniques mean that most phenomena can be
studied using quantitative techniques.
This is particularly useful if you are in an environment where numbers are highly valued and
numerical data is considered the ‘gold standard’.
However, it is important to note that quantitative methods are not necessarily the most
suitable methods for investigation. They are unlikely to be very helpful when you want to
understand the detailed reasons for particular behaviour in depth. It is also possible that
assigning numbers to fairly abstract constructs such as personal opinions risks making them
spuriously precise.
• Surveys, whether conducted online, by phone or in person. These rely on the same
questions being asked in the same way to a large number of people;
• Observations, which may either involve counting the number of times that a
particular phenomenon occurs, such as how often a particular word is used in
interviews, or coding observational data to translate it into numbers; and
• Secondary data, such as company accounts.
Qualitative Research
Qualitative research is any which does not involve numbers or numerical data.
It often involves words or language, but may also use pictures or photographs and
observations.
Almost any phenomenon can be examined in a qualitative way, and it is often the preferred
method of investigation in the UK and the rest of Europe; US studies tend to use quantitative
methods, although this distinction is by no means absolute.
Qualitative analysis results in rich data that gives an in-depth picture and it is particularly
useful for exploring how and why things have happened.
• If respondents do not see a value for them in the research, they may provide
inaccurate or false information. They may also say what they think the researcher
wishes to hear. Qualitative researchers therefore need to take the time to build
relationships with their research subjects and always be aware of this potential.
• Although ethics are an issue for any type of research, there may be particular
difficulties with qualitative research because the researcher may be party to
confidential information. It is important always to bear in mind that you must do no
harm to your research subjects.
• It is generally harder for qualitative researchers to remain apart from their
work. By the nature of their study, they are involved with people. It is therefore
helpful to develop habits of reflecting on your part in the work and how this may
affect the research.
Although qualitative data is much more general than quantitative, there are still a number of
common techniques for gathering it. These include:
• Secondary data, including diaries, written accounts of past events, and company
reports; and
• Observations, which may be on site, or under ‘laboratory conditions’, for example,
where participants are asked to role-play a situation to show what they might do.
There are a variety of ways to select your sample, and to make sure that it gives you results
that will be reliable and credible.
Ideally, research would collect information from every single member of the population that
you are studying. However, most of the time that would take too long and so you have to
select a suitable sample: a subset of the population.
The idea behind selecting a sample is to be able to generalise your findings to the whole
population, which means that your sample must be:
If your sample is not representative, you can introduce bias into the study. If it is not large
enough, the study will be imprecise.
However, if you get the relationship between sample and population right, then you can draw
strong conclusions about the nature of the population.
How large should your sample be? It depends how precise you want the answer. Larger
samples generally give more precise answers.
Your desired sample size depends on what you are measuring and the size of the error that
you’re prepared to accept.
24
25
If you’re not very confident about this kind of thing, then the best way to deal with it is to
find a friendly statistician and ask for some help. Most of them will be delighted to help you
make sense of their specialty.
It is better to be imprecisely right than precisely wrong. Imprecisely right means that you
know broadly what the correct answer is. Precisely wrong means that you think you know the
answer, but you don’t. In other words, if you can only worry about one, worry about bias.
Selecting a Sample
Probability sampling is where the probability of each person or thing being part of the sample
is known. Non-probability sampling is where it is not.
Probability Sampling
Probability sampling methods allow the researcher to be precise about the relationship
between the sample and the population.
This means that you can be absolutely confident about whether your sample is representative
or not, and you can also put a number on how certain you are about your findings (this
number is called the significance. In simple random sampling, every member of the
population has an equal chance of being chosen. The drawback is that the sample may not be
genuinely representative. Small but important sub-sections of the population may not be
included.
Proportional stratified random sampling takes the same proportion from each stratum, but
again suffers from the disadvantage that rare groups will be badly represented. Non-
proportional stratified sampling therefore takes a larger sample from the smaller strata, to
ensure that there is a large enough sample from each stratum.
Systematic random sampling relies on having a list of the population, which should ideally
be randomly ordered. The researcher then takes every nth name from the list.
There are many different methods of selecting ‘random samples’. If you are the lead
researcher for a project and instructing others to ‘take a random sample’, or indeed asked to
take a ‘random sample’, make sure you are all using the same method!
in urban areas in the Cameroon, you could randomly select just two or three cities, and then
sample fully from within these. It is, of course, possible to combine all these in several stages,
which is often done for large-scale studies.
Non-Probability Sampling
Using non-probability sampling methods, it is not possible to say what is the probability of
any particular member of the population being sampled. Although this does not make the
sample ‘bad’, researchers using such samples cannot be as confident in drawing conclusions
about the whole population.
Convenience sampling selects a sample on the basis of how easy it is to access. Such
samples are extremely easy to organise, but there is no way to guarantee whether they are
representative.
Quota sampling divides the population into categories, and then selects from within
categories until a sample of the chosen size is obtained within that category. Some market
research is this type, which is why researchers often ask for your age: they are checking
whether you will help them meet their quotas for particular age groups.
Purposive sampling is where the researcher only approaches people who meet certain
criteria, and then checks whether they meet other criteria. Again, market researchers out and
about with clipboards often use this approach: for example, if they are looking to examine the
shopping habits of men aged between 20 and 40, they would only approach men, and then
ask their age.
Snowball sampling is where the researcher starts with one person who meets their criteria,
and then uses that person to identify others. This works well when your sample has very
specific criteria: for example, if you want to talk to workers with a particular set of
responsibilities, you might approach one person with that set, and ask them to introduce you
to others.
Of course, if you already have an established relationship, this won’t be a problem, but this is
often not the case in research, and five minutes at the start of the interview can pay dividends
later on in the quality of the data that you obtain.
In qualitative research, with semi-structured interviews, the way that you ask the questions is
much less likely to lead to bias than in straightforward surveys. However, there is still a
danger of bias if you are tempted consciously or subconsciously to impose your frame of
reference onto your interviewee. To avoid this:
26
27
• Try to use open questions whenever possible as they are least likely to bias answers
(see our pages Questioning Skills and Techniques and Types of Questions for more
information); and
• Use techniques of reflection and other clarification techniques to ensure that you
have fully understood your interviewee’s meaning, and also to prompt them to say
more.
Interviewers may, however, need to probe deeper into a subject and, for this, specific
questioning techniques can be useful.
Types of Probe
• The basic probe is repeating the initial question, which reminds the interviewee what
you asked. This is useful if they have wandered off the subject.
• Explanatory probes are questions like ‘What did you mean by that?’ and ‘What makes
you say that?’ and are useful for exploring meaning further.
• Focused probes include questions like ‘What sort of…?’
• The silent probe is where the interviewer simply remains silent and waits for the
interviewee to say more.
• Drawing out is useful when the interviewee seems to have stopped mid-sentence or
mid-idea. Repeat the last few words that they said with an upward inflexion, like a
question, or add ‘Tell me more about that’.
• Giving ideas or suggestions would use questions like ‘Have you thought about x?’ or
‘Have you tried…?’
Laddering
This is a powerful technique, but you need to be aware that people may be uncomfortable
with this until you have developed at least a superficial rapport and relationship. You can also
‘ladder’ in the opposite direction, where you get more specific until you reach examples, by
asking questions like ‘Can you give me a specific example of that?’ or ‘When was the last
time that you remember something like that happening?’.
Perhaps most crucially, interviewers need to have very good listening skills. They need to
listen to what their interviewee is saying, and also be aware of what they are not saying, but
without imposing their own views.
Finally…
27
28
Interviews are a very good way of gathering rich qualitative information from a limited
number of people.
While you need enough data to make the research worthwhile, don’t try to interview too
many people. The quality lies in the depth of exploration, not necessarily in the breadth of
views.
Focus groups are widely used in market research and in politics, but perhaps less often in
research situations. This may be because researchers lack the necessary skills to make them
work, but also because they are not sure when such techniques would be most useful.
• When you are short of time and you need to gather views from a group of people, not
just one or two individuals;
• To review a process or event and gather different opinions about it and how to
improve it in future;
• When opinions are not likely to be sensitive, and the subject is one that can be freely
discussed in a group without embarrassment or concerns;
• When you know that there is a range of views;
• When you want to gather reactions from several people to an event, especially as it
happens. This is often used in politics, for example to find out what people think
about a party conference speech, party political broadcast, or debate, where events are
often televised live.
Secondly, consider the venue and timing. It sounds obvious, but if you want to chat to
working people, you may want to hold the focus group at lunchtime. Nobody’s going to want
to stay late after work to talk to you. However, if you want to talk to mums of young children,
evening might be best, when their partners are home and the baby is in bed. Alternatively, if
you need to hold the event during the day, you might need to consider providing a crèche.
Consideration of these aspects avoids eliminating whole sections of your sample population.
You also need to think about the location: is it accessible by public transport? If you’re
holding an evening event, will people feel safe walking there? And getting home? Will you
28
29
pay for travel costs? Will you provide a meal, light refreshments, or nothing at all? Ideally,
you want them to feel comfortable and relaxed, but will you achieve this best with the group
sitting round a table, like a formal business meeting, or in comfortable chairs, as if they were
chatting to friends?
All these will affect the ‘feel’ of the focus group and therefore the participants’ willingness to
contribute. The important thing to remember is that you can’t necessarily predict how. You
just need to be aware of it and take what steps you can to avoid any problems. However, if
any issues do occur, you’ll have to react to them on the day.
Finally, consider how you are going to record the event. Will you take notes? Will the
group record views on a flip chart? Or will you video/tape the whole event and review it
later? You will need informed consent from the group for whichever method you choose.
Like semi-structured interviews, focus groups will need a broad structure, including some
starting questions. If you wish to explore several different areas, make sure that you manage
the discussion to cover them all. This means that you may need to move the conversation on
from an area of interest to the participants to one that is more interesting to you, but without
alienating anyone. You’ll also want to have room for the discussion to expand into areas that
develop in the course of the event.
Getting Help...
If you’re anxious about running a focus group, you may find it helpful to talk to your
supervisor or your sponsor to discuss whether you can draft in some expert help as there are
plenty of consultancies who can provide this service at reasonable prices.
Possible Problems
There are a number of criticisms that can be made of focus groups. These include:
• Concerns that people may not feel comfortable airing their views in public. This
may be made worse both if they do not know the other participants and,
paradoxically, if they do. For example, if one participant is senior to another in the
same organisation, the more junior person may feel unable to express different views,
but greater trust between participants usually leads to more openness.
• In public, people often express the views that they feel that they ought to have,
rather than their real views. This is known as social pressure and may mean that the
focus group’s views are more or less extreme in reality than the expressed view.
29
30
However, these concerns can be overcome by good facilitation at the event, including careful
design and outline of the topics to be covered, together with triangulation of your research
using other techniques and research methods.
A Concluding Thought
Focus groups are not ideal for every situation or every piece of research, and there are serious
questions to address in designing one. However, there is no question that, if used well, focus
groups are a strong tool for researchers to explore diverse views, especially if time is too tight
to allow sufficient in-depth one-to-one interviews to be held. They are well worth exploring if
you have the necessary facilitation skills, or access to support to run such an event.
Sometimes you may wish to use one single method, whether quantitative or qualitative, and
sometimes you may want to use several, whether all one type or a mixture. It is your research
and only you can decide which methods will suit both your research questions and your
For example, you may have collected data from or in written texts, or through in-depth
interviews or transcripts of meetings. According to Easterby-Smith, Thorpe and Jackson, in
their book Management Research, there are six main systems of analysis for language-
based data, which may also be used for other types of data.
1. Content Analysis
Here, you start with some ideas about hypotheses or themes that might emerge, and
look for them in the data that you have collected. You might, for example, use a colour-
coding or numbering system to identify text about the different themes, grouping together
ideas and gathering evidence about views on each theme.
30
31
2. Grounded Analysis
This is similar to content analysis, in that it uses similar techniques for coding. However, in
grounded analysis, you do not start from a defined point. Instead, you allow the data to
‘speak for itself’, with themes emerging from the discussions and conversations. In
practice, this may be much harder to achieve because it requires you to put aside what you
have read and simply concentrate on the data.
These first two approaches are not really as distinct as the description might lead you to
believe.
Instead, the pure approaches lie at opposite ends of a spectrum. For example, a pure content
analysis approach would have fixed themes. However, if more information emerges from the
data that does not fit with the pre-identified themes, you may want to update and adapt your
themes in the course of the research. This approach is moving towards a hybrid approach, and
perhaps a more pragmatic approach than either pure system.
This form of analysis examines the links between individuals as a way of understanding
what motivates behaviour.
It has been used, for example, as a way of understanding why some people are more
successful at work than others, and why some children were more likely to run away from
home. This type of analysis may be most useful in combination with other methods, for
example after some kind of content or grounded analysis to identify common themes about
relationships. It’s often helpful to use a visual approach to this kind of analysis to generate a
network diagram showing the relationships between members of a network.
4. Discourse Analysis
This approach not only analyses conversation, but also takes into account the social
context in which the conversation occurs, including previous conversations, power
relationships and the concept of individual identity. It may also include analysis of written
sources, such as emails or letters, and body language to give a rich source of data surrounding
the actual words used.
5. Narrative Analysis
This looks at the way in which stories are told within an organisation or society to try to
understand more about the way in which people think and are organised within groups.
• bureaucratic, which is highly structured and logical, and often about imposing
control;
31
32
• quest, where the ambition is to have the most compelling story and lead others to
success;
• chaos, where the story is lived, rather than told; and
• postmodern, which is rather like chaos narratives, in that it is lived, but where the
‘narrator’ is aware of the story and what they are trying to achieve.
6. Conversation Analysis
This is largely used in ethnographic research. It assumes that conversations are all
governed by rules and patterns which remain the same whoever is talking. It also
assumes that what is said can only be understood by looking at what went before and after.
Conversation analysis requires a detailed examination of the data, including exactly which
words are used, in what order, whether speakers overlap their speech, and where the
emphasis is placed. There are therefore detailed conventions used in transcribing for
conversation analysis.
Like content and grounded analysis, discourse, narrative and conversation analysis can be
considered as on a spectrum of systems for analysing forms of language. Which you use will
depend on what you want to achieve from the analysis.
This page provides a brief summary of some of the most common techniques for
summarising your data, and explains when you would use each one.
The starting point is usually to group the raw data into categories, and/or to visualise it. For
example, if you think you may be interested in differences by age, the first thing to do is
probably to group your data in age categories, perhaps ten- or five-year chunks.
One of the most common techniques used for summarising is using graphs, particularly bar
charts, which show every data point in order, or histograms, which are bar charts grouped
into broader categories.
32
33
An example is shown below, which uses three sets of data, grouped by four categories. This
might, for example, be men, women, and ‘no gender specified’, grouped by age categories
20–29, 30–39, 40–49 and 50–59.
An alternative to a histogram is a line chart, which plots each data point and joins them up
with a line. The same data as in the bar chart are displayed in a line graph below.
It is not hard to draw a histogram or a line graph by hand, as you may remember from school,
but spreadsheets will draw one quickly and easily once you have input the data into a table,
saving you any trouble. They will even walk you through the process.
The important thing about drawing a graph is that it gives you an immediate ‘picture’ of the
data. This is important because it shows you straight away whether your data are grouped
together, spread about, tending towards high or low values, or clustered around a central
point. It will also show you whether you have any ‘outliers’, that is, very high or very low
data values, which you may want to exclude from the analysis, or at least revisit to check that
they are correct.
33
34
It is always worth drawing a graph before you start any further analysis, just to have a look at
your data.
You can also display grouped data in a pie chart, such as this one.
Pie charts are best used when you are interested in the relative size of each group, and what
proportion of the total fits into each category, as they illustrate very clearly which groups are
bigger.
The average gives you information about the size of the effect of whatever you are testing, in
other words, whether it is large or small. There are three measures of average: mean, median
and mode.
See our page on Averages for more about calculating each one, and for a quick calculator.
When most people say average, they are talking about the mean. It has the advantage that it
uses all the data values obtained and can be used for further statistical analysis. However, it
can be skewed by ‘outliers’, values which are atypically large or small.
As a result, researchers sometimes use the median instead. This is the mid-point of all the
data. The median is not skewed by extreme values, but it is harder to use for further statistical
analysis.
34
35
The mode is the most common value in a data set. It cannot be used for further statistical
analysis.
The values of mean, median and mode are not the same, which is why it is really important to
be clear which ‘average’ you are talking about.
Researchers often want to look at the spread of the data, that is, how widely the data are
spread across the whole possible measurement scale.
There are three measures which are often used for this:
The range is the difference between the largest and smallest values. Researchers often quote
the interquartile range, which is the range of the middle half of the data, from 25%, the
lower quartile, up to 75%, the upper quartile, of the values (the median is the 50% value). To
find the quartiles, use the same procedure as for the median, but take the quarter- and three-
quarter-point instead of the mid-point.
The standard deviation measures the average spread around the mean, and therefore gives a
sense of the ‘typical’ distance from the mean.
The variance is the square of the standard deviation. They are calculated by:
To calculate the standard deviation, take the square root of the variance.
Skew
The skew measures how symmetrical the data set is, or whether it has more high values, or
more low values. A sample with more low values is described as negatively skewed and a
sample with more high values as positively skewed.
Generally speaking, the more skewed the sample, the less the mean, median and mode will
coincide.
35
36
Once you have calculated some basic values of location, such as mean or median, spread,
such as range and variance, and established the level of skew, you can move to more
advanced statistical analysis, and start to look for patterns in the data.
This is in order to draw lessons from the sample that can be generalised to the wider
population.
Relationships vs differences
For example:
Comparing Groups
Your first step is to identify your two or more groups. This will obviously depend on your
research question or hypothesis.
So if your hypothesis was that men are more likely to like ice cream than women, your two
groups are men and women, and your data is likely to be something like self-expressed liking
for ice cream on a scale of 1 to 5, or perhaps the number of times that ice creams are
consumed each week in the summer months.
You then need to produce summary data for each group, usually mean and standard
deviation. These may or may not look quite similar.
In order to decide whether there is a genuine difference between the two groups, you have to
use a reference distribution against which to measure the values from the two groups.
36
37
The most common source of reference distributions is a standard distribution such as the
normal distribution or t- distribution. These two are the same except that the standard
deviation of the t-distribution is estimated from the sample, and that of the normal
distribution is known.
You then compare the summary data from the two groups and decide the probability of
achieving that difference by chance. The lower the probability, the more likely it is that your
result is correct. This is referred to as statistical significance.
Types of Error
• The groups are different, and you conclude that they are different (correct result)
• The groups are different, but you conclude that they are not (Type II error)
• The groups are the same, but you conclude that they are different (Type I error)
• The groups are the same, and you conclude that they are the same (correct result).
Type I errors are generally considered more important than Type II, because they have the
potential to change the status quo.
For example, if you wrongly conclude that a new medical treatment is effective, doctors are
likely to move to providing that treatment. Patients may receive the treatment instead of an
alternative that could have fewer side effects, and pharmaceutical companies may stop
looking for an alternative treatment.
Surveys and Survey Design explains that there are two types of answer scale, continuous and
categorical. Age, for example, is a continuous scale, although it can also be grouped into
categories.
• For a continuous scale, you can use the means of the two groups that you are
comparing.
• For a category scale, you need to use the medians.
37
38
If you are not very confident about the quality of the data collected, for example because the
inputting was done quickly and cheaply, or because the data have not been checked, then you
may prefer to use the median even if the data are continuous to avoid any problems with
outliers. This makes the tests more robust, and you can rely on the results more.
What Test?
Average Test Reference
Purpose Data Scale
Test Statistic Distribution
Mean
Continuous t t
t-test
Compare two groups
Median All combination of
Category U statistic
Mann-Whitney U test ranks
Mean
Continuous Analysis of Variance F-ratio F
Compare three or (ANOVA)
more groups
Median All combination of
Category W statistic
Kruskal-Wallis Test ranks
The other thing that you have to decide is whether you are confident of the direction of the
distance. In practice, this boils down to whether your research hypothesis is expressed as ‘x is
likely to be more than y’, or ‘x is likely to be different from y’. If you are confident of the
direction of the distance, then your test will be one-tailed. If not, it will be two-tailed.
For each type of test, there is a standard formula for the test statistic. For example, for the t-
test, it is:
(M1-M2)/SE(diff)
SE (diff) is the standard error of the difference, which is calculated from the standard
deviation and the sample size of each group.
The final part of the test is to compare the test statistic to that required to meet the desired
level of significance (usually 5% or 1%). This value is available from published statistical
38
39
tables. If the test statistic is that value or more, then the difference between groups is said to
be statistically significant at the 5% or 1% level.
NOTE: the significance level is sometimes called the p value, and expressed as p < 0.05 or p
< 0.01.
Comparing Variables
Sometimes, you may want to know if there is a link between two variables. If so, you can
predict someone’s response to one variable by their response to the other.
• A positive association means that high scores for one variable tend to occur with high
scores for the other.
• A negative association means that high scores for one variable tend to occur with low
scores for the other.
• There is no association when the score for one variable does not predict the score for
the other.
Seeing an Association
One of the best ways of checking for an association is to draw a line graph of the data with
the two variables on the x and y axes. Broadly speaking, if there is an association, you will
see it from the graph.
Drawing a graph will also help you identify if there is a peculiar relationship, such as a
positive association for part of the data and a negative for the rest, as shown below. This will
show in a test as no correlation, but there is clearly some sort of a relationship in this case.
39
40
Again, there are specific tests depending on whether you are using continuous, categorical or
ranked data.
• For categorical data, you use the chi-squared test (also written χ2)
• For continuous data it is the Pearson product-moment correlation
• For ranks, use the Kendall rank order correlation.
Again, you need to work out the test statistic, and compare that with the value needed to
obtain the desired level of significance.
A correlation is an association between two variables. It does not necessarily imply that one
causes the other. Both could be caused by something completely different, or it could simply
be that people who show one characteristic often show the other.
For example, it could be that people who shop for groceries online buy more ready-made
meals than those who shop in store. However, it is unlikely that the act of buying online
causes the purchase of more ready-meals. It is more likely that those who shop online are
short of time, and so buy more convenience food, or possibly simply that younger people are
both more likely to shop online and more likely to buy convenience food.
A Word of Advice
There are statistical packages available, such as SPSS, which will carry out all these tests for
you. However, if you have never studied statistics, and you’re not very confident about what
you’re doing, you are probably best off discussing it with a statistician or consulting a
detailed statistical textbook.
Badly-done statistical analysis can invalidate very good research. It is much better to find
someone to help you.
Multivariate Analysis
This section explains some of the some of the more advanced techniques, involving several
variables and not just one or two used for statistical analysis.
In real life, as opposed to laboratory research, you are likely to find that your data are
affected by many things other than the variable that you wish to test. There are correlations
between items that you’ve never considered, and the world is complex.
The purpose of advanced statistical analysis is to simplify some of the relationships, while
making a more effective model of what you are seeing.
40
41
• Design
• Using Sub-Samples
• Using Statistical Controls
• Multivariate Analysis
1. Design
You can design your research so that causal factors are made independent of each
other. For example, if you think that there may be a link between age and salary, then a
random sample of employees will risk combining the effects of both. If, however, you divide
the population into groups by age, and then randomly sample equal numbers from each
group, you have made age and salary independent.
2. Using Sub-Samples
Here, you select your sample to be equal on any potentially confounding factors. For
example, job type may affect pay, so if you want to study the effects of another factor on pay,
you could select only people doing the same job.
If you suspect that three variables may be linked, you can control for one to test for
correlations between the other two. Effectively, you adjust the statistical value of the
control to be constant, and test whether there is still a relationship between the other two
variables. You may find that the observed relationship remains high (it is real), or reduces
considerably (there is probably no real relationship). There is a third case: where there is no
relationship until you control the third variable, which means that the control variable is
masking the relationship between the other two.
4. Multivariate Analysis
Multivariate Analysis includes many statistical methods that are designed to allow you to
include multiple variables and examine the contribution of each.
The factors that you include in your multivariate analysis will still depend on what you want
to study. Some studies will want to look at the contribution of certain factors, and other
studies to control for those factors as (more or less) a nuisance.
In multivariate analysis, the first thing to decide is the role of the variables.
41
42
This is a function of your model, not of the variables themselves, and the same variable may
be either in different studies.
The relationships between variables are usually represented by a picture with arrows:
You can also observe variables directly, or infer them from what is happening these are
known as latent variables.
You might decide that 'success at school' consists of academic success, together with some
measure of social success (perhaps average duration of friendships, or size of ‘friendship
group’) plus one of effort put in. These are your observed variables.
The measurement model examines the relationship between the observed and latent variables.
The idea behind such models is that there are correlations between the observed variables.
These correlations are assumed to be caused by common factors. The greater the influence of
the common factors (the factor loading), the higher the correlations between the variables.
Cronbach’s alpha is used to measure the correlations between variables. A value of 0.70 or
more gives a good level of reliability to the model.
Methods of Analysis
There are a variety of methods of analysis for measurement models like this. They include
Confirmatory Factor Analysis and Exploratory Factor Analysis, and are usually carried
out by computer.
The details of how to carry out each one are beyond the scope of this page, but the basic idea
is that they measure how much of the variation seen in the overall construct is caused by each
factor.
42
43
Causal Models
Causal models look at the way in which variables relate to each other. While it is not possible
to prove causality beyond doubt, causal models allow you to say whether the suggested
relationship fits the data, and how well.
The strength or weakness of any causal model is the selection of the variables. If you miss out
a major causal factor, then your conclusions will be either limited or incorrect. It is therefore
worth taking time on defining your model as carefully as possible.
There is a balance to be struck between simplicity and including more variables to obtain a
better fit. Obviously you do not want to miss out a major causal variable, and including more
variables will always give a better fit. But you need to consider whether the additional
complexity is worth it for the gain in quality of the model.
Suitable analysis methods for causal models tend to be what is called generalised linear
models, which include logistic regression analysis, multiple regression analysis, multivariate
analysis of covariance (MANCOVA) and multivariate analysis of variance (MANOVA).
All these methods give you a measure of how much of the variation in the dependent
variables is caused by the predictors, and thus whether your model is any good.
Again, there are computer packages, such as SPSS, which can do these analyses for you, but
do make sure that you understand what you’re doing and are interpreting the results correctly.
Structural Equation Modelling brings together measurement models and causal models. It
is a computer-modelling technique that fits a structural equation to the model. This technique
is pretty complicated, but in essence compares possible models and identifies the one that
best fits the data.
A Complex Area
The world is a complex place, and sometimes the only way to understand what’s going on is
to use advanced statistical techniques for modelling.
However, these too are complex and you should not embark on them without understanding
the basics. If you don’t, then it’s a good idea to consult someone who does, usually a
statistician. Even if you’ve used the technique before, it’s still a good idea to get a statistician
to have a look at what you’re planning to do and check your results afterwards in case of any
glaring errors.
43