Data collection tool development,
Sampling methods, & Quality
Assurance
MED 410
Dr. Ola Soudah
Data collection tool
• Data: means information collected as an image, text, vitals, numbers,
or figures.
• Data sources are classified into two types:
• Primary data: first-hand information collected by an investigator, It is
collected for the first time. Pros: more reliable, original, and directly related to
your research.
• Secondary data: refers to second-hand information; It is not originally
collected and rather obtained from already published or unpublished sources.
• The main sources of health data are:
• Surveys.
• Electronic Medical records (EMR) .
• Claims data or administrative data.
• Vital records.
• Surveillance.
• Disease registries.
• Peer-reviewed literature.
Surveys
• Surveys are an important means of collecting health and social science
information from a sample of people in a standardized way to better
understand a larger population.
• Pros:
• Collect empirical data in a relatively short period of time.
• Surveys can collect data on a representative sample of people, particularly when
samples are randomized or purposive nonprobability sampling is used.
• There are many methods used to conduct surveys, including
questionnaires and in-depth interviews via phone, mail, email, and in-
person.
Electronic Medical records
• Electronic Medical records (EMR) or electron health records (EHR) track
events and transactions between patients and health care providers,
real-time data.
• EMR Contain a patient’s medical history, diagnoses, medications,
treatment plans, immunization dates, allergies, radiology images, and
laboratory and test results.
• Medical records help us measure and analyze trends in health care use,
patient characteristics, and quality of care.
• Pros: The data are automatically collected and usually accurate and
detailed because they come from health care providers.
Claims data or administrative data
• Claims databases collect information on millions of doctors’
appointments, bills, insurance information, and other patient-provider
communications.
• Pros:
• Like other medical records, they come directly from notes made by the health
care provider, real-time data.
• The large sample size of claims data, researchers can analyze groups of
patients with rare illnesses and medical conditions.
• Cons: there may be low validity due to certain illegal billing practices,
like ordering unnecessary tests or billing for services that were not
provided.
Vital Records
• Vital records are collected by the National Vital Statistics System, and are
maintained by state and local governments.
• Vital records include births, deaths, marriages, divorces, and fetal deaths.
They also record information about the cause of death, or details of the birth.
• Pros: Vital records are useful because they offer very detailed information
and include information about rare disorders that end in death.
• Cons: Vital records only provide information on diseases and illnesses that
end in death.
Surveillance
• Public health surveillance is the ongoing systematic collection, analysis,
and interpretation of data, closely integrated with the timely
dissemination of these data to those responsible for preventing and
controlling disease and injury.”
• These systems function through the efforts of local and state health
departments, working in tandem with a variety of health care
providers (laboratories, hospitals, private providers), who are
mandated by law to report cases of certain diseases.
Disease registries
• Disease registries are another type of public health surveillance.
• Registries are systems that allow people to collect, store, retrieve,
analyze, and disseminate information about people with a specific
disease or condition.
• Pros:
• Disease registries let researchers estimate how large a health problem is,
determine the incidence of the disease, study trends over time, and evaluate
the effects of certain environmental exposures.
Surveillance data has a higher validity than surveys, because the data
comes from lab tests, diagnoses, and other patient records.
Peer-reviewed literature
• Peer-reviewed journals may include their articles data uploaded on
some data repository, like Harvard COVID-19 dataverse.
• The research of scholars who have collected their own data using an
experimental study design, survey, or various other study methodologies.
• They also present the work of researchers who have performed novel
analyses of existing data sources.
Selecting, designing, and developing your
questionnaire
Questionnaires offer an objective means
of collecting information about people’s
knowledge, beliefs, attitudes, and
behavior.
Steps in questionnaire development
• Ask your self the following questions:
• What information are you trying to collect? Require a deep knowledge
of the research topic.
• Is a questionnaire appropriate? Sometimes questionnaires were used
inappropriately.
Rule of thumb:
don’t use a
questionnaire to
assess sensitive
topics (associated
with stigma), can
be mixed with
people perceptions,
a broad issue that
can’t be assessed
by a question.
• Could you use an existing instrument? Using a previously validated
and published questionnaire will save you time and resources; you will
be able to compare your own findings with those from other studies,
you need only give outline details of the instrument when you write up
your work, and you may find it easier to get published.
• Is the questionnaire valid and reliable?
• A valid questionnaire measures what it claims to measure.
• Repliable questionnaires yield consistent results from repeated
samples and different researchers over time.
Questions
• Types of questions:
• Open ended Usually is
answered
as TEXT
• Close ended
Open ended VS Close ended
questions
Pros Cons
Open Allow free expression Take longer time
ended Capture new response options, Take more effort to finish (exhaustive)
emotions, & thoughts Need coding and analysis
Rely on hand writing skills & clarity
Closed Easy and quick Guessed answers
question Clear and complete Ceiling – Floor effect
Suitable for self completion Errors in filling it
Easy to standardize can’t capture wider options
How should you
present your
questions?
(Question formatting
and response options)
A Likert scale presents
ordered responses to a
questionnaire item that
asks participants to rank
preferences numerically,
such as
by using a scale for which 1
indicates strong
disagreement and 5
indicates strong agreement.
Most scales with a neutral
option list 5 to 7 categories.
Most scales without a
neutral option list 4
or 6 categories.
Wording
• Clarity. Make questions as clear and specific as possible.
• For example, asking, “How much exercise do you usually get?” is less clear than asking,
“During a typical week, how many hours do you spend in vigorous walking?”
• Simplicity. Use simple, common words and grammar that convey the
idea, and avoid technical terms and jargon.
• For example, it is clearer to ask about “drugs you can buy without a prescription from a
doctor” than to ask about “over-the-counter medications.”
• Neutrality. Avoid “loaded” words and stereotypes that suggest a
desirable answer.
• Asking “During the last month, how often did you drink too much alcohol?” may
discourage respondents from admitting that they drink a lot of alcohol.
Setting the Time Frame
• To measure the frequency of the behavior it is essential to have the
respondent describe it in terms of some unit of time.
During the last 7 days, how many cigarettes did you smoke (one pack is
equal to 20 cigarettes)?
[ ] cigarettes in the last 7 days
• The investigator must first decide what aspect of the behavior is most
important to the study: the average or the extremes.
Avoid Pitfalls
• Double-barreled questions. Each question should contain only one
concept.
• “How many cups of coffee or tea do you drink during a day?”
• Hidden assumptions. Sometimes questions make assumptions that
may not apply to all people who participate in the study.
• For example, a standard depression item asks how often, in the
past week: “I felt that I could not be happy even with help from my
family.”
What should the questionnaire look like?
• In general, questions should be short
and to the point (around 12 words or less).
• Physical layout of their questionnaire as the font size, color,
or question order.
Good questionnaire checklist
• After drafting the questionnaire, check each question for clarity and
confirm that the responses are also carefully worded. For example:
• Does each question ask what it is intended to ask?
• Is the language of each question clear and neutral?
• Will members of the study population understand the language?
• Do questions about sensitive topics use language acceptable to the source
population?
• Are the response options clearly presented?
• For scaled questions, is the rank order clear? (For example, is it obvious that 1
is “strongly disagree” and 5 is “strongly agree”? Or, alternatively, that 1 is
“excellent” and 7 is “poor”?)
• For questions with unranked categories, is the order of possible responses
alphabetical or otherwise neutral?
For new questions and scales
• Pretest
• Pretest the instrument to insure the clarity and timing of the questions.
• Validate
• Correctness or accuracy : measure what it meant to measure)
• Reliable
• consistency: give similar results when it used by different individuals or
settings
• Questionnaires and interviews can be assessed for validity and reliability by
taking a pilot study on small number of people or/and expert view.
Pilot Testing
• A pilot test, or pretest, is a small-scale preliminary study conducted to
evaluate the feasibility of a full scale research project.
• A pilot test of a questionnaire is helpful for checking:
• The wording and clarity of the questions
• The order of the questions
• The ability and willingness of participants to answer the questions
• The responses given, and whether the responses match the intended types
of responses
• The amount of time it takes to complete the survey
α
• Check the reliability (internal consistency, Cronbach )
Using available scales
• One way to improve validity is to include survey questions or modules
that are identical to the ones used in previous research projects.
• Such as:
• The Beck Depression Inventory and the General Health Questionnaire (GHQ), which
assesses psychological status
• The Mini-Mental State Examination (MMSE), which evaluates cognitive function
• The SF-36 and SF-12, which both measure health-related quality of life, that captures an
individual’s perceived physical, mental, emotional, and social well-being.
Translation
• One way to ensure that the correct meaning is being conveyed is to
use back translation, or double translation, in which one person
translates the questionnaire from the original language to a new
language and then a second person translates the survey instrument
in the new language back into the original language.
Response rate
• Response rates tend to be low, particularly when used without a well-
defined target population or sample.
• Here are some helpful guidelines for surveys of health professionals,
students, and educators.
• An introductory statement.
• The survey instrument must have well designed content and be visually
appealing.
• A respondent should be able to complete the survey in a short period.
• Some inducement or gift may be included.
• A follow-up request may spur some respondents.
Questionnaire administration methods
Types of surveys
• Self reported by mail or online.
• Interview by face to face, telephone based , or virtual.
• Group or focus group interview.
Population Sampling
• We can’t sample the entire
population …. For a study.
• The source population,
sometimes called a sampling
frame, is a well-defined subset of
individuals from the target
population from which potential
study participants will be
sampled.
• Sampling methods classification:
• Non-Probability Sampling Methods
• Probability Sampling Methods
Non-Probability Sampling Methods
• Convenience sampling
• Quota sampling
• Purposive Sampling
• Snowball sampling
Convenience sampling
• Select any one available.
Quota sampling
Purposive Sampling
Snowball sampling
Existing subjects are
asked to nominate
further subjects known
to them, so the sample
increases in size like a
rolling snowball.
Probability Sampling Methods
• Simple random sampling
• Systematic sampling
• Stratified sampling
• Clustered sampling
• Multi-stage sampling
Simple random sampling
In this case each individual is chosen
entirely by chance and each member of
the population has an equal chance, or
probability, of being selected.
Method to choose by chance:
• Toss a coin.
• Random Number generator tables
or software.
Random number generator
Have a list of individuals name (total ).
Determine your sample size (n).
Press generate, pick the person with the number shown.
Systematic sampling
Stratified sampling
Clustered sampling
• In a clustered sample, subgroups of the population are used as the
sampling unit, rather than individuals.
Multi-stage sampling
Sample Quality
• Sample bias: the sample is not representative for the population.
Sample is different in its’ characteristics compared to the population.
• Sampling bias may be introduced when:
• Poor sampling methodology
• Poor recruitment methodology
• High rate withdrawal from the study
• How to avoid: is to stick to probability-based sampling methods and to
add 20% extra to the required sample size.
Questions