What is data collection?
The traditional definition of data collection might lead us to think of gathering information through
surveys, observations, or interviews. However, the modern-age definition of data collection
extends beyond conducting surveys and observations. It encompasses the systematic gathering and
recording of any kind of information through digital or manual methods. Data collection can be as
routine as a doctor logging a patient’s information into an electronic medical record system during
each clinic visit, or as specific as keeping a record of mosquito nets delivered to a rural household.
How to collect data: Getting started
Before starting your data collection process, you must clearly understand what you aim to achieve
and how you’ll get there. Below are some actionable steps to help you get started.
1. Define your goals
Defining your goals is a crucial first step. Engage relevant stakeholders and team members in an
iterative and collaborative process to establish clear goals. It’s important that projects start with
the identification of key questions and desired outcomes to ensure you focus your efforts on
gathering the right information.
Start by understanding the purpose of your project– what problem are you trying to solve, or what
change do you want to bring about? Think about your project’s potential outcomes and obstacles
and try to anticipate what kind of data would be useful in these scenarios. Consider who will be
using the data you collect and what data would be the most valuable to them. Think about the long-
term effects of your project and how you will measure these over time. Lastly, leverage any
historical data from previous projects to help you refine key questions that may have been
overlooked previously.
Once questions and outcomes are established, your data collection goals may still vary based on
the context of your work. To demonstrate, let’s use the example of an international organization
working on a healthcare project in a remote area.
If you’re a researcher, your goal will revolve around collecting primary data to answer specific
questions. This could involve designing a survey or conducting interviews to collect first-hand
data on patient improvement, disease or illness prevalence, and behavior changes (such as an
increase in patients seeking healthcare).
If you’re part of the monitoring and evaluation (M&E) team, your goal will revolve around
measuring the success of your healthcare project. This could involve collecting primary data
through surveys or observations and developing a dashboard to display real-time metrics like
the number of patients treated, percentage of reduction in incidences of disease,, and average
patient wait times. Your focus would be using this data to implement any needed program
changes and ensure your project meets its objectives.
If you’re part of a field team, your goal will center around the efficient and accurate execution
of project plans. You might be responsible for using data collection tools to capture pertinent
information in different settings, such as in interviews takendirectly from the sample
community or over the phone. The data you collect and manage will directly influence the
operational efficiency of the project and assist in achieving the project’s overarching objectives.
2. Identify your data collection types
Essentially, there are two main data collection types to choose from: primary and secondary.
Primary data is the information you collect directly from first-hand engagements. It’s gathered
specifically for your research and tailored to your research question. Primary data collection
methods can range from surveys and interviews to focus groups and observations. Because you
design the data collection process, primary data can offer precise, context-specific information
directly related to your research objectives. For example, suppose you are investigating the
impact of a new education policy. In that case, primary data might be collected through surveys
distributed to teachers or interviews with school administrators dealing directly with the
policy’s implementation.
Secondary data, on the other hand, is derived from resources that already exist. This can
include information gathered for other research projects, administrative records, historical
documents, statistical databases, and more. While not originally collected for your specific
study, secondary data can offer valuable insights and background information that complement
your primary data. For instance, continuing with the education policy example, secondary data
might involve academic articles about similar policies, government reports on education or
previous survey data about teachers’ opinions on educational reforms.
While both types of data have their strengths, this guide will predominantly focus on primary
data and the methods to collect it. Primary data is often emphasized in research because it
provides fresh, first-hand insights that directly address your research questions. Primary data also
allows for more control over the data collection process, ensuring data is relevant, accurate, and
up-to-date.
However, secondary data can offer critical context, allow for longitudinal analysis, save time and
resources, and provide a comparative framework for interpreting your primary data. It can be a
crucial backdrop against which your primary data can be understood and analyzed. While we focus
on primary data collection methods in this guide, we encourage you not to overlook the value of
incorporating secondary data into your research design where appropriate.
3. Choose your data collection methods
When choosing your data collection method, there are many options at your disposal. Data
collection is not limited to methods like surveys and interviews. In fact, many of the processes in
our daily lives serve the goal of collecting data, from intake forms to automated endpoints, such
as payment terminals and mass transit card readers. Let us dive into some common types of data
collection methods:
Surveys and Questionnaires
Surveys and questionnaires are tools for gathering information about a group of individuals,
typically by asking them predefined questions. They can be used to collect quantitative and
qualitative data and be administered in various ways, including online, over the phone, in person
(offline), or by mail.
Advantages: Surveys allows researchers to reach many participants quickly and cost-
effectively, making them ideal for large-scale studies. The structured format of survey questions
can also make analysis easier than other methods.
Disadvantages: Survey data collection may not capture complex or nuanced information well,
as participants are limited to predefined response choices. Also, there can be issues with
response bias, where participants might provide socially desirable answers rather than honest
ones.
Interviews
Interviews involve a one-on-one conversation between the researcher and the participant. The
interviewer asks open-ended questions to gain detailed information about the participant’s
thoughts, feelings, experiences, and behaviors.
Advantages: They allow for an in-depth understanding of the topic at hand. The researcher can
adapt the questioning in real time based on the participant’s responses, allowing for more
flexibility.
Disadvantages: They can be time-consuming and resource-intensive, as they require trained
interviewers and a significant amount of time for both conducting and analyzing responses.
They may also introduce interviewer bias if not conducted carefully, due to how an interviewer
presents questions and perceives the respondent, and how the respondent perceives the
interviewer.
Observations
Observations involve directly observing and recording behavior or other phenomena as they occur
in their natural settings.
Advantages: Observations can provide valuable contextual information, as researchers can
study behavior in the environment where it naturally occurs, reducing the risk of artificiality
associated with laboratory settings or self-reported measures.
Disadvantages: Observational studies may suffer from observer bias, where the observer’s
expectations or biases could influence their interpretation of the data. Also, some behaviors
might be altered if subjects are aware they are being observed.
Focus Groups
Focus groups are guided discussions among selected individuals to gain information about their
views and experiences.
Advantages: Focus groups allow for interaction among participants, which can generate a
diverse range of opinions and ideas. They are good for exploring new topics where there is little
pre-existing knowledge.
Disadvantages: Dominant voices in the group can sway the discussion, potentially silencing
less assertive participants. They also require skilled facilitators to moderate the discussion
effectively.
Forms
Forms are standardized documents with blank fields for collecting data in a systematic manner.
They are often used in fields like Customer Relationship Management (CRM) or Electronic
Medical Records (EMR) data entry. Surveys may also be referred to as forms.
Advantages: Forms are versatile, easy to use, and efficient for data collection. They can
streamline workflows by standardizing the data entry process.
Disadvantages: They may not provide in-depth insights as the responses are typically
structured and limited. There is also potential for errors in data entry, especially when done
manually.
Selecting the right data collection method should be an intentional process, taking into
consideration the unique requirements of your project. The method selected should align with your
goals, available resources, and the nature of the data you need to collect.
If you aim to collect quantitative data, surveys, questionnaires, and forms can be excellent tools,
particularly for large-scale studies. These methods are suited to providing structured responses that
can be analyzed statistically, delivering solid numerical data.
However, if you’re looking to uncover a deeper understanding of a subject, qualitative data might
be more suitable. In such cases, interviews, observations, and focus groups can provide richer,
more nuanced insights. These methods allow you to explore experiences, opinions, and behaviors
deeply. Some surveys can also include open-ended questions that provide qualitative data.
The cost of data collection is also an important consideration. If you have budget constraints, in-
depth, in-person conversations with every member of your target population may not be practical.
In such cases, distributing questionnaires or forms can be a cost-saving approach.
Additional considerations include language barriers and connectivity issues. If your respondents
speak different languages, consider translation services or multilingual data collection tools. If
your target population resides in areas with limited connectivity and your method will be to collect
data using mobile devices, ensure your tool provides offline data collection, which will allow you
to carry out your data collection plan without internet connectivity.
4. Determine your sampling method
Now that you’ve established your data collection goals and how you’ll collect your data, the next
step is deciding whom to collect your data from. Sampling involves carefully selecting a
representative group from a larger population. Choosing the right sampling method is crucial for
gathering representative and relevant data that aligns with your data collection goal.
Consider the following guidelines to choose the appropriate sampling method for your research
goal and data collection method:
Understand Your Target Population: Start by conducting thorough research of your target
population. Understand who they are, their characteristics, and subgroups within the
population.
Anticipate and Minimize Biases: Anticipate and address potential biases within the target
population to help minimize their impact on the data. For example, will your sampling method
accurately reflect all ages, gender, cultures, etc., of your target population? Are there barriers
to participation for any subgroups? Your sampling method should allow you to capture the most
accurate representation of your target population.
Maintain Cost-Effective Practices: Consider the cost implications of your chosen sampling
methods. Some sampling methods will require more resources, time, and effort. Your chosen
sampling method should balance the cost factors with the ability to collect your data effectively
and accurately.
Consider Your Project’s Objectives: Tailor the sampling method to meet your specific
objectives and constraints, such as M&E teams requiring real-time impact data and researchers
needing representative samples for statistical analysis.
By adhering to these guidelines, you can make informed choices when selecting a sampling
method, maximizing the quality and relevance of your data collection efforts.
5. Identify and train data collectors
Not every data collection use case requires data collectors, but training individuals responsible for
data collection becomes crucial in scenarios involving field presence.
The SurveyCTO platform supports both self-response survey modes and surveys that require a
human field worker to do in-person interviews. Whether you’re hiring and training data collectors,
utilizing an existing team, or training existing field staff, we offer comprehensive guidance and
the right tools to ensure effective data collection practices.
Here are some common training approaches for data collectors:
In-Class Training: Comprehensive sessions covering protocols, survey instruments, and best
practices empower data collectors with skills and knowledge.
Tests and Assessments: Assessments evaluate collectors’ understanding and competence,
highlighting areas where additional support is needed.
Mock Interviews: Simulated interviews refine collectors’ techniques and communication
skills.
Pre-Recorded Training Sessions: Accessible reinforcement and self-paced learning to refresh
and stay updated.
Training data collectors is vital for successful data collection techniques. Your training should
focus on proper instrument usage and effective interaction with respondents, including
communication skills, cultural literacy, and ethical considerations.
Remember, training is an ongoing process. Knowledge gaps and issues may arise in the field,
necessitating further training.
Once you’ve established the preliminary elements of your data collection process, you’re ready to
start your data collection journey. In this section, we’ll delve into the specifics of designing and
testing your instruments, collecting data, and organizing data while embracing the iterative nature
of the data collection process, which requires diligent monitoring and making adjustments when
needed.
6. Design and test your instruments
Designing effective data collection instruments like surveys and questionnaires is key. It’s crucial
to prioritize respondent consent and privacy to ensure the integrity of your research. Thoughtful
design and careful testing of survey questions are essential for optimizing research insights. Other
critical considerations are:
Clear and Unbiased Question Wording: Craft unambiguous, neutral questions free from bias
to gather accurate and meaningful data. For example, instead of asking, “Shouldn’t we invest
more into renewable energy that will combat the effects of climate change?” ask your question
in a neutral way that allows the respondent to voice their thoughts. For example: “What are
your thoughts on investing more in renewable energy?”
Logical Ordering and Appropriate Response Format: Arrange questions logically and
choose response formats (such as multiple-choice, Likert scale, or open-ended) that suit the
nature of the data you aim to collect.
Coverage of Relevant Topics: Ensure that your instrument covers all topics pertinent to your
data collection goals while respecting cultural and social sensitivities. Make sure your
instrument avoids assumptions, stereotypes, and languages or topics that could be considered
offensive or taboo in certain contexts. The goal is to avoid marginalizing or offending
respondents based on their social or cultural background.
Collect Only Necessary Data: Design survey instruments that focus solely on gathering the
data required for your research objectives, avoiding unnecessary information.
Language(s) of the Respondent Population: Tailor your instruments to accommodate the
languages your target respondents speak, offering translated versions if needed. Similarly, take
into account accessibility for respondents who can’t read by offering alternative formats like
images in place of text.
Desired Length of Time for Completion: Respect respondents’ time by designing instruments
that can be completed within a reasonable timeframe, balancing thoroughness with
engagement. Having a general timeframe for the amount of time needed to complete a response
will also help you weed out bad responses. For example, a response that was rushed and
completed outside of your response timeframe could indicate a response that needs to be
excluded.
Collecting and Documenting Respondents’ Consent and Privacy: Ensure a robust consent
process, transparent data usage communication, and privacy protection throughout data
collection.
Perform Cognitive Interviewing
Cognitive interviewing is a method used to refine survey instruments and improve the accuracy of
survey responses by evaluating how respondents understand, process, and respond to the
instrument’s questions. In practice, cognitive interviewing involves an interview with the
respondent, asking them to verbalize their thoughts as they interact with the instrument. By
actively probing and observing their responses, you can identify and address ambiguities, ensuring
accurate data collection.
Thoughtful question wording, well-organized response options, and logical sequencing enhance
comprehension, minimize biases, and ensure accurate data collection. Iterative testing and
refinement based on respondent feedback improve the validity, reliability, and actionability of
insights obtained.
Put Your Instrument to the Test
Through rigorous testing, you can uncover flaws, ensure reliability, maximize accuracy, and
validate your instrument’s performance. This can be achieved by:
Conducting pilot testing to enhance the reliability and effectiveness of data collection.
Administer the instrument, identify difficulties, gather feedback, and assess performance in
real-world conditions.
Making revisions based on pilot testing to enhance clarity, accuracy, usability, and participant
satisfaction. Refine questions, instructions, and format for effective data collection.
Continuously iterating and refining your instrument based on feedback and real-world
testing. This ensures reliable, accurate, and audience-aligned methods of data collection.
Additionally, this ensures your instrument adapts to changes, incorporates insights, and
maintains ongoing effectiveness.
7. Collect your data
Now that you have your well-designed survey, interview questions, observation plan, or form, it’s
time to implement it and gather the needed data. Data collection is not a one-and-done deal; it’s
an ongoing process that demands attention to detail. Imagine spending weeks collecting data, only
to discover later that a significant portion is unusable due to incomplete responses, improper
collection methods, or falsified responses. To avoid such setbacks, adopt an iterative approach.
Leverage data collection tools with real-time monitoring to proactively identify outliers and issues.
Take immediate action by fine-tuning your instruments, optimizing the data collection process,
addressing concerns like additional training, or reevaluating personnel responsible for inaccurate
data (for example, a field worker who sits in a coffee shop entering fake responses rather than
doing the work of knocking on doors).
SurveyCTO’s Data Explorer was specifically designed to fulfill this requirement, empowering
you to monitor incoming data, gain valuable insights, and know where changes may be needed.
Embracing this iterative approach ensures ongoing improvement in data collection, resulting in
more reliable and precise results.
8. Clean and organize your data
After data collection, the next step is to clean and organize the data to ensure its integrity and
usability.
Data Cleaning: This stage involves sifting through your data to identify and rectify any errors,
inconsistencies, or missing values. It’s essential to maintain the accuracy of your data and
ensure that it’s reliable for further analysis. Data cleaning can uncover duplicates, outliers, and
gaps that could skew your results if left unchecked. With real-time data monitoring, this
continuous cleaning process keeps your data precise and current throughout the data collection
period. Similarly, review and corrections workflows allow you to monitor the quality of your
incoming data.
Organizing Your Data: Post-cleaning, it’s time to organize your data for efficient analysis and
interpretation. Labeling your data using appropriate codes or categorizations can simplify
navigation and streamline the extraction of insights. When you use a survey or form, labeling
your data is often not necessary because you can design the instrument to collect in the right
categories or return the right codes. An organized dataset is easier to manage, analyze, and
interpret, ensuring that your collection efforts are not wasted but lead to valuable, actionable
insights.