Data Collection and Analysis Tool
Data Collection and Analysis Tool
Tool Description
DATA COLLECTION AND ANALYSIS Options and recommendations for data collection
and for creating and reporting statistics
V.1 3/2024 2
Table of contents
About this tool 3
What are primary data? 3
When to collect primary vs. secondary data? 3
Scoping a primary data collection exercise 5
Getting started 5
Should you run a census or a survey? 5
Survey design 7
Defining a sample 7
Sources of bias 10
Questionnaire design 11
Overview 11
Questionnaire best practices 11
Testing 13
Customization 14
Privacy 14
Choosing software 14
Communication strategy 17
Key considerations 17
Communications methods 18
Incentives 19
Enumeration 20
Enumeration methods 20
Enumeration planning 21
Survey enumeration 23
Processing data 25
Data cleaning 26
Weighting 27
Analysis 28
Reporting 29
External support and RFP guidelines 32
References 34
V.1 3/2024 3
About this tool
This tool outlines key considerations for effective primary data collection. It is closely
related to the Indicator Guide & Recommended Indicators Tool, which describes a suite
of indicators developed by SGIGs to measure well-being, many of which may need to be
informed by primary data.
Additional resources
Some of the content included in this tool has been adapted from Statistics
Canada (2010), Survey Methods and Practices (Catalogue no. 12-587-X. ISBN
978-1-100-16410-6), which provides detailed guidance on primary data collection
and can serve as a useful companion guide for users of this tool.
V.1 3/2024 4
difficult, but when it’s required, the benefits can be tremendous! After SGIGs have gone
through a proper primary data collection exercise, they will have moved along the data
use continuum towards expertise, and primary data collection exercises can serve as a
catalyst for organizational change more broadly.
The benefit of administrative data is that they already exist and therefore
significant efficiencies are gained in utilizing these data. The drawback of
administrative data sources is that they aren’t designed specifically to create
indicators and measure well-being. This means that they may not be precisely
aligned with the ideal indicators governments have in mind, and there may be
variability in data quality and completeness that will need to be addressed. Here
are some questions to ask when considering using a specific set of administrative
data for an indicator(s) of interest:
1. Accuracy: Are there systems in place that ensure data are accurate? Do
the data reflect reality? How are mistakes addressed?
2. Completeness: Does the data set have a lot of missing values? How will this
affect analysis?
3. Consistency: Are there any instances of conflicting information within the
V.1 3/2024 5
dataset?
4. Timeliness: Does the timeline on which the data can be accessed align with
project objectives and reporting plan?
5. Validity: Is there consistent formatting that makes the data usable? For
example, are birth dates reported in a consistent format?
6. Uniqueness: Does each unique person only appear once in the dataset? For
example, if someone is listed as “Robin A. Mackey” in one instance “Robin
Mackey” in another instance, will this show up as two entries?
V.1 3/2024 6
1. What are our priority areas for data collection?
a. Are there specific domains of well-being that are the priority for
measurement? Or is this project specifically about a comprehensive
picture of well-being?
2. What do we have the capacity (people, time, and funding) to undertake?
For guidance on answering the latter question, a workplan and costing tool is available.
Surveys are used when it is not feasible or necessary to collect information or observations
from the entire population of interest. A survey is a smaller scale operation than a
census, and as such it provides a faster and more economical way of collecting
information. A census might not be feasible if the population is very large, geographically
dispersed, or difficult to reach, or if limited resources are available, including people,
time, and funding. From a technical perspective, if a complete sample frame (a full list of
the population of interest) exists, a census is only required when there are important
strategic reasons to conduct a census. For example, if people would feel like their voices
weren’t heard because they were excluded from the sample in a survey. Otherwise, the
size of the sample in a survey can always be adjusted to accommodate the information
needs in any primary data collection exercise. If the information needs are very detailed
(e.g. an SGIG wants to be able to understand, very precisely, some things about small
sub-populations) then the size of the sample may be almost the same as the entire
population and therefore the exercise essentially becomes a census. Even the long-form
of the Census of the Population from Statistics Canada is actually a survey, sampling 1 in
4 Canadians.
V.1 3/2024 7
Table 1. Census vs. survey
Census Survey
Scope Data are collected from the entire Data are collected from a subset
population. If the population is of the population. If the population
small, or there are strategic is large, accurate results can
considerations, a census may be usually be derived from a relatively
preferable. small sample.
Cost More costly. Primary data Less costly overall, and more
collection exercise costs typically resources can be devoted to each
scale with the size of the sample individual response.
because enumeration is usually a
main cost-driver.
It is entirely possible to conduct both a census and a survey. For example, SGIGs may
consider conducting a census on-lands in tandem with a survey off-lands. Several
considerations inform whether to run a census or survey on-lands only, or both on-lands
and off-lands. Running a census or survey on-lands is easier because enumeration is
easier when the population is close together. For example, enumerators can actually go
door-to-door to reach the population of interest. When running a census or survey
off-lands, it is likely that the population will be very dispersed, forcing a reliance on
electronic enumeration methods that can introduce errors in the sample. Generally, the
enumeration requirements for people living off-lands will be greater and more
expensive, so current personnel, time frame, and funding should be assessed to
determine the feasibility of enumerating off-lands. Additionally, if the content of the
questionnaire is program-focused and those programs are primarily or exclusively
offered on-lands, including citizens or members living off-lands may not be necessary.
V.1 3/2024 8
Scenario: Decision to undertake a census
To make this decision, the Sulingituk team connects with both leadership and
senior administration about their desired outcomes of this exercise. Leadership
emphasizes their interest in connecting with all members, as the last time a
similar effort was undertaken was in the decision to assume self-government
seven years previously. Administration is clear that they have interests in
understanding the needs of smaller groups – including those with unstable
housing and gender diverse persons – who are otherwise not well represented in
the planning and programming of the Sulingituk Government. As both leadership
and senior administration confirm the availability of budget for a comprehensive
project, the decision is made to undertake a census, attempting to reach all
Sulingituk members and those living in their households.
Survey design
Defining a sample
The goal in running a survey is to ensure that the sample is representative of the
population of interest. In other words, the data collected should accurately reflect the
characteristics of the entire population. The process of defining a sample can be broken
down into four main steps: (1) identify the survey frame, (2) decide the sample design,
(3) determine the sample size, and (4) produce the sample randomly.
A survey frame is the means of identifying and contacting the units of the survey
population (e.g., individuals or households). It is essentially the list or database from
which the sample is drawn for the survey. The frame can be a physical list such as a data
V.1 3/2024 9
file of contact information, a geographic list which includes units within specific areas, or
a conceptual list that provides conditions delineating who is included. For SGIGs, the
survey frame will often be their citizenship or membership list. The frame chosen
ultimately determines the definition of the survey population and can affect the methods
of data collection, sample selection, and estimation, the cost of the survey, and the
quality of its outputs. It should ideally include all (and only) the units (e.g. people,
households) of interest in the population.
For SGIGs, their citizenship or membership list will be a critical input to conducting
either a census or a survey. Evaluating the list’s data quality will be a first step to
understanding what kind of data collection is possible and at what cost. If there
are many errors or discrepancies in the list, addressing these before getting
started may be necessary.
Once the survey frame has been established, the next step is to plan the sampling design.
This step in the survey design determines how the units from the frame will be selected
for participation in the survey. The sampling design should be chosen to ensure that the
sample is representative of the population of interest and that the survey results will
have the desired level of precision. Broadly speaking, there are two types of sampling
designs: probability and non-probability sampling. At a high-level, probability sampling
employs randomness when selecting a sample from a population, whereas
non-probability sampling refers to a wide range of sampling techniques that do not
V.1 3/2024 10
employ randomness. The choice of sampling design can be influenced by a variety of
factors, including the nature of the population, the objectives of the survey, the statistical
methods that will be employed during analysis, and the resources available. Generally,
non-probability sampling is faster, easier, and less expensive, but that comes at a cost:
the risk of introducing uncertainty and bias into findings. Table 2 presents common types
of sampling designs and their descriptions.
Once the sampling design has been chosen, the next step is to determine the sample size.
Sample size refers to the number of participants or data points selected from a larger
population to be studied. It is a critical factor that impacts the precision of research
findings. A larger sample size generally leads to more accurate and generalizable results,
while a smaller sample size may introduce more uncertainty into the findings. The
sample size should be large enough to provide the desired level of precision for the
survey estimates, but not so large as to be unnecessarily costly or difficult to manage.
Technically, the size of the sample should be calculated for each statistic of interest.
Since there is often interest in many statistics that will be informed by the set of
questions in a questionnaire, the sample size needs to be set for the type of statistic that
will require the largest sample and then used for the entire survey.
In the internet age, the formula is not so important, as it is possible to simply plug the
population size, statistic of interest, margin of error, and confidence level into one of the
many online sample size calculators, like the Qualtrics sample size calculator.
V.1 3/2024 11
Table 2. Common types of sampling designs
Sampling Design Types Description
Strengths
● Simple to implement, analyze, and understand.
● Population members are all assigned to the
sample in one step.
Weaknesses
● Limited range of use cases. If the population is
non-homogeneous, some members may be
harder to include in the sample than others,
and a simple random sample will give biased
Probability
results.
Sampling
Strengths
● Wider use of cases
● Accounts for some types of under- and over-
representation
Weaknesses
● Requires additional setup and planning
V.1 3/2024 12
Sampling Design Types Description
Strengths
● Effective for studying hard-to-reach or hidden
populations.
● Can result in a more diverse and
knowledgeable sample as referrals are made
by initial participants.
Weaknesses
● Higher risk of inaccuracy due to selection,
response bias, and non-response error, caused
by survey respondents not representing the
population of interest.
Strengths
● Most realistic scenario for any sort of opinion
survey.
● Quick data collection, saving time and
resources.
● Easy implementation, cost-effective.
Weaknesses
● Without adjustments, the sample will be over
representative of people with the time and
inclination to answer the survey.
○ Will not represent people who never
answer their phones.
● Higher risk of inaccuracy due to selection,
response bias, and non-response error, caused
by survey respondents not representing the
population of interest.
○ Can be made more representative with
adjustments for known variables like age
and sex.
V.1 3/2024 13
Sources of bias
When selecting a sampling design, it is important to keep in mind the types of error that
commonly arise in the sampling process. The two most common types of errors are
non-response bias and selection bias. Non-response bias is when the individuals who
respond to a survey or whom you observe in a study differ significantly from those who
do not respond or opted not to participate. Selection bias occurs when the selection of
the sample is non-random and some individuals had a higher or lower probability of
being selected. When conducting either a census or a survey, non-response bias is the
form of bias that is most concerning. High response rates will tend to reduce
non-response bias, which is part of the reason to consider a sample over a census.
Questionnaire design
Overview
A questionnaire (or form) is one of the survey instruments through which primary data
are systematically collected. It is a group of sequence of questions, prompts, or
statements designed to obtain information from a respondent. A questionnaire can
include open ended questions, close ended questions, or a combination of both.
Questionnaires are both critically important to data quality and create an impression on
the survey respondents. The goal is to make sure respondents understand what they are
being asked and can provide the answers easily in a form that is suitable for subsequent
data processing and analysis. This is achieved through thoughtful questionnaire design,
where question topics are determined and precisely worded and ordered. SGIGs that are
also referencing the Conceptual Well-being Framework Tool and the Indicator Guide &
Recommended Indicators Tool —including the starting point questions they provide—
may use these as guidance in the planning, development, and customization of a
questionnaire.
V.1 3/2024 14
Questionnaire best practices
A well designed questionnaire should be efficient, consistent, and respondent-friendly.
An efficient questionnaire increases engagement by taking the minimum time necessary
for a respondent to complete and reduces costs by making responses easy for data
analysts to process. A consistent questionnaire reduces errors by ensuring that
respondents and analysts share a clear interpretation of the questions and answers. A
respondent-friendly questionnaire considers the experience of survey respondents in its
content, layout, and design, and aims to increase engagement and leaves a good
impression on survey respondents. For example, a respondent-friendly questionnaire
considers the applicability of different questions for respondents living on- or off-lands.
Questionnaire layout
When designing a questionnaire, the layout ensures respondents' ease of understanding
and completion. Before presenting the questions, include an introduction that may
answer some frequently asked questions. For example, the introduction may include
project background and purpose, privacy and security information, the consent
statement, and details about timelines or incentives. It is important to also thank
participants for their time and make sure they understand the value and purpose of their
contributions.
Visually, consistency in formatting, effective use of white space, and well-aligned page
layouts contribute to the user experience, whether in print or online. Include headers with
the questionnaire title, footers with contact information, and branding elements, if
applicable, to help maintain a cohesive visual identity. Care should be taken with the use
of colour to ensure readability for all respondents, including those with colour vision
deficiencies. Usability testing, mobile-friendly design for online surveys, progress
indicators, and numbered questions aid in questionnaire effectiveness. Additionally,
response validation, clear language, and adherence to accessibility standards
V.1 3/2024 15
contribute to a well-structured and user-friendly questionnaire layout, ultimately
enhancing data quality.
● Ensure all questions are phrased as true questions and not statements
● Avoid negatives
○ The word NOT is easy to overlook even when highlighted, and
processing negatives costs attention
● Be exact
○ Don’t assume knowledge - provide necessary context / detail to
answer accurately/consistently. E.g. Use terms like “in the last 7 days”
instead of “in a typical week”
● Pre-empt guessing
○ e.g. Use bins like “Between $20,000 and $30,000” for large quantities
that people are unlikely to know exactly
V.1 3/2024 16
● Use skip patterns to justify assumptions
○ e.g. Ask “Do you use any time management software?”, and if the
answer is no, skip over “What time management software do you use?”
○ Provide respondents with the opportunity to skip a question or (better
yet) choose a response that indicates the question does not apply to
them, they do not know, or prefer not to answer
● Take into account the applicability of questions for on-lands and off-lands
residents
○ Consider whether all survey respondents have the information or
experience they need to answer a particular question and use skip
logic if they don’t
○ E.g., Be mindful about asking off-lands residents questions about
programs, services, or events that are only available on-lands
○ Define terms that might be ambiguous. For example, define
"community" and whether it refers to the community of the SGIG or the
respondent’s community of residency
V.1 3/2024 17
Testing
Effective questionnaire design requires thorough, iterative testing to collect meaningful
data. Designers should first self-administer the draft questionnaire, including reading it
out loud, to assess flow, clarity, accuracy, and alignment to analysis needs. To enhance
data quality and survey effectiveness, each question should then be evaluated from the
perspective of multiple individuals with diverse backgrounds and circumstances to
reveal insights on question interpretation and barriers to precise responses. Ensure to
refine question wording to eliminate confusing language, leading nature, double
negatives, repetition, and absolutes. Finally, ensure to review the responses against
expected outcomes and refine questions to align with analysis needs.
If feasible given time and budget, designers should next test the questionnaire with a
small group of data users and target respondents directly to assess flow, clarity,
accuracy, and alignment to analysis needs. This will help identify any problems with the
questionnaire that were not apparent during the initial review stage before finalizing it
and getting it ready for distribution.
Customization
Pre-made questions are readily available and can serve as a useful starting point when
designing a questionnaire. However, questionnaires should also be customized to ensure
relevance to each SGIG’s cultural and community context and priorities. The
customization section in the Indicator Guide & Recommended Indicators Tool includes
further information on customization relevant to questionnaire design.
Privacy
Data collection carries risks to privacy that must be considered carefully. Best practices
dictate that data collectors and analysts should be transparent about how information
V.1 3/2024 18
will be used, limit collection of personal details, restrict data access, and remove
identifying details from aggregate data. Data collection requires balancing analysis
benefits with privacy protections. All data privacy and protection policies should be
aligned with SGIGs’ data governance program. See for reference the Data Governance
Framework Tool.
Choosing software
When choosing software to conduct a survey or census, there are several important
considerations to keep in mind to ensure that the selected tool aligns with the project’s
specific needs and objectives. Some key considerations include:
● Survey Complexity: Some software options are better suited for simple
questionnaires with basic question types, while others offer advanced features for
complex questionnaires with branching logic, skip patterns, and randomization.
● Budget: Some software options offer free plans with limitations, while others
require a paid subscription. Consider the cost in relation to the features needed
and the value these bring to the project.
● Ease of Use: Consider the user-friendliness of the software. Will the platform be
easy to use for both administrators and respondents? Intuitive interfaces can
save time and reduce the risk of errors.
● Reporting and Analytics: Evaluate the reporting and analytics capabilities of the
software. Is in-depth data analysis required, or should the tool be able to produce
any reports? The tool selected should align with analysis and reporting needs.
V.1 3/2024 19
● Mobile Responsiveness: In today's mobile-centric world, make sure the software
offers mobile responsiveness so that respondents can complete surveys on
various devices.
● Qualtrics
● SurveyMonkey Enterprise
● QuestionPro Enterprise
● LimeSurvey (self-hosted)
V.1 3/2024 20
Scenario: Drafting a questionnaire
The decision to conduct a census has sparked significant interest, with numerous staff
members putting forth their information needs in the hopes it will be incorporated into
the questionnaire. However, given all of these incoming information needs, the team
anticipates a challenge in creating an efficient, consistent, and respondent-friendly
questionnaire.
To address the multitude of competing interests, the team initiates the process by
prioritizing questions related to the previously agreed-upon indicators chosen from the
Indicator Guide & Recommended Indicator List. This step is crucial, as these indicators
form the core of their previously agreed-upon information needs.
The team then establishes a set of guiding principles endorsed by the senior
administrator to facilitate the systematic evaluation of other proposed questions and
information requirements in a transparent manner. These principles encompass:
Once the draft questionnaire is loaded into the survey software, the Sulingituk team
seeks to test the questionnaire to ensure it meets the four guiding principles and the
‘rules of thumb’ - and to ensure that their decisions on question selection aren’t leaving
any major gaps in the eyes of the community. They also want to ensure that the
technology, links, and survey structure work as intended.
Given that many staff of the Sulingituk Government are also members, they decide to
use this group to test the questionnaire. Senior administration provides one hour for
Sulingituk members on staff to complete the questionnaire, and prizes are offered for
their participation. Participants are asked to identify any questions that are unclear or
uncomfortable to answer. They are also asked to point out any broken links or other
V.1 3/2024 21
technical snags in the survey. Finally, they are asked to think about whether the
questionnaire covers the big things that the community will want to share or talk about
with the Sulingituk Government.
Based on the feedback, the Sulingituk team makes a number of changes to the
questionnaire:
● Some terminology is adjusted to reflect the community's worldview and common
language;
● Additional information is built into the introduction to the questionnaire to explain
privacy and confidentiality for all respondents;
● One question is added to reflect a gap in the questionnaire related to child care;
● Some unclear questions are refined to enhance accuracy, reliability, and clarity.
Communication strategy
Key considerations
Effective communication when conducting primary data collection ensures potential
survey participants understand the importance of the data collection exercise, how the
data will be used and safeguarded, and important timelines. This is the opportunity to
connect with potential survey participants, express gratitude for their time and
participation, and let them know if you are offering any incentives. The communication
strategy may also address concerns, build trust, and maximize participation by providing
transparency to participants. When the respondents feel that their input is valued and
their concerns are being addressed, they may be more likely to participate fully in the
survey. To achieve this, consider these guiding questions while creating the
communication strategy:
● What is the primary goal of the data collection efforts and why is it important?
● Who is the target audience for the collected data and what are their information
coneeds and preferences?
● What is the level of trust amongst participants in data collection and in the SGIG
holding citizen data? How can trust be built or reinforced?
V.1 3/2024 22
● Who are the opinion leaders in the community and within families that can
encourage people to participate?
● Who are trusted messengers or people with large followings that respondents will
be connected to?
● What potential barriers or misconceptions may participants have about the data
collection and how can the concerns be addressed proactively? What are the
messages people need to hear about how the information will be treated?
● How can the messaging be tailored to resonate with the values and interests of
participants?
● What are the key events and times of year to target the messaging and
audience?
● What are the likely questions people will ask? What feedback mechanisms can be
implemented to continually gather input and insights from participants to refine
the communication strategy?
Think of the communication timeline as being composed of multiple waves, each with its
own time frame, target responses, and audience. As the project moves through each
wave, there is the opportunity to refine the approach based on successes and
challenges.
Communications methods
There are many possible communication products to develop and pathways to leverage.
These should be chosen based on the answers to the questions in the section above and
the key preferences of the target audiences. Options to consider:
● Production of a video that describes why this project is important and why people
should participate, with messaging delivered by key opinion leaders in the SGIG.
V.1 3/2024 23
● A strong social media campaign that reflects the platforms used most by the
SGIG members (e.g. Facebook). This may include paid advertising where
appropriate. Another effective strategy is to identify any members or pages with
large followings among the target respondent group, and work with those
influencers to promote the project (with appropriate compensation for their time).
Partnering with other organizations with large followings among the SGIG’s
members should be considered to leverage their channels to promote
participation.
Various materials are useful to share at community events or door-to-door. This includes
materials such as posters, presentation slides, door knockers, and Frequently Asked
Questions (FAQs).
Respondent FAQs
Data anonymity
● Will my responses be anonymous?
● How will my data be aggregated and reported?
Data usage
● How will the collected data be used?
V.1 3/2024 24
Contact information
● Who can I contact if I have questions or concerns about the survey?
● Is there a dedicated support team for respondents?
Survey deadline
● Is there a deadline for completing the survey?
● Can I request an extension if needed?
Incentives
Financial incentives are an effective way to increase survey responses. Develop an
approach to incentives that aligns with the specific goals of the project and caters to the
preferences of the audience. Consider: the types of incentives to offer; how frequently to
issue incentives; and, whether there are ways for respondents to increase their chances
of winning.
● Most likely to be motivated by a major large prize. If so, consider offering a top
three prize draw at the end of the data collection process, with major incentives
(e.g. trips, electronics).
Consider if there are ways to increase the response or completion rates by establishing
different levels of incentives, or increasing opportunities for respondents to win. For
example, respondents could earn more entries in a prize draw through completing more
sections of the survey in a comprehensive way, or earn more entries through referring
eligible friends and family members.
V.1 3/2024 25
Scenario: Engagement strategies
A key reason for Sulingituk Government to undertake a census rather than a survey was
that they are interested in hearing from groups that typically do not participate
otherwise in community engagement efforts. These groups are challenging to reach, as
they feel stigmatized, have low trust in sharing their challenges and needs with the
Sulingituk Government, and some have very limited access to technology and no fixed
address. The Sulingituk Government develops tailored outreach and engagement
strategies to collect accurate representation of these populations:
V.1 3/2024 26
Enumeration
Enumeration methods
Enumeration refers to the process of collecting information from survey participants,
usually through questionnaires or interviews. Having varied enumeration methods allows
people to participate in the way most convenient and comfortable for them. Consider
providing various options, such as:
When selecting an enumeration method, SGIGs should consider the trade-offs between
respondent preference and cost, data quality, and timely responses. Depending on the
project objectives and constraints, SGIGs may find certain methods are more suitable
than others. For example, respondents often prefer mail-in questionnaires for their
convenience and privacy, as individuals can complete them at their own pace and in the
comfort of their own space. However, they are not always recommended due to several
reasons, including:
● Low response rates: Mail-in questionnaires typically have lower response rates
compared to other data collection methods resulting in a less representative
sample and may require additional efforts and costs to increase participation.
● Limited control and lengthy response times: There is little control over the timing
and completion of mail-in questionnaires. Respondents may take longer to
complete mail-in questionnaires, leading to delays in obtaining data, which may
not be suitable for the projected timeline.
● Costs: Printing, postage, and data entry costs can be significant when using
mail-in questionnaires, especially for large-scale surveys, possibly straining the
budget.
V.1 3/2024 27
While mail-in questionnaires can be a valuable data collection method in specific
contexts, other methods like online surveys, phone interviews, or in-person interviews
may be more fitting alternatives for many projects.
Enumeration planning
Recruitment
In the pre-enumeration phase, the first step is to hire enumerators. The hiring process
involves developing job descriptions, specifying requirements and qualifications, and
outlining the necessary skills. To recruit suitable candidates, the job may be posted on
local job boards or obtained through network connections. Conduct interviews to assess
the candidates' suitability for the role. SGIGs can consider these questions to shape their
enumerator recruitment strategy:
Community knowledge
● Are there specific dynamics within the community that may be better managed
by ensuring a diverse group of enumerators representing different groups within
the community?
Trust factors
● Are there trust issues within the community that may affect the level of trust
respondents have in enumerators?
V.1 3/2024 28
Community employment opportunities
● Is it important to prioritize hiring enumerators from the community, and what are
the potential benefits or challenges associated with this approach?
● How can the recruitment strategy balance the need for community
representation with potential issues of trust, and/or the need for specific skills or
qualifications?
● Are there opportunities to utilize existing staff or resources within the organization
to support the enumerator workforce, such as training or logistical support?
● How can existing staff be integrated into the recruitment and training process
effectively?
● What age and gender mix among enumerators might be most effective for
engaging with the target population?
● Are there cultural norms or preferences within the community that should guide
the composition of the enumerator team in terms of age and gender?
Enumerators should also be trained on the survey instrument, data collection techniques,
and any specific protocols or instructions related to the project. In this training,
enumerators should be aware of any potential biases, cultural sensitivities, or ethical
issues that may arise during data collection and how to handle them appropriately.
V.1 3/2024 29
Lastly, verify that enumerators have the necessary equipment, such as tablets or paper
surveys, and that this equipment is in good working condition.
Survey enumeration
In the survey enumeration phase, accurate enumeration lists need to be produced and
maintained to allow the enumeration supervisor to track the overall progress in logging
attempts and completions of eligible respondents. To produce the enumeration lists, the
membership list will be cleaned by removing duplicate entries and restructuring the data
set. There may be additional information for other sources added, such as from an
in-community housing list. The list will then be divided into smaller lists organized by
location and the presence of contact information—members without any contact details
will be put into a separate list. A template will be utilized to format the lists into a more
usable layout for the enumerators. The initial versions of the lists will then be refined
based on feedback and input from the enumerators. The lists will be consistently
updated and reproduced as surveys are completed in order to track progress.
When assigning lists to enumerators, consider their familiarity with the area, language
proficiency, and capacity to manage the workload effectively. The enumeration
V.1 3/2024 30
supervisor will provide ongoing monitoring to redeploy enumerators, consider new
incentive approaches, and make recommendations to increase productivity and the
response rate.
Enumerators should log attempts, including details such as the date and time of the
attempt, the specific location involved, reasons for unsuccessful attempts, and any
additional comments or noteworthy observations. Additionally, enumerators should
execute any required follow-up communications, such as sending follow-up emails, and
log those as well. Real-time reporting or regular check-ins with the enumeration
supervisor is encouraged. Performing daily checks can ensure the completion of surveys
and validate participant responses, removing those who decline to participate or provide
invalid information. To gauge the overall progress, diligently track response rates and the
advancement of enumeration on a weekly basis. This comprehensive approach ensures
the smooth execution of the survey distribution process.
Supervisors should monitor progress by tracking the number of completed surveys and
identifying areas where enumeration may be falling behind schedule or experiencing
excess capacity. To ensure effective data collection, they should be prepared to pivot
and redistribute resources, such as reallocating enumerators from areas with lower
workloads to those in need or adjusting the deployment schedule as required. Since they
will work closely with enumerators, motivating the team fosters a positive work
environment, enhancing data quality and efficiency. Careful quality control throughout
enumeration provides a strong foundation for analysis and mitigate the risk of
inaccuracy.
V.1 3/2024 31
● Seek commitment or follow-up if the respondent is unavailable
● Consider asking neighbors for advice on availability
Interviewing
Effective interviewing during data collection relies on the enumerator’s confidence,
listening skills, empathy, and speech. When meeting respondents, the enumerator should
introduce themselves and the project courteously, confirm the address if relevant, and
ask for the respondent's full name. The enumerator should assure respondents that their
information is confidential. Language barriers can be addressed through translation by
another household member, so long as the respondent feels comfortable. If respondents
seem uncomfortable, the enumerator can emphasize incentives, skipping questions, and
the importance of participation. With reluctant respondents, the enumerator can leave
contact information to complete online at a different time. Instruct enumerators to keep
accurate records of their calls or visits, including date, time, location (if relevant), and
any issues encountered during data collection.
Safety
Enumerator safety is a top priority and enumerators should be aware of potential
hazards and take steps to avoid and mitigate risks, especially when conducting
door-to-door enumeration. Interviews should only occur where the enumerator feels
safe and comfortable, including their mental health and safety.
V.1 3/2024 32
Scenario: Trust building
The Sulingituk team knows that many members have trust issues with sharing their
personal data and information with the Sulingituk Government – and this was
emphasized strongly by participants in the questionnaire testing phase.
Accordingly, Sulingituk implements several measures to address these concerns to
build trust and encourage participation in the Census.
● Training and strict protocols: Census staff are trained to follow strict
protocols to protect respondents’ privacy. This training emphasizes building
trust with the community and adhering to ethical data collection practices.
● Policy: The data collection policy is made public and shared with any
interested respondent. This policy describes controls for data access and
measures to ensure privacy and confidentiality.
V.1 3/2024 33
Processing data
Once the data have been collected and the enumeration phase is closed, analysts
should “clean” the data. Data cleaning means identifying and correcting errors and
inconsistencies. Then, analysts can move onto data processing, which transforms survey
responses obtained during collection into a form that is suitable for tabulation and data
analysis.
Data cleaning
Before analyzing data, it is important to ensure that they contain no obvious errors or
unusable responses. Some of the most common forms of erroneous survey data are:
● Duplicated entries,
For example, if a survey is enumerated online then a user may accidentally submit their
responses twice, leading to a duplicate entry in the data. It is important to remove
V.1 3/2024 34
duplicate entries before analysis as otherwise the opinions of the person who
accidentally submitted their survey twice will have an inordinately large impact on the
results. Duplicated entries are easy to deal with: simply remove one of the copies of the
response from the data.
Incomprehensible and missing responses can cause more difficulties. It may be possible
to remove such responses from the data, but removing data increases the margin of
error and, therefore, leads to less precise results. In many cases it is preferable to impute
missing values, i.e. to insert an informed guess of what the missing value should be. The
most common form of imputation involves using responses to other, similar questions by
the same respondent to guess the most likely response. For example, one could
randomly select a response from among those with the same age, gender, and city of
residence as the respondent who is missing a response. There are many more complex
means of imputing data, with entire textbooks devoted to the subject. One such textbook
is freely available online for those interested in the many imputation methods available.
During data processing, the data cleaning team encounters a situation where
they discover duplicate names with different birthdates and the use of nicknames
that match birth dates but do not align with legal names on the membership list.
These discrepancies raise concerns about the accuracy of the data, as the data
cleaning team lacks the necessary community knowledge to verify the identity of
these individuals in the membership list.
To address this issue, the data cleaning team compiles a list of questionable
entries and organizes a meeting with the enumeration team, all of whom are
Sulingituk community members. During the meeting, they collaboratively work
through the list, confirming the legal names of each person and resolving
discrepancies. This effort aims to ensure the accuracy and integrity of the
membership data, fostering a more reliable foundation for the analysis phase.
Weighting
It is common, particularly in a census, to not fully reach the target population or sample
size. When processing data where the target population was not reached,
post-stratification or weighting techniques can be used. Post-stratification techniques
adjust survey data to align with the known target population demographics, reducing
V.1 3/2024 35
bias and improving generalizability. Weighting techniques, which assign differential
weights to survey responses based on specific criteria, such as age, gender, and
geographic location, aim to ensure results reflect the overall target population.
Weighting survey data is slightly more expensive than using unweighted data. Moreover,
it introduces additional uncertainty into the survey results. A survey that is focused on
soliciting specific feedback may not need to be weighted because it is safe to assume
that people who feel strongly about the subject are quite likely to respond. Nonetheless,
there are many cases in which weighting is worthwhile.
● Response rates are much higher in one demographic group than in others.
○ Suppose two-thirds of respondidents are over the age of 50 but more
than half of those being surveyed are less than 50 years old. Weighting
data by age ensures that the views of younger people are represented
fairly.
● A reliable and complete source of data that overlaps with survey responses is
available.
○ Ultimately, weighting relies upon comparing survey data to another
data source. If it is not possible to compare demographic responses in
the survey to an alternative, trusted source of demographic data then
it is not possible to apply weights.
Analysis
Data analysis involves summarizing the data and interpreting their meaning in a way
that provides clear answers to the questions that initiated the survey. Often, it consists of
interpreting tables and various summary measures, such as frequencies, means and
ranges, or more complex statistics when relevant.
V.1 3/2024 36
The analysis of the data should follow an analysis plan. The analysis plan is produced
prior to enumeration and shows how the data generated by the survey will meet the
information needs. It includes how the data generated by the survey will be processed,
which relationships will be examined, which statistical methods will be used to examine
those relationships, what criteria will be used to interpret the results, and how the results
will be reported. A successful analysis plan ensures that each aspect of the survey works
together to meet the objectives of the survey: that the variables used in the survey meet
the needs the analysis, that the planned survey sample will meet the needs of the
statistical methods, that the outputs of the statistical methods have objective criteria
they can be evaluated against, and that the results will meet the information needs of
the survey.
As part of the analysis plan, drafting mock-ups tables that show how the final results will
be reported is a good step to focus the team on the results that will be generated. These
tables serve as a visual representation of the expected outcomes and provide a clear
structure for presenting the survey findings. They can be designed to display the key
variables and relationships that will be examined in the analysis, as well as any subgroup
or demographic breakdowns that will be included. The mock-up tables also help to
ensure that the planned analyses will indeed produce the desired information, and they
can be used to identify any additional data processing or analysis steps that may be
needed. By visualizing the end product of the analysis in this way, the survey designers
can work more efficiently and effectively towards the survey objectives, and stakeholders
can have a clearer understanding of what to expect from the survey results.
In general, the development of the analysis plan and the data analysis itself should be
conducted by an external service provider that specializes in data analysis. Ideally, this
service provider will have been present throughout the process and so would have
guided other parts of the project to ensure consistency all the way through to data
analysis. There are certainly instances when SGIGs will have internal capacity to analyze
the data, but even then it’s not a bad idea to have external support to validate the
approach taken to analyzing the data.
V.1 3/2024 37
● Does the design of your research project actually answer what you set out to
understand?
● Are there relevant comparison populations?
● What context is important for understanding this result?
● Does the presentation of the data/analysis articulate the most important
information?
Scenario: Weighting
While analyzing the census results, a situation arises where the youth population
significantly outperformed other age groups in terms of census participation. This
has implications for data analysis, policy formulation, and community
engagement. The overrepresentation of youth is linked to their active use of social
media, where census-related posts garnered attention and motivated many
young residents to participate. This now affects indicators related to education,
employment, and substance use, because results are skewed due to their high
participation.
The Sulingituk Government goes back to the core purpose of their census – to
obtain data to inform planning, decision-making, and budgeting. From this
perspective, they recognize that a representative report would provide a more
wholistic and representative understanding crucial for effective planning, policy
formulation, and resource allocation.
V.1 3/2024 38
Reporting
Reporting is about taking all of the work that’s been done and making it useful for
different audiences. Importantly, reporting is one of the major cost-drivers of the project
overall, so understanding your reporting requirements and then translating those into a
set of products that meets those requirements and budget is key. The initial set of reports
that will be generated from the dataset should largely be determined by the work done
earlier in the project timeline. The questionnaire was developed with a specific set of
statistics in mind, the impetus for the project overall would have been clearly articulated,
and the government’s specific priorities that lead to the desire to conduct the data
collection exercise in the first place should all be answered in an initial set of reporting
products. That said, new reporting requirements will invariably emerge through the
course of the project. Those new reporting requirements should be scoped through a
planning process. The planning process does not need to be too involved, but at a
minimum, the following questions should be answered:
● What are the key objectives and intended uses for reporting?
● Who are the target audiences and what formats are likely to resonate most?
● What is the budget and timeline that can be allocated to this product?
Reporting products end up being the most substantive legacy of the entire project. When
people talk about the project in the future, they’ll be thinking about the reporting
products that were generated.
Summary reporting
Written report
Usually, the form of the final report will be confirmed in the course of the project. It could
include one final major report composed of subsections that can stand alone, multiple
shorter summary reports concisely compiling key findings on major topics into brief
V.1 3/2024 39
documents, or some combination of the two. Regardless of the form of the final report,
topic-based summaries are useful tools for gathering insights into findings on core
topics important to the SGIG. Summary reports can provide vital evidence to support
funding requests, development projects, and supportive programming across
departments. For instance, these summaries could cover topic areas such as cultural
vitality and identity, governance and jurisdiction, health and wellness, labour market
indicators, housing, and environmental stewardship, depending on the content of the
questionnaire and interest.
V.1 3/2024 40
Interactive web tools
Immersive and interactive web-based tools are a way to present findings in a visually
appealing and user-friendly manner. These tools can provide custom insights and allow
users to explore specific indicators through charts, graphs, gauges, maps, and filters.
Some of the interactive web tools that can be produced using collected data to support
SGIG priorities are:
V.1 3/2024 41
Internal analytics portal: A business intelligence platform providing staff with powerful
tools to quickly answer research questions related to key planning needs. The platform
would allow staff to gain insights into the SGIG’s needs and priorities, identify service
gaps, and make data-driven decisions.
The deliverables could range from basic cross-tabulation summaries exported into Excel
to more advanced integrated data visualizations and reports. While more complex ad
hoc requests may require deeper research paired with the existing data inputs, creating
a centralized process can facilitate rapid turnaround to address emergent needs.
The knowledge transfer workshops could also focus on practical applications of the
insights gained from the project. Attendees can learn how to use the insights to inform
program planning and decision-making processes. The knowledge transfer workshops
aim to provide SGIG staff with the necessary skills and knowledge to use data for
program planning and decision-making.
V.1 3/2024 42
Scenario: Reporting
At the beginning of the census project, the Sulingituk Government chose a set of
key health and wellness indicators, which included asking respondents about their
rates of substance use. The data showed elevated rates of substance use among
older adult males. Sulingituk Government wants to report these data to the
community, but does not want to reinforce stereotypes that perpetuate racism, or
create stigma that disincentivizes people from seeking help.
● Presents clearly the rates of substance use among all age groups in a factual
manner.
● Describes the intention for using these data to tailor programs to the
demographics in highest need for those programs.
● Invites older adult males who use substances to review the draft report,
incorporating their views and perspectives prior to releasing the report to the
community.
V.1 3/2024 43
External support and RFP guidelines
Should you recruit external support?
When considering the hiring of external support for primary data collection, several
factors come into play, including the project’s scope and objectives, the time and
capacity of any in-house expertise, budget, project timelines, and the existence and
availability of trusted service providers, among others. Primary data collection is made
up of several activities, and it’s possible to go to market on some, but not all, of these
activities, using one or several service providers:
● Scoping/Statement of objectives
● Questionnaire design
● Communications strategy
● Survey enumeration
Objectives
Background
● Detail the main activities or tasks that need to be completed as part of the project.
● Specify the project's timeline, including start and end dates, as well as any key
milestones.
V.1 3/2024 44
Deliverables
Mandatory criteria
● Define the criteria that will be used to evaluate service provider proposals and
indicate the relative importance of each criterion.
● Include any legal or contractual terms and conditions that service providers must
adhere to.
Evaluating proposals
SGIGs may follow their pre-existing evaluation process for RFPs while considering the
specific requirements for data collection. This includes evaluating the proposed
methods, including sampling techniques, data collection tools, and data quality
measures. Given the sensitivity of data in many projects, SGIGs should inquire about the
proposed data security measures. Questions should be raised regarding how data
privacy and confidentiality will be maintained throughout the data lifecycle, from
collection to analysis and storage, as well as data ownership.
Similarly, assess the proposed risk mitigation strategies and contingency plans. SGIGs
may also inquire about the proposed data quality assurance processes, including data
cleaning, validation, and verification. Understand how potential biases will be addressed
V.1 3/2024 45
and how data integrity will be maintained throughout the analysis. Regarding the
interview process, questions might revolve around the experience and expertise of the
proposing team, their track record in similar projects, and their approach to handling
potential data collection and analysis challenges.
● What software or tools do you use for data collection and data analysis, and
why do you choose these particular tools?
● What measures do you have in place to ensure the quality and integrity of the
data throughout data collection and analysis?
● How do you handle and store data securely during the project?
V.1 3/2024 46