TOPIC 3: DEMOGRAPHIC DATA AND METHODS
WHY IS DEMOGRAPHIC DATA IMPORTANT?
Demographic data collection
1. Research
a. Demographic context often important determinant (or control); e.g. household
composition and consumption, health, fertility
b. What causes demographic change?
2. Policy
a. How many schools?
b. How many teachers to train?
c. How to organize retirement plans?
d. Where to focus resources? Build roads, know damage after natural
catastrophes...
e. Update population projections
3. Business
a. Where to built supermarket?
b. How much to charge for life insurance?
Democratization of data
❖ Nowadays, everyone with access to internet can use demographic data.
❖ Some of the most important sources of this kind of data are (repositories): United
Nations Population Division (Demographic Yearbook), national statistics institutes,
CIA, World Bank, UNAIDS (HIV/AIDS Epidemic Update), United Nations
Development Programme (Human Development Report), United Nations
Environment Programme, Universities (e.g. World Wealth and Income database), etc.
On what concerns micro-level data (samples from censuses) some important sources
are: IPUMS and IPUMS International.
❖ You can even find historical data in websites like Gapminder (historical website that
stores data of the past 200 years), ClioInfra, Maddison project (they have estimated
GDP for all countries of the world for the past 200 years), etc.
❖ But where does this information derive from? From many different sources. The most
important are: census, registration, sample surveys and population register.
Data and visualization
OUTLINE FOR TODAY
1) Types of demographic data collections
● Census
● AdminIstrative data
○ Population registry;
○ Vital registrations;
● Sample Survey
2) Coverage and errors in demographic data
3) Advanced demographic techniques
● Indirect methods
● Capture-recapture
4) Data sources for you to use
TYPES OF DEMOGRAPHIC DATA COLLECTION
1. Types of demographic data collectors
A) DEMOGRAPHIC CENSUS
The primary source of data on population size, distribution, structure and characteristics is the
census of population. The major source of information on the population processes of births
and deaths is the registration of vital statistics, through population registers (few countries) or
sample surveys (developing nations). Administrative and historical data also provide a lot of
information (population changes at a local level, mobility, etc.) A demographic census is a
count of everyone in an area. Some of its characteristics:
❖ A count of everyone in an area
❖ At the same time
❖ Periodic (e.g. the UN recommends to take it every ten years, although in Spain is done
yearly)
❖ Compulsory participation: as a citizen, if you are told to participate in a census, you
must do it.
❖ Not restricted to demographic data, also social and economic
Carrying out a census is a massive and expensive operation. It involves from mapping
households, mobilizing & training enumerators (e.g. 6.5 million enumerators in China),
public campaigning (people being aware that a census would take place), compiling
questionnaires (a previous bureau usually decides the questions and might involve public and
private actors), as well as publish, analyze and disseminating the data. The United Nations
Statistic Division (2008) noted that "it requires mapping the entire country, mobilizing and
training enumerators (…) and analyzing the data". In practice, this does not mean that every
person is seen and interviewed, but that one adult in the household answers for all the people
living there.
Questions that are asked may vary from census to census and it depends on the political
interests on the political party in the government. Some questions are usually asked include:
name, year of birth, sex, race/ religion/ ethnicity, marital status, education, occupation,
employment, migration, housing, etc. The data collection can be carried out in different ways:
❖ Enumerator completion: it could make sure that the population was understanding the
question, since a person (census taker) would give another person the questionnaire
and ask him the questions directly.
❖ Respondent completion (self-enumeration): people were handed out the
questionnaires and filling them out themselves. Is usually the head of the household or
any adult in the household who answers the questionnaire.
➢ ‘Head of household’
➢ Any adult
➢ Every adult
❖ Anonymized (info separated from name→ name and ID is erased after census): it is
important.
❖ New: completion via internet (nowadays in developed world)
Short history of census taking
The earliest governments which carried out population census where those in the ancient
civilizations of Egypt, Babylonia, China and Rome. In this last empire, they were counted for
taxation, count of men and wealth for military purposes. Census were usually carried out to
see how many men were available for work or to know in terms of taxes, how much money
or wealth was available to be taxed. Also interesting is the census took by pre-Columbian
Incas with help of Quipus. They used some devices with many strings to collect the data.
Modern national censuses of Spain:
● 1768 by Conde de Aranda: 9.308.804 people counted
● 1787 census by Conde de Floridablanca and 1797 by Manual Godoy: information on
sex, age and marriage status, occupation, whether household head paid taxes
● Since the general commission of statistics was created in 1857, census have taken
place regularly: in 1857, 1860, 1877, 1887, 1897.
● Since 1900, every 10 years: national statistics offices were created and censuses took
place every 10 years.
Spain
The last census was carried out in 2011, a moment in which every European country also
carried it out. They all agreed more or less in a series of questions (to make studies
comparable across Europe). It was an initiative to promote a common census for Europe.
Even though census can be a source of national pride, they have traditionally generated a lot
of resistance from the population, in the sense that, for example in Germany, the last census
before 2011 was carried out in 1987, with very unreliable results.
They stop carrying out census because they were very unreliable and they required to change
the law. Due to this, once the law was changed, the results of 2011 were reliable.
B) VITAL STATISTICS/REGISTRATION
● Registro Civil: Birth, Death, Marriage and divorce, Adoptions, Name changes,
Nationalizations. However, how do we get information from migration? It is quite
imperfect or difficult.
Additional information (read)
Registration of vital events started as a chore of the church. Priests often recorded baptisms,
marriages and deaths and historical demographers have used the surviving records to
reconstruct the demographic history of parts of Europe. First, death records only included the
cause of the death but later in the 18th century the age of those dying was also included, but
people remained skeptical about the data recorded. Civil registration of births and deaths
became compulsory and an office of vital statistics was officially established by the English
government. In 1900 these certificates were standardized.
Although most nations have a system of birth and death registration that is separate from
census activities, a lot of countries maintain population registers, which are lists of all people
in the country and they also record birth, date, marriage and change of residence. This is
basically for administrative purposes but it is also helpful for demographic research.
C) POPULATION REGISTER
The Padrón Municipal allows you to have access to the public healthcare system in the place
where you live, as well as to cast your vote in that city. It is, thus, a very important source of
demographic data, especially for migration. The main problem is that the information is
limited by law. It includes:
• Family name, first name, gender, ID number, address
• (Highest degree / education)
• Birth date and place
• Date of settlement in a municipality
• Previous places of residence
• Date of moving out of a municipality
• Date of death
Data collections
Would you rather have a census or a population register? Why?
Probably a combination of both is better. However, a census tends to include more
information that a registration but it tends to be carried out every 10 years.
Recent developments
• Now Netherlands: ‘virtual census’: brings together all existing census and combines all
available data; one of the main advantages it may present is that with this unique census,
costs are lowered. The main problem (maybe) is that many resources are needed→ in other
countries, they may need to complement the data with other tools (i.e. information obtained
directly by the poorer layers of the society)
• Statistics office (CBS) links of all government data sources
• Population register; taxation; benefits; retirement; student loans; CBS surveys; etc.
EU census-taking
The way the census is carried out varies across
European states.
The traditional census consisted on the whole
population filling a document (i.e. UK)
The Scandinavian countries usually don’t carry
out an additional census, but rather use their own
data
FR: rolling census→ questionnaire handed to 5%
of population every year, so every 10 years, the
whole population is asked
Count of total population in Spain
In between and after the last census there are inter-censal calculation => departing from last
census (Ptc) => count of births and deaths & migration registers (out-migration & new
nationals)
Pt = Ptc + B – D + IN - OUT
*OUT: We get information of out migrants: out check of the register from the city they had
been living
• Produced twice per year and published on INE website: 1 January and 1 July
• Counter-check with next census
D) SAMPLE SURVEYS
● They are often used for research, either by a private institution or a researcher.
● It can contain any sort of question that the organization is interested in asking.
● It involves a sample of the population, which needs to be representative (of the
geographic unit under study).
● It is usually carried out as a part of the census, (like in the US, Spain) for obtaining
more extensive & continuous info
● This is how most demographic data for developing countries are collected (i.e. data of
interest: diseases, child mortality, economic factors...). Basically, in Western countries
the information is obtained by census.
● Some examples are the Continuous Household Survey (INE), the Gender and
Generations Programme (GGP), the European Social Survey or Demographic Health
Surveys (DHS).
→ DHS programme carried out by the US Aid Agency for International Development only
in the countries highlighted. It may concentrate on the spread of diseases (mainly AIDs) or
other areas of study.
European social survey
ADVANTAGES AND DISADVANTAGES OF CENSUS
COVERAGE OF CENSUS: DE JURE VERSUS DE FACTO POPULATION
De jure vs de facto population
• De Jure: legally belongs to an area (living there, owning land) present/absent. People are
counted in terms of their usual residence (where you spend most of the time of a year).
• De Facto: present on census day resident/transient/tourist; what the census numerator
counts. All the population in a territory at that concrete moment. It can be difficult to measure
it because of the movements of the population (i.e.: maybe in a holiday the de facto would be
much higher or lower than usual)
Errors and inconsistencies
1. Coverage errors: including undercount (people who are not counted) or overcount (people
who are counted more than once). Some examples or situations that arise from this
circumstance is:
● People missed by census => often differential undercount (refers to particular groups
of people that are more likely to be undercounted, as in the US black Americans are
more likely to be undercounted because they are most distrustful towards the
government; leads to biases in the data)
● People missing from registers (e.g. children in China).
● Survey non-responders (problem if those who don’t respond are somehow biased in
the sense that they have certain characteristics, i.e. if they had very high income and
don’t want to respond because of it). Some other examples are newborns until firstly
registred; migrants when don’t want to be found; homeless.
Problem: How do you know what you do not know or know wrong? It is useful to compare
data sources:
- Estimate what population should have been based on registers and compare with
actual census.
- Match people in census with surveyed people.
2. Content errors:
– It includes inaccurate responses
(people do not understand the
questions) or nonresponses to
particular questions.
– An example of an inaccurate
response is age heaping (should
start in multiples of 10 not of 5,
when people are not sure of their
real age and hence, lie when
they’re asked)
This problem is usually
encountered by demographers, and try by all means to minimize its negative effects
3. Inconsistencies: If any of the data in a census are collected on a simple basis, the sampling
error is introduced into the results. With any sample, differences are likely to exist between
the sampled group and the larger group. However, in a scientific sample, sampling error is
measured based on the mathematics of probability. Error can be controlled to a certain extent
and the higher the sample, the lower the error.
– Definitional: work, gender, race
– Demographic labels as social construct
Administrative data= often politics
It is often necessary to take into account that what is being measured and how it is measured
has historical roots but is also related to politics. Administrative data is usually related to
political interests. In this way, high stakes can lead to deliberate errors: for example, the US
Census determines budget and house of representatives (Congress) seats; or how censuses in
dictatorships/civil unrest are treated. Governments and groups with political interests are
usually accused of misusing data.
In the same line, demography consists of categorizations. Such categorization not always
naturally 'given' but also a socially produced 'artefact' as religion, race/ethnicity (as used in
the US, it is not a biological concept but rather a subjective and social one – how is Hispanic
a race?), kinship, employment or gender. The example of Peru during the 18th century is very
clear, as difficulties appeared when trying to categorize people into races which are socially
constructed.
Demography consists of categorizations
We don’t always fit in a categorization, as they are not always naturally ‘given’ but also a
socially produced ‘artefact’: Religion; Race/ ethnicity; Kinship; Employment; Gender
→ Race/ethnicity: Race (as used in US), Not a biological concept but rather a social concept
(is Hispanic a race? or should we call them Latinos?) , It’s also a subjective name used to
categorize;
E.g. mixed origin people must choose one race (or more)
Demographic labels
Demographic labels are related to political sensitivities in a context-specific setting. This
means that in some countries asking about religion, ethnicity or language is usually banned
because of the values or principles of the country they are being carried out at. It may also be
politically biased
● In France questions on religion, ethnicity and language in the census are forbidden by
law
● In 2006 Nigerian census no questions on religion and ethnicity after protests
● In the US questions on ethnicity and race in the census, but questions on religion are
not asked (forbidden)
3. ADVANCED DEMOGRAPHIC TECHNIQUES
How do we estimate demographic statistics when we don’t have good data/don’t trust the
data we have/we want to contrast the results we have?
Statistics have different data requirements!
● Inhabitants – Number of people in country X at date a in the year
● Crude birth rate = Number of births per 1,000 per year
○ Births in a year (flow data)
○ Average number of people in that year, usually at midyear (stock data)
● Life expectancy at age x = Expected years to live at age x assuming constant mortality
rates at each age
○ Deaths by ages x to x+n
○ People alive at age x to x+n
Number of inhabitants: census data vs vital registration vs population register
• What if only census data? --> e.g. linear interpolation (basic demographic technique):
P= Pc0 + (y – yc0) * (Pc1 – Pc0)/(yc1 – yc0)
Where c0 is first census, c1 is second census, P is population in year y (in between censuses)
Advanced demographic techniques are commonly used when there is some data available
already about what we are interested in but it has inconsistencies, it is incomplete (surveys or
census do not provide enough information; many countries lack good vital registration,
census data, no population register, etc.) or there are hidden populations that are not being
covered by this data. The main two techniques are indirect estimation and capture-recapture.
1. Indirect estimation
Indirect collection of data on demographic events of others and oneself in special surveys
(e.g. DHS). Indirect estimation is usually information we extract from people about their
family and environment. Questions about oneself but also about relatives (partner, siblings,
children, and/or parents). There is less people needed in the sample as a result. For example,
we have the orphanhood method or sibling histories.
Historical sources of demographic information include censuses and vital statistics, but the
lack of good data requires special work to locate birth records in churches and death records
in graveyards. A complete set of good local records for a small village may allow a
researcher to reconstruct the demographic profile over a period of years. Another source of
information is the family genealogies, the compilation of which has become increasingly
common.
The results of these labors can be of considerable importance in testing our knowledge about
how the world used to work. By quantifying our knowledge of past patterns of demographic
events we are also better able to interpret historical events in a meaningful fashion.
2. Capture-recapture to estimate hidden populations (article Srebrenica)
The authors use this method to estimate how many people died in the genocide of Srebrenica.
They used 2 different sources and joining them, tried to estimate better how many people
died.