This note explains how to work with the Danish CPR numbers in R. The majority of the information contained here is based on this wonderful Wikipedia article.
I've made an R package to handle validation of Danish cpr numbers. It can be installed as follows:
# Requires the devtools package to be installed in order to work
devtools::install_github("ekstroem/DKcpr")The first 6 digits of the CPR number represent date-of-birth in the
format DDMMYY. Since some people can live longer than 100 years the
date does not uniquely specify the year that a person was born. For
example, could the string 101010 represent October 10th 2010 or
October 10th, 1910.
The 7th digit of the CPR number determines the century but the cut is
not trivial. Consequently, the date_of_birth() function returns the
date-of-birth as an R Date object in the format YYYY-MM-DD. As
input it accepts a vector of strings
For example:
library("DKcpr")
cpr <- c("1010104321", "1010978726", "2310450637", "1010978726")
date_of_birth(cpr)## [1] "2010-10-10" "1897-10-10" "1945-10-23" "1897-10-10"
We get NA if we enter CPR numbers that refer to illegal dates, or do not match the format, or contains text.
date_of_birth(c("3510104321", "2902191234", "1111111", "Curious George"))## Warning: 4 failed to parse.
## [1] NA NA NA NA
Working with the exact dates is easily done with the lubridate
package. For example, to extract the year we can use the year()
function:
library("lubridate")
dob <- date_of_birth(cpr)
year(dob)## [1] 2010 1897 1945 1897
The is_cpr() function can determine whether a CPR number is a valid
CPR number. It returns TRUE if it is a valid CPR number, FALSE if
it is not (i.e., is a date but does not fulfill the modulo 11 check), and NA if it is not a legal 10-digit number or date.
is_cpr(cpr)## [1] TRUE FALSE TRUE FALSE
The gender() function returns the gender of the individuals. 0 = female, 1 = male.
gender(cpr)## [1] 1 0 1 0