Output of discord_data()
may return multiple rows per kin-pair if 'id' column has non-unique values
#6
Labels
The sample data built into the discord package contains 1200 rows of single-entered data from the NlsyLinks package containing height and weight for kin pairs. The column ‘extended_id’ is not a unique identifier for kin pairs, but rather a family (or similar grouping) identifier. For a family with three kin, we would see something like the following (from NlsyLinks):
discord_data()
requires the id variable to be a “unique kinship pair identifier”, meaning the extended id from NlsyLinks will not work. This causes an issue where the output of thediscord_data()
could return multiple rows per kin-pair. Consider the ‘Gen1Housemates’ subset of the sample_data. This has 233 pairs with overlapping extended ids:Calling
discord_data()
leads to additional rows being returned:However, if we specify a unique id ourselves, we get the expected output (note the number of rows in the print-out):
The text was updated successfully, but these errors were encountered: