-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
While using the boston_housing data set, a data set hosted by the Scikit-learn package and used to demo models on house price prediction, I came across a feature titled 'B'. This struck me as odd because all other features had been given descriptive names such as 'AGE' or 'TAX'. It turns out that B = 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town. I naively assumed, as this data was being hosted by a prestigious package, that these data were in the data set because they offer significant explanatory value, which would point to a strongly pervasive racist mentality in the population at the time. However, after reading the blog post attached below, it appears as though the data in the B feature of the Boston housing data set were manufactured in an attempt to encourage segregation of the races. If true, this would be strong evidence of systemic institutional racism and by continuing to use this fraudulent data we would be perpetuating the effect desired by the author. I hope you will agree that we would be doing the scientific literature a service by investigating this issue further and ultimately consigning this data to historic reference archives and not encouraging its use in modern research by hosting it.
I look forward to your response,
Jamie R. Sykes
https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8