The data set seems to be missing some very common words, such as "metal", plus a lot of simple words like "it" and "the".