-
Notifications
You must be signed in to change notification settings - Fork 183
Description
The textstat implementation of the Dale-Chall-Formula classifies several words as difficult words that the original Dale-Chall-Formula would not. For example, Scotland, returned, giants, giant's, strongest are returned as part of textstat.difficult_words_list(text), even though the base forms return, giant, strong are all part of the easy words list.
Dale and Chall (1948, p. 38-49) suggest that the following word forms should be considered familiar:
- names of persons and places
- regular plurals and possessives of words on the list
- the third-person, singular forms (s or ies from y), present-participle forms (ing), past-participle forms (n), and past-tense forms (ed or ied from y), when these are added to verbs appearing on the list
- comparatives and superlatives of adjectives appearing on the list
- adverbs familiar which are formed by adding ly to a word on the list
The complete list of rules can be found in Dale & Chall (1948).
I understand that most of these rules are not easy to implement for the textstat package, but to avoid confusion and maybe prompt users to check the list returned by textstat.difficult_words_list(text), the README could point out the deviation from the original Dale & Chall formula?
Source: Dale, E., & Chall, J. (1948). A Formula for Predicting Readability: Instructions. Educational Research Bulletin, 27(2), 37-54. Retrieved August 11, 2021, from http://www.jstor.org/stable/1473669