Closed
Description
The OneHotEncoder should have an option to summarize categories that are not frequent - or we should have another transformer to do that beforehand.
Generally having a maximum number of categories or having a minimum frequency per category would make sense as thresholds. This is similar to what we're doing in CountVectorizer but I think common enough for categorical variables that we should explicitly make it easy to do.