Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aradzie/keybr.com-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

keybr.com-corpus

This repository is a bunch of scripts for developing word frequency dictionaries.

To build a word frequency dictionary, one needs a corpus of text. Various corpora can be obtained from https://opus.nlpl.eu/

We prefer:

  • Contemporary, simple, every day language.
  • Unbiased language that is not focused on any topic, such as politics or technology.
  • Language that is not vulgar, obscene or otherwise triggering.

The word frequency dictionaries are often built in collaboration with native speakers, who manually and carefully review the lists to remove any bad words.