-
Notifications
You must be signed in to change notification settings - Fork 172
Open
Description
ngram2vec/ngram2vec/corpus2pairs.py
Line 58 in 6966b1c
| subsampler = dict([(word, 1 - sqrt(subsample / count)) for word, count in six.iteritems(vocab) if count > subsample]) #subsampling technique |
I am confused about the sub-sampler in corpus2pairs. I think 1 - sqrt(subsample / count) should be replaced with 1 - sqrt(subsample / (count / total_word_count_in_vocab)).
ps. I might misunderstand your implementation, and in actual implementation of original word2vec.c ,the subsample probability equals 1 - (sqrt(subsample / (count / total_word_count_in_vocab)) + subsample / (count / total_word_count_in_vocab) ).
Metadata
Metadata
Assignees
Labels
No labels