Thanks to visit codestin.com
Credit goes to github.com

Skip to content

question about subsampler #7

@majortomz

Description

@majortomz

subsampler = dict([(word, 1 - sqrt(subsample / count)) for word, count in six.iteritems(vocab) if count > subsample]) #subsampling technique

I am confused about the sub-sampler in corpus2pairs. I think 1 - sqrt(subsample / count) should be replaced with 1 - sqrt(subsample / (count / total_word_count_in_vocab)).

ps. I might misunderstand your implementation, and in actual implementation of original word2vec.c ,the subsample probability equals 1 - (sqrt(subsample / (count / total_word_count_in_vocab)) + subsample / (count / total_word_count_in_vocab) ).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions