Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gosom/go-minhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction
======
This is an implementation of the Minhash algorithm as descibed 
in chapter 3 of Mining Massive Datasets ( http://infolab.stanford.edu/~ullman/mmds/ch3.pdf ).

Implementation is inspired from the python repository https://github.com/ekzhu/datasketch .

Usage
=====
Please see the example folder

There is also a naive benchmark between the datasketch python and this
Implementation

Go:
----
Similar: %f and Took %s 1 21.876983ms
Python:
----
Similar %f and Took %f ms 1.0 668.7448024749756

This around 33 times faster

Ofcourse this is not to compare python with go, I was just curious


TODO
====

- Add documentation comments
- Implementation of LSH
- Implementation of the SuperMinhash algorithm as defined https://arxiv.org/pdf/1706.05698.pdf
- Maybe parallelize the computation

About

An implementation of minhash in golang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published