MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Jan 20, 2026 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Near-duplicate image detection using Locality Sensitive Hashing
Locality Sensitive Hashing, fuzzy-hash, min-hash, simhash, aHash, pHash, dHash。基于 Hash值的图片相似度、文本相似度
Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Serverless, lightweight, and fast vector database on top of DynamoDB
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Locality Sensitive Hashing (LSH) based recommendation system. Integrates with Redis and your own database.
Fast fuzzy text search
cross-architecture binary comparison database
Hashing-based network discovery from time series
An implementation of Locality sensitive hashing
Select a set of pairs to check in link prediction from the vast, sparse space of possible pairs
Recommending movies using Collaborative Filtering and Locality Sensitive Hashing in PySpark
Quora Questions Pair detection with Semantic Similarity
Hashing-based network alignment based on structural features
First story detection using shingling, LSH and graphical methods
Add a description, image, and links to the lsh topic page so that developers can more easily learn about it.
To associate your repository with the lsh topic, visit your repo's landing page and select "manage topics."