Add semantic similarity to top level interface#1063
Conversation
2cc14b3 to
2787739
Compare
There was a problem hiding this comment.
Pull request overview
Connects the top-level query_search “similarity threshold” parameter to the semantic-memory search call so callers can filter semantic results by similarity/distance.
Changes:
- Passes
score_thresholdthrough to semantic memory search asmin_distance - Updates the mock-based test to assert the new threshold is forwarded to both episodic and semantic managers
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/memmachine/main/test_memmachine_mock.py | Strengthens assertions to verify the threshold is forwarded into both memory backends. |
| src/memmachine/main/memmachine.py | Wires the threshold into the semantic search call via min_distance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2787739 to
4aed52f
Compare
There was a problem hiding this comment.
Currently score is usable for reranker score in episodic memory.
min_distance (interpreted as minimum distance) means that we are including all of the worst results and excluding all of the best results. Lower distance is better. Higher similarity is better. Please rename.
To share score thresholds between different memory types using different databases, we should convert all distances/similarities to the same type of score i.e. use the same embedder (and for embedders, normalize the results from database queries to use the same formula for computation) or use the same reranker (easier).
See #1066 for a reference of formulas used for score computation.
4aed52f to
00bff33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| session_data=session_data, | ||
| set_metadata=set_metadata, | ||
| limit=limit, | ||
| distance_threshold=score_threshold, |
There was a problem hiding this comment.
We interpret score as higher is better right now.
Please either rename or redefine score or distance, and ensure that it works with rerankers which return 0-1.
There was a problem hiding this comment.
higher_is_better bool is now on SimilarityMetric
|
Suggestions were opened regarding unifying similarity system. |
Connects the similarity threshold of
memmachineto the similarity argument for semantic memory.This allows users to filter by similarity when using semantic memory.