-
Notifications
You must be signed in to change notification settings - Fork 0
WIP: Adding functions for tracking recall metrics #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
47b13bf to
ddfad58
Compare
ddfad58 to
0feed95
Compare
Not needed anymore
dhagberg-sf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanity check looks reasonable. I'd have to study internals to really dig in here on ownership/exception paths/etc.
README.md
Outdated
|
|
||
| Optionally, you can enable recall tracking to monitor the recall of your indexes. | ||
| These statistics are based on sampled queries and persist in memory, so they will | ||
| be lost on server restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this setting be changed while the server is running? Or does it require a restart to apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be changed at runtime. It doesn't require any shared libraries or modifying any settings that require a restart. I wanted to make sure it could be enabled/disabled easily and not incur any interuptions.
| if (recall_context == NULL) | ||
| { | ||
| recall_context = AllocSetContextCreate(TopMemoryContext, | ||
| "Vector Recall Tracking", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tab alignment weird in PR view, prob not in editor, not sure if project has standardized on space- or tab-indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses tabs and I've run into some weird formatting issues as well.
src/vector_recall.c
Outdated
| void | ||
| VectorRecallTrackerInit(VectorRecallTracker *tracker) | ||
| { | ||
| tracker->query_value = (Datum) 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like a pretty small set of operations, but is it worth having a null tracker and making this a noop if pg_track_recall is false? Same for UpdateDistance below?
| bool found; | ||
|
|
||
| /* Don't proceed if recall tracking is disabled or no hash table */ | ||
| if (!pgvector_track_recall || recall_stats_hash == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - this is the real potential overhead and we noop when false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd rather keep the checks minimal and not litter them in every function. Seems better to noop during the expensive things.
src/vector_recall.c
Outdated
| if (isnull) | ||
| continue; | ||
|
|
||
| distanceDatum = FunctionCall2Coll(distance_proc, collation, tracker->query_value, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a null distance_proc already handled in FunctionCall2Coll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not. I'll add some safety checks and logging incase that happens. As best I can tell distance_proc shouldn't ever be null but I'd rather be safe.
| * Track a vector query with safe recall estimation | ||
| */ | ||
| void | ||
| TrackVectorQuery(Relation index, VectorRecallTracker *tracker, FmgrInfo *distance_proc, Oid collation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done full Rust-style ownership checking in here but it does look like all resources allocated are properly closed/de-allocated in here.
They won't return results but we are at least exercising some of the code.
This PR adds recall tracking functionality to monitor the quality of approximate vector indexes without the performance overhead of exact searches. It is disabled by default and must be explicity enabled by a user. Recall tracking can also be disabled in the event of bugs or performance impacts.
What it does
Configuration
Usage
View recall summary with index names:
Get raw statistics:
Get statistics for a specific index:
SELECT pg_vector_recall_get(index_oid);Reset stats for a specific index:
SELECT pg_vector_recall_reset(index_oid);The recall estimation works by counting how many tuples in the table fall within the distance of the K-th result, providing a practical approximation of search quality without the cost of brute-force comparison.