-
Notifications
You must be signed in to change notification settings - Fork 450
fix: Adding metadata to document chunks #3184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @dmartinol! In reference to your comment about |
Thanks for pointing this out @courtneypacheco ! |
091aa4b to
126286c
Compare
1ca00f7 to
db7ba31
Compare
db7ba31 to
f0b0830
Compare
|
@jwm4 updated the list of excluded fields as per your request |
f0b0830 to
6c499ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a substantial improvement. It would be nice to get this in as soon as possible.
cdoern
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Do we want a test for this? Or is that already covered?
Signed-off-by: Daniele Martinoli <[email protected]>
added UT, thanks |
Adding metadata to document chunks, following guidance from
docling-haystackpackage.Reference code here.
Note: we cannot integrate the package as-is because it depends on
docling = "^2.9.0"while we are forced todocling>=2.4.2,<=2.8.3frominstructlab-sdg.Metadata fields:
All the DocMeta fields apart from:
schema_name,versionanddoc_itemsIssue resolved by this Pull Request:
Closes #3192
Verifying the generated schema:
Sequence of commands to validate the schema of the default in-memory store:
Sample output (edited to show only the relevant fields):
And a snippet of a chunk metafdata from the JSON document:
Checklist:
conventional commits.