Opensearch kwargs fix #914

baitsguy · 2024-10-11T22:59:56Z

When using @context_params, all kwargs specificied in the global Context would get passed into the reader.opensearch method. Those were being forwarded into the opensearchpy client. Changed the parameter to explicitly accept query_kwargs. I don't quiet know why we have kwargs anyway but don't want to mess with that

karanataryn

Few comments and suggestions, I'm not entirely sure this is the correct fix for the problem.

lib/sycamore/sycamore/tests/unit/connectors/opensearch/test_opensearch.py

lib/sycamore/sycamore/tests/integration/query/test_query_opensearch.py

lib/sycamore/sycamore/reader.py

* Fix kwargs in opensearch reader * simplify test assertion * lint * pr comments

* added ability to read schema from file * small typo Co-authored-by: Matt Welsh <[email protected]> * fixed two funtion refs that were modified * reformatted file with black * fixed schema file format (was json), added more exception handling * Fix anonymous reading in materialize and add rate limited logging. (#898) * Fix anonymous reading in materialize and add rate limited logging. * In materialize, try reading using the credentials, but if it doesn't work, fall back to reading anonymously if that seems to be working. * Add rate limited logging to reading via materialize in local mode. * Check for no root before checking if a source since that makes more sense. * switch ntsb_loader_materialized.py over to read in local mode, it was working (with the anonymous fix), but was very slow hence the logging. * Bump version to v0.1.23. (#903) * fix asdict in the reader too. duh (#907) Signed-off-by: Henry Lindeman <[email protected]> * Add text reprentation for empty tables (#909) * Refactor logical plan serialization. (#905) * Working on this. * Working on refactoring. * Tests pass - is such a thing even possible? * Fix tests. * Fix mypy. * Cleanup. * Fix NTSB examples. * A few tweaks to the query planner prompt, and a workaround in queryui/util.py. * Fix mypy. * seriously small performance improvement that matters when youre processing tens of thousands of tables (from training code) (#906) Signed-off-by: Henry Lindeman <[email protected]> * Handle opensearch reader doc resconstruction when no parent doc in results (#908) * Fix bug in entity extraction. (#911) * Notebooks like default-prep-script.ipynb would fail because the wrong way of generating the prompt would be used. * Rename test to match with name of file being tested. * Fix existing tests to verify parameters on all branches -- the reason the tests were passing was that it was taking the default branch in the test cases * Update all of the tests to directly call run rather than route everything through ray. * Enable copying of the hash context. (#910) * Enable copying of the hash context. * Address comments. * Add option to extract line-based bounding boxes from pdfminer. (#874) We have been using pdfminer's layout detection to group text into boxes. This can cause issues, especially with table extraction, when the boxes don't line up with cells or what we detect with the DETR model. This change adds support for an object_type parameter to the PdfMinerExtractor that can be set to "boxes" (the current behavior), or "lines", which groups characters into lines, but does not group them further. To avoid an explosion of options, we introduce a "text_extractor_options" dict as a paramter, and refactor the TextExtractor class hierarchy a bit to support it. * Support random sample in local mode. (#913) This transform isn't widely used, but still worth supporting in local model to bring it to parity. * Opensearch kwargs fix (#914) * Fix kwargs in opensearch reader * simplify test assertion * lint * pr comments * fix typo (#917) * Update using_jupyter.md (#902) * Update using_jupyter.md Update link * Fixed path --------- Co-authored-by: dtecuci <[email protected]> * Rebased. Added ability to read schema from file * rebased. small typo Co-authored-by: Matt Welsh <[email protected]> * rebased. reformatted file with black * resolved conflicts * changed schema file format to yaml * removed unused import * small typos fixed * fixed spacing --------- Signed-off-by: Henry Lindeman <[email protected]> Co-authored-by: Matt Welsh <[email protected]> Co-authored-by: Eric Anderson <[email protected]> Co-authored-by: Ben Sowell <[email protected]> Co-authored-by: Henry Lindeman <[email protected]> Co-authored-by: Dhruv Kaliraman <[email protected]> Co-authored-by: Vinayak Thapliyal <[email protected]> Co-authored-by: Alex Meyer <[email protected]> Co-authored-by: Karan Sampath <[email protected]> Co-authored-by: jonfritz <[email protected]>

baitsguy added 2 commits October 11, 2024 15:57

Fix kwargs in opensearch reader

73cd41b

simplify test assertion

87468b9

baitsguy requested a review from karanataryn October 11, 2024 22:59

baitsguy marked this pull request as ready for review October 11, 2024 23:00

lint

e94d6cf

karanataryn reviewed Oct 11, 2024

View reviewed changes

pr comments

15374ac

karanataryn approved these changes Oct 12, 2024

View reviewed changes

baitsguy enabled auto-merge (squash) October 12, 2024 00:36

baitsguy merged commit 3c7b3a3 into main Oct 12, 2024
10 of 11 checks passed

dtecuci pushed a commit that referenced this pull request Oct 14, 2024

Opensearch kwargs fix (#914)

0783984

* Fix kwargs in opensearch reader * simplify test assertion * lint * pr comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Opensearch kwargs fix #914

Opensearch kwargs fix #914

Uh oh!

baitsguy commented Oct 11, 2024 •

edited

Loading

Uh oh!

karanataryn left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Opensearch kwargs fix #914

Opensearch kwargs fix #914

Uh oh!

Conversation

baitsguy commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karanataryn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baitsguy commented Oct 11, 2024 •

edited

Loading

karanataryn left a comment •

edited

Loading