Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit bc44583

Browse files
claudevdmClaude
andauthored
Add rag enrichment and ingestion. (#33413)
* Add base VectorDatabaseTransform. * Add BigQueryVectorWriterConfig. * Add BigQueryVectorWriterConfig tests. * Allow overriding joinfn and custom types from EnrichmentSourceHandler. * Add BigQueryVectorSearchEnrichmentHandler. * Fix streaming test. * Add licence. * Fix project in vector search test and pydocs. * Fix bigquery streaming test. * Resolve open comments. Also fix batching logic when metadata restrictions are applied. * Resolve comments. * Fix bigquery ingestion default schema to work with Avro file loads. * Assert that embedding set in vector write * Call out RAG changes in CHANGES.md. * Add PR links to CHANGES.md. --------- Co-authored-by: Claude <[email protected]>
1 parent 1ac764a commit bc44583

File tree

13 files changed

+1760
-7
lines changed

13 files changed

+1760
-7
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
22
"comment": "Modify this file in a trivial way to cause this test suite to run",
3-
"modification": 2
3+
"modification": 3
44
}
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
22
"comment": "Modify this file in a trivial way to cause this test suite to run",
3-
"modification": 2
3+
"modification": 3
44
}

CHANGES.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
## New Features / Improvements
6969

7070
* X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
71+
* Add BigQuery vector/embedding ingestion and enrichment components to apache_beam.ml.rag (Python) ([#33413](https://github.com/apache/beam/pull/33413)).
7172
* Upgraded to protobuf 4 (Java) ([#33192](https://github.com/apache/beam/issues/33192)).
7273
* [GCSIO] Added retry logic to each batch method of the GCS IO (Python) ([#33539](https://github.com/apache/beam/pull/33539))
7374

@@ -110,6 +111,7 @@
110111
* Support OnWindowExpiration in Prism ([#32211](https://github.com/apache/beam/issues/32211)).
111112
* This enables initial Java GroupIntoBatches support.
112113
* Support OrderedListState in Prism ([#32929](https://github.com/apache/beam/issues/32929)).
114+
* Add apache_beam.ml.rag package with RAG types, base chunking, LangChain chunking and HuggingFace embedding components (Python) ([#33364](https://github.com/apache/beam/pull/33364)).
113115

114116
## Breaking Changes
115117

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
"""Enrichment components for RAG pipelines.
19+
This module provides components for vector search enrichment in RAG pipelines.
20+
"""

0 commit comments

Comments
 (0)