-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Added vector datatype support in Oracle dialect #12321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@CaselIT I have created a new PR. This have only changes related to vector datatype. For fetch_type i will create a new PR. |
Thank you, we will take a look soon! |
a41d709
to
c3fed5d
Compare
Hi, is there any update on this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using dataclasses + enums for this index configuration since all the dictionary / hardcoded keywords / checks are too verbose for modern Python and place burden on the developer to copy and paste from docs, rather than allowing their IDE to help with most of it
I think we will need some more passes through the tests as well to clean things up
@CaselIT any comments?
if using == "HNSW": | ||
parts.append("ORGANIZATION INMEMORY NEIGHBOR GRAPH") | ||
elif using == "IVF": | ||
parts.append("ORGANIZATION NEIGHBOR PARTITIONS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if using == "XYZ" what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this validation stuff I think should be handled using dataclasses + enums up front for this config since this is very complex config
test/dialect/oracle/test_types.py
Outdated
Column("c1", VECTOR), | ||
) | ||
|
||
if testing.against("oracle>23.4"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a @testing.fails_if("oracle<=23.4")
decorator on top , and likely worked into test/requirements.py as something that can be re-used
test/dialect/oracle/test_types.py
Outdated
(1, [6, 7]), | ||
) | ||
else: | ||
with expect_raises_message(exc.DatabaseError, "ORA-03060"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need this part, fails_if
handles it
@@ -591,6 +591,228 @@ def _remove_clob(inputsizes, cursor, statement, parameters, context): | |||
|
|||
.. versionadded:: 2.0.0 added support for the python-oracledb driver. | |||
|
|||
VECTOR Datatype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the wrong file for general documentation, this should be in oracle/base.py
even if this only works with "oracledb" and not "cx_oracle", this is a SQL construct and is not driver-specific (it can be made to work with cx_oracle as well)
**top-level keys** and **nested keys**. This structure applies to both HNSW and IVF VECTOR indexes. | ||
|
||
Top-Level Keys | ||
============== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is almost definitely the wrong indentation level, ====
is likely level one header in these files
I would recommend doing a doc build
cd doc/builld
python -m venv .venv
.venv/bin/pip install -r requirements.txt
source .venv/bin/activate
make clean autobuild
|
||
* ``distance``: | ||
- Specifies the metric for calculating distance between VECTORS. | ||
- **Valid Values**: ``"EUCLIDEAN"``, ``"COSINE"``, ``"DOT"``, ``"MANHATTAN"``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you really want valid values like this I recommend using an enum.Enum
Top-Level Keys | ||
============== | ||
|
||
These keys are specified directly under the ``oracle_vector`` dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use a dataclass for this? This is a highly complex configuration, I'd use modern python constructs
@CaselIT I've incorporated the suggested feedback and updated the code accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 5505f00 of this pull request into gerrit so we can run tests and reviews and stuff
New Gerrit review created for change 5505f00: https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
just needs pep8. i will need to review more closely since this is a big change |
sorry we are moving slowly on this, can you install pre-commit and re-commit so pep8 fixes are implemented? |
5505f00
to
53d1fc4
Compare
@zzzeek I have made the changes and I hope pep8 issues are fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 53d1fc4 of this pull request into gerrit so we can run tests and reviews and stuff
Patchset 53d1fc4 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm proposing moving the parameter-specific documentation to be in VectorIndexConfig
and related classes, so that it will be much more cross-linked.
As this will greatly increase the line length of these classes, I propose that all of:
VECTOR
, VectorIndexConfig
, VectorDistanceType
, VectorIndexType
be moved to a new source file lib/sqlalchemy/dialects/oracle/vector.py since this is a big new subject of its own
When using Oracle VECTOR indexes, the configuration parameters are divided based on index type | ||
**HNSW** and **IVF**. This structure applies to both HNSW and IVF VECTOR indexes. | ||
|
||
Comman Attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
Also I think 99% of this section should be in the docstring for VectorIndexConfig
and not here in the main docstring
} | ||
|
||
|
||
class VectorIndexType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a docstring
IVF = "IVF" | ||
|
||
|
||
class VectorDistanceType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a docstring, docstrings for elements as well
|
||
|
||
@dataclass | ||
class VectorIndexConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a docstring also, but it also has to list out these params. since there is no __init__
method, let's try documenting the params here. Additionally, as they seem to be specific to various VectorIndexType
combinations, that should be here as well:
@dataclass
class VectorIndexConfig:
"""Defines the configuration for < accurate terminology here >
:param index_type: the <XYZ> - applies to :attr:`.VectorIndexType.HNSW`, <others>
:param distance: <QPR> - applies to :attr:`.VectorIndexType.FOO`, :attr:`.VectorIndexType.BAR`
# ... etc.
) | ||
|
||
|
||
def build_vector_index_config(vector_index_config: VectorIndexConfig) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an underscored method on OracleCompiler
|
||
* ``distance``: | ||
- Specifies the metric for calculating distance between VECTORS. | ||
- **Valid Values**: Enum values from `VectorDistanceType` (`EUCLIDEAN`, `COSINE`, `DOT`, `MANHATTAN`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so throughout all these docs, the classes have to be referred to like this, using sphinx paramlinks directives:
- **Valid Values**: Enum values from :class:`.VectorDistanceType`
- Specifies the indexing method. For HNSW, this must be :attr:`.VectorIndexType.HNSW`
Then these classes need to be added to https://github.com/sqlalchemy/sqlalchemy/blob/main/doc/build/dialects/oracle.rst using .. autoclass:
- VECTOR
should be added to the types section, then the index constructs will be in a new section Oracle DDL Constructs
which will have a dotted line underscore ------
. then they will be linked from these docs
* ``parallel``: | ||
- Specifies degree of parallelism | ||
|
||
HNSW Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the descriptions of these parameters will be in the docstring for VectorIndexConfig
, they dont need to be fully expanded upon here
} | ||
|
||
|
||
class VectorIndexType(Enum): | ||
HNSW = "HNSW" | ||
IVF = "IVF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the elements can have docstrings also
class VectorIndexType(Enum):
HNSW = "HNSW"
"""the HNSW index type.
Configuration parameters for this type include :paramref:`.VectorIndexConfig.distance`, :paramref:`.VectorIndexConfig.foobar`
"""
IVF = "IVF"
"""The IVF index type
etc
"""
it looks like a class called |
yes this line was missing from base.py file |
53d1fc4
to
a0b3fa8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision a0b3fa8 of this pull request into gerrit so we can run tests and reviews and stuff
Patchset a0b3fa8 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
- lib/sqlalchemy/dialects/oracle/base.py (line 1414): I just realized this method is copied from base compiler.py. All it does is add APPROX to the final result. Can we just use the super() implementation and simply inject APPROX right after the ROWS indicator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
- lib/sqlalchemy/dialects/oracle/base.py (line 1414): Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
some docs and test suggestions.
also reflection is not tested
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
- lib/sqlalchemy/dialects/oracle/base.py (line 740): 8bit unsigned is called binary? it may be worth mentioning it in parenthesis?
something like8-bit unsigned integers (binary)
- lib/sqlalchemy/dialects/oracle/base.py (line 1433): no need of
r
here - lib/sqlalchemy/dialects/oracle/vector.py (line 247): since this is used in a bind processor it may make sense to cache this mapping, either in the VECTOR class or at the module level
- lib/sqlalchemy/sql/selectable.py (line 3894): @mike I guess this kinda overlaps with Select.ext, but we don't have it available for 2.0.
I guess moving forward simple boolean modifiers of existing select features are fine as dialect kwargs, but more complex things would need to be extension, otherwise it could seem a bit arbitrary what is a dialect kwargs and what's an extension
|
||
vector_data_8 = [1, 2, 3] | ||
statement = insert(t1) | ||
with engine.connect() as conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
these 3 sections above would be nicer if word wrapped to ~ 80 chars.
it's just a preference, the text is otherwise fine
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
:param distance: Enum values from :class:`.VectorDistanceType` | ||
specifies the metric for calculating distance between VECTORS. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
I guess reflecting these is not easy, but we could just test that it doesn't break anything?
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
:param parallel: integer. Specifies degree of parallelism. | ||
|
||
:param hnsw_neighbors: interger. Should be in the range 0 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
values -> value
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
2048. Specifies the number of nearest neighbors considered | ||
during the search The attribute :attr:`VectorIndexConfig.hnsw_neighbors` | ||
is HNSW index specific. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
values -> value
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
0 to 10,000,000. Specifies the number of partitions used to | ||
divide the dataset. The attribute | ||
:attr:`VectorIndexConfig.ivf_neighbour_partitions` is IVF index | ||
specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
missing period after search
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
""" | ||
|
||
index_type: VectorIndexType = VectorIndexType.HNSW | ||
distance: Optional[VectorDistanceType] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
missing _ in the param name
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
distance: Optional[VectorDistanceType] = None | ||
accuracy: Optional[int] = None | ||
hnsw_neighbors: Optional[int] = None | ||
hnsw_efconstruction: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
missing _ in the param name
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
ivf_sample_per_partition: Optional[int] = None | ||
ivf_min_vectors_per_partition: Optional[int] = None | ||
parallel: Optional[int] = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
I'm guessing this is missing
The attribute :attr:
VectorIndexConfig.ivf_min_vectors_per_partition
is IVF index specific.
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
should be an integer value. | ||
|
||
:param storage_format: VectorStorageFormat. The VECTOR storage | ||
type format. This may be Enum values form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
do we need to make some changes to support reflection? if it already works a test that leverages it would be nice to have
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
i think we definitely would need additional logic to support reflection since the type accepts arguments, and one of them is quite complex. The VECTOR would have to be added to the map of type lookups to even come back from reflection so yes, this is not done at all
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
test/dialect/oracle/test_types.py
Outdated
@@ -951,6 +958,194 @@ def test_longstring(self, metadata, connection): | |||
finally: | |||
exec_sql(connection, "DROP TABLE Z_TEST") | |||
|
|||
@testing.fails_if("oracle<=23.4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Federico Caselli (CaselIT) wrote:
would only_on be better here (also in the others)? it makes little sense to try it on older versions since we know vector is not there
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
@suraj-ora-2020 the change was changed on gerrit, so either me or mike will address the comments. Thanks for the contribution |
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL support to fully support this type for Oracle Database. This change includes the base :class:`_oracle.VECTOR` type that adds new type-specific methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as new parameters ``oracle_vector`` for the :class:`.Index` construct, allowing vector indexes to be configured, and ``oracle_fetch_approximate`` for the :meth:`.Select.fetch` clause. Pull request courtesy Suraj Shaw. Fixes: sqlalchemy#12317 Fixes: sqlalchemy#12341 Closes: sqlalchemy#12321 Pull-request: sqlalchemy#12321 Pull-request-sha: a0b3fa8 Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 0a471c3 of this pull request into gerrit so we can run tests and reviews and stuff
Patchset 0a471c3 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL support to fully support this type for Oracle Database. This change includes the base :class:`_oracle.VECTOR` type that adds new type-specific methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as new parameters ``oracle_vector`` for the :class:`.Index` construct, allowing vector indexes to be configured, and ``oracle_fetch_approximate`` for the :meth:`.Select.fetch` clause. Pull request courtesy Suraj Shaw. Fixes: sqlalchemy#12317 Fixes: sqlalchemy#12341 Closes: sqlalchemy#12321 Pull-request: sqlalchemy#12321 Pull-request-sha: a0b3fa8 Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8 Added vector datatype support in Oracle dialect sqlalchemy#12321 Added vector datatype support in Oracle dialect#12321 Added vector datatype support in Oracle dialect#12321
0a471c3
to
22c34f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 22c34f6 of this pull request into gerrit so we can run tests and reviews and stuff
Patchset 22c34f6 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
|
||
@dataclass | ||
class VectorIndexConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
f"{field} must be an integer if" | ||
f"provided, got {type(value).__name__}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL support to fully support this type for Oracle Database. This change includes the base :class:`_oracle.VECTOR` type that adds new type-specific methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as new parameters ``oracle_vector`` for the :class:`.Index` construct, allowing vector indexes to be configured, and ``oracle_fetch_approximate`` for the :meth:`.Select.fetch` clause. Pull request courtesy Suraj Shaw. Fixes: sqlalchemy#12317 Fixes: sqlalchemy#12341 Closes: sqlalchemy#12321 Pull-request: sqlalchemy#12321 Pull-request-sha: a0b3fa8 Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8 Added vector datatype support in Oracle dialect sqlalchemy#12321 Added vector datatype support in Oracle dialect#12321 Added vector datatype support in Oracle dialect#12321 updated to static link
22c34f6
to
78e1eb6
Compare
|
||
.. seealso:: | ||
|
||
`CREATE VECTOR INDEX <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-B396C369-54BB-4098-A0DD-7C54B3A0D66F>`_ - in the Oracle documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that a stable link? the previous link looked a lot better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its a redirecting lookup link so that it is not required to update the link with each new db release. With the earlier one it was required to update the link with every db release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
Oracle Database 23ai introduced a new VECTOR datatype for artificial intelligence | ||
and machine learning search operations. The VECTOR datatype is a homogeneous array | ||
of 8-bit signed integers, 8-bit unsigned integers (binary), 32-bit floating-point numbers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
from sqlalchemy import insert, select | ||
|
||
with engine.begin() as conn: | ||
conn.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
use_literal_execute_for_simple_int | ||
), | ||
**kw, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
.. versionadded:: 2.0.41 | ||
|
||
:param index_type: Enum value from :class:`.VectorIndexType` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
Specifies the indexing method. For HNSW, this must be | ||
:attr:`.VectorIndexType.HNSW`. | ||
|
||
:param distance: Enum value from :class:`.VectorDistanceType` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
:param ivf_sample_per_partition: integer. Should be between 1 | ||
and ``num_vectors / neighbor partitions``. Specifies the | ||
number of samples used per partition. The attribute | ||
:attr:`VectorIndexConfig.ivf_sample_per_partition` is IVF index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
:param ivf_min_vectors_per_partition: integer. From 0 (no trimming) | ||
to the total number of vectors (results in 1 partition). Specifies | ||
the minimum number of vectors per partition. The attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
this needs to be formatted as:
:attr:`.VectorIndexConfig.ivf_min_vectors_per_partition`
note period and quotes
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one is right
:attr:
.VectorIndexConfig.ivf_min_vectors_per_partition
:attr:
VectorIndexConfig.ivf_min_vectors_per_partition
In first there is period before VectorIndexConfig. If this one is correct then other places also needs to be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the period allows sphinx to do a search to get the correct namespace for the object, which in this case would be somehting like sqlalchemy.dialects.oracle.VectorIndexConfig
. that is, the period tells sphinx to substitute the full module path. if the parent document of this doc already sets up :module:
to be sqlalchemy.dialects.oracle
, then that's the default prefix in any case, but I tend not to rely on that in most cases since sphinx can be very unreliable.
background is at https://www.sphinx-doc.org/en/master/usage/domains/python.html#target-resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
|
||
def _array_typecode(self, typecode): | ||
""" | ||
Map storage format to array typecode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
move "typecode_map" to be a class level variable of the VECTOR class and name it with a leading underscore
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
VectorStorageFormat.FLOAT32: "f", # Float | ||
VectorStorageFormat.FLOAT64: "d", # Double | ||
} | ||
return typecode_map.get(typecode, "d") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
so this would read:
return self._typecode_map.get(typecode, "d")
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
@@ -951,6 +958,194 @@ def test_longstring(self, metadata, connection): | |||
finally: | |||
exec_sql(connection, "DROP TABLE Z_TEST") | |||
|
|||
@testing.only_on("oracle>=23.4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL support to fully support this type for Oracle Database. This change includes the base :class:`_oracle.VECTOR` type that adds new type-specific methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as new parameters ``oracle_vector`` for the :class:`.Index` construct, allowing vector indexes to be configured, and ``oracle_fetch_approximate`` for the :meth:`.Select.fetch` clause. Pull request courtesy Suraj Shaw. Fixes: sqlalchemy#12317 Fixes: sqlalchemy#12341 Closes: sqlalchemy#12321 Pull-request: sqlalchemy#12321 Pull-request-sha: a0b3fa8 Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8 Added vector datatype support in Oracle dialect sqlalchemy#12321 Added vector datatype support in Oracle dialect#12321 Added vector datatype support in Oracle dialect#12321 updated to static link some doc modification
78e1eb6
to
a72a18a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision a72a18a of this pull request into gerrit so we can run tests and reviews and stuff
Patchset a72a18a added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
""" | ||
|
||
def process(value): | ||
if isinstance(value, array.array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
def _array_typecode(self, typecode): | ||
""" | ||
Map storage format to array typecode. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
.. seealso:: | ||
|
||
`Using VECTOR Data | ||
<https://python-oracledb.readthedocs.io/en/latest/user_guide/vector_data_type.html>`_ - in the Oracle documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so we're going to point to the oracledb driver docs? OK. I will qualify the description here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything I need to do regarding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
code review left on gerrit
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
artifact
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
* :ref:`Index <genindex>` - Index for easy lookup of documentation topics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
artifact
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael Bayer (zzzeek) wrote:
Done
View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838
Michael Bayer (zzzeek) wrote: code review left on gerrit View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
Federico Caselli (CaselIT) wrote: Thanks for the update. Looks good for me now, thanks View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 |
Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 has been merged. Congratulations! :) |
Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5844 has been merged. Congratulations! :) |
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL support to fully support this type for Oracle Database. This change includes the base :class:`_oracle.VECTOR` type that adds new type-specific methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as new parameters ``oracle_vector`` for the :class:`.Index` construct, allowing vector indexes to be configured, and ``oracle_fetch_approximate`` for the :meth:`.Select.fetch` clause. Pull request courtesy Suraj Shaw. Fixes: #12317 Closes: #12321 Pull-request: #12321 Pull-request-sha: a72a18a Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8 (cherry picked from commit 1b780ce)
Description
This PR adds support for the VECTOR datatype in the Oracle dialect of SQLAlchemy, which corresponds to the vector data type introduced in Oracle Database 23ai. This new VECTOR type allows for efficient storage and retrieval of high-dimensional vector data, enabling SQLAlchemy users to leverage vector-based queries. By treating VECTOR as a "first-class" type, this integration makes it easier to work with Oracle’s vector-based capabilities directly in Python.
Fixes: #12317
Checklist
This pull request is:
must include a complete example of the issue. one line code fixes without an
issue and demonstration will not be accepted.
Fixes: #<issue number>
in the commit messageinclude a complete example of how the feature would look.
Fixes: #<issue number>
in the commit messageHave a nice day!