Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Added vector datatype support in Oracle dialect #12321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

suraj-ora-2020
Copy link
Contributor

@suraj-ora-2020 suraj-ora-2020 commented Feb 6, 2025

Description

This PR adds support for the VECTOR datatype in the Oracle dialect of SQLAlchemy, which corresponds to the vector data type introduced in Oracle Database 23ai. This new VECTOR type allows for efficient storage and retrieval of high-dimensional vector data, enabling SQLAlchemy users to leverage vector-based queries. By treating VECTOR as a "first-class" type, this integration makes it easier to work with Oracle’s vector-based capabilities directly in Python.

Fixes: #12317

Checklist

This pull request is:

  • A documentation / typographical / small typing error fix
    • Good to go, no issue or tests are needed
  • A short code fix
    • please include the issue number, and create an issue if none exists, which
      must include a complete example of the issue. one line code fixes without an
      issue and demonstration will not be accepted.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests. one line code fixes without tests will not be accepted.
  • A new feature implementation
    • please include the issue number, and create an issue if none exists, which must
      include a complete example of how the feature would look.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests.

Have a nice day!

@suraj-ora-2020
Copy link
Contributor Author

@CaselIT I have created a new PR. This have only changes related to vector datatype. For fetch_type i will create a new PR.

@CaselIT
Copy link
Member

CaselIT commented Feb 6, 2025

Thank you, we will take a look soon!

@edcuba
Copy link

edcuba commented Mar 3, 2025

Hi, is there any update on this? Thanks!

Copy link
Member

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using dataclasses + enums for this index configuration since all the dictionary / hardcoded keywords / checks are too verbose for modern Python and place burden on the developer to copy and paste from docs, rather than allowing their IDE to help with most of it

I think we will need some more passes through the tests as well to clean things up

@CaselIT any comments?

if using == "HNSW":
parts.append("ORGANIZATION INMEMORY NEIGHBOR GRAPH")
elif using == "IVF":
parts.append("ORGANIZATION NEIGHBOR PARTITIONS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using == "XYZ" what happens?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this validation stuff I think should be handled using dataclasses + enums up front for this config since this is very complex config

Column("c1", VECTOR),
)

if testing.against("oracle>23.4"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a @testing.fails_if("oracle<=23.4") decorator on top , and likely worked into test/requirements.py as something that can be re-used

(1, [6, 7]),
)
else:
with expect_raises_message(exc.DatabaseError, "ORA-03060"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need this part, fails_if handles it

@@ -591,6 +591,228 @@ def _remove_clob(inputsizes, cursor, statement, parameters, context):

.. versionadded:: 2.0.0 added support for the python-oracledb driver.

VECTOR Datatype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the wrong file for general documentation, this should be in oracle/base.py

even if this only works with "oracledb" and not "cx_oracle", this is a SQL construct and is not driver-specific (it can be made to work with cx_oracle as well)

**top-level keys** and **nested keys**. This structure applies to both HNSW and IVF VECTOR indexes.

Top-Level Keys
==============
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is almost definitely the wrong indentation level, ==== is likely level one header in these files

I would recommend doing a doc build

cd doc/builld
python -m venv .venv
.venv/bin/pip install -r requirements.txt
source .venv/bin/activate
make clean autobuild


* ``distance``:
- Specifies the metric for calculating distance between VECTORS.
- **Valid Values**: ``"EUCLIDEAN"``, ``"COSINE"``, ``"DOT"``, ``"MANHATTAN"``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you really want valid values like this I recommend using an enum.Enum

Top-Level Keys
==============

These keys are specified directly under the ``oracle_vector`` dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use a dataclass for this? This is a highly complex configuration, I'd use modern python constructs

@suraj-ora-2020
Copy link
Contributor Author

@CaselIT I've incorporated the suggested feedback and updated the code accordingly.

@suraj-ora-2020 suraj-ora-2020 requested a review from zzzeek April 8, 2025 09:31
@zzzeek zzzeek requested review from CaselIT and sqla-tester April 8, 2025 11:15
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 5505f00 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

New Gerrit review created for change 5505f00: https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@zzzeek
Copy link
Member

zzzeek commented Apr 8, 2025

just needs pep8. i will need to review more closely since this is a big change

@zzzeek
Copy link
Member

zzzeek commented Apr 10, 2025

sorry we are moving slowly on this, can you install pre-commit and re-commit so pep8 fixes are implemented?

@suraj-ora-2020
Copy link
Contributor Author

suraj-ora-2020 commented Apr 10, 2025

@zzzeek I have made the changes and I hope pep8 issues are fixed.

@zzzeek zzzeek requested a review from sqla-tester April 10, 2025 14:29
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 53d1fc4 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 53d1fc4 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Member

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm proposing moving the parameter-specific documentation to be in VectorIndexConfig and related classes, so that it will be much more cross-linked.

As this will greatly increase the line length of these classes, I propose that all of:

VECTOR, VectorIndexConfig, VectorDistanceType, VectorIndexType be moved to a new source file lib/sqlalchemy/dialects/oracle/vector.py since this is a big new subject of its own

When using Oracle VECTOR indexes, the configuration parameters are divided based on index type
**HNSW** and **IVF**. This structure applies to both HNSW and IVF VECTOR indexes.

Comman Attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Also I think 99% of this section should be in the docstring for VectorIndexConfig and not here in the main docstring

}


class VectorIndexType(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a docstring

IVF = "IVF"


class VectorDistanceType(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a docstring, docstrings for elements as well



@dataclass
class VectorIndexConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a docstring also, but it also has to list out these params. since there is no __init__ method, let's try documenting the params here. Additionally, as they seem to be specific to various VectorIndexType combinations, that should be here as well:

@dataclass
class VectorIndexConfig:
     """Defines the configuration for < accurate terminology here >

      :param index_type: the <XYZ> - applies to :attr:`.VectorIndexType.HNSW`, <others>
      :param distance: <QPR> - applies to :attr:`.VectorIndexType.FOO`, :attr:`.VectorIndexType.BAR`
      # ... etc.

)


def build_vector_index_config(vector_index_config: VectorIndexConfig) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an underscored method on OracleCompiler


* ``distance``:
- Specifies the metric for calculating distance between VECTORS.
- **Valid Values**: Enum values from `VectorDistanceType` (`EUCLIDEAN`, `COSINE`, `DOT`, `MANHATTAN`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so throughout all these docs, the classes have to be referred to like this, using sphinx paramlinks directives:

- **Valid Values**: Enum values from :class:`.VectorDistanceType`

- Specifies the indexing method. For HNSW, this must be :attr:`.VectorIndexType.HNSW`

Then these classes need to be added to https://github.com/sqlalchemy/sqlalchemy/blob/main/doc/build/dialects/oracle.rst using .. autoclass: - VECTOR should be added to the types section, then the index constructs will be in a new section Oracle DDL Constructs which will have a dotted line underscore ------. then they will be linked from these docs

* ``parallel``:
- Specifies degree of parallelism

HNSW Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the descriptions of these parameters will be in the docstring for VectorIndexConfig, they dont need to be fully expanded upon here

}


class VectorIndexType(Enum):
HNSW = "HNSW"
IVF = "IVF"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the elements can have docstrings also

class VectorIndexType(Enum):
    HNSW = "HNSW"
    """the HNSW index type. 

    Configuration parameters for this type include :paramref:`.VectorIndexConfig.distance`, :paramref:`.VectorIndexConfig.foobar`
   
   """

   IVF = "IVF"
   """The IVF index type

   etc

   """

@zzzeek
Copy link
Member

zzzeek commented Apr 10, 2025

it looks like a class called VectorStorageFormat has gotten lost

@suraj-ora-2020
Copy link
Contributor Author

suraj-ora-2020 commented Apr 10, 2025

it looks like a class called VectorStorageFormat has gotten lost

yes this line was missing from base.py file .types import VectorStorageFormat . And thanks for reviewing I will update the PR as per your suggestion.

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision a0b3fa8 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset a0b3fa8 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

  • lib/sqlalchemy/dialects/oracle/base.py (line 1414): I just realized this method is copied from base compiler.py. All it does is add APPROX to the final result. Can we just use the super() implementation and simply inject APPROX right after the ROWS indicator

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

  • lib/sqlalchemy/dialects/oracle/base.py (line 1414): Done

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

some docs and test suggestions.

also reflection is not tested

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

  • lib/sqlalchemy/dialects/oracle/base.py (line 740): 8bit unsigned is called binary? it may be worth mentioning it in parenthesis?
    something like 8-bit unsigned integers (binary)
  • lib/sqlalchemy/dialects/oracle/base.py (line 1433): no need of r here
  • lib/sqlalchemy/dialects/oracle/vector.py (line 247): since this is used in a bind processor it may make sense to cache this mapping, either in the VECTOR class or at the module level
  • lib/sqlalchemy/sql/selectable.py (line 3894): @mike I guess this kinda overlaps with Select.ext, but we don't have it available for 2.0.

I guess moving forward simple boolean modifiers of existing select features are fine as dialect kwargs, but more complex things would need to be extension, otherwise it could seem a bit arbitrary what is a dialect kwargs and what's an extension


vector_data_8 = [1, 2, 3]
statement = insert(t1)
with engine.connect() as conn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

these 3 sections above would be nicer if word wrapped to ~ 80 chars.

it's just a preference, the text is otherwise fine

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


:param distance: Enum values from :class:`.VectorDistanceType`
specifies the metric for calculating distance between VECTORS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

I guess reflecting these is not easy, but we could just test that it doesn't break anything?

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


:param parallel: integer. Specifies degree of parallelism.

:param hnsw_neighbors: interger. Should be in the range 0 to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

values -> value

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

2048. Specifies the number of nearest neighbors considered
during the search The attribute :attr:`VectorIndexConfig.hnsw_neighbors`
is HNSW index specific.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

values -> value

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

0 to 10,000,000. Specifies the number of partitions used to
divide the dataset. The attribute
:attr:`VectorIndexConfig.ivf_neighbour_partitions` is IVF index
specific.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

missing period after search

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

"""

index_type: VectorIndexType = VectorIndexType.HNSW
distance: Optional[VectorDistanceType] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

missing _ in the param name

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

distance: Optional[VectorDistanceType] = None
accuracy: Optional[int] = None
hnsw_neighbors: Optional[int] = None
hnsw_efconstruction: Optional[int] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

missing _ in the param name

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

ivf_sample_per_partition: Optional[int] = None
ivf_min_vectors_per_partition: Optional[int] = None
parallel: Optional[int] = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

I'm guessing this is missing

The attribute :attr:VectorIndexConfig.ivf_min_vectors_per_partition is IVF index specific.

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

should be an integer value.

:param storage_format: VectorStorageFormat. The VECTOR storage
type format. This may be Enum values form
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

do we need to make some changes to support reflection? if it already works a test that leverages it would be nice to have

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

i think we definitely would need additional logic to support reflection since the type accepts arguments, and one of them is quite complex. The VECTOR would have to be added to the map of type lookups to even come back from reflection so yes, this is not done at all

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@@ -951,6 +958,194 @@ def test_longstring(self, metadata, connection):
finally:
exec_sql(connection, "DROP TABLE Z_TEST")

@testing.fails_if("oracle<=23.4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

would only_on be better here (also in the others)? it makes little sense to try it on older versions since we know vector is not there

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@CaselIT
Copy link
Member

CaselIT commented Apr 27, 2025

@suraj-ora-2020 the change was changed on gerrit, so either me or mike will address the comments. Thanks for the contribution

suraj-ora-2020 added a commit to suraj-ora-2020/sqlalchemy that referenced this pull request Apr 28, 2025
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL
support to fully support this type for Oracle Database. This change
includes the base :class:`_oracle.VECTOR` type that adds new type-specific
methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as
new parameters ``oracle_vector`` for the :class:`.Index` construct,
allowing vector indexes to be configured, and ``oracle_fetch_approximate``
for the :meth:`.Select.fetch` clause.  Pull request courtesy Suraj Shaw.

Fixes: sqlalchemy#12317
Fixes: sqlalchemy#12341
Closes: sqlalchemy#12321
Pull-request: sqlalchemy#12321
Pull-request-sha: a0b3fa8

Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8
@CaselIT CaselIT requested a review from sqla-tester April 30, 2025 13:47
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 0a471c3 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 0a471c3 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

suraj-ora-2020 added a commit to suraj-ora-2020/sqlalchemy that referenced this pull request Apr 30, 2025
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL
support to fully support this type for Oracle Database. This change
includes the base :class:`_oracle.VECTOR` type that adds new type-specific
methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as
new parameters ``oracle_vector`` for the :class:`.Index` construct,
allowing vector indexes to be configured, and ``oracle_fetch_approximate``
for the :meth:`.Select.fetch` clause.  Pull request courtesy Suraj Shaw.

Fixes: sqlalchemy#12317
Fixes: sqlalchemy#12341
Closes: sqlalchemy#12321
Pull-request: sqlalchemy#12321
Pull-request-sha: a0b3fa8

Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8

Added vector datatype support in Oracle dialect sqlalchemy#12321

Added vector datatype support in Oracle dialect#12321

Added vector datatype support in Oracle dialect#12321
@suraj-ora-2020
Copy link
Contributor Author

@CaselIT @zzzeek Thanks for the feedback! I've made all the requested changes. Let me know if anything else needs updating.

@CaselIT CaselIT requested a review from sqla-tester April 30, 2025 19:07
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 22c34f6 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 22c34f6 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838



@dataclass
class VectorIndexConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

f"{field} must be an integer if"
f"provided, got {type(value).__name__}"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

suraj-ora-2020 added a commit to suraj-ora-2020/sqlalchemy that referenced this pull request May 2, 2025
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL
support to fully support this type for Oracle Database. This change
includes the base :class:`_oracle.VECTOR` type that adds new type-specific
methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as
new parameters ``oracle_vector`` for the :class:`.Index` construct,
allowing vector indexes to be configured, and ``oracle_fetch_approximate``
for the :meth:`.Select.fetch` clause.  Pull request courtesy Suraj Shaw.

Fixes: sqlalchemy#12317
Fixes: sqlalchemy#12341
Closes: sqlalchemy#12321
Pull-request: sqlalchemy#12321
Pull-request-sha: a0b3fa8

Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8

Added vector datatype support in Oracle dialect sqlalchemy#12321

Added vector datatype support in Oracle dialect#12321

Added vector datatype support in Oracle dialect#12321

updated to static link

.. seealso::

`CREATE VECTOR INDEX <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-B396C369-54BB-4098-A0DD-7C54B3A0D66F>`_ - in the Oracle documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that a stable link? the previous link looked a lot better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its a redirecting lookup link so that it is not required to update the link with each new db release. With the earlier one it was required to update the link with every db release.

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


Oracle Database 23ai introduced a new VECTOR datatype for artificial intelligence
and machine learning search operations. The VECTOR datatype is a homogeneous array
of 8-bit signed integers, 8-bit unsigned integers (binary), 32-bit floating-point numbers,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

from sqlalchemy import insert, select

with engine.begin() as conn:
conn.execute(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

use_literal_execute_for_simple_int
),
**kw,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


.. versionadded:: 2.0.41

:param index_type: Enum value from :class:`.VectorIndexType`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Specifies the indexing method. For HNSW, this must be
:attr:`.VectorIndexType.HNSW`.

:param distance: Enum value from :class:`.VectorDistanceType`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

:param ivf_sample_per_partition: integer. Should be between 1
and ``num_vectors / neighbor partitions``. Specifies the
number of samples used per partition. The attribute
:attr:`VectorIndexConfig.ivf_sample_per_partition` is IVF index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


:param ivf_min_vectors_per_partition: integer. From 0 (no trimming)
to the total number of vectors (results in 1 partition). Specifies
the minimum number of vectors per partition. The attribute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

this needs to be formatted as:

:attr:`.VectorIndexConfig.ivf_min_vectors_per_partition`

note period and quotes

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one is right

:attr:.VectorIndexConfig.ivf_min_vectors_per_partition

:attr:VectorIndexConfig.ivf_min_vectors_per_partition

In first there is period before VectorIndexConfig. If this one is correct then other places also needs to be modified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the period allows sphinx to do a search to get the correct namespace for the object, which in this case would be somehting like sqlalchemy.dialects.oracle.VectorIndexConfig. that is, the period tells sphinx to substitute the full module path. if the parent document of this doc already sets up :module: to be sqlalchemy.dialects.oracle, then that's the default prefix in any case, but I tend not to rely on that in most cases since sphinx can be very unreliable.

background is at https://www.sphinx-doc.org/en/master/usage/domains/python.html#target-resolution

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838


def _array_typecode(self, typecode):
"""
Map storage format to array typecode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

move "typecode_map" to be a class level variable of the VECTOR class and name it with a leading underscore

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

VectorStorageFormat.FLOAT32: "f", # Float
VectorStorageFormat.FLOAT64: "d", # Double
}
return typecode_map.get(typecode, "d")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

so this would read:

return self._typecode_map.get(typecode, "d")

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@@ -951,6 +958,194 @@ def test_longstring(self, metadata, connection):
finally:
exec_sql(connection, "DROP TABLE Z_TEST")

@testing.only_on("oracle>=23.4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL
support to fully support this type for Oracle Database. This change
includes the base :class:`_oracle.VECTOR` type that adds new type-specific
methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as
new parameters ``oracle_vector`` for the :class:`.Index` construct,
allowing vector indexes to be configured, and ``oracle_fetch_approximate``
for the :meth:`.Select.fetch` clause.  Pull request courtesy Suraj Shaw.

Fixes: sqlalchemy#12317
Fixes: sqlalchemy#12341
Closes: sqlalchemy#12321
Pull-request: sqlalchemy#12321
Pull-request-sha: a0b3fa8

Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8

Added vector datatype support in Oracle dialect sqlalchemy#12321

Added vector datatype support in Oracle dialect#12321

Added vector datatype support in Oracle dialect#12321

updated to static link

some doc modification
@zzzeek zzzeek requested a review from sqla-tester May 5, 2025 15:14
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision a72a18a of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset a72a18a added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

"""

def process(value):
if isinstance(value, array.array):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

def _array_typecode(self, typecode):
"""
Map storage format to array typecode.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

.. seealso::

`Using VECTOR Data
<https://python-oracledb.readthedocs.io/en/latest/user_guide/vector_data_type.html>`_ - in the Oracle documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so we're going to point to the oracledb driver docs? OK. I will qualify the description here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything I need to do regarding this?

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

deprecated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

artifact

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

* :ref:`Index <genindex>` - Index for easy lookup of documentation topics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

artifact

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Bayer (zzzeek) wrote:

Done

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@sqla-tester
Copy link
Collaborator

Michael Bayer (zzzeek) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@sqla-tester
Copy link
Collaborator

Federico Caselli (CaselIT) wrote:

Thanks for the update. Looks good for me now, thanks

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5838 has been merged. Congratulations! :)

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5844 has been merged. Congratulations! :)

sqlalchemy-bot pushed a commit that referenced this pull request May 6, 2025
Added new datatype :class:`_oracle.VECTOR` and accompanying DDL and DQL
support to fully support this type for Oracle Database. This change
includes the base :class:`_oracle.VECTOR` type that adds new type-specific
methods ``l2_distance``, ``cosine_distance``, ``inner_product`` as well as
new parameters ``oracle_vector`` for the :class:`.Index` construct,
allowing vector indexes to be configured, and ``oracle_fetch_approximate``
for the :meth:`.Select.fetch` clause.  Pull request courtesy Suraj Shaw.

Fixes: #12317
Closes: #12321
Pull-request: #12321
Pull-request-sha: a72a18a

Change-Id: I6f3af4623ce439d0820c14582cd129df293f0ba8
(cherry picked from commit 1b780ce)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Support for Oracle VECTOR Datatype
5 participants