Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Proposal: Python SDK V2#10122

Open
N-o-Z wants to merge 6 commits intomasterfrom
design/lakefs-sdk-v2-10004
Open

Proposal: Python SDK V2#10122
N-o-Z wants to merge 6 commits intomasterfrom
design/lakefs-sdk-v2-10004

Conversation

@N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Feb 10, 2026

Closes #10004

@N-o-Z N-o-Z requested review from a team and nopcoder February 10, 2026 19:47
@N-o-Z N-o-Z self-assigned this Feb 10, 2026
@N-o-Z N-o-Z added proposal exclude-changelog PR description should not be included in next release changelog labels Feb 10, 2026
@N-o-Z N-o-Z force-pushed the design/lakefs-sdk-v2-10004 branch 2 times, most recently from 82d6577 to 3b16f8f Compare February 10, 2026 19:51
@N-o-Z N-o-Z requested a review from ozkatz February 10, 2026 20:00
Comment on lines +64 to +68
| Model serialization | `.dict()`, `.json()` | `.model_dump()`, `.model_dump_json()` |
| Model deserialization | `.parse_obj()`, `.parse_raw()` | `.model_validate()`, `.model_validate_json()` |
| Model config | `class Config:` inner class | `model_config = ConfigDict(...)` |
| Validators | `@validator` | `@field_validator` |
| Argument validation | `@validate_arguments` | `@validate_call` |
Copy link
Contributor

@skshetry skshetry Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V2, dict(), json(), parse_obj(), parse_raw() and construct() still function, but they are deprecated and raise a DeprecationWarning.

Config inner class is also deprecated (and still works), but a lot of settings have been renamed or removed. Same with @validator too.

conint/constr may also work (I can see that in the pydantic v2 code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While Pydantic V2 does provide backward-compatibility for these methods, they are deprecated and subject to removal in a future release.
This document intentionally treats them as breaking changes to ensure we don't rely on transitional compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While Pydantic V2 does provide backward-compatibility for these methods, they are deprecated and subject to removal in a future release.

They are planning to drop that on v3 release, not on v2. pydantic/pydantic#10033

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: Does this mean the same lakefs_sdk code works with Pydantic V2 out of the box? I'm wondering if we can skip all of this at the price of working with deprecated methods.

Copy link
Member Author

@N-o-Z N-o-Z Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lakefs_sdk DOES NOT work with pydantic v2. lakefs_sdk's package requirements specify pydantic < 2.
This is a constraint of the openapi-generator version we are using and we can't play with it.
As the document explains in detail - our current WA was by running a post generation script that relies on pydantic v2 compatibility layer to support pydantic v1 via the pydantic v2 package. As of release 2.12 we CANNOT LONGER rely on this compatibility layer.
As long as we are tied to the openapi-generator for creating the SDK we will always be dependant on a specific pydantic major. This means that yes - when user will request support of pydantic v3 or on the event of deprecation of v2 we will need to deal with it again.
Unless we want to have discussion on how we generate our SDK (perhaps we should implement it ourselves) I suggest we put this discussion aside

This means the current SDK will not work on Python 3.14+. At the same time, we cannot simply regenerate the existing `lakefs-sdk` package with Pydantic V2 code, because that would be a breaking change for users who depend on the current V1-based API (`.dict()`, `.json()`, `@validator`, etc.).

## Goals
1. Publish a new Python SDK package (`lakefs-sdk-v2` / `lakefs_sdk_v2`) generated with OpenAPI Generator v7.9.0, producing native Pydantic V2 code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to release this as a major version bump of the existing package (i.e., lakefs-sdk v2) instead of publishing a separate package?

I see we have guard in place in lakefs HL client:

"lakefs-sdk>=1.50,< 2",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the coupling of lakeFS and SDKs version restricts us from releasing a major version without releasing also a major lakeFS version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification to the doc

Copy link
Contributor

@skshetry skshetry Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means when pydantic releases a v3 end of this year or some other libraries make a breaking change, we'll have to create a lakefs-sdk-v3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we are relying on openapi-generator for the SDK creation we are chained to the package dependency it creates.
But that is not any different from having our own package dependency and then getting a request to support a newer version of that package. We will be at the same position and we will need to find a solution for it without breaking compatibility.
The underlying issue is the coupling of lakeFS and SDK versions which we don't intend on solving ATM as far as I'm aware

Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to confirm my understanding - switching to the new generated SDK does not modify the names of the functions and the main change is related to how we convert the generated structures to JSON? user script that call the API will continue to work without any change, unless they used the async option or the structure convert methods?

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This will be a good step forwards. The current proposal explains the plan, and what the internal changes would look like. I would like a better understanding of what the user of the new generated lakeFS SDK will see.

This means the current SDK will not work on Python 3.14+. At the same time, we cannot simply regenerate the existing `lakefs-sdk` package with Pydantic V2 code, because that would be a breaking change for users who depend on the current V1-based API (`.dict()`, `.json()`, `@validator`, etc.).

## Goals
1. Publish a new Python SDK package (`lakefs-sdk-v2` / `lakefs_sdk_v2`) generated with OpenAPI Generator v7.9.0, producing native Pydantic V2 code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of the name: having a lakeFS SDK v2 for lakeFS API version 1.76 will be confusing.

Does the user-visible of the SDK change? If so we probably need to change the current name lakefs-sdk.

If it will go away, let's call it lakefs-sdk-exp or something.

There is still the probable issue that old programs may start failing Pydantic - but then I would argue that they were already broken, and as such we do not need to support them.

Copy link
Member Author

@N-o-Z N-o-Z Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is described as part of the sunsetting plan.
I agree we should probably rename it to something else other than V2.
However we should be careful with the naming. This has implications on the pypi package and we won't be able to change it once we decide on it. Therefore lakefs-sdk-exp is not a very good choice.
For reference, the previous legacy version was called lakefs-client. We will need to come up with a new name which is not a temporary one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just throwing a bad idea in the air - it is possible to revive the lakefs-client package with the new version.
Technically it's been enough time for the deprecated package to be forgotten (I doubt there's any user that uses it)


## Non-Goals
1. Changing the API surface of the high-level Python SDK wrapper (`lakefs` package) - the wrapper should continue to work identically from the user's perspective
2. Supporting Pydantic V1 in the new SDK - the new package requires Pydantic >= 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to communicate this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be part of the python project requirements >2.0. Not sure if we actually need to explicitly communicate it in other ways

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has user-visible effect we may want to poll a few users, see if they can make it. Otherwise we will be stuck with both versions (deprecated lakefs-sdk that works for some users, new improved lakefs-sdk-better that cannot replace it until users let us).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a commitment to support deprecated versions while providing new functionality.
Users can continue using the old SDK for the existing functionality if they are locked in a deprecated dependency. We have the version guardrails to make sure we are not introducing any breaking changes in lakeFS which will prevent them from using it with the old SDK. However, if they want to use any new functionality they will have to update their environment.

```

## Breaking Changes Between Old and New SDK
The following changes are introduced by the Pydantic V1 → V2 migration in the generated code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a section on changes in the user-visible API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@N-o-Z
Copy link
Member Author

N-o-Z commented Feb 11, 2026

Want to confirm my understanding - switching to the new generated SDK does not modify the names of the functions and the main change is related to how we convert the generated structures to JSON? user script that call the API will continue to work without any change, unless they used the async option or the structure convert methods?

Basically yes,
Added a new section for user visible changes

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Ideally I would like this to override the lakefs-sdk package name. If we can pull that off, my ideal naming plan would be to call the new one lakefs-sdk-exp, keep both alive for a while, then rename the new one lakefs-sdk - and perhaps rename the old one lakefs-sdk-legacy if users scream.

When we had lakefs-client and lakefs-sdk, users would regularly mix them up. And people continued to use lakefs-client well after we stopped updating it. So if at all possible I don't want to have both.

The blocking issue right now is the old async API. The thing is, it is not particularly usable. Perhaps go through our community and ask if anyone cares about it? Alternatively, how much work would it take to hack it back into the code using the template?


## Non-Goals
1. Changing the API surface of the high-level Python SDK wrapper (`lakefs` package) - the wrapper should continue to work identically from the user's perspective
2. Supporting Pydantic V1 in the new SDK - the new package requires Pydantic >= 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has user-visible effect we may want to poll a few users, see if they can make it. Otherwise we will be stuck with both versions (deprecated lakefs-sdk that works for some users, new improved lakefs-sdk-better that cannot replace it until users let us).

`.dict()`, `.json()`, `.parse_obj()`, `.parse_raw()` are replaced by `.model_dump()`, `.model_dump_json()`, `.model_validate()`, `.model_validate_json()`. While Pydantic V2 still supports these old methods for backward compatibility, they are deprecated and subject to removal in a future release - code should be migrated to the V2 methods.

**Internal parameters are now explicit instead of `**kwargs`**
The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually cool - we pass-through on these parameters, I feel safe punting on them to the actual packages making a decision.

The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged.

**`async_req` is removed**
The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any way to estimate how many people use this?

The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead.

**New `*_without_preload_content()` method variants**
Each endpoint gains a third method variant (in addition to the existing `*_with_http_info()`) that returns the raw HTTP response without deserializing the body, useful for streaming large responses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, this is very useful!

The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged.

**`async_req` is removed**
The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead.
The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead.
_Breaks backwards compatibility._

└─────────────────────────────────────────────────┘
```

## Target Architecture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the sunsetting of lakefs_sdk, right?
I'm assuming there will be a transition period which is not manifested here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely

Comment on lines +64 to +68
| Model serialization | `.dict()`, `.json()` | `.model_dump()`, `.model_dump_json()` |
| Model deserialization | `.parse_obj()`, `.parse_raw()` | `.model_validate()`, `.model_validate_json()` |
| Model config | `class Config:` inner class | `model_config = ConfigDict(...)` |
| Validators | `@validator` | `@field_validator` |
| Argument validation | `@validate_arguments` | `@validate_call` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: Does this mean the same lakefs_sdk code works with Pydantic V2 out of the box? I'm wondering if we can skip all of this at the price of working with deprecated methods.


## Implementation Plan
### Phase 1: New SDK alongside old (this branch)
**Status: Done** (`claude/update-python-sdk-KyIG4`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done because we already published it?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this - I used claude to create a new python package to research the changes for the API

| Risk | Mitigation |
|-----------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| HL wrapper breakage during migration | Run full unit + integration test suite against `lakefs-sdk-v2` before merging Phase 2 |
| Users on Pydantic V1 cannot upgrade | Old `lakefs-sdk` continues to work; no forced upgrade |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continue to work on old APIs. We can always get a requirement for a new API that some customer X wanted, but he's stuck on the older version.
I'm not saying it's a show stopper - just that the risk is not fully mitigated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create a new client and declare the sunsetting of the old client it gives us 2 things:

  1. We are not required to be BC with the old client - we don't have to make any guarantees and we are allowed to change/break APIs and package dependencies
  2. We are informing the users using the old client that we are not going to support it for much longer (that includes adding new functionality) and provide them with time to prepare to transition their workflows to the new client.
    We've done that once (lakefs-client), without a lot of friction and I believe that this is still the least worst possible solution.

|-----------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| HL wrapper breakage during migration | Run full unit + integration test suite against `lakefs-sdk-v2` before merging Phase 2 |
| Users on Pydantic V1 cannot upgrade | Old `lakefs-sdk` continues to work; no forced upgrade |
| OpenAPI Generator v7.9.0 generates subtly different API surface | Functional parity testing in Phase 1; diff generated code against old SDK |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can accept it? Since users are required to upgrade packages anyhow, the migration isn't seamless. Maybe we can accept some changes that are easy for the users to change, instead of trying to align the API surface which stays in the code forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you are suggesting here?
We are not trying to align any API - the only thing we want to solve here is the pydantic v1 dependency. All the other things are the consequence of updating the openapi-generator version for that purpose.

| Validators | `@validator` | `@field_validator` |
| Argument validation | `@validate_arguments` | `@validate_call` |
| Constrained types | `conint()`, `constr()` | `Annotated[int, Field(ge=...)]` |
| Async support | `async_req=True` (thread pool) | Native `asyncio` (`library=asyncio`) |
Copy link
Contributor

@skshetry skshetry Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the following issues, it seems that there is no way to create both sync and async client in the same project. (Please correct me if I am wrong).

Are you proposing to create a native asyncio client? library=asyncio only generates async client, and library=urllib3 (default) only generates sync client AFAIU.

Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be possible to update the sdk to v7.1 have pydentic v2 implementation which is supported by python 3.14+.

Users that will require pydentic v1 can import the compatibility to v1 as described in https://docs.pydantic.dev/latest/migration/.

Our code (python wrapper) will need to get update to use v2 and/or import v1 compatibility.

And for users they will probably only need to understand that we upgraded.

@@ -0,0 +1,173 @@
# Python SDK V2 (V3...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydentic

@N-o-Z N-o-Z force-pushed the design/lakefs-sdk-v2-10004 branch from 2be0f11 to a08cfec Compare March 2, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-changelog PR description should not be included in next release changelog mostly-human proposal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python SDK incompatible with Python 3.14+ due to Pydantic V1 compatibility layer

5 participants