Conversation
82d6577 to
3b16f8f
Compare
| | Model serialization | `.dict()`, `.json()` | `.model_dump()`, `.model_dump_json()` | | ||
| | Model deserialization | `.parse_obj()`, `.parse_raw()` | `.model_validate()`, `.model_validate_json()` | | ||
| | Model config | `class Config:` inner class | `model_config = ConfigDict(...)` | | ||
| | Validators | `@validator` | `@field_validator` | | ||
| | Argument validation | `@validate_arguments` | `@validate_call` | |
There was a problem hiding this comment.
In V2, dict(), json(), parse_obj(), parse_raw() and construct() still function, but they are deprecated and raise a DeprecationWarning.
Config inner class is also deprecated (and still works), but a lot of settings have been renamed or removed. Same with @validator too.
conint/constr may also work (I can see that in the pydantic v2 code).
There was a problem hiding this comment.
While Pydantic V2 does provide backward-compatibility for these methods, they are deprecated and subject to removal in a future release.
This document intentionally treats them as breaking changes to ensure we don't rely on transitional compatibility.
There was a problem hiding this comment.
While Pydantic V2 does provide backward-compatibility for these methods, they are deprecated and subject to removal in a future release.
They are planning to drop that on v3 release, not on v2. pydantic/pydantic#10033
There was a problem hiding this comment.
Naive question: Does this mean the same lakefs_sdk code works with Pydantic V2 out of the box? I'm wondering if we can skip all of this at the price of working with deprecated methods.
There was a problem hiding this comment.
lakefs_sdk DOES NOT work with pydantic v2. lakefs_sdk's package requirements specify pydantic < 2.
This is a constraint of the openapi-generator version we are using and we can't play with it.
As the document explains in detail - our current WA was by running a post generation script that relies on pydantic v2 compatibility layer to support pydantic v1 via the pydantic v2 package. As of release 2.12 we CANNOT LONGER rely on this compatibility layer.
As long as we are tied to the openapi-generator for creating the SDK we will always be dependant on a specific pydantic major. This means that yes - when user will request support of pydantic v3 or on the event of deprecation of v2 we will need to deal with it again.
Unless we want to have discussion on how we generate our SDK (perhaps we should implement it ourselves) I suggest we put this discussion aside
design/open/python-sdk-v2.md
Outdated
| This means the current SDK will not work on Python 3.14+. At the same time, we cannot simply regenerate the existing `lakefs-sdk` package with Pydantic V2 code, because that would be a breaking change for users who depend on the current V1-based API (`.dict()`, `.json()`, `@validator`, etc.). | ||
|
|
||
| ## Goals | ||
| 1. Publish a new Python SDK package (`lakefs-sdk-v2` / `lakefs_sdk_v2`) generated with OpenAPI Generator v7.9.0, producing native Pydantic V2 code |
There was a problem hiding this comment.
Would it be possible to release this as a major version bump of the existing package (i.e., lakefs-sdk v2) instead of publishing a separate package?
I see we have guard in place in lakefs HL client:
There was a problem hiding this comment.
Unfortunately, the coupling of lakeFS and SDKs version restricts us from releasing a major version without releasing also a major lakeFS version
There was a problem hiding this comment.
Added clarification to the doc
There was a problem hiding this comment.
I guess this means when pydantic releases a v3 end of this year or some other libraries make a breaking change, we'll have to create a lakefs-sdk-v3.
There was a problem hiding this comment.
As long as we are relying on openapi-generator for the SDK creation we are chained to the package dependency it creates.
But that is not any different from having our own package dependency and then getting a request to support a newer version of that package. We will be at the same position and we will need to find a solution for it without breaking compatibility.
The underlying issue is the coupling of lakeFS and SDK versions which we don't intend on solving ATM as far as I'm aware
nopcoder
left a comment
There was a problem hiding this comment.
Want to confirm my understanding - switching to the new generated SDK does not modify the names of the functions and the main change is related to how we convert the generated structures to JSON? user script that call the API will continue to work without any change, unless they used the async option or the structure convert methods?
arielshaqed
left a comment
There was a problem hiding this comment.
Thanks! This will be a good step forwards. The current proposal explains the plan, and what the internal changes would look like. I would like a better understanding of what the user of the new generated lakeFS SDK will see.
design/open/python-sdk-v2.md
Outdated
| This means the current SDK will not work on Python 3.14+. At the same time, we cannot simply regenerate the existing `lakefs-sdk` package with Pydantic V2 code, because that would be a breaking change for users who depend on the current V1-based API (`.dict()`, `.json()`, `@validator`, etc.). | ||
|
|
||
| ## Goals | ||
| 1. Publish a new Python SDK package (`lakefs-sdk-v2` / `lakefs_sdk_v2`) generated with OpenAPI Generator v7.9.0, producing native Pydantic V2 code |
There was a problem hiding this comment.
I am not a fan of the name: having a lakeFS SDK v2 for lakeFS API version 1.76 will be confusing.
Does the user-visible of the SDK change? If so we probably need to change the current name lakefs-sdk.
If it will go away, let's call it lakefs-sdk-exp or something.
There is still the probable issue that old programs may start failing Pydantic - but then I would argue that they were already broken, and as such we do not need to support them.
There was a problem hiding this comment.
This is described as part of the sunsetting plan.
I agree we should probably rename it to something else other than V2.
However we should be careful with the naming. This has implications on the pypi package and we won't be able to change it once we decide on it. Therefore lakefs-sdk-exp is not a very good choice.
For reference, the previous legacy version was called lakefs-client. We will need to come up with a new name which is not a temporary one.
There was a problem hiding this comment.
Just throwing a bad idea in the air - it is possible to revive the lakefs-client package with the new version.
Technically it's been enough time for the deprecated package to be forgotten (I doubt there's any user that uses it)
|
|
||
| ## Non-Goals | ||
| 1. Changing the API surface of the high-level Python SDK wrapper (`lakefs` package) - the wrapper should continue to work identically from the user's perspective | ||
| 2. Supporting Pydantic V1 in the new SDK - the new package requires Pydantic >= 2.0 |
There was a problem hiding this comment.
We will need to communicate this.
There was a problem hiding this comment.
It will be part of the python project requirements >2.0. Not sure if we actually need to explicitly communicate it in other ways
There was a problem hiding this comment.
If it has user-visible effect we may want to poll a few users, see if they can make it. Otherwise we will be stuck with both versions (deprecated lakefs-sdk that works for some users, new improved lakefs-sdk-better that cannot replace it until users let us).
There was a problem hiding this comment.
I don't think we have a commitment to support deprecated versions while providing new functionality.
Users can continue using the old SDK for the existing functionality if they are locked in a deprecated dependency. We have the version guardrails to make sure we are not introducing any breaking changes in lakeFS which will prevent them from using it with the old SDK. However, if they want to use any new functionality they will have to update their environment.
| ``` | ||
|
|
||
| ## Breaking Changes Between Old and New SDK | ||
| The following changes are introduced by the Pydantic V1 → V2 migration in the generated code: |
There was a problem hiding this comment.
Please add a section on changes in the user-visible API.
Basically yes, |
arielshaqed
left a comment
There was a problem hiding this comment.
Thanks!
Ideally I would like this to override the lakefs-sdk package name. If we can pull that off, my ideal naming plan would be to call the new one lakefs-sdk-exp, keep both alive for a while, then rename the new one lakefs-sdk - and perhaps rename the old one lakefs-sdk-legacy if users scream.
When we had lakefs-client and lakefs-sdk, users would regularly mix them up. And people continued to use lakefs-client well after we stopped updating it. So if at all possible I don't want to have both.
The blocking issue right now is the old async API. The thing is, it is not particularly usable. Perhaps go through our community and ask if anyone cares about it? Alternatively, how much work would it take to hack it back into the code using the template?
|
|
||
| ## Non-Goals | ||
| 1. Changing the API surface of the high-level Python SDK wrapper (`lakefs` package) - the wrapper should continue to work identically from the user's perspective | ||
| 2. Supporting Pydantic V1 in the new SDK - the new package requires Pydantic >= 2.0 |
There was a problem hiding this comment.
If it has user-visible effect we may want to poll a few users, see if they can make it. Otherwise we will be stuck with both versions (deprecated lakefs-sdk that works for some users, new improved lakefs-sdk-better that cannot replace it until users let us).
| `.dict()`, `.json()`, `.parse_obj()`, `.parse_raw()` are replaced by `.model_dump()`, `.model_dump_json()`, `.model_validate()`, `.model_validate_json()`. While Pydantic V2 still supports these old methods for backward compatibility, they are deprecated and subject to removal in a future release - code should be migrated to the V2 methods. | ||
|
|
||
| **Internal parameters are now explicit instead of `**kwargs`** | ||
| The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged. |
There was a problem hiding this comment.
This is actually cool - we pass-through on these parameters, I feel safe punting on them to the actual packages making a decision.
| The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged. | ||
|
|
||
| **`async_req` is removed** | ||
| The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead. |
There was a problem hiding this comment.
Do we have any way to estimate how many people use this?
| The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead. | ||
|
|
||
| **New `*_without_preload_content()` method variants** | ||
| Each endpoint gains a third method variant (in addition to the existing `*_with_http_info()`) that returns the raw HTTP response without deserializing the body, useful for streaming large responses. |
There was a problem hiding this comment.
Yay, this is very useful!
| The old SDK accepted parameters like `_request_timeout` and `_headers` via `**kwargs`. The new SDK declares them as explicit keyword arguments with proper type annotations. Existing code that passes these parameters by name continues to work unchanged. | ||
|
|
||
| **`async_req` is removed** | ||
| The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead. |
There was a problem hiding this comment.
| The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead. | |
| The old SDK offered thread-pool-based async via `async_req=True`, returning a thread handle. This is removed in the new SDK. Users who rely on this can use `concurrent.futures.ThreadPoolExecutor` or `asyncio.to_thread()` instead. | |
| _Breaks backwards compatibility._ |
| └─────────────────────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| ## Target Architecture |
There was a problem hiding this comment.
After the sunsetting of lakefs_sdk, right?
I'm assuming there will be a transition period which is not manifested here
| | Model serialization | `.dict()`, `.json()` | `.model_dump()`, `.model_dump_json()` | | ||
| | Model deserialization | `.parse_obj()`, `.parse_raw()` | `.model_validate()`, `.model_validate_json()` | | ||
| | Model config | `class Config:` inner class | `model_config = ConfigDict(...)` | | ||
| | Validators | `@validator` | `@field_validator` | | ||
| | Argument validation | `@validate_arguments` | `@validate_call` | |
There was a problem hiding this comment.
Naive question: Does this mean the same lakefs_sdk code works with Pydantic V2 out of the box? I'm wondering if we can skip all of this at the price of working with deprecated methods.
design/open/python-sdk-v2.md
Outdated
|
|
||
| ## Implementation Plan | ||
| ### Phase 1: New SDK alongside old (this branch) | ||
| **Status: Done** (`claude/update-python-sdk-KyIG4`) |
There was a problem hiding this comment.
Done because we already published it?!
There was a problem hiding this comment.
Ignore this - I used claude to create a new python package to research the changes for the API
| | Risk | Mitigation | | ||
| |-----------------------------------------------------------------|------------------------------------------------------------------------------------------------| | ||
| | HL wrapper breakage during migration | Run full unit + integration test suite against `lakefs-sdk-v2` before merging Phase 2 | | ||
| | Users on Pydantic V1 cannot upgrade | Old `lakefs-sdk` continues to work; no forced upgrade | |
There was a problem hiding this comment.
Continue to work on old APIs. We can always get a requirement for a new API that some customer X wanted, but he's stuck on the older version.
I'm not saying it's a show stopper - just that the risk is not fully mitigated.
There was a problem hiding this comment.
When we create a new client and declare the sunsetting of the old client it gives us 2 things:
- We are not required to be BC with the old client - we don't have to make any guarantees and we are allowed to change/break APIs and package dependencies
- We are informing the users using the old client that we are not going to support it for much longer (that includes adding new functionality) and provide them with time to prepare to transition their workflows to the new client.
We've done that once (lakefs-client), without a lot of friction and I believe that this is still the least worst possible solution.
| |-----------------------------------------------------------------|------------------------------------------------------------------------------------------------| | ||
| | HL wrapper breakage during migration | Run full unit + integration test suite against `lakefs-sdk-v2` before merging Phase 2 | | ||
| | Users on Pydantic V1 cannot upgrade | Old `lakefs-sdk` continues to work; no forced upgrade | | ||
| | OpenAPI Generator v7.9.0 generates subtly different API surface | Functional parity testing in Phase 1; diff generated code against old SDK | |
There was a problem hiding this comment.
Maybe we can accept it? Since users are required to upgrade packages anyhow, the migration isn't seamless. Maybe we can accept some changes that are easy for the users to change, instead of trying to align the API surface which stays in the code forever.
There was a problem hiding this comment.
I'm not sure what you are suggesting here?
We are not trying to align any API - the only thing we want to solve here is the pydantic v1 dependency. All the other things are the consequence of updating the openapi-generator version for that purpose.
| | Validators | `@validator` | `@field_validator` | | ||
| | Argument validation | `@validate_arguments` | `@validate_call` | | ||
| | Constrained types | `conint()`, `constr()` | `Annotated[int, Field(ge=...)]` | | ||
| | Async support | `async_req=True` (thread pool) | Native `asyncio` (`library=asyncio`) | |
There was a problem hiding this comment.
Reading the following issues, it seems that there is no way to create both sync and async client in the same project. (Please correct me if I am wrong).
- Python: Creating an async client OpenAPITools/openapi-generator#18407
- Doesn't support
sync/asyncgeneration python client. Something like httpx OpenAPITools/openapi-generator#19255
Are you proposing to create a native asyncio client? library=asyncio only generates async client, and library=urllib3 (default) only generates sync client AFAIU.
nopcoder
left a comment
There was a problem hiding this comment.
Will it be possible to update the sdk to v7.1 have pydentic v2 implementation which is supported by python 3.14+.
Users that will require pydentic v1 can import the compatibility to v1 as described in https://docs.pydantic.dev/latest/migration/.
Our code (python wrapper) will need to get update to use v2 and/or import v1 compatibility.
And for users they will probably only need to understand that we upgraded.
| @@ -0,0 +1,173 @@ | |||
| # Python SDK V2 (V3...) | |||
Co-authored-by: Ariel Shaqed (Scolnicov) <[email protected]>
2be0f11 to
a08cfec
Compare
Closes #10004