Conversation
|
I understand why this change. But ideally we would recall why we had this in the first place, and what the cost of removing it will be. |
The implications are - physical addresses that are uploaded and not linked within the 6 hour grace period are in danger of being removed by the GC process. @nopcoder I have a proposal for mitigation - since a big part of the risk here is to link a non-existing object, can we add a check that there's actually an object at the physical location before linking it? |
|
@N-o-Z I think that checking the object during link will prevent the use of the API while working with minimal permission - as far as I know the user can link data while the client have the permissions to upload the data while lakeFS doesn't have permission to read. @arielshaqed document the openapi spec like we did for copy object? |
In this case I think we need to discuss this change further |
|
📚 Documentation preview at https://pr-10180.docs-lakefs-preview.io/ (Updated: 2/27/2026, 8:06:01 AM - Commit: 6383a03) |
|
@N-o-Z made it configurable and update the description with the findings. |
N-o-Z
left a comment
There was a problem hiding this comment.
Thanks!
Can we add this to the lakeFS config please (maybe in capabilities)?
This solution is not complete until we use this configuration in the GC client
|
Can you add the reasoning for making it configurable? I thought we wanted to remove it completely |
81db273 to
6383a03
Compare
Updated - thanks! |
Added it into the notes of the PR. |
itaiad200
left a comment
There was a problem hiding this comment.
I think we should reduce complexity, not additional guardrails, configuration and more functionality that is not really needed. I believe the users shouldn't be aware of that, should be allowed to configure it, and we should not validate timestamps while linking.
The current guardrail prevents the possibility of long-running uploads being deleted. Unlike other operations, where you can run GC within a specific timeframe, uploads of large data are not easy to coordinate. |
N-o-Z
left a comment
There was a problem hiding this comment.
Thanks,
I'm only blocking because I think disabling the grace time altogether will be a major issue.
If it is already configurable - let the user configure it to as long as they wish.
Disabling it completely without understanding the GC implications could be disastrous.
Also, think about what will be the expected behavior for GC when the grace is 0?
pkg/api/controller.go
Outdated
|
|
||
| writeResponse(w, r, http.StatusOK, apigen.GarbageCollectionConfig{ | ||
| GracePeriod: swag.Int(int(catalog.LinkAddressTime.Seconds())), | ||
| GracePeriod: swag.Int(int(c.Catalog.LinkAddressExpiration.Seconds())), |
There was a problem hiding this comment.
| GracePeriod: swag.Int(int(c.Catalog.LinkAddressExpiration.Seconds())), | |
| GarbageCollectionGracePeriod: swag.Int(int(c.Catalog.LinkAddressExpiration.Seconds())), |
Or:
| GracePeriod: swag.Int(int(c.Catalog.LinkAddressExpiration.Seconds())), | |
| LinkAddressGracePeriod: swag.Int(int(c.Catalog.LinkAddressExpiration.Seconds())), |
docs/src/reference/configuration.md
Outdated
| * `graveler.ensure_readable_root_namespace` `(bool: true)` - When creating a new repository use this to verify that lakeFS has access to the root of the underlying storage namespace. Set `false` only if lakeFS should not have access (i.e pre-sign mode only). | ||
| * `graveler.max_batch_delay` `(duration : 3ms)` - Controls the server batching period for references store operations. | ||
| * `graveler.background.rate_limit` `(int : 0)` - Requests per seconds limit on background work performed (default: 0 - unlimited), like deleting committed staging tokens. | ||
| * `graveler.link_address_expiration` `(duration : "6h")` - How long a pre-signed staging upload address remains valid. Set to `0` to disable the expiration check. |
There was a problem hiding this comment.
If we decided to make this configurable, I suggest having a minimal grace that makes sense (let's say 6 hours) and if a user needs more than that they can configure this to be as big as they want.
Disabling the grace altogether still creates an issue with GC
So let's extend the grace period from 6 hours to 24 hours and be done with it. I don't see the value of making this configurable. |
6383a03 to
c50eb05
Compare
|
@itaiad200 update the code to set time verification limit to 24h. |
I just want to clarify the implications of this change. |
|
Co-authored-by: Amp <[email protected]> Amp-Thread-ID: https://ampcode.com/threads/T-019caef6-8f86-755c-926d-e3722e913822
c50eb05 to
5d5de83
Compare
Thanks @N-o-Z updated the client code too. |
itaiad200
left a comment
There was a problem hiding this comment.
Thanks.
I think this change is reasonable. 24 hours to complete a single file write to the storage and updating it in lakeFS should be more than enough. The only downside for extending the grace period is that objects are deleted 18 hours later - I think it's worth the reduced complexity.
I still don't understand why we need to verify long running multipart upload. Are we concerned about a multipart upload taking more than 24 hours, completed in the object store (uncompleted mpu are not gc'd) so it could be seen by the GC, but not registered in lakeFS? I find this race condition impossible, it could only occur if someone deliberately chooses not to complete mpus, and we can document the implications when GC is running.
We can do this in a separate PR though.
Amp-Thread-ID: https://ampcode.com/threads/T-019cafe9-4108-73b8-94d3-3e946ba8f2d6 Co-authored-by: Amp <[email protected]>
Close #10099