Add lakectl fs cp and lakectl fs mv commands#10083
Add lakectl fs cp and lakectl fs mv commands#10083nside wants to merge 4 commits intotreeverse:masterfrom
lakectl fs cp and lakectl fs mv commands#10083Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae8cc65867
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ae8cc65 to
ea0cb71
Compare
Add two new commands to the lakectl CLI: - `lakectl fs cp` - Copy objects within a repository - `lakectl fs mv` - Move objects within a repository (copy + delete) Both commands support: - Single object and recursive (-r) operations - Force flag (-f) to overwrite existing objects - Configurable parallelism (-p) for concurrent operations - Progress bar (disable with --no-progress) Co-Authored-By: Claude Opus 4.5 <[email protected]>
ea0cb71 to
35eaf09
Compare
arielshaqed
left a comment
There was a problem hiding this comment.
Thanks! Given the number of rough edges around the copyObject API I Will leave the overall CLI design to others who are more expert.
Right now I'm concerned about:
- Some very strange possible behaviours.
- (Error flooding when something goes wrong...).
-fflag seems to do something other than documented.
- This seems heavily AI-generated. The sheer number of lines of code will raise the maintenance burden. I would like to request more oversight; among things:
- Unify significant part of copy and move behaviours.
- Use atomics where necessary, avoid where unnecessary.
- Readability of the tests - again, they are very hard to read. It took me a while to understand where to find tests for "mv". I still don't understand a method called mockCopyClient.DeleteObjectsWithResponse. Etc.
cmd/lakectl/cmd/fs_cp.go
Outdated
| errorsWg.Add(1) | ||
| go func() { |
There was a problem hiding this comment.
Use errorsWg.Go. Even better use errgroup.
There was a problem hiding this comment.
Done - switched to errorsWg.Go(func() {...}) pattern, matching the style in fs_rm.go.
cmd/lakectl/cmd/fs_cp.go
Outdated
| go func() { | ||
| defer errorsWg.Done() | ||
| for err := range errorCh { | ||
| fmt.Fprintln(os.Stderr, "Error:", err) |
There was a problem hiding this comment.
This can flood the screen, and it is poorly formatted.
There was a problem hiding this comment.
Fixed - now limits error output to 10 messages, then shows "(additional errors suppressed)".
cmd/lakectl/cmd/fs_cp.go
Outdated
| defer errorsWg.Done() | ||
| for err := range errorCh { | ||
| fmt.Fprintln(os.Stderr, "Error:", err) | ||
| atomic.AddInt64(&errors, 1) |
There was a problem hiding this comment.
Why is this atomic? It will only be accessed after WaitGroup.Wait surely?
There was a problem hiding this comment.
Removed the atomic - you're right, errorCount is only modified in the single error handler goroutine and read after Wait().
| {{ if .Errors }}Errors: {{ .Errors }} object(s){{ end }} | ||
| ` | ||
|
|
||
| var fsMvCmd = &cobra.Command{ |
There was a problem hiding this comment.
This command is implement quite similarly to the "copy" command above. Can we share code between them?
There was a problem hiding this comment.
Done - unified into a single recursiveCopyMove() function with a deleteSource parameter. Reduced ~170 lines of duplication.
cmd/lakectl/cmd/fs_cp.go
Outdated
| func init() { | ||
| withRecursiveFlag(fsCpCmd, "recursively copy all objects under the specified path") | ||
| withParallelismFlag(fsCpCmd) | ||
| fsCpCmd.Flags().BoolP("force", "f", false, "overwrite existing objects at destination") |
There was a problem hiding this comment.
Are you sure that this is correct? AFAICS this flag ends up on the option graveler.SetOptions.Force, which has this godoc:
// Force set to true will bypass repository read-only protection.
Force boolThere was a problem hiding this comment.
You're right - the Force flag bypasses read-only protection, not "overwrite existing". Removed the -f/--force flag entirely since it doesn't match user expectations.
- Remove misleading -f/--force flag (was bypassing read-only protection, not overwriting existing objects as documented) - Unify copy and move logic into shared recursiveCopyMove function - Use sync.WaitGroup.Go() pattern instead of Add(1) + go func() - Remove unnecessary atomic operations (errorCount accessed after Wait()) - Limit error output to 10 messages to avoid flooding stderr Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
arielshaqed
left a comment
There was a problem hiding this comment.
Thanks!
I find your continued use of AI in this PR perplexing: it appears that Claude has done little to scan the existing code, and it adds huge portions of new code (e.g. a new batched deleter) while failing to address the flagged issues (e.g. using sync.WaitGroup.Go). Please manually go over the comments and over the generated code. While we obviously appreciate code contributions, as a maintainer I need to ensure that accepted code is maintainable. This is particularly important in an open-source project.
| // deleteObjectsBatch deletes objects in batches of deleteChunkSize | ||
| func deleteObjectsBatch(ctx context.Context, client apigen.ClientWithResponsesInterface, repository, branch string, paths []string) []error { |
There was a problem hiding this comment.
We already have deleteObjectWorker in fs_rm. It even supports concurrent operations. Why add another way to do it?
| } | ||
|
|
||
| // recursiveCopyMove handles recursive copy or move operations. | ||
| // When deleteSource is true, source objects are deleted after successful copy (move behavior). |
There was a problem hiding this comment.
So why not call it move?
| var wg sync.WaitGroup | ||
| wg.Add(parallelism) | ||
| for range parallelism { | ||
| go func() { |
| copiedMu.Lock() | ||
| copiedPaths = append(copiedPaths, task.srcPath) | ||
| copiedMu.Unlock() |
There was a problem hiding this comment.
Not blocking, but curious why this uses a lock rather than a channel.
| // Skip directory markers | ||
| if strings.HasSuffix(obj.Path, uri.PathSeparator) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
I don't understand:
- Looking at the suffix of the path means this does not copy directory markers - empty objects which are named for the "directory", which some systems (notably Spark) seem to like.
- Why not look a obj.PathType?
- Is this a recursive copy or not? If recursive, why are you even getting these bad objects?
| continue | ||
| } | ||
| // Transform path: replace source prefix with dest prefix | ||
| relPath := strings.TrimPrefix(obj.Path, srcPrefix) |
There was a problem hiding this comment.
Use CutPrefix and check.
Summary
lakectl fs cpcommand to copy objects within a repositorylakectl fs mvcommand to move objects within a repository (copy + delete)Test plan
Description
This PR adds two frequently requested commands to lakectl for managing objects within a repository:
lakectl fs cp <source URI> <dest URI>Copies objects from source to destination within the same repository.
lakectl fs mv <source URI> <dest URI>Moves objects (copy + delete source) within the same repository.
Flags (both commands)
-r, --recursive-p, --parallelism--no-progressExample usage
Implementation notes
CopyObjectAPI for copiesfs rm)🤖 Generated with Claude Code