-
Notifications
You must be signed in to change notification settings - Fork 124
vine: explicitly specify file type for put/puturl/puturl_now #4233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vine: explicitly specify file type for put/puturl/puturl_now #4233
Conversation
|
Ok, it makes sense to pass the file type with But it's not clear to me why the type has been added to |
|
For Input files are less of a concern, since they are only used on the worker and do not impact the manager’s state. But output files can be problematic, because their type are sent back through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where the cache-update mechanism was returning a file type value of 0 instead of a valid range (1-5), preventing temporary file replication. The fix ensures that file type information is explicitly passed from manager to worker for all file transfer operations rather than being inferred by the worker.
Key changes:
- Update protocol messages to include explicit file type parameters for
put,puturl, andputurl_nowoperations - Pass file type information through the cache system and task file operations
- Resolve file types for URLs after materialization via cache-update
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| taskvine/src/worker/vine_worker.c | Updates message parsing to extract file type and passes it to cache operations |
| taskvine/src/worker/vine_sandbox.c | Passes file type when adding output files to cache |
| taskvine/src/worker/vine_cache.h | Updates function signatures to include file type parameter |
| taskvine/src/worker/vine_cache.c | Stores and uses explicit file type in cache operations |
| taskvine/src/manager/vine_task.h | Updates task file functions to accept file type parameter |
| taskvine/src/manager/vine_task.c | Creates files with explicit type instead of using generic local file function |
| taskvine/src/manager/vine_protocol.h | Increments protocol version to reflect message format changes |
| taskvine/src/manager/vine_manager_put.c | Updates protocol messages to include file type in communication |
| taskvine/src/manager/vine_manager.c | Resolves URL file types to original types after materialization |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| } else if (sscanf(line, "infile %s %s %d %d", localname, taskname_encoded, &flags, &original_type)) { | ||
| url_decode(taskname_encoded, taskname, VINE_LINE_MAX); | ||
| vine_hack_do_not_compute_cached_name = 1; | ||
| vine_task_add_input_file(task, localname, taskname, flags); | ||
| } else if (sscanf(line, "outfile %s %s %d", localname, taskname_encoded, &flags)) { | ||
| vine_task_add_input_file(task, localname, taskname, flags, original_type); | ||
| } else if (sscanf(line, "outfile %s %s %d %d", localname, taskname_encoded, &flags, &original_type)) { | ||
| url_decode(taskname_encoded, taskname, VINE_LINE_MAX); | ||
| vine_hack_do_not_compute_cached_name = 1; | ||
| vine_task_add_output_file(task, localname, taskname, flags); | ||
| vine_task_add_output_file(task, localname, taskname, flags, original_type); |
Copilot
AI
Sep 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf format strings expect 4 parameters but the protocol change may cause compatibility issues if older clients send messages with only 3 parameters. Consider adding backward compatibility handling or validation that all 4 parameters are successfully parsed.
| } else if (sscanf(line, "infile %s %s %d %d", localname, taskname_encoded, &flags, &original_type)) { | ||
| url_decode(taskname_encoded, taskname, VINE_LINE_MAX); | ||
| vine_hack_do_not_compute_cached_name = 1; | ||
| vine_task_add_input_file(task, localname, taskname, flags); | ||
| } else if (sscanf(line, "outfile %s %s %d", localname, taskname_encoded, &flags)) { | ||
| vine_task_add_input_file(task, localname, taskname, flags, original_type); | ||
| } else if (sscanf(line, "outfile %s %s %d %d", localname, taskname_encoded, &flags, &original_type)) { | ||
| url_decode(taskname_encoded, taskname, VINE_LINE_MAX); | ||
| vine_hack_do_not_compute_cached_name = 1; | ||
| vine_task_add_output_file(task, localname, taskname, flags); | ||
| vine_task_add_output_file(task, localname, taskname, flags, original_type); |
Copilot
AI
Sep 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf format strings expect 4 parameters but the protocol change may cause compatibility issues if older clients send messages with only 3 parameters. Consider adding backward compatibility handling or validation that all 4 parameters are successfully parsed.
| if (sscanf(line, "task %" SCNd64, &task_id) == 1) { | ||
| r = do_task(manager, task_id, time(0) + options->active_timeout); | ||
| } else if (sscanf(line, "put %s %d %" SCNd64, filename_encoded, &cache_level, &length) == 3) { | ||
| } else if (sscanf(line, "put %s %d %" SCNd64 " %d", filename_encoded, &cache_level, &length, &original_type) == 4) { |
Copilot
AI
Sep 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf calls properly check the return value for the expected number of parsed parameters, which is good practice for the protocol change. However, the infile/outfile parsing above doesn't check return values - consider adding similar validation there.
| r = do_put(manager, filename, cache_level, length, original_type); | ||
| reset_idle_timer(); | ||
| } else if (sscanf(line, "puturl %s %s %d %" SCNd64 " %o %s", source_encoded, filename_encoded, &cache_level, &length, &mode, transfer_id) == 6) { | ||
| } else if (sscanf(line, "puturl %s %s %d %" SCNd64 " %d %o %s", source_encoded, filename_encoded, &cache_level, &length, &original_type, &mode, transfer_id) == 7) { |
Copilot
AI
Sep 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf calls properly check the return value for the expected number of parsed parameters, which is good practice for the protocol change. However, the infile/outfile parsing above doesn't check return values - consider adding similar validation there.
| reset_idle_timer(); | ||
| hash_table_insert(current_transfers, filename, strdup(transfer_id)); | ||
| } else if (sscanf(line, "puturl_now %s %s %d %" SCNd64 " %o %s", source_encoded, filename_encoded, &cache_level, &length, &mode, transfer_id) == 6) { | ||
| } else if (sscanf(line, "puturl_now %s %s %d %" SCNd64 " %d %o %s", source_encoded, filename_encoded, &cache_level, &length, &original_type, &mode, transfer_id) == 7) { |
Copilot
AI
Sep 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf calls properly check the return value for the expected number of parsed parameters, which is good practice for the protocol change. However, the infile/outfile parsing above doesn't check return values - consider adding similar validation there.
|
The worker doesn't need the |
Proposed Changes
The issue was that



cache-updatereturns file type value of 0, which should be in the range of 1 to 5. And because of the incorrect type, temp file replication cannot be proceeded.To fix this, we should let the manager control the file type and append it whenever it sends a file to a worker, rather than letting the worker infer the file type from the message.
For whatever file the manager sends to any worker, a
file_typefield must be explicitly specified in that message.If the file was a
VINE_URL(e.g., a file that needs to be downloaded from another worker), we will resolve its actual type once it is materialized and the manager recveives the relatedcache-update.Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make testRun local tests prior to pushing.make formatFormat source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lintRun lint on source code prior to pushing.