Stateless support and switch to new deeplake-api.#3130
Conversation
|
Claude finished @khustup2's task in 2m 29s —— View job Review CompleteI've reviewed the PR and found several issues that should be addressed: Issues Found:
All issues have been posted as inline comments on the specific lines. Test Coverage: |
| const auto meta_path = join_path(path, k_meta_name); | ||
| meta_table = deeplake_api::open_catalog_table(meta_path, std::move(creds)).get_future().get(); | ||
| } | ||
| return meta_table; |
There was a problem hiding this comment.
Thread safety issue: The cache uses thread_local, but credentials are passed by value and the path comparison doesn't account for concurrent root_path changes across requests.
Fix: Add mutex protection or ensure credentials equality check when cache hits occur:
| return meta_table; | |
| std::shared_ptr<deeplake_api::catalog_table> get_meta_table(const std::string& path, const icm::string_map<>& creds) | |
| { | |
| if (path != root_path || !meta_table) { | |
| // Cache miss or path changed - open and cache | |
| root_path = path; | |
| const auto meta_path = join_path(path, k_meta_name); | |
| meta_table = deeplake_api::open_catalog_table(meta_path, icm::string_map<>(creds)).get_future().get(); | |
| } | |
| return meta_table; | |
| } |
| try { | ||
| auto table = open_catalog_table(root_path, k_meta_name, std::move(creds)); | ||
| // Use cached meta table handle to avoid repeated S3 opens | ||
| auto table = catalog_table_cache::instance().get_meta_table(root_path, std::move(creds)); |
There was a problem hiding this comment.
Cache invalidation on every error: Invalidating the cache on read errors may cause performance degradation if transient errors occur frequently (e.g., network issues).
Fix: Consider invalidating only on specific error types or after N consecutive failures to avoid unnecessary cache churn.
| "Catalog path %s is a non-directory filesystem artifact. Removing it before catalog initialization.", | ||
| path.c_str()); | ||
| if (!std::filesystem::remove(path, ec) && ec) { | ||
| elog(ERROR, "Failed to migrate local catalog path %s: %s", path.c_str(), ec.message().c_str()); |
There was a problem hiding this comment.
Error handling uses elog(ERROR) which aborts transaction: In the migration path, using elog(ERROR) will abort the entire transaction and may prevent catalog initialization.
Fix: Consider using elog(WARNING) instead and continuing:
| elog(ERROR, "Failed to migrate local catalog path %s: %s", path.c_str(), ec.message().c_str()); | |
| elog(WARNING, "Failed to migrate local catalog path %s: %s - continuing anyway", path.c_str(), ec.message().c_str()); |
| is_catalog = deeplake_api::is_catalog_table(path, icm::string_map<>(creds)).get_future().get(); | ||
| deeplake_api::delete_dataset(path, icm::string_map<>(creds)).get_future().get(); | ||
| } catch (const std::exception& e) { | ||
| elog(ERROR, "Failed to migrate remote catalog path %s: %s", path.c_str(), e.what()); |
There was a problem hiding this comment.
Potential data loss: Deleting a remote dataset without user confirmation could cause data loss if the wrong path is configured.
Fix: Add a GUC setting to control migration behavior or require explicit confirmation.
| // Ensure catalog exists | ||
| pg::dl_catalog::ensure_catalog(root_path, creds); | ||
| // Only ensure catalog on first call or when root_path changes | ||
| if (!catalog_ensured || root_path != last_root_path) { |
There was a problem hiding this comment.
Race condition on root_path change: When root_path changes, the cache is not explicitly invalidated before calling get_catalog_version, potentially returning stale version from the old path.
Fix: Explicitly invalidate cache when path changes:
| if (!catalog_ensured || root_path != last_root_path) { | |
| if (!catalog_ensured || root_path != last_root_path) { | |
| pg::dl_catalog::ensure_catalog(root_path, creds); | |
| catalog_ensured = true; | |
| last_root_path = root_path; | |
| last_catalog_version = 0; // Reset version when path changes | |
| } |
Note: The cache uses thread_local so each worker thread should be isolated, but verify this is the intended behavior.
| // auto-discovered when pointing to a shared root_path | ||
| pg::table_storage::instance().force_load_table_metadata(); | ||
| // Track the previous root_path to detect actual changes | ||
| static thread_local std::string last_root_path; |
There was a problem hiding this comment.
Thread-local static initialization race: Using static thread_local inside a function called from utility hook may not be thread-safe during initialization across concurrent SET commands.
Fix: Consider moving to a proper session-level state or add explicit initialization guards.
|

🚀 🚀 Pull Request
Impact
Description
Things to be aware of
Things to worry about
Additional Context