-
Notifications
You must be signed in to change notification settings - Fork 635
Rc1 #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Rc1 #1765
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I refactored the deepfreeze class into four action classes and externalized all the common routines they might share because DRY, of course...
This now correctly invokes the right action classes. Some required params aren't being enforced, but that's a job for tomorrow.
Added a save_settings method to persist global settings to the deepfreeze status index.
This wasn't working when tryingn to map with filters.
I added several new options and adjustd others so that we can now specify --rotate_by and choose bucket or path. Then the suffix gets appolied either to the bucket name or the path name, depending. The repo name will always get the suffix.
Switched most settings to being part of a Settings object. Completed updating Rotate up through ILM changes. Fully implemented style.
Verified and fixed code for removing old repositories.
For oneup, at least. Need to ensure this works for date-based rotation too.
Removed commented-out code now that I know it's safe
Finally got black configured and disabled Flake. Much happier now.
templated these, which we'll use to track repos and thawsets inside of the status index in elasticsearch
Unit tests for utility classes used by DeepFreeze.
These tests cover all remaining utility (module-level) functions. They could perhaps be collected into a single file.
I plan to do this wherever possible, and anywhere it doesn't cause more problems than it solves.
This is almost certainly incomplete, but I'll add to it as we go along.
This completely breaks a number of things, but I wanted to capture it mid-stream so as not to lose it. Flaky network at BAH.
Set defaults for this code formatter, which is faster than black but can format just as well and to the same standard.
Switched to Ruff. It really wants " instead of '.
Added s3client.py to encapsulate S3 client code for various providers under a consistent inteface. Includes classes S3Client and its implementation classes, plus a factory method to return a client object for a particular provider.
Users can still list all by adding --include-copmleted or -c
1. Added Status Constants (constants.py) - Added THAW_STATUS_IN_PROGRESS, THAW_STATUS_COMPLETED, THAW_STATUS_FAILED, and THAW_STATUS_REFROZEN constants - Created THAW_REQUEST_STATUSES list for validation 2. Updated Refreeze Action (refreeze.py) - Changed status from "completed" to THAW_STATUS_REFROZEN when refreeze completes - Now properly indicates that thawed data has been cleaned up and returned to frozen state 3. Added Retention Setting (helpers.py) - Added thaw_request_retention_days_refrozen setting (default: 35 days) - This aligns with the 30-day max for data to return to Glacier, plus 5 days buffer 4. Updated Cleanup Logic (cleanup.py) - Added handling for "refrozen" status in both _cleanup_old_thaw_requests() and dry-run mode - Refrozen requests are automatically deleted after 35 days 5. Updated Thaw List Filtering (thaw.py - do_list_requests()) - Now excludes both "completed" AND "refrozen" requests by default - Use --include-completed or -c flag to see all requests - Updated help messages to reflect "completed/refrozen" filtering 6. Updated Status Checking (thaw.py) - do_check_status(): Skips refrozen requests with helpful message - do_check_all_status(): Filters out refrozen requests before processing Status Lifecycle The complete thaw request lifecycle is now: 1. in_progress → Thaw operation is actively running 2. completed → Thaw succeeded, data is available and mounted 3. refrozen → Data has been cleaned up via refreeze (new!) 4. failed → Thaw operation failed Retention Periods (Cleanup) - Completed: 7 days (default) - Failed: 30 days (default) - Refrozen: 35 days (new!) All syntax validation passed! The new status properly distinguishes between "thaw completed and data available" vs "thaw was completed but has been cleaned up."
Added descriptions of all actions in markdown.
Due to issues in rotate, not all repos were being marked 'frozen'. This necessitated adding repair_metadata, which can be used should this ever occur again and serves as a foundation for other potential repair work in the future. Updated integration tests and fixes revealed by testing.
1. Parallelized AWS S3 API Calls (10-15x speedup on S3 checks) File: curator/actions/deepfreeze/utilities.py - Modified check_restore_status() to use ThreadPoolExecutor with 15 concurrent workers - Instead of checking objects sequentially (one by one), now checks up to 15 objects in parallel - This is the biggest win - transforms sequential 10,000 API calls from 16+ minutes to ~1 minute Technical details: - boto3 client is thread-safe, making this safe to implement - Separates instant-access objects (no check needed) from Glacier objects (need parallel checking) - Uses concurrent.futures.as_completed() to process results as they arrive 2. Eliminated Redundant Status Checks (2x speedup on overall flow) Files: curator/actions/deepfreeze/thaw.py - Added status caching in both do_check_status() and do_check_all_status() - Modified _display_thaw_status() to accept optional cache parameter - Previously called check_restore_status() twice per repository (once for logic, once for display) - Now caches results from first check and reuses for display 3. Added Progress Indicators (UX improvement) Files: curator/actions/deepfreeze/thaw.py - Shows "Checking repository X of Y..." as each repository is processed - Gives users real-time feedback instead of appearing frozen - Uses existing rich library for clean terminal output 4. Code Quality - All changes pass black formatting - All changes pass ruff linting - Backward compatible - no API changes Expected Performance Improvement Before: ~11 minutes (660 seconds) After: ~1-2 minutes (60-120 seconds) Overall speedup: 5-10x faster! Breakdown: - S3 API calls: 16 minutes → ~1 minute (15x faster) - Redundant checks eliminated: Cut remaining time in half - Total: 11 minutes → 1-2 minutes The exact improvement depends on: - Number of thaw requests - Number of repositories per request - Number of objects per repository - Network latency to AWS S3
Summary of Changes 1. CLI Command (curator/cli_singletons/deepfreeze.py:344-370) Added the -f/--refrozen-retention-days option to the cleanup command: - Short flag: -f (mnemonic for "refrozen") - Long flag: --refrozen-retention-days - Type: integer - Default: None (uses config setting, typically 35 days) 2. Cleanup Action (curator/actions/deepfreeze/cleanup.py) - Updated __init__ to accept refrozen_retention_days parameter - Modified _cleanup_old_thaw_requests() to use CLI override if provided, otherwise fall back to settings value - Applied same logic to do_dry_run() method for consistent behavior - Updated class docstring to document the new parameter 3. Schema Validation Added validation in two places: - option_defaults.py: Created refrozen_retention_days() function with validation (1-365 days range, None allowed) - validators/options.py: Added the option to cleanup's validation schema
1. Added NotFoundError import (line 7) - imported the specific exception
type from elasticsearch8
to handle repository not found errors
2. Added specific exception handling (lines 210-223) - added a new
exception handler that:
- Specifically catches NotFoundError before the generic exception
handler
- Detects when the error is a repository_missing_exception
(indicating the repository has
already been unmounted)
- Logs an INFO level message instead of ERROR: "Repository {name}
has already been unmounted,
no indices to delete"
- Returns gracefully with no indices deleted
- For other NotFoundError cases, logs a WARNING instead of ERROR
Show counts in thaw list output
Detect and fix situation where a thaw request is submitted, acted upon by AWS, but ignored by the requestor. If check-status is run after the data is refrozen by AWS, this detects that and fixes the metadata to show the request as being refrozen so it doesn't languish as a pending request.
Updated test description to reflect the integration tests' unreliable nature.
Author
|
Closed and replaced. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Changes
Known issues