-
Notifications
You must be signed in to change notification settings - Fork 995
[WiP] Add parity module and default implementation based on PAR2 #4574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Also add support for parity of index files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listed some concerns below
| <Project>{7E119745-1F62-43F0-936C-F312A1912C0B}</Project> | ||
| <Name>Duplicati.Library.AutoUpdater</Name> | ||
| </ProjectReference> | ||
| <ProjectReference Include="..\Library\Backend\OneDrive\Duplicati.Library.Backend.OneDrive.csproj"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always encountered runtime error about unable to find System.Net.Http when running with OneDrive backend. So I temporarily disabled the dependency here
|
|
||
| if (reupload) | ||
| { | ||
| // TODO(cmpute): ensure this is the correct way to re-upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite tricky here to implement re-uploading process with the renaming strategy. Here I just re-encrypt the data and upload again. Any suggestions on improving this will be welcome!
|
The main functionalities are ready, however tests are not yet added. Please take a look any suggestions are welcome :) |
|
I will leave several other features for separate future PRs to prevent this PR from being to big:
|
|
Thanks @cmpute. Sorry for the delay. We are quite resource constrained, so it may take some time before this gets some proper review. We are also trying to balance introducing large new features with transitioning to .NET 5. I did try to run a few simple tests (using Linux) and noticed the following:
|
|
Thanks for the work on this! |
|
@warwickmm I'm not sure what's the reason here, it seems somewhere the inputs for Parity module are null? Could you please share a test case? |
|
And currently I haven't implemented related options to the GUIs, so I only added the dependency to the command line project |
I haven't had time to look into it, but my guess is that duplicati/Duplicati/Library/Parity/Par2Parity.cs Lines 196 to 202 in 0990068
|
|
I get the error when running the following script on Linux: This yields the output: |
|
This pull request has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/the-compacting-process-is-very-dangerous/10832/17 |
|
Sorry for not updating anymore as I don't have much spare time to dig into the issue. My personal tests on my computer works fine, so at least this branch can be tested by more people. |
|
Thanks @cmpute for your work in getting this started. We can leave this as a draft and see if there are others willing to continue the work. |
Summary
This PR adds the functionality to create parity files for the backup (fixes #314). Parity data can be useful when the network connection is not stable or the filenames are been messed in the backend. Part of the implementation is based on #3879. Any suggestions are welcome.
A module interface is defined for parity provider similar to ICompression and IEncrypt interfaces. And a PAR2 module is implemented to utilized the PAR2 standard. Other parity implementations can be integrated through the interface. A requirement for the parity module is that it can produce a single file storing the parity data. It is critical to prevent additional zipping on the parity files, because there's no way to ensure the parity protection on the zipped archive.
Added options
The options for parity modules:
parity-redundancy-level: redundancy level in percentage, default = 0parity-module: the module used to create parity, default to par2 implementation, default =par2The options for par2 module:
par2-block-count-small-file: specify the block size for small files, default = 25par2-block-count-large-file: specify the block size for large files, default = 500par2-file-size-threshold: specify the file size threshold between small and large files, default = 4MBpar2-program-path: specify the path to the PAR2 program. Windows version is bundled, in Linux par2 can be installed via package managerImplementation details
The parity module will act transparently between the main library and backends.
Upload (Backup, Compact)
The parity file is created after when the data volume is compressed and encrypted and before uploading. The suffix for parity files will be like
+.par2to make it easier to distinguish encryption suffix and parity suffix.Download (Compact, Restore, Find)
When a volume downloaded has a hash mismatch, the backend manager will try to find its parity file on the remote based on file name. If present, the parity file will be downloaded and attempt to repair the volume. The parity file should be able to protect itself from bit rot.
The repaired volume will be immediately re-uploaded without changing file name. (this behavior is not finalized)
Remove (Compact, Purge, Delete)
When a data volume is removed in the remote, its parity file will also be removed.
Repair
Small amount of bit rot will be automatically repaired during the download process. Meanwhile, it's possible as well to utilize the parity data during disaster recovery.
Par2 files are able to identify file name change, so if the file names are messed up in the remote, the data volumes can still be repaired with all the parity files. However, this operation requires downloading all parity files, so it should be triggered manually.
Create parity for existing backups
It's feasible to enable the parity for old backups. However, data volumes have to be downloaded in order to create parity files. Thus this operation will be integrated with TEST command. Whenever a data volume is downloaded for checking, a parity file will be created and uploaded if the volume is not damaged and parity functionality is enabled.
PAR2 block count / block size
Here's a short guide to choose block size and block count for par2. Basically
block size * block count ≈ data file size. Par2 support specifying either block size or block count. For simplicity,block countoption is implemented. Generally (ref Parchive/par2cmdline#151 (comment)), larger block count leads to larger parity size overhead (more headers), larger block size leads to less granularity and less probability for successful repairing.