Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cmpute
Copy link

@cmpute cmpute commented Jul 2, 2021

Summary

This PR adds the functionality to create parity files for the backup (fixes #314). Parity data can be useful when the network connection is not stable or the filenames are been messed in the backend. Part of the implementation is based on #3879. Any suggestions are welcome.

A module interface is defined for parity provider similar to ICompression and IEncrypt interfaces. And a PAR2 module is implemented to utilized the PAR2 standard. Other parity implementations can be integrated through the interface. A requirement for the parity module is that it can produce a single file storing the parity data. It is critical to prevent additional zipping on the parity files, because there's no way to ensure the parity protection on the zipped archive.

Added options

The options for parity modules:

  • parity-redundancy-level: redundancy level in percentage, default = 0
  • parity-module: the module used to create parity, default to par2 implementation, default = par2

The options for par2 module:

  • par2-block-count-small-file: specify the block size for small files, default = 25
  • par2-block-count-large-file: specify the block size for large files, default = 500
  • par2-file-size-threshold: specify the file size threshold between small and large files, default = 4MB
  • par2-program-path: specify the path to the PAR2 program. Windows version is bundled, in Linux par2 can be installed via package manager

Implementation details

The parity module will act transparently between the main library and backends.

Upload (Backup, Compact)

The parity file is created after when the data volume is compressed and encrypted and before uploading. The suffix for parity files will be like +.par2 to make it easier to distinguish encryption suffix and parity suffix.

Download (Compact, Restore, Find)

When a volume downloaded has a hash mismatch, the backend manager will try to find its parity file on the remote based on file name. If present, the parity file will be downloaded and attempt to repair the volume. The parity file should be able to protect itself from bit rot.

The repaired volume will be immediately re-uploaded without changing file name. (this behavior is not finalized)

Remove (Compact, Purge, Delete)

When a data volume is removed in the remote, its parity file will also be removed.

Repair

Small amount of bit rot will be automatically repaired during the download process. Meanwhile, it's possible as well to utilize the parity data during disaster recovery.

Par2 files are able to identify file name change, so if the file names are messed up in the remote, the data volumes can still be repaired with all the parity files. However, this operation requires downloading all parity files, so it should be triggered manually.

Create parity for existing backups

It's feasible to enable the parity for old backups. However, data volumes have to be downloaded in order to create parity files. Thus this operation will be integrated with TEST command. Whenever a data volume is downloaded for checking, a parity file will be created and uploaded if the volume is not damaged and parity functionality is enabled.

PAR2 block count / block size

Here's a short guide to choose block size and block count for par2. Basically block size * block count ≈ data file size. Par2 support specifying either block size or block count. For simplicity, block count option is implemented. Generally (ref Parchive/par2cmdline#151 (comment)), larger block count leads to larger parity size overhead (more headers), larger block size leads to less granularity and less probability for successful repairing.

@cmpute cmpute marked this pull request as ready for review July 4, 2021 01:00
Copy link
Author

@cmpute cmpute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listed some concerns below

<Project>{7E119745-1F62-43F0-936C-F312A1912C0B}</Project>
<Name>Duplicati.Library.AutoUpdater</Name>
</ProjectReference>
<ProjectReference Include="..\Library\Backend\OneDrive\Duplicati.Library.Backend.OneDrive.csproj">
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always encountered runtime error about unable to find System.Net.Http when running with OneDrive backend. So I temporarily disabled the dependency here


if (reupload)
{
// TODO(cmpute): ensure this is the correct way to re-upload
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite tricky here to implement re-uploading process with the renaming strategy. Here I just re-encrypt the data and upload again. Any suggestions on improving this will be welcome!

@cmpute
Copy link
Author

cmpute commented Jul 4, 2021

The main functionalities are ready, however tests are not yet added. Please take a look any suggestions are welcome :)

@cmpute
Copy link
Author

cmpute commented Jul 4, 2021

I will leave several other features for separate future PRs to prevent this PR from being to big:

  1. Following the renaming strategy when uploading block volumes or index volumes failed.
  2. Utilize parity data in the recovery command-line tool.
  3. Make sure parity function works in the GUI.

@warwickmm
Copy link
Member

Thanks @cmpute. Sorry for the delay. We are quite resource constrained, so it may take some time before this gets some proper review. We are also trying to balance introducing large new features with transitioning to .NET 5.

I did try to run a few simple tests (using Linux) and noticed the following:

  1. There appear to be a few missing project dependencies. Duplicati.Library.Parity should at least be added to the Duplicati.GUI.TrayIcon and Duplicati.Server projects.
  2. When trying to run a command-line backup with parity enabled, I encountered the following error:
Fatal error => File name cannot be null.
Parameter name: sourceFileName

System.AggregateException: One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName (File name cannot be null.
Parameter name: sourceFileName) (One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName))) ---> System.AggregateException: File name cannot be null.
Parameter name: sourceFileName (File name cannot be null.
Parameter name: sourceFileName) (One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName)) ---> System.ArgumentNullException: File name cannot be null.
Parameter name: sourceFileName
at System.IO.File.Move (System.String sourceFileName, System.String destFileName) [0x00003] in <62b430b945fa49a19a75382ef03e7bed>:0 
at Duplicati.Library.Parity.Par2Parity.Create (System.String inputfile, System.String outputfile, System.String inputname) [0x00027] in <607aa00fded44e43a2d0aee67fa9f238>:0 
at Duplicati.Library.Main.Operation.Common.BackendHandler+FileEntryItem.CreateParity (Duplicati.Library.Main.Options options) [0x00060] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.Library.Main.Operation.Backup.BackendUploader.UploadBlockAndIndexAsync (Duplicati.Library.Main.Operation.Backup.VolumeUploadRequest upload, Duplicati.Library.Main.Operation.Backup.BackendUploader+Worker worker, System.Threading.CancellationToken cancelToken) [0x003ab] in <cc86451d7e164a24b846eebe5812c36b>:0 
at CoCoL.AutomationExtensions.RunTask[T] (T channels, System.Func`2[T,TResult] method, System.Boolean catchRetiredExceptions) [0x000d4] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
at Duplicati.Library.Main.Operation.BackupHandler.FlushBackend (Duplicati.Library.Main.BackupResults result, CoCoL.IWriteChannel`1[T] uploadtarget, System.Threading.Tasks.Task uploader) [0x00222] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x01003] in <cc86451d7e164a24b846eebe5812c36b>:0 
 --- End of inner exception stack trace ---
at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x014fc] in <cc86451d7e164a24b846eebe5812c36b>:0 
 --- End of inner exception stack trace ---
at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x0005d] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x0000a] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004d] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0029a] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.Library.Main.Controller.Backup (System.String[] inputsources, Duplicati.Library.Utility.IFilter filter) [0x00091] in <cc86451d7e164a24b846eebe5812c36b>:0 
at Duplicati.CommandLine.Commands.Backup (System.IO.TextWriter outwriter, System.Action`1[T] setup, System.Collections.Generic.List`1[T] args, System.Collections.Generic.Dictionary`2[TKey,TValue] options, Duplicati.Library.Utility.IFilter filter) [0x0013b] in <06e68cc1737e4d018f5428961e44dc27>:0 
at (wrapper delegate-invoke) System.Func`6[System.IO.TextWriter,System.Action`1[Duplicati.Library.Main.Controller],System.Collections.Generic.List`1[System.String],System.Collections.Generic.Dictionary`2[System.String,System.String],Duplicati.Library.Utility.IFilter,System.Int32].invoke_TResult_T1_T2_T3_T4_T5(System.IO.TextWriter,System.Action`1<Duplicati.Library.Main.Controller>,System.Collections.Generic.List`1<string>,System.Collections.Generic.Dictionary`2<string, string>,Duplicati.Library.Utility.IFilter)
at Duplicati.CommandLine.Program.ParseCommandLine (System.IO.TextWriter outwriter, System.Action`1[T] setup, System.Boolean& verboseErrors, System.String[] args) [0x003df] in <06e68cc1737e4d018f5428961e44dc27>:0 
at Duplicati.CommandLine.Program.RunCommandLine (System.IO.TextWriter outwriter, System.IO.TextWriter errwriter, System.Action`1[T] setup, System.String[] args) [0x00004] in <06e68cc1737e4d018f5428961e44dc27>:0 

@kenkendk
Copy link
Member

Thanks for the work on this!
It has been on my wishlist for quite some time.
I like that you wrap the PAR2 tool and leave it open to change implementation later.

@warwickmm warwickmm marked this pull request as draft August 11, 2021 17:38
@cmpute
Copy link
Author

cmpute commented Aug 12, 2021

@warwickmm I'm not sure what's the reason here, it seems somewhere the inputs for Parity module are null? Could you please share a test case?

@cmpute
Copy link
Author

cmpute commented Aug 12, 2021

And currently I haven't implemented related options to the GUIs, so I only added the dependency to the command line project

@warwickmm
Copy link
Member

@warwickmm I'm not sure what's the reason here, it seems somewhere the inputs for Parity module are null? Could you please share a test case?

I haven't had time to look into it, but my guess is that inputname is null here:

public void Create(string inputfile, string outputfile, string inputname = null)
{
// Move input to working directory
if (string.IsNullOrEmpty(inputname))
inputname = Path.GetFileName(inputfile);
var movedfile = Path.Combine(m_work_dir, inputname);
File.Move(inputfile, movedfile);

@warwickmm
Copy link
Member

I get the error when running the following script on Linux:

source_dir=$(mktemp -d)
local_db=$(mktemp)
mono Duplicati/CommandLine/bin/Debug/Duplicati.CommandLine.exe "backup" "file:///tmp/tmpgqcuflet" "${source_dir}" --backup-name="Local" --dbpath="${local_db}" --no-encryption --parity-redundancy-level=10

This yields the output:

Backup started at 8/13/2021 9:29:21 AM
Checking remote backup ...
  Listing remote folder ...
  Listing remote folder ...
Scanning local files ...
  0 files need to be examined (0 bytes)
Running PAR2 command: par2 create -q -r10 -b25 -n1 "duplicati-b54e86b550ab04716beb4f55c3fc2ffdc.dblock.zip.par2" "duplicati-b54e86b550ab04716beb4f55c3fc2ffdc.dblock.zip"

Opening: duplicati-b54e86b550ab04716beb4f55c3fc2ffdc.dblock.zip
Done
  Uploading file (530 bytes) ...
  Uploading file (565 bytes) ...
  Uploading file (1.86 KB) ...
Fatal error => File name cannot be null.
Parameter name: sourceFileName

System.AggregateException: One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName (File name cannot be null.
Parameter name: sourceFileName) (One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName))) ---> System.AggregateException: File name cannot be null.
Parameter name: sourceFileName (File name cannot be null.
Parameter name: sourceFileName) (One or more errors occurred. (File name cannot be null.
Parameter name: sourceFileName)) ---> System.ArgumentNullException: File name cannot be null.
Parameter name: sourceFileName
...

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/the-compacting-process-is-very-dangerous/10832/17

@cmpute
Copy link
Author

cmpute commented Jan 20, 2022

Sorry for not updating anymore as I don't have much spare time to dig into the issue. My personal tests on my computer works fine, so at least this branch can be tested by more people.

@warwickmm
Copy link
Member

Thanks @cmpute for your work in getting this started. We can leave this as a draft and see if there are others willing to continue the work.

@kenkendk kenkendk changed the title Add parity module and default implementation based on PAR2 [WiP] Add parity module and default implementation based on PAR2 Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add parity file support

4 participants