Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ryansimmen
Copy link
Member

@ryansimmen ryansimmen commented Mar 31, 2020

Execute long running backup tasks in parallel by leveraging moreutils parallel.

This resolves https://github.com/github/ghes-infrastructure/issues/386

@ryansimmen ryansimmen requested a review from dbussink April 1, 2020 00:23
@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch 6 times, most recently from 1aa484c to 80f7862 Compare April 1, 2020 02:30
@lildude
Copy link
Member

lildude commented Apr 1, 2020

Two issues come to mind with this, and only because I've thought about this before when considering parallelisation in the past:

  1. This makes the assumption that parallel in the path is GNU parallel.
    This is a problem because GNU parallel is not the same as, and not compatible with, parallel as shipped with moreutils. These changes don't distinguish between the two and thus we could have some weird and unexpected failures.
  2. This is a breaking change with the only indication of it being a change hidden away in the requirements.md.
    This new requirement is going to need to be made explicitly clear in the release notes, and should probably be documented in the README.md too, especially as it might not be easy to install both on the same system on some operating systems. I know Debian/Ubuntu plays nice, but I don't know about other OSes.

With both in mind, I think we need a guard really early in the backup and restore steps (can bung it inshare/github-backup-utils/ghe-backup-config along with the other early checks) that checks for the GNU parallel and fails early if it can't be found.

@lildude
Copy link
Member

lildude commented Apr 1, 2020

As an aside, as this is a breaking change, it should not ship until Backup Utils 2.21.0 at the earliest.

@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch 2 times, most recently from 868a1dc to 7ee1099 Compare April 1, 2020 15:58
@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch 8 times, most recently from fa53e9d to 6c3e4ad Compare April 2, 2020 02:27
Copy link
Member

@snh snh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making these changes @ryansimmen ❤️

It looks like there are some test failures that need to be resolved here.

As @lildude mentioned, we should also hold off merging this for now until closer to 2.21. This will also give us some time to exert this under more varied testing scenarios.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few recommendations. The test failures are legit too.

@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch from 6c3e4ad to 27511e0 Compare April 2, 2020 15:00
@ryansimmen
Copy link
Member Author

  1. This makes the assumption that parallel in the path is GNU parallel.
    This is a problem because GNU parallel is not the same as, and not compatible with, parallel as shipped with moreutils. These changes don't distinguish between the two and thus we could have some weird and unexpected failures.

With both in mind, I think we need a guard really early in the backup and restore steps (can bung it inshare/github-backup-utils/ghe-backup-config along with the other early checks) that checks for the GNU parallel and fails early if it can't be found.

This PR is now using moreutils parallel instead of GNU parallel and if both are installed the code will check the path for the presence of parallel and choose the appropriate executable.

@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch from 66ae7f5 to 7561b14 Compare April 4, 2020 11:57
@ryansimmen
Copy link
Member Author

Here is a thread describing why gawk is needed over mawk https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=593504

@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch from 6a47bf3 to d263c8a Compare April 5, 2020 17:12
@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch 4 times, most recently from 01cabcd to 53912d8 Compare April 6, 2020 01:10
@ryansimmen ryansimmen force-pushed the ryansimmen/parallel-backup branch from 53912d8 to c8eccf8 Compare April 6, 2020 02:25
@ryansimmen ryansimmen merged commit 9449255 into master Apr 7, 2020
This was referenced Jun 9, 2020
@randyr505
Copy link

Glad to see my POC has come to fruition! I can't wait to test this out in our environment! @ryansimmen is there a limit for the parallel rsyncs? In my POC I ran 7000 at a time, lol. Thanks for your great work on this!

@ryansimmen
Copy link
Member Author

Glad to see my POC has come to fruition! I can't wait to test this out in our environment! @ryansimmen is there a limit for the parallel rsyncs? In my POC I ran 7000 at a time, lol. Thanks for your great work on this!

@randyr505 yes, you may simply set GHE_PARALLEL_RSYNC_MAX_JOBS. For reference please see https://github.com/github/backup-utils/blob/master/backup.config-example#L72

timreimherr pushed a commit that referenced this pull request Oct 24, 2023
timreimherr pushed a commit that referenced this pull request Oct 24, 2023
amildahl pushed a commit that referenced this pull request Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants