Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

maclarel
Copy link
Contributor

@maclarel maclarel commented Sep 8, 2020

This introduces a parallelized restore of storage data, with a number of rsync threads equal to the number of storage nodes. This is the same logic used for ghe-restore-repositories, simply ported over to ghe-restore-storage.

For customers that are heavy users of LFS in a clustered environment, this can have significant performance improvements, with a reduction in run time equivalent to the number of nodes. Specifically, rsync only utilizes a single thread of sshd, so when high transfer speeds are possible it is likely that sshd will become CPU bound resulting in limited transfer speed.

For example, restoring ~5TB of data across 5 storage nodes would complete in approximately 16 hours assuming a transfer speed of 100MB/s (roughly where we see sshd become CPU bound) as the restores would be run sequentially. Assuming sufficient bandwidth for transfers at 500MB/s (achievable on a 10Gbit connection) this could reduce the overall time to approximately 3 hours as all 5 rsync invocations would be run simultaneously and would utilize 1 thread per server effectively quintupling performance.

Verbose log confirms that all 3 are being kicked off at the same time, which aligns what what is seen for ghe-restore-repositories behaviour:

Sep 08 17:38:45 ghe-restore-storage: sent 3,418 bytes  received 73 bytes  2,327.33 bytes/sec
Sep 08 17:38:45 ghe-restore-storage: total size is 14,652,931  speedup is 4,197.34
Sep 08 17:38:45 ghe-restore-storage: sending incremental file list
Sep 08 17:38:45 ghe-restore-storage:
Sep 08 17:38:45 ghe-restore-storage: sent 3,418 bytes  received 73 bytes  2,327.33 bytes/sec
Sep 08 17:38:45 ghe-restore-storage: total size is 14,652,931  speedup is 4,197.34
Sep 08 17:38:45 ghe-restore-storage: sending incremental file list
Sep 08 17:38:45 ghe-restore-storage:
Sep 08 17:38:45 ghe-restore-storage: sent 3,421 bytes  received 76 bytes  2,331.33 bytes/sec
Sep 08 17:38:45 ghe-restore-storage: total size is 14,652,931  speedup is 4,190.14

@maclarel maclarel requested a review from omgitsads September 8, 2020 17:49
@maclarel maclarel marked this pull request as ready for review September 14, 2020 13:57
Copy link
Member

@omgitsads omgitsads left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this. I think it looks good to me 👍 .

Have you tried this out with a large amount of data, to get an idea of the improvement? Given this is just rsync'ing data from /data/user/storage, you could generate a decent chunk of random data with dd and confirm it's doing what you expect.

@maclarel
Copy link
Contributor Author

maclarel commented Sep 17, 2020

Was originally bottlenecked by my home connection speed for effectively testing this, but wound up spinning up an EC2 instance with a 5Gbit NIC to remove that problem :) With that said, disk performance on my backup host will still likely be a bottleneck as I'm seeing transfer and disk speeds plummet the more I use my test instance.

Added 6GB worth of 1GB files to random locations for the storage restore into the current backup dir:

dd if=/dev/zero of=data/current/storage/0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat bs=1G count=1
dd if=/dev/zero of=data/current/storage/6/61/5f/615fd57d6491104c28e6e106test2.dat bs=1G count=1
dd if=/dev/zero of=data/current/storage/f/f3/98/f3985f87c02780d404a7ffe6test3.dat bs=1G count=1
dd if=/dev/zero of=data/current/storage/3/3c/70/3c7075f9865d684417eb5289test4.dat bs=1G count=1
dd if=/dev/zero of=data/current/storage/d/d3/c7/d3c70c81ace70adc9afe304test5.dat bs=1G count=1
dd if=/dev/zero of=data/current/storage/7/76/31/7631588f512e56afb1405c8test6.dat bs=1G count=1

Time without parallelism (~9 min):

Sep 17 16:43:50 ghe-restore-storage: * Transferring data to storage-server-0b2851a6-3883-11ea-8074-0e9b925ad0bd ...
Sep 17 16:45:39 ghe-restore-storage: sending incremental file list
Sep 17 16:45:39 ghe-restore-storage: 0/06/7a/
Sep 17 16:45:39 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 16:45:39 ghe-restore-storage: 3/3c/70/
Sep 17 16:45:39 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 16:45:39 ghe-restore-storage: 6/61/5f/
Sep 17 16:45:39 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 16:45:39 ghe-restore-storage: 7/76/31/
Sep 17 16:45:39 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 16:45:39 ghe-restore-storage: d/d3/c7/
Sep 17 16:45:39 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 16:45:39 ghe-restore-storage: f/f3/98/
Sep 17 16:45:39 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 16:45:39 ghe-restore-storage:
Sep 17 16:45:39 ghe-restore-storage: sent 6,444,027,991 bytes  received 234 bytes  58,849,572.83 bytes/sec
Sep 17 16:45:39 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 16:45:40 ghe-restore-storage: * Transferring data to storage-server-0b81cab0-3883-11ea-8376-021bb6bfffc5 ...
Sep 17 16:47:25 ghe-restore-storage: sending incremental file list
Sep 17 16:47:25 ghe-restore-storage: 0/06/7a/
Sep 17 16:47:25 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 16:47:25 ghe-restore-storage: 3/3c/70/
Sep 17 16:47:25 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 16:47:25 ghe-restore-storage: 6/61/5f/
Sep 17 16:47:25 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 16:47:25 ghe-restore-storage: 7/76/31/
Sep 17 16:47:25 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 16:47:25 ghe-restore-storage: d/d3/c7/
Sep 17 16:47:25 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 16:47:25 ghe-restore-storage: f/f3/98/
Sep 17 16:47:25 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 16:47:25 ghe-restore-storage:
Sep 17 16:47:25 ghe-restore-storage: sent 6,444,027,991 bytes  received 234 bytes  61,080,836.26 bytes/sec
Sep 17 16:47:25 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 16:47:26 ghe-restore-storage: * Transferring data to storage-server-0e0139d8-3883-11ea-b6f4-0ee2882f8799 ...
Sep 17 16:52:52 ghe-restore-storage: sending incremental file list
Sep 17 16:52:52 ghe-restore-storage: 0/06/7a/
Sep 17 16:52:52 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 16:52:52 ghe-restore-storage: 3/3c/70/
Sep 17 16:52:52 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 16:52:52 ghe-restore-storage: 6/61/5f/
Sep 17 16:52:52 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 16:52:52 ghe-restore-storage: 7/76/31/
Sep 17 16:52:52 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 16:52:52 ghe-restore-storage: d/d3/c7/
Sep 17 16:52:52 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 16:52:52 ghe-restore-storage: f/f3/98/
Sep 17 16:52:52 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 16:52:52 ghe-restore-storage:
Sep 17 16:52:52 ghe-restore-storage: sent 6,444,027,991 bytes  received 234 bytes  19,736,686.75 bytes/sec
Sep 17 16:52:52 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 16:52:52 ghe-restore-storage: Finalizing routes

Time with parallelism (3 min 40 sec):

GHE_PARALLEL_ENABLED=yes
GHE_PARALLEL_MAX_JOBS=3
GHE_PARALLEL_RSYNC_MAX_JOBS=3
GHE_PARALLEL_MAX_LOAD=75
Sep 17 18:25:36 ghe-restore-storage: * Transferring data to storage-server-0b2851a6-3883-11ea-8074-0e9b925ad0bd ...
Sep 17 18:25:36 ghe-restore-storage: * Transferring data to storage-server-0e0139d8-3883-11ea-b6f4-0ee2882f8799 ...
Sep 17 18:25:36 ghe-restore-storage: * Transferring data to storage-server-0b81cab0-3883-11ea-8376-021bb6bfffc5 ...
Sep 17 18:29:15 ghe-restore-storage: sending incremental file list
Sep 17 18:29:15 ghe-restore-storage: 0/06/7a/
Sep 17 18:29:15 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 18:29:15 ghe-restore-storage: 3/3c/70/
Sep 17 18:29:15 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 18:29:15 ghe-restore-storage: 6/61/5f/
Sep 17 18:29:15 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 18:29:15 ghe-restore-storage: 7/76/31/
Sep 17 18:29:15 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 18:29:15 ghe-restore-storage: d/d3/c7/
Sep 17 18:29:15 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 18:29:15 ghe-restore-storage: f/f3/98/
Sep 17 18:29:15 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 18:29:15 ghe-restore-storage:
Sep 17 18:29:15 ghe-restore-storage: sent 6,444,027,999 bytes  received 234 bytes  29,357,759.60 bytes/sec
Sep 17 18:29:15 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 18:29:15 ghe-restore-storage: sending incremental file list
Sep 17 18:29:15 ghe-restore-storage: 0/06/7a/
Sep 17 18:29:15 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 18:29:15 ghe-restore-storage: 3/3c/70/
Sep 17 18:29:15 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 18:29:15 ghe-restore-storage: 6/61/5f/
Sep 17 18:29:15 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 18:29:15 ghe-restore-storage: 7/76/31/
Sep 17 18:29:15 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 18:29:15 ghe-restore-storage: d/d3/c7/
Sep 17 18:29:15 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 18:29:15 ghe-restore-storage: f/f3/98/
Sep 17 18:29:15 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 18:29:15 ghe-restore-storage:
Sep 17 18:29:15 ghe-restore-storage: sent 6,444,027,999 bytes  received 234 bytes  29,357,759.60 bytes/sec
Sep 17 18:29:15 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 18:29:16 ghe-restore-storage: sending incremental file list
Sep 17 18:29:16 ghe-restore-storage: 0/06/7a/
Sep 17 18:29:16 ghe-restore-storage: 0/06/7a/067a71e0a4e77d17cf5e1ab4test1.dat
Sep 17 18:29:16 ghe-restore-storage: 3/3c/70/
Sep 17 18:29:16 ghe-restore-storage: 3/3c/70/3c7075f9865d684417eb5289test4.dat
Sep 17 18:29:16 ghe-restore-storage: 6/61/5f/
Sep 17 18:29:16 ghe-restore-storage: 6/61/5f/615fd57d6491104c28e6e106test2.dat
Sep 17 18:29:16 ghe-restore-storage: 7/76/31/
Sep 17 18:29:16 ghe-restore-storage: 7/76/31/7631588f512e56afb1405c8test6.dat
Sep 17 18:29:16 ghe-restore-storage: d/d3/c7/
Sep 17 18:29:16 ghe-restore-storage: d/d3/c7/d3c70c81ace70adc9afe304test5.dat
Sep 17 18:29:16 ghe-restore-storage: f/f3/98/
Sep 17 18:29:16 ghe-restore-storage: f/f3/98/f3985f87c02780d404a7ffe6test3.dat
Sep 17 18:29:16 ghe-restore-storage:
Sep 17 18:29:16 ghe-restore-storage: sent 6,444,027,999 bytes  received 234 bytes  29,224,617.84 bytes/sec
Sep 17 18:29:16 ghe-restore-storage: total size is 6,457,562,678  speedup is 1.00
Sep 17 18:29:16 ghe-restore-storage: Finalizing routes

I'll note that I burned a fair bit of time on this getting moreutils installed since a package that includes parallel isn't available for RHEL-based distros (e.g. RHEL, CentOS, Amazon Linux), so you're required to compile it from source as we can't use GNU Parallel (which is the only version available through a package manager as far as I can tell). We should consider either having the requirements be a Debian based system, or providing the required binaries as part of backup-utils. cc @ryansimmen

I've also verified that restores to an HA environment still function (with parallelism enabled or disabled) and are otherwise unaffected by these updates 🎉

@ryansimmen
Copy link
Member

@maclarel I think you can just install moreutils-parallel for RHEL instead of compiling from source.

@maclarel maclarel merged commit d95134c into master Sep 17, 2020
@maclarel maclarel deleted the parallelize_storage_restore branch September 17, 2020 20:02
@jianghao0718 jianghao0718 changed the title Add parallelized restore capability to ghe-restore-storage Add Parallelized Restore Capability to ghe-restore-storage Sep 23, 2020
@jianghao0718 jianghao0718 changed the title Add Parallelized Restore Capability to ghe-restore-storage Add parallelized restore capability to ghe-restore-storage Sep 23, 2020
This was referenced Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants