-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17818. Fix serial fsimage transfer during checkpoint with multiple namenodes #7862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
π -1 overall
This message was automatically generated. |
|
π +1 overall
This message was automatically generated. |
|
@Hexiaoqiao @ayushtkn @tomscut Do you think that uploading fsimage in checkpoint with observer namenode should be changed from serial to parallel? |
|
π -1 overall
This message was automatically generated. |
79408b5 to
1a62b21
Compare
|
π -1 overall
This message was automatically generated. |
5d88e72 to
bd9a615
Compare
|
π -1 overall
This message was automatically generated. |
|
π -1 overall
This message was automatically generated. |
bd9a615 to
10ae4be
Compare
|
π -1 overall
This message was automatically generated. |
10ae4be to
d577987
Compare
|
π +1 overall
This message was automatically generated. |
2c6c308 to
8258d01
Compare
|
π -1 overall
This message was automatically generated. |
abb5732 to
f766baa
Compare
|
π -1 overall
This message was automatically generated. |
f766baa to
16490a1
Compare
16490a1 to
77e60f1
Compare
|
π -1 overall
This message was automatically generated. |
In our cluster, each namespace has four NameNodes: one active, one standby, and two observers. When the standby NameNode performs a checkpoint, it transfer the fsimage to the other three NameNodes. However, we found that these transfer are performed serially.
The reason is that the corePoolSize in ThreadPoolExecutor is 0, and the transfer task does not fill the LinkedBlockingQueue, resulting in only one thread transfer the fsimage at a time. This greatly increases the checkpoint time.
Β ExecutorService executor = new ThreadPoolExecutor(0, activeNNAddresses.size(), 100, Β Β Β Β TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(activeNNAddresses.size()), Β Β Β Β uploadThreadFactory);