-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDDS-1713. ReplicationManager fail to find proper node topology based… #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… on Datanode details from heartbeat. Contributed by Xiaoyu Yao.
|
💔 -1 overall
This message was automatically generated. |
|
/retest |
| } else { | ||
| // Get the datanode details again from node manager with the topology info | ||
| // for registered datanodes. | ||
| datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not rely on the IP address of datanode in NetworkTopology path, instead we should use the datanode UUID. It is possible that more than one datanode process is running on the same machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one DN instances on the same machine are most likely from test/dev environment such as MiniOzoneCluster. In production, even containers in K8S has dedicate IPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But IP address can change for the same datanode. In fact, we have a Jira to remove it in the future from the yaml file: HDDS-1480
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property "dfs.datanode.use.datanode.hostname" is used to control whether use IP address or hostname. Use Ip address or hostname, current exiting hadoop/hdfs/yarn topology tools/customer mgt scripts can be reused. It would be easy for user to adopt Ozone. @xiaoyuyao, I can take over this if you are fully occupied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one DN instances on the same machine are most likely from test/dev environment such as MiniOzoneCluster. In production, even containers in K8S has dedicate IPs.
I agree, but the problem here is that after this change the test/dev environment where there are more than one datanode process running in same machine will not even work properly. Heartbeat from different datanode process (running on same machine) will be mapped to a single process and all the other datanode process will be marked as dead even though they are heartbeating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nandakumar131, yes. We will need to handle this case for the minicluster based tests.
The current topology awareness is based on a map of ip/dns->location, I think change it to uuid->location should work as long we have a mapping from uuid->ip/dns maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on changing it to uuid -> location and maintaining a map for uuid -> ip/dns.
| } else { | ||
| // Get the datanode details again from node manager with the topology info | ||
| // for registered datanodes. | ||
| datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xiaoyu, node can use Ipaddress or hostname as topology network name.
Maybe we should refactor nodeManager.getNode function, pass datanodeDetails in. Then make whether use Ipaddress or hostname as network topology name an inner logic in the getNode function.
| } else { | ||
| // Get the datanode details again from node manager with the topology info | ||
| // for registered datanodes. | ||
| datanodeDetails = nodeManager.getNode(datanodeDetails.getIpAddress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property "dfs.datanode.use.datanode.hostname" is used to control whether use IP address or hostname. Use Ip address or hostname, current exiting hadoop/hdfs/yarn topology tools/customer mgt scripts can be reused. It would be easy for user to adopt Ozone. @xiaoyuyao, I can take over this if you are fully occupied.
|
New patch to support datanode uuid -> ip/hostname -> network path mapping. Also resolve the heartbeat issue. |
|
💔 -1 overall
This message was automatically generated. |
|
The JIRA has been taken over with a different PR. |
…in changelog (apache#1008) * Throw a record too large exception for changelog oversized records * Change implementation to handle large messages in CachedStore based on user defined configs * Address review and change new Scala classes to Java * Address review and add a test case
… on Datanode details from heartbeat. Contributed by Xiaoyu Yao.