Hadock is a fork of GitHub - big-data-europe/docker-hadoop: Apache Hadoop docker image, enhanced for self development. For basic usage, please refer to README.md at big-data-europe/docker-hadoop.
-
Copy/hard link a hadoop distribution inside base directory as hadoop.tar.gz. The content of the directory should look like:
.rw-r–r– root root 963 B Wed Oct 21 17:23:22 2020 Dockerfile
.rw-r–r– root root 5.1 KB Wed Oct 21 18:25:18 2020 entrypoint.sh .rw-r–r– root root 434.4 MB Tue Jul 14 08:59:46 2020 hadoop.tar.gz -
Build the docker images.
make
-
Expose the directory of the hadoop jar files on the local machine.
export HADOOP_LOCAL_JAR=
-
Start the containers via docker-compose
docker-compose up
This feature allows significantly faster development cycle. Simply compile hadoop on your local machine and restart the cluster to have your changes be reflected in Hadock. The HADOOP_LOCAL_JAR is generally the $HADOOP_HOME/hadoop-dist/target/hadoop-$VERSION/share.
The remote debugging is enabled by setting $HADOOP_OPTS:
export HADOOP_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9999
Each role has its own debugging port, which is exposed by Docker as the following by default:
Container | Port |
---|---|
resourcemanager | 9999 |
nodemanager | 9998 |
nodemanager2 | 9997 |
nodemanager3 | 9996 |
apphistoryserver | 9995 |
jobhistoryserver | 9994 |
In order to change the debug port of a component, overwrite env variable:
export RESOURCEMANAGER_DEBUG=9001
The remote debugging could be disabled by deleting the appropriate DEBUG fields in hadoop.env.
In order to enable RMHA, run Hadock with:
docker-compose -f docker-compose-rmha.yml up
WARNING: RMHA requires additional resources (as a zookeeper instance and a second ResourceManager instance is added)
The cluster consists of two additional nodes, which makes it a 3-node cluster. Feel free to adjust it by deleting/copying the NodeManagers appropriately in docker-compose.yml.