Thanks to visit codestin.com
Credit goes to github.com

Skip to content

brumi1024/docker-hadoop-dev

 
 

Repository files navigation

Table of Contents

  1. Hadock
    1. Getting started
    2. New improvements
      1. Jar files mounted as volume
      2. Remote debugging enabled
      3. Optional ResourceManager High Availability
      4. Extended cluster with additional nodes

Hadock

Hadock is a fork of GitHub - big-data-europe/docker-hadoop: Apache Hadoop docker image, enhanced for self development. For basic usage, please refer to README.md at big-data-europe/docker-hadoop.

Getting started

  1. Copy/hard link a hadoop distribution inside base directory as hadoop.tar.gz. The content of the directory should look like:

    .rw-r–r– root root 963 B Wed Oct 21 17:23:22 2020 Dockerfile
    .rw-r–r– root root 5.1 KB Wed Oct 21 18:25:18 2020 entrypoint.sh .rw-r–r– root root 434.4 MB Tue Jul 14 08:59:46 2020 hadoop.tar.gz

  2. Build the docker images.

    make
    
  3. Expose the directory of the hadoop jar files on the local machine.

    export HADOOP_LOCAL_JAR=
    
  4. Start the containers via docker-compose

    docker-compose up
    

New improvements

Jar files mounted as volume

This feature allows significantly faster development cycle. Simply compile hadoop on your local machine and restart the cluster to have your changes be reflected in Hadock. The HADOOP_LOCAL_JAR is generally the $HADOOP_HOME/hadoop-dist/target/hadoop-$VERSION/share.

Remote debugging enabled

The remote debugging is enabled by setting $HADOOP_OPTS:

export HADOOP_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9999

Each role has its own debugging port, which is exposed by Docker as the following by default:

Container Port
resourcemanager 9999
nodemanager 9998
nodemanager2 9997
nodemanager3 9996
apphistoryserver 9995
jobhistoryserver 9994

In order to change the debug port of a component, overwrite env variable:

export RESOURCEMANAGER_DEBUG=9001

The remote debugging could be disabled by deleting the appropriate DEBUG fields in hadoop.env.

Optional ResourceManager High Availability

In order to enable RMHA, run Hadock with:

docker-compose -f docker-compose-rmha.yml up

WARNING: RMHA requires additional resources (as a zookeeper instance and a second ResourceManager instance is added)

Extended cluster with additional nodes

The cluster consists of two additional nodes, which makes it a 3-node cluster. Feel free to adjust it by deleting/copying the NodeManagers appropriately in docker-compose.yml.

About

Hadoop in Docker enhanced for Hadoop Developers

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 62.0%
  • Dockerfile 22.1%
  • Makefile 8.9%
  • CSS 7.0%