-
Notifications
You must be signed in to change notification settings - Fork 177
Using Boto3 to create EMR cluster. #8
Description
Hi All,
I am trying to automate the EMR cluster creation using Boto3. Which i am using to create the EMR cluster. I need a cluster created with Impala configured.
Here is the parmas i passed to run_job_flow
Name='AutmateEMR',
ReleaseLabel='emr-4.6.0',
Instances={
'InstanceGroups': [{'InstanceCount':4,'InstanceRole':'CORE','InstanceType':'r3.8xlarge','Name':'slave'},{'InstanceCount':1,'InstanceRole':'MASTER','InstanceType':'r3.8xlarge','Name':'master'}],
'Ec2KeyName': 'MyKey',
'KeepJobFlowAliveWhenNoSteps': True,
'TerminationProtected': False,
'Ec2SubnetId': 'id',
'EmrManagedMasterSecurityGroup': 'value',
'EmrManagedSlaveSecurityGroup': 'value',
'ServiceAccessSecurityGroup': 'value',
},
BootstrapActions=[{'Name': 'Install Impala2','ScriptBootstrapAction': {'Path': 's3://coeus/bigtop/impala/impala-install'}}],
Applications=[{'Name':'Hadoop','Name':'Spark','Name':'Ganglia','Name':'Hive','Name':'Presto-Sandbox'}],
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole',
VisibleToAllUsers=True|False,
Tags=[{"Key":"owner","Value":"myname"}],
Configurations=[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]}]
This code successfully creates the cluster but when i try to run the MapR jobs like distcp on the cluster it throws this error
"Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster"
I created the cluster using the console and passing same parameters the cluster gets created and I am able to run the MapR commands (Distcp) without having any issues. I am not sure why does EMR cluster created with Boto3 has the issues with hadoop config.
Here is the cli export of the cluster i created using the console.
aws emr create-cluster --applications Name=Hadoop Name=Spark Name=Ganglia Name=Presto-Sandbox Name=Hive --bootstrap-actions '[{"Path":"s3://coeus/bigtop/impala/impala-install","Name":"Custom action"}]' --tags 'owner=myname' --ec2-attributes '{"KeyName":"mykey","InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"","SubnetId":"","EmrManagedSlaveSecurityGroup":"","EmrManagedMasterSecurityGroup":""}' --service-role EMR_DefaultRole --release-label emr-4.6.0 --log-uri ' ' --name 'automate' --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"r3.8xlarge","Name":"master"},{"InstanceCount":4,"InstanceGroupType":"CORE","InstanceType":"r3.8xlarge","Name":"slave"}]' --configurations '[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]}]' --region
I am out of ideas why it should be happening. any help is highly appreciated.