Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on May 29, 2025. It is now read-only.

Using Boto3 to create EMR cluster. #8

@rahul22022

Description

@rahul22022

Hi All,

I am trying to automate the EMR cluster creation using Boto3. Which i am using to create the EMR cluster. I need a cluster created with Impala configured.
Here is the parmas i passed to run_job_flow
Name='AutmateEMR',
ReleaseLabel='emr-4.6.0',
Instances={
'InstanceGroups': [{'InstanceCount':4,'InstanceRole':'CORE','InstanceType':'r3.8xlarge','Name':'slave'},{'InstanceCount':1,'InstanceRole':'MASTER','InstanceType':'r3.8xlarge','Name':'master'}],
'Ec2KeyName': 'MyKey',
'KeepJobFlowAliveWhenNoSteps': True,
'TerminationProtected': False,
'Ec2SubnetId': 'id',
'EmrManagedMasterSecurityGroup': 'value',
'EmrManagedSlaveSecurityGroup': 'value',
'ServiceAccessSecurityGroup': 'value',
},
BootstrapActions=[{'Name': 'Install Impala2','ScriptBootstrapAction': {'Path': 's3://coeus/bigtop/impala/impala-install'}}],
Applications=[{'Name':'Hadoop','Name':'Spark','Name':'Ganglia','Name':'Hive','Name':'Presto-Sandbox'}],
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole',
VisibleToAllUsers=True|False,
Tags=[{"Key":"owner","Value":"myname"}],
Configurations=[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]}]

This code successfully creates the cluster but when i try to run the MapR jobs like distcp on the cluster it throws this error
"Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster"

I created the cluster using the console and passing same parameters the cluster gets created and I am able to run the MapR commands (Distcp) without having any issues. I am not sure why does EMR cluster created with Boto3 has the issues with hadoop config.

Here is the cli export of the cluster i created using the console.

aws emr create-cluster --applications Name=Hadoop Name=Spark Name=Ganglia Name=Presto-Sandbox Name=Hive --bootstrap-actions '[{"Path":"s3://coeus/bigtop/impala/impala-install","Name":"Custom action"}]' --tags 'owner=myname' --ec2-attributes '{"KeyName":"mykey","InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"","SubnetId":"","EmrManagedSlaveSecurityGroup":"","EmrManagedMasterSecurityGroup":""}' --service-role EMR_DefaultRole --release-label emr-4.6.0 --log-uri ' ' --name 'automate' --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"r3.8xlarge","Name":"master"},{"InstanceCount":4,"InstanceGroupType":"CORE","InstanceType":"r3.8xlarge","Name":"slave"}]' --configurations '[{"Classification":"hadoop-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"JAVA_HOME":"/usr/lib/jvm/java-1.8.0"},"Configurations":[]}]}]' --region

I am out of ideas why it should be happening. any help is highly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions