Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add a subcommand to mrjob/tools for counting active emr jobflows and instance counts and print out collected stats#947

Merged
anusha-r merged 20 commits into
Yelp:masterfrom
paiweilai:paiwei-infra-2229
Oct 7, 2014
Merged

Add a subcommand to mrjob/tools for counting active emr jobflows and instance counts and print out collected stats#947
anusha-r merged 20 commits into
Yelp:masterfrom
paiweilai:paiwei-infra-2229

Conversation

@paiweilai

Copy link
Copy Markdown

Adding a subcommand 'collect-emr-active-stats' to mrjob for counting active emr clusters/instances and print the stats

Active jobflows are those in states of BOOTSTRAPPING, RUNNING, WAITING, and STARTING

Output stats including:

  • timestamp: datetime.utcnow().isoformat()
  • num_jobflows: computed by adding all jobflows in active states
  • total_instance_count: summing all instance counts of active jobflows

@paiweilai paiweilai changed the title Paiwei infra 2229 Add 'collect-emr-active-stats' subcommand to mrjob Sep 26, 2014
@paiweilai paiweilai changed the title Add 'collect-emr-active-stats' subcommand to mrjob Add a subcommand to mrjob/tools for counting active emr jobflows and instance counts and print out collected stats Sep 26, 2014
@yalinhuang

Copy link
Copy Markdown
Contributor

Does this tool aim for supporting querying the instantaneous EMR usage stats regarding the moment of making the query, or for any time point?

As to the output, could you provide more formats? For example, it outputs by default json-typed data (which facilitates chaining the output to other programs) but it may output in a human friendlier format if given "--pretty-output" option.

@paiweilai

Copy link
Copy Markdown
Author
  1. currently it collects the EMR stats at the moment of making the query.
  2. the default output format is json, and yes, the pretty output format can be added (will work on it).

Comment thread mrjob/tools/emr/collect_emr_stats.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more specifics about the functionalities, e.g., what are the EMR stats? what is the definition of "active"?

Comment thread mrjob/tools/emr/collect_emr_stats.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try:
import simplejson as json
except ImportError:
import json

...

print json.dumps(stats)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants