Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Apache Airflow #1221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2019
Merged

Add Apache Airflow #1221

merged 1 commit into from
Jan 28, 2019

Conversation

duyet
Copy link
Contributor

@duyet duyet commented Jan 28, 2019

What is this Python project?

Apache Airflow: Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

What's the difference between this Python project and similar ones?

Airflow vs. Luigi:

Airflow

  • Easy-to-use UI (+)
  • Built in scheduler (+)
  • Easy testing of DAGs (+)
  • Separates output data and task state (+)
  • Strong and active community (+)
    Luigi
  • Creating and testing tasks is difficult (-)
  • The UI is challenging to navigate (-)
  • Not scalable due to tight coupling with cron jobs; the number of worker processes is bounded by number of cron workers assigned to a job (-)
  • Re-running pipelines is not possible

Airflow vs. Oozie

Airflow

  • Python Code for DAGs (+)
  • Has connectors for every major service/cloud provider (+)
  • More versatile (+)
  • Advanced metrics (+)
  • Better UI and API (+)
  • Capable of creating extremely complex workflows (+)
  • Jinja Templating (+)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

Oozie

  • Java or XML for DAGs (---)
  • Hard to build complex pipelines (-)
  • Smaller, less active community (-)
  • Worse WEB GUI (-)
  • Java API (-)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

--

Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.

@duyet duyet changed the title Update README.md Add Apache Airflow Jan 28, 2019
@vinta vinta merged commit e780eb6 into vinta:master Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants