Guide

Airflow vs. Oozie


We often get questions regarding the differences between Airflow and Oozie. Below you'll find a summary of the two tools and a comparison of the two communities surrounding the projects.

TL;DR

Airflow leverages growing use of python to allow you to create extremely complex workflows, while Oozie allows you to write your workflows in Java and XML. The open-source community supporting Airflow is 20x the size of the community supporting Oozie.

Airflow Overview

Created by Airbnb Data Engineer Maxime Beauchemin, Airflow is an open source workflow management system designed for authoring, scheduling, and monitoring workflows as DAGs, or directed acyclic graphs. All workflows are designed in python and it is currently the most popular open source workflow management tool on the market.

Oozie Overview

Oozie is an open-source workflow scheduling system written in Java for Hadoop systems. Oozie has a coordinator that allows for jobs to be triggered by time, event, or data availability and allows you to schedule jobs via command line, Java API, and a GUI. It supports XML property files and uses an SQL database to log metadata pertaining to task orchestration.

While it has been used successfully by a few teams, it has been reported that Oozie has difficulty handling complex pipelines and has an underdeveloped GUI that is challenging to navigate.

Key Differences

Python vs. Java

As mentioned above, Airflow allows you to write your DAGs in Python while Oozie uses Java or XML. Per Codecademy's recent report, the Python community has grown exponentially in recent years, and even excelled to the most active programming language on Stack Overflow in 2017:

pythongraph

Community

Airflow is the most active workflow management tool on the market and has 8,636 stars on Github and 491 active contributors. See below for an image documenting code changes caused recent commits to the project.

airflow

Oozie has 386 stars and 16 active contributors on Github. See below for an image documenting code changes caused by recent commits to the project.

oozie

Note that, with open source projects, community contributions are significant in that they're reflective of the community's faith in the future of the project and indicate that features are actively being developed.

Other Features

As pointed out by Stack Overflow user Michele De Simoni, there are a few reasons why Airflow is preferred over Oozie by the community for workflow management.

Airflow

  • Python Code for DAGs (+)
  • Has connectors for every major service/cloud provider (+)
  • More versatile (+)
  • Advanced metrics (+)
  • Better UI and API (+)
  • Capable of creating extremely complex workflows (+)
  • Jinja Templating (+)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

Oozie

  • Java or XML for DAGs (---)
  • Hard to build complex pipelines (-)
  • Smaller, less active community (-)
  • Worse WEB GUI (-)
  • Java API (-)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

Ready to build your data workflows with Airflow?

Astronomer is the data engineering platform built by developers for developers. Send data anywhere with automated Apache Airflow workflows, built in minutes...