Skip to Main Content
Apache Oozie
May 2015
Beginner to intermediate content levelBeginner to intermediate
272 pages
7h 22m
English
Content preview from Apache Oozie

Chapter 1. Introduction to Oozie

In this chapter, we cover some of the background and motivations that led to the creation of Oozie, explaining the challenges developers faced as they started building complex applications running on Hadoop.1 We also introduce you to a simple Oozie application. The chapter wraps up by covering the different Oozie releases, their main features, their timeline, compatibility considerations, and some interesting statistics from large Oozie deployments.

Big Data Processing

Within a very short period of time, Apache Hadoop, an open source implementation of Google’s MapReduce paper and Google File System, has become the de facto platform for processing and storing big data.

Higher-level domain-specific languages (DSL) implemented on top of Hadoop’s MapReduce, such as Pig2 and Hive, quickly followed, making it simpler to write applications running on Hadoop.

A Recurrent Problem

Hadoop, Pig, Hive, and many other projects provide the foundation for storing and processing large amounts of data in an efficient way. Most of the time, it is not possible to perform all required processing with a single MapReduce, Pig, or Hive job. Multiple MapReduce, Pig, or Hive jobs often need to be chained together, producing and consuming intermediate data and coordinating their flow of execution.

Tip

Throughout the book, when referring to a MapReduce, Pig, Hive, or any other type of job that runs one or more MapReduce jobs on a Hadoop cluster, we refer to it as a Hadoop job. We ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Getting Started with Kudu

Getting Started with Kudu

Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart

Publisher Resources

ISBN: 9781449369910Errata Page