Blog Post

Book: Hadoop, The Definitive Guide (O’Reilly)

Hadoop, the definitive guide

This book gave me the tools I needed to get started with Hadoop.  The author uses easy to understand language to describe a very compicated set of tools.   The example code is easy to download and get started right away.

I was working 3 IT projects, fixing stuff at home, starting a new job, trying to stay fit and thought I needed something to do in my spare time so I’ll learn Hadoop. This book came to our OKC SQL User Group as part of O’Reilly’s user group outreach, but it deserves a review so I read it. I read the 3rd edition, but the 4th is out now for early preview.

The appendix does a good job explaining how to setup a test environment or download a virtual machine. I went the easy route and grabbed a VM from Cloudera. The only thing I would improve is some step by step getting started instructions on how to run commands, but I found that on a cloudera tutorial website. After brushing the dust and rust off my UNIX ‘mad skills’ from the 80’s I was Hadooping in about an hour.

Chapter 1 Meet Hadoop. If you are not familiar with Hadoop, this chapter gives the background of development as well as helps you navigate the terminology.

Chapter 2 MapReduce. This chapter helps you understand MapReduce, the programming model. Map and Reduce are two basic functions in Hadoop. The author walks through examples of each with a small set of weather data and shows visually how jobs and tasks get done in Hadoop.

Chapter 3 The Hadoop Distributed Filesystem (HDFS). The author explains how Hadoop uses HDFS to spread data out across multiple machines. Start with the basic file operations in the examples and it will start to make sense.

This is as far as I’ve made it in this book, but I love it and had to share.

If you enjoy it, please get a copy at

Here’s the Table of Contents for the rest of the book:
Chapter 4 Hadoop I/O

Chapter 5 Developing a MapReduce Application

Chapter 6 How MapReduce Works

Chapter 7 MapReduce Types and Formats

Chapter 8 MapReduce Features

Chapter 9 Setting Up a Hadoop Cluster

Chapter 10 Administering Hadoop

Chapter 11 Pig

Chapter 12 Hive

Chapter 13 HBase

Chapter 14 ZooKeeper

Chapter 15 Sqoop –  This is the connection tool to query Hadoop data using SQL.  I already skipped ahead to this and the examples help this chapter make sense of Hadoop for a SQL guy.

Chapter 16 Case Studies
Appendix Installing Apache Hadoop

Image Credit:  O’Reilly publishing.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts