Make delicious recipes!

Hadoop Basics

Hadoop Architechture

  1. Hadoop divides a file into chunks (typically 64 MB in size) and stores each chunk on a DataNode.

  2. Each chunk is replicated multiple times (typically 3 times) to guard against node failure.
    If any node fails, all the chunks in it are automatically copied from other nodes to keep the replication factor same as before.

  3. One node in the Hadoop cluster is called the NameNode.
    This node stores only the meta-data for chunks of files and keeps this information in memory.
    This helps the NameNode to respond very quickly when it is asked about the whereabouts of a file.

  4. When chunks are needed, the NameNode only provides the location.
    Accessing the chunks happens directly from the DataNodes.

Why huge block-sizes?

Lets say, HDFS is storing a 1000Mb file.
With a 4k block size, 256,000 requests will be required to get that file (1 request per block).
In HDFS, those requests go across a network and come with a lot of overhead.
Additionally, each request is processed by the NameNode to figure out the block's physical location.
With 64Mb blocks, the number of requests goes down to 16, which is much much more efficient for network traffic.
It reduces the load on the NameNode and also reduces the meta-data for the entire file, allowing meta-data to be stored in memory.
Thus, for large files, a bigger block size in HDFS is a boon.


Conceptually, map-reduce functions look like:

map (key1, value1) ----> list <key2, value2>

reduce (key2, list<value2>) -----> list <key3, value3>

i.e. map takes a key/value as an input and emits a list of key-value pairs.
Hadoop collects all these emitted key-value pairs, groups them by key and calls reduce for each group.
That's why the input to the "reduce" function is one key but multiple values.
Reduce function is free to emit whatever it wants as the same is just flushed to the HDFS.

Each map or reduce job is called a Task.
And all tasks for one map-reduce work make up one Job.

Like us on Facebook to remain in touch
with the latest in technology and tutorials!

Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Email: (Your email is not shared with anybody)

Facebook comments:

Site Owner: Sachin Goyal