Since I’ve used Hadoop for a while, I’m concentrating on the interesting points
from Google’s implementation.
“MapReduce exploits a restricted programming model to parallelize the user
program automatically and to provide transparent fault-tolerance”
I believe this is emblematic of a lot of Google worker. Restricting the API to
give greater guarantees.
- Intermediate keys and values must be the same type as the output keys and
values. This is not the case with the current Hadoop implementation.
- The authors also mention that MapReduce could be applied to a shared memory
system or NUMA multi-processor system
- Notes on Execution
- Intermediate values are written to local disk periodically, partitioned
by key
- It sends these locations to the master
- Reducer learns about the locations from the master and reads the appropriate
partitions via RPC directly from the Mapper
- Once all of the data is received, the reducer sorts the data, possibly with
an external sort
- I wonder if some speculative partial reduction work could be done or some
sort of heap style sort that could start immediately
- However, it’s probably better use of the machine to have it running other
work instead
- Reducer output goes to GFS
- Once the reducer finishes, it atomically renames the output file
- Master does not currently checkpoint, the user must rerun the job
- The authors say it would be easy
- At the time, network bandwidth was scarce
- They had the cluster scheduler assign map tasks on the same machines that
held the data in GFS or very close to the same machine
- Now that Google has a better network fabric, I wonder how it changes things
- They have the ability to skip bad records
- When an error occurs (i.e. segfault), they have a handler that sends a quick
UDP packet to the master with the offset
- If the master receives this message from two workers, when it reschedules
the task, it will notify the worker to skip the record
- This is super cool!
- I didn’t know this existed, but
it does!
- Interestingly, the JavaDoc steals directly from the paper in the
description
- They recognize the benefit of having a local version where the user can debug
with GDB
- Counter values are piggybacked on the heartbeat response