Stephen Holiday
Articles
Projects
Notes
Travel
Resume
Contact
RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
- Facebook, 2011
[
Paper
] [
Mirror
]
Conventional Data Placement
Row Store
Have to read all columns, can’t skip them
Compression ratio is lower
Column Store
Expensive to reconstruct records
The columns could be on different machines
Can avoid by creating materialized views of columns accessed together frequently
Hybrid PAX (Column store in each disk page)
Designed to optimize the CPU cache
Still need to read the whole page from disk
RCFile takes after hybrid PAX, but uses larger sizes
Also adds lazy decompression, columns that are not used are not decompressed until they are really needed
Consider a scan with a where clause, if the column is not part of the predicate, the column only needs to be decompressed if the predicate matches
Each HDFS block contains a series of
row groups
Each row group contains:
Sync marker
Metadata Header
Pointer to the start of each column
Uses run length encoding
Compressed columns
Currently Gzip on high
At some point, increasing the row group size provides diminishing compression returns
They use 4MB
A large row group size also makes lazy decompression less effective
More likely that the row in the column will be needed
Other notes about storage
Erasure Coding in Windows Azure Storage
[Microsoft, 2012]
Finding a needle in Haystack: Facebook’s photo storage
[Facebook, 2010]
GFS: Evolution on Fast-forward
[Google, 2009]
The Google File System
[Google, 2003]
XORing Elephants: Novel Erasure Codes for Big Data
[USC & Facebook, 2013]