Erasure Coding in Windows Azure Storage
- Microsoft, 2012
Given a file, if we break it into 6 chunks we can achieve reliability with 3 parity chunks.
The overhead is then 1.5x
(6 + 3)/6 = 1.5
We can break the file into 12 chunks and we need 4 parity chunks.
The overhead is then 1.33x
(12 + 4)/12 = 1.33
Now, when there is a missing block, you need to retrieve 12 blocks instead of 6.
This means 2x disk and network IO.
They note that traditional erasure codes assume that 1 failure is as likely as 2 failures.
However, 1 failure is a lot less likely than 2 failures at the same time.
There solution is to break the file into two block of 6 chunks.
They create 2 file parity blocks as before.
Then they create 1 parity block for each half.
This means, when there is a single block missing, they only need to retrieve 6 blocks instead of 12.
Other notes about storage
Finding a needle in Haystack: Facebook’s photo storage
GFS: Evolution on Fast-forward
RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
The Google File System
XORing Elephants: Novel Erasure Codes for Big Data
[USC & Facebook, 2013]