Erasure Coding in Windows Azure Storage - Microsoft, 2012

[Paper] [Mirror]
[ATC’12] [Video]
Given a file, if we break it into 6 chunks we can achieve reliability with 3 parity chunks.
- The overhead is then 1.5x (6 + 3)/6 = 1.5.
We can break the file into 12 chunks and we need 4 parity chunks.
- The overhead is then 1.33x (12 + 4)/12 = 1.33.
Now, when there is a missing block, you need to retrieve 12 blocks instead of 6.
- This means 2x disk and network IO.
They note that traditional erasure codes assume that 1 failure is as likely as 2 failures.
- However, 1 failure is a lot less likely than 2 failures at the same time.
There solution is to break the file into two block of 6 chunks.
- They create 2 file parity blocks as before.
- Then they create 1 parity block for each half.
- This means, when there is a single block missing, they only need to retrieve 6 blocks instead of 12.