# Erasure Coding in Windows Azure Storage - Microsoft, 2012

• [Paper] [Mirror]
• [ATC’12] [Video]

• Given a file, if we break it into 6 chunks we can achieve reliability with 3 parity chunks.
• The overhead is then 1.5x (6 + 3)/6 = 1.5.
• We can break the file into 12 chunks and we need 4 parity chunks.
• The overhead is then 1.33x (12 + 4)/12 = 1.33.
• Now, when there is a missing block, you need to retrieve 12 blocks instead of 6.
• This means 2x disk and network IO.
• They note that traditional erasure codes assume that 1 failure is as likely as 2 failures.
• However, 1 failure is a lot less likely than 2 failures at the same time.
• There solution is to break the file into two block of 6 chunks.
• They create 2 file parity blocks as before.
• Then they create 1 parity block for each half.
• This means, when there is a single block missing, they only need to retrieve 6 blocks instead of 12.