This is a discussion with Google engineer Sean Quinlan on GFS.
- Single Master
- 64 MB CHunk Size
- Throughput vs. Latency
- GFS was designed for high throughput, high latency is OK
- BigTable, built on top of GFS has a commit log on GFS
- To alleviate the intermittent delays to write to the log, BigTable has two
open commit logs and switches if one is slow
- Gmail uses a multihomed approach across DCs
- Consistency
- GFS does not guarantee that all of the replicas of a chunk are byte-wise
identical
- Duplicate records or half written records can appear
- GFS deals with half written records
- Application has to deal with duplicates
- When you read you aren’t guaranteed to get the latest data
- People did not expect this behavior so it was surprising
- Quinlan believes the right approach is to just have one writer per file
- Snapshot
- They worked hard on a system to do great snapshots (really clones)
- Quinlan notes that the feature is not used that often, despite it being
really hard to build