This is a discussion with Google engineer Sean Quinlan on GFS.
  - Single Master
    
  
- 64 MB CHunk Size
    
  
- Throughput vs. Latency
    
      - GFS was designed for high throughput, high latency is OK
- BigTable, built on top of GFS has a commit log on GFS
- To alleviate the intermittent delays to write to the log, BigTable has two
open commit logs and switches if one is slow
- Gmail uses a multihomed approach across DCs
 
- Consistency
    
      - GFS does not guarantee that all of the replicas of a chunk are byte-wise
identical
- Duplicate records or half written records can appear
        
          - GFS deals with half written records
- Application has to deal with duplicates
 
- When you read you aren’t guaranteed to get the latest data
- People did not expect this behavior so it was surprising
- Quinlan believes the right approach is to just have one writer per file
 
- Snapshot
    
      - They worked hard on a system to do great snapshots (really clones)
- Quinlan notes that the feature is not used that often, despite it being
really hard to build