Spanner: Google's Globally-Distributed Database - Google, 2012

  • [Paper] [Mirror]
  • [OSDI’12] [Video]
  • Features:
    • Designed for cross DC replication.
    • Spanner is a multi-versioned database.
      • Each piece of data is assigned a timestamp.
    • Spanner provides external consistency.
      • That is, transactions occur in the order that they happened in the real world.
    • SQL-based query language
  • TrueTime: Exposes uncertainty in the clock by representing time as an interval.
    • TT.now() returns [earliest, latest]
      • earliest - The earliest possible timestamp for the current time.
      • latest - The latest possible timestamp for the current time.
    • This means that TrueTime is guarantying that the current time is somewhere in the interval [earliest, latest].
    • They use a combination of GPS clocks and atomic clocks.
    • The size of the interval is generally less than 10 ms.
      • Since we are working with distributed consensus across the world, 10 ms is pretty small.
  • Transactions with TrueTime:

    “We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”

    • The system slows down the transaction until the system to ‘wait out’ the uncertainty of the clock.
    • Example of a transaction (2PL)
      1. Acquire locks
      2. Call TT.now(), Let s = TT.now().latest.
      3. Do work.
      4. Wait until TT.now().earliest > s.
    • Example of a transaction (with 2PC)
      1. Acquire locks
      2. Compute s
      3. Participants start logging
      4. Finish logging, send Prepared to Coordinator with s
      5. Coordinator computes overall s
      6. Coordinator releases locks, sends Committed and the overall s value to Participants
      7. Participants release locks