Originally designed for joining search query data with ad click data
These are two separate streams that do not have a guaranteed global order
It’s not unusual to have one servers clicks be severely delayed
The streams come from many front end servers via logs
It’s not uncommon for one stream to be delayed
They didn’t want to send the query data with the URL, just an ID
Giant URLs
Have to trust the user data
They can never charge an advertiser twice, but they want to make sure they
join all of the results eventually, leading to:
at most-once at any time
near-exact in real-time
exactly-once eventually
They achieve fault tolerance partially but running the join in multiple
DCs
They use a system called the IdRegistry which stops multiple outputs of
the join in a read-modify-write transaction
Each cluster retries the join until their is a success
IdRegistry
KV store, PaxosDB
Before an output is written, the node attempts to commit in IdRegisty using
the join key
If the record already exists, it skips it, assuming it has been joined
successfully
While the IdRegisty commit is atomic, the node could still fail after
writing the value
There is a system that periodically checks and re-injects events that were
not present in the output but present in the IdRegisty
For performance, the node checks the IdRegistry before performing the join
lookup, but confirms at the end with the atomic transaction
Since the IdRegisty is located in geographically distance DCs, there can
only be a few transactions per second (less than 10) as the latency between
DCs is high
They batch many transactions into one Paxos transaction
They make sure two transactions don’t conflict in one batch at the
application level
They also shard the IdRegisty
The partitioning can be adjusted dynamically
They can put a new map in PaxosDB which is effective from a specified
timestamp onwards