Unicorn: A System for Searching the Social Graph - Facebook, 2013

  • [Paper] [Mirror]

  • In-memory index of the social graph
  • The Graph
    • thousands of edge types
    • many edges have tightly coupled inverse edges
      • ex. People like a page and the page has likers
    • Lots of nodes but generally sparse
      • Thus an adjacency list makes sense
    • Served up by TAO
  • Data Model
    • You can ask Unicorn for the adjacency list for a particular source vertex by its ID and the edge type:

      likes: 5

      Returns a sorted list of pages the user likes called hits.

    • Hit - a (DocId, HitData) pair
      • DocId is a pair of (sort-key, id)
        • The vertex must have the same sort key in all lists
      • HitData is a byte array for app specific data
        • filtering results / scoring
        • I like how this gives you the flexibility to bring data into the posting list or leave it in TAO. However, you’d need to be careful not to make the posting list large. Even if you have enough RAM you still have small caches and limited RAM bandwidth.

          You could also store data in the forward index described later.

    • Sharded by document (i.e. all hits for a given vertex are on the same node)
      • Great for partitions
      • Many set ops can be done on the leaf (since all instances of the result will be on that shard)
      • I note that you can skip waiting for the long tail of results
      • I also note that this effectively splits the source vertex across multiple servers
    • More terms than a text index, but the postings lists are generally much shorter
    • Updates go to a mutable index layer superimpose on the immutable layer
      • They also need to deal with deleted
      • Probably with a tombstone, though it is not mentioned
    • Posting list reference by <edge-type>:<id>
    • They optimize for decoding posting lists that exceed a CPU’s cache-line
      • Most of time is spent in memory accesses
    • Posting lists are all loaded in memory
    • They split the indices into vertical
      • Verticals hold all the postinglists for the same result type
        • For example friend and likers are in the same vertical (both user IDs) where likes is not necessarily (results in a page ID)
      • Top-aggregators can combine results from multiple verticals
      • IDs that will never be intersected or merged need not be in the same leaf
  • Architecture
    • Client library chooses a cluster close geographically
    • Life of a query:
      • Send to a top-aggregator
      • Top-aggregator send the query to one rack-aggregator per rack
      • Send query to all servers in the rack (each is responsible for a shard)
      • Rack-aggregator combines and truncates the results
        • Can’t necessarily truncate the results at the shard because of re-ranking
      • Top-aggregator combines and truncates the results
    • Rack-aggregators are used because bandwidth within a rack is greater than BW between servers on different racks
      • I think I’ve heard some datacenters using a network fabric that allows for close to equal BW inter-rack and intra-rack. Locality is still probably helpful.
    • Rack-aggregators know how to forward the request to another replica shard if one of the shards in the rack is down
    • Rack-aggregators are actually run on every machine on the rack (stateless)
    • A single replica can have “thousands of machines”
    • Sometimes they build smaller subsets of verticals to help deal with the load
  • Query Language
    • Designed for composition
    • Operators:
      • AND - Intersect
      • OR - Union
      • DIFFERENCE - Results from the first set no in the second
      • WEAKAND - “Allows operands to be missing from some fraction of the results within an index shard”
        • Clients can specify an optional_count or optional_weight
          • i.e. absolute number for a shard or a fraction of the results
        • Example of WEAKAND searching for people named “Melanie Mars*” with a preference to friends of the user.

            (term friend:3 :optional-hits 2)
            (term melanie)
            (term mars*)
        • Used to enforce diversity in the results
      • APPLY - Similar to apply in JavaScript

        apply(prefix, innerQuery)

        Allows for queries like:

        (apply tagged: friend:5)

        Instead of:

        (or tagged:<id0> ... tagged:<idn>)
        • Obviously the client could do this, but keeps the work in the cluster where the latency in smaller
        • Since the client is showing “semantic intent” there are opportunities for query planning
      • EXTRACT - Get data from the forward index
        • For some relationships there would be “billions of ‘one-to-few’ mappings” and that would require more ram per shard than feasible
        • Instead they partial denormalize and put data in the forward index
        • They give the example of looking for “People tagged in photos of Jon Jones”

          Instead of looking up photos of Jon Jones and then people tagged in the photos, they store the ids of people tagged in the forward index of the photos.

        • “Typically, the forward index entry has already been fetched in order to score the source document”
  • Scoring
    • There is a forward index on each shard
    • The client request can select a scoring function that resides on the shard
    • Scoring function can use the forward index data
    • The paper references a discussion on maintaining result diversity with the aggregators here
  • Privacy
    • They don’t want to do the privacy checks inside Unicorn
      • Already have all the business logic written
    • Each result has a JSON or THrift struct that describes the possible ways to get to the result, termed the lineage
    • Sometimes on the order of several KB, but it hasn’t been a problem