PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
- CMU, 2012
Partitioning a natural graph is hard.
Random partitioning of vertices leads to most edges crossing machine boundaries.
This means that there is a ton of communication between machines.
GAS Decomposition - They recognized most graph algos had a similar steps.
Gather: Accumulate information about the neighborhood
Apply: Apply the accumulated value to center vertex
Scatter: Update adjacent edges and vertices
Replicate the vertex across the nodes such that no edge crosses a machine boundary.
Now we can run the vertex program in parallel, only needing to synchronize at the apply step.
Other notes about graph
GraphChi: Large-Scale Graph Computation on Just a PC
Pregel: A System for Large-Scale Graph Processing