How Degree-of-Separation Search Works

The degree-of-separation search works by building a network map displaying the linking structure of a list of Web sites returned in response to a Google query. For example, a search to get the betweenness of “Hillary Clinton” works as follows:


1. Starts by entering the search string “Hillary Clinton” into Google.
2. Take the top ten, or another small number of Web sites returned to query “Hillary Clinton”.
3. Get the top ten, or another small number of Web sites pointing to each of returned Web sites in step 2 by executing a “link:URL” query, where URL is one of the top ten Web sites returned in step 2. The Google “link” query returns the “significant” Web sites linking to a specific URL. For Google “significant” means that the linking Web sites themselves are linked by other Web sites with a page rank larger than 0.
4. Get the top ten Web sites pointing to each of returned Web sites in step 3. Repeat step 4 up to desired degree of separation from the original top ten Web sites collected in step 2. Usually it is sufficient, however, to stop here at step 4.

Figure 1. Degree-of-separation Web search for query “Hillary Clinton”

Figure 1 illustrates the network map returned to the query “Hillary Clinton”. The level-1 nodes are the ones connected directly to the query, i.e. the original search results. Level-2 nodes are the most highly ranked search results returned by the “link” query, to each of the top ten level-1 nodes. Level-3 nodes are the most highly Google-ranked nodes returned by the “link” queries of each of the level-2 nodes. Figure 1 already gives a visual overview of the betweenness of each of the level-1 and level-2 nodes. The more links a node has pointing to it, the more between it is. For example the node labeled http://clinton.senate.gov is linked by a group of level 2 nodes which themselves are linked by groups of level-3 nodes. This indicates that the node http://clinton.senate.gov will have fairly high betweenness itself.

Betweenness ranges from 0, for nodes which are totally peripheral, to 1, for nodes which are on all shortest paths. The most between node in figure 2 is the search query “Hillary Clinton” itself, with a value of 0.61. The second most between node is indeed, as figure 2 illustrates, http://clinton.senate.gov with a betweenness value of 0.36. Some other high-betweenness nodes are www.ovaloffice2008.com and www.hillaryclinton.com.

Figure 2. Betweenness of Web Sites indicated by size of square or circle

Top Google search results do not necessarily have highest betweenness centrality. Google sorts search results by its patented “Page Rank” algorithm, which looks at what Web pages link back to a particular page. It also weights the links to the page by the page rank of the originating page. In terms of social network analysis Google measures the in-degree of a page, that is the number of incoming links. Page rank is a local algorithm, because it only factors in all the nearest neighbors of the page it is measuring. It includes page-rank of the neighbors, weighting incoming links higher from sites that themselves have a high page rank. Betweeness, on the other hand, is basically a global concept, because it looks at all the shortest paths within the entire network that are going through a particular site. Frequently, a node which has a high page rank will also have high betweenness, but this is not necessarily the case.