-
PageRank and rank-reversal dependence on the damping factor
Authors:
Seung-Woo Son,
Claire Christensen,
Peter Grassberger,
Maya Paczuski
Abstract:
PageRank (PR) is an algorithm originally developed by Google to evaluate the importance of web pages. Considering how deeply rooted Google's PR algorithm is to gathering relevant information or to the success of modern businesses, the question of rank-stability and choice of the damping factor (a parameter in the algorithm) is clearly important. We investigate PR as a function of the damping facto…
▽ More
PageRank (PR) is an algorithm originally developed by Google to evaluate the importance of web pages. Considering how deeply rooted Google's PR algorithm is to gathering relevant information or to the success of modern businesses, the question of rank-stability and choice of the damping factor (a parameter in the algorithm) is clearly important. We investigate PR as a function of the damping factor d on a network obtained from a domain of the World Wide Web, finding that rank-reversal happens frequently over a broad range of PR (and of d). We use three different correlation measures, Pearson, Spearman, and Kendall, to study rank-reversal as d changes, and show that the correlation of PR vectors drops rapidly as d changes from its frequently cited value, $d_0=0.85$. Rank-reversal is also observed by measuring the Spearman and Kendall rank correlation, which evaluate relative ranks rather than absolute PR. Rank-reversal happens not only in directed networks containing rank-sinks but also in a single strongly connected component, which by definition does not contain any sinks. We relate rank-reversals to rank-pockets and bottlenecks in the directed network structure. For the network studied, the relative rank is more stable by our measures around $d=0.65$ than at $d=d_0$.
△ Less
Submitted 23 January, 2012;
originally announced January 2012.
-
Sampling properties of directed networks
Authors:
Seung-Woo Son,
Claire Christensen,
Golnoosh Bizhani,
David V. Foster,
Peter Grassberger,
Maya Paczuski
Abstract:
For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed ne…
▽ More
For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed networks, a description of the effects of BFS sampling on other topological properties are all but absent from the literature. To systematically study the effects of sampling biases on directed networks, we compare BFS sampling to random sampling on complete large-scale directed networks. We present new results and a thorough analysis of the topological properties of seven different complete directed networks (prior to sampling), including three versions of Wikipedia, three different sources of sampled World Wide Web data, and an Internet-based social network. We detail the differences that sampling method and coverage can make to the structural properties of sampled versions of these seven networks. Most notably, we find that sampling method and coverage affect both the bow-tie structure, as well as the number and structure of strongly connected components in sampled networks. In addition, at low sampling coverage (i.e. less than 40%), the values of average degree, variance of out-degree, degree auto-correlation, and link reciprocity are overestimated by 30% or more in BFS-sampled networks, and only attain values within 10% of the corresponding values in the complete networks when sampling coverage is in excess of 65%. These results may cause us to rethink what we know about the structure, function, and evolution of real-world directed networks.
△ Less
Submitted 13 October, 2012; v1 submitted 6 January, 2012;
originally announced January 2012.
-
Random Sequential Renormalization and Agglomerative Percolation in Networks: Application to Erd"os-R'enyi and Scale-free Graphs
Authors:
Golnoosh Bizhani,
Peter Grassberger,
Maya Paczuski
Abstract:
We study the statistical behavior under random sequential renormalization(RSR) of several network models including Erd"os R'enyi (ER) graphs, scale-free networks and an annealed model (AM) related to ER graphs. In RSR the network is locally coarse grained by choosing at each renormalization step a node at random and joining it to all its neighbors. Compared to previous (quasi-)parallel renormaliza…
▽ More
We study the statistical behavior under random sequential renormalization(RSR) of several network models including Erd"os R'enyi (ER) graphs, scale-free networks and an annealed model (AM) related to ER graphs. In RSR the network is locally coarse grained by choosing at each renormalization step a node at random and joining it to all its neighbors. Compared to previous (quasi-)parallel renormalization methods [C.Song et.al], RSR allows a more fine-grained analysis of the renormalization group (RG) flow, and unravels new features, that were not discussed in the previous analyses. In particular we find that all networks exhibit a second order transition in their RG flow. This phase transition is associated with the emergence of a giant hub and can be viewed as a new variant of percolation, called agglomerative percolation. We claim that this transition exists also in previous graph renormalization schemes and explains some of the scaling laws seen there. For critical trees it happens as N/N0 -> 0 in the limit of large systems (where N0 is the initial size of the graph and N its size at a given RSR step). In contrast, it happens at finite N/N0 in sparse ER graphs and in the annealed model, while it happens for N/N0 -> 1 on scale-free networks. Critical exponents seem to depend on the type of the graph but not on the average degree and obey usual scaling relations for percolation phenomena. For the annealed model they agree with the exponents obtained from a mean-field theory. At late times, the networks exhibit a star-like structure in agreement with the results of Radicchi et. al. While degree distributions are of main interest when regarding the scheme as network renormalization, mass distributions (which are more relevant when considering 'supernodes' as clusters) are much easier to study using the fast Newman-Ziff algorithm for percolation, allowing us to obtain very high statistics.
△ Less
Submitted 12 December, 2011; v1 submitted 21 September, 2011;
originally announced September 2011.
-
Clustering Drives Assortativity and Community Structure in Ensembles of Networks
Authors:
David V. Foster,
Jacob G. Foster,
Peter Grassberger,
Maya Paczuski
Abstract:
Clustering, assortativity, and communities are key features of complex networks. We probe dependencies between these attributes and find that ensembles with strong clustering display both high assortativity by degree and prominent community structure, while ensembles with high assortativity are much less biased towards clustering or community structure. Further, clustered networks can amplify smal…
▽ More
Clustering, assortativity, and communities are key features of complex networks. We probe dependencies between these attributes and find that ensembles with strong clustering display both high assortativity by degree and prominent community structure, while ensembles with high assortativity are much less biased towards clustering or community structure. Further, clustered networks can amplify small homophilic bias for trait assortativity. This marked asymmetry suggests that transitivity, rather than homophily, drives the standard nonsocial/social network dichotomy.
△ Less
Submitted 5 January, 2011; v1 submitted 10 December, 2010;
originally announced December 2010.
-
Random Sequential Renormalization of Networks I: Application to Critical Trees
Authors:
Golnoosh Bizhani,
Vishal Sood,
Maya Paczuski,
Peter Grassberger
Abstract:
We introduce the concept of Random Sequential Renormalization (RSR) for arbitrary networks. RSR is a graph renormalization procedure that locally aggregates nodes to produce a coarse grained network. It is analogous to the (quasi-)parallel renormalization schemes introduced by C. Song {\it et al.} (Nature {\bf 433}, 392 (2005)) and studied more recently by F. Radicchi {\it et al.} (Phys. Rev. Lett…
▽ More
We introduce the concept of Random Sequential Renormalization (RSR) for arbitrary networks. RSR is a graph renormalization procedure that locally aggregates nodes to produce a coarse grained network. It is analogous to the (quasi-)parallel renormalization schemes introduced by C. Song {\it et al.} (Nature {\bf 433}, 392 (2005)) and studied more recently by F. Radicchi {\it et al.} (Phys. Rev. Lett. {\bf 101}, 148701 (2008)), but much simpler and easier to implement. In this first paper we apply RSR to critical trees and derive analytical results consistent with numerical simulations. Critical trees exhibit three regimes in their evolution under RSR: (i) An initial regime $N_0^ν\lesssim N<N_0$, where $N$ is the number of nodes at some step in the renormalization and $N_0$ is the initial size. RSR in this regime is described by a mean field theory and fluctuations from one realization to another are small. The exponent $ν=1/2$ is derived using random walk arguments. The degree distribution becomes broader under successive renormalization -- reaching a power law, $p_k\sim 1/k^γ$ with $γ=2$ and a variance that diverges as $N_0^{1/2}$ at the end of this regime. Both of these results are derived based on a scaling theory. (ii) An intermediate regime for $N_0^{1/4}\lesssim N \lesssim N_0^{1/2}$, in which hubs develop, and fluctuations between different realizations of the RSR are large. Crossover functions exhibiting finite size scaling, in the critical region $N\sim N_0^{1/2} \to \infty$, connect the behaviors in the first two regimes. (iii) The last regime, for $1 \ll N\lesssim N_0^{1/4}$, is characterized by the appearance of star configurations with a central hub surrounded by many leaves. The distribution of sizes where stars first form is found numerically to be a power law up to a cutoff that scales as $N_0^{ν_{star}}$ with $ν_{star}\approx 1/4$.
△ Less
Submitted 23 March, 2011; v1 submitted 20 September, 2010;
originally announced September 2010.
-
Edge direction and the structure of networks
Authors:
Jacob G. Foster,
David V. Foster,
Peter Grassberger,
Maya Paczuski
Abstract:
Directed networks are ubiquitous and are necessary to represent complex systems with asymmetric interactions---from food webs to the World Wide Web. Despite the importance of edge direction for detecting local and community structure, it has been disregarded in studying a basic type of global diversity in networks: the tendency of nodes with similar numbers of edges to connect. This tendency, call…
▽ More
Directed networks are ubiquitous and are necessary to represent complex systems with asymmetric interactions---from food webs to the World Wide Web. Despite the importance of edge direction for detecting local and community structure, it has been disregarded in studying a basic type of global diversity in networks: the tendency of nodes with similar numbers of edges to connect. This tendency, called assortativity, affects crucial structural and dynamic properties of real-world networks, such as error tolerance or epidemic spreading. Here we demonstrate that edge direction has profound effects on assortativity. We define a set of four directed assortativity measures and assign statistical significance by comparison to randomized networks. We apply these measures to three network classes---online/social networks, food webs, and word-adjacency networks. Our measures (i) reveal patterns common to each class, (ii) separate networks that have been previously classified together, and (iii) expose limitations of several existing theoretical models. We reject the standard classification of directed networks as purely assortative or disassortative. Many display a class-specific mixture, likely reflecting functional or historical constraints, contingencies, and forces guiding the system's evolution.
△ Less
Submitted 7 November, 2010; v1 submitted 28 August, 2009;
originally announced August 2009.
-
Correlated dynamics in human printing behavior
Authors:
Uli Harder,
Maya Paczuski
Abstract:
Arrival times of requests to print in a student laboratory were analyzed. Inter-arrival times between subsequent requests follow a universal scaling law relating time intervals and the size of the request, indicating a scale invariant dynamics with respect to the size. The cumulative distribution of file sizes is well-described by a modified power law often seen in non-equilibrium critical syste…
▽ More
Arrival times of requests to print in a student laboratory were analyzed. Inter-arrival times between subsequent requests follow a universal scaling law relating time intervals and the size of the request, indicating a scale invariant dynamics with respect to the size. The cumulative distribution of file sizes is well-described by a modified power law often seen in non-equilibrium critical systems. For each user, waiting times between their individual requests show long range dependence and are broadly distributed from seconds to weeks. All results are incompatible with Poisson models, and may provide evidence of critical dynamics associated with voluntary thought processes in the brain.
△ Less
Submitted 7 December, 2004;
originally announced December 2004.
-
A dynamical model of a GRID market
Authors:
Uli Harder,
Peter Harrison,
Maya Paczuski,
Tejas Shah
Abstract:
We discuss potential market mechanisms for the GRID. A complete dynamical model of a GRID market is defined with three types of agents. Providers, middlemen and users exchange universal GRID computing units (GCUs) at varying prices. Providers and middlemen have strategies aimed at maximizing profit while users are 'satisficing' agents, and only change their behavior if the service they receive i…
▽ More
We discuss potential market mechanisms for the GRID. A complete dynamical model of a GRID market is defined with three types of agents. Providers, middlemen and users exchange universal GRID computing units (GCUs) at varying prices. Providers and middlemen have strategies aimed at maximizing profit while users are 'satisficing' agents, and only change their behavior if the service they receive is sufficiently poor or overpriced. Preliminary results from a multi-agent numerical simulation of the market model shows that the distribution of price changes has a power law tail.
△ Less
Submitted 2 October, 2004;
originally announced October 2004.