-
Introducing Total Harmonic Resistance for Graph Robustness under Edge Deletions
Authors:
Lukas Berner,
Henning Meyerhenke
Abstract:
Assessing and improving the robustness of a graph $G$ are critical steps in network design and analysis. To this end, we consider the optimisation problem of removing $k$ edges from $G$ such that the resulting graph has minimal robustness, simulating attacks or failures. In this paper, we propose total harmonic resistance as a new robustness measure for this purpose - and compare it to the recentl…
▽ More
Assessing and improving the robustness of a graph $G$ are critical steps in network design and analysis. To this end, we consider the optimisation problem of removing $k$ edges from $G$ such that the resulting graph has minimal robustness, simulating attacks or failures. In this paper, we propose total harmonic resistance as a new robustness measure for this purpose - and compare it to the recently proposed forest index [Zhu et al., IEEE Trans.\ Inf.\ Forensics and Security, 2023]. Both measures are related to the established total effective resistance measure, but their advantage is that they can handle disconnected graphs. This is also important for originally connected graphs due to the removal of the $k$ edges. To compare our measure with the forest index, we first investigate exact solutions for small examples. The best $k$ edges to select when optimizing for the forest index lie at the periphery. Our proposed measure, in turn, prioritizes more central edges, which should be beneficial for most applications. Furthermore, we adapt a generic greedy algorithm to our optimization problem with the total harmonic resistance. With this algorithm, we perform a case study on the Berlin road network and also apply the algorithm to established benchmark graphs. The results are similar as for the small example graphs above and indicate the higher suitability of the new measure.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms
Authors:
Svetlana Kulagina,
Henning Meyerhenke,
Anne Benoit
Abstract:
Scientific workflows are often represented as directed acyclic graphs (DAGs), where vertices correspond to tasks and edges represent the dependencies between them. Since these graphs are often large in both the number of tasks and their resource requirements, it is important to schedule them efficiently on parallel or distributed compute systems. Typically, each task requires a certain amount of m…
▽ More
Scientific workflows are often represented as directed acyclic graphs (DAGs), where vertices correspond to tasks and edges represent the dependencies between them. Since these graphs are often large in both the number of tasks and their resource requirements, it is important to schedule them efficiently on parallel or distributed compute systems. Typically, each task requires a certain amount of memory to be executed and needs to communicate data to its successor tasks. The goal is thus to execute the workflow as fast as possible (i.e., to minimize its makespan) while satisfying the memory constraints. Hence, we investigate the partitioning and mapping of DAG-shaped workflows onto heterogeneous platforms where each processor can have a different speed and a different memory size. We first propose a baseline algorithm in the absence of existing memory-aware solutions. As our main contribution, we then present a four-step heuristic. Its first step is to partition the input DAG into smaller blocks with an existing DAG partitioner. The next two steps adapt the resulting blocks of the DAG to fit the processor memories and optimize for the overall makespan by further splitting and merging these blocks. Finally, we use local search via block swaps to further improve the makespan. Our experimental evaluation on real-world and simulated workflows with up to 30,000 tasks shows that exploiting the heterogeneity with the four-step heuristic reduces the makespan by a factor of 2.44 on average (even more on large workflows), compared to the baseline that ignores heterogeneity.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Methodology of Algorithm Engineering
Authors:
Jan Mendling,
Henrik Leopold,
Henning Meyerhenke,
Benoît Depaire
Abstract:
Research on algorithms has drastically increased in recent years. Various sub-disciplines of computer science investigate algorithms according to different objectives and standards. This plurality of the field has led to various methodological advances that have not yet been transferred to neighboring sub-disciplines. The central roadblock for a better knowledge exchange is the lack of a common me…
▽ More
Research on algorithms has drastically increased in recent years. Various sub-disciplines of computer science investigate algorithms according to different objectives and standards. This plurality of the field has led to various methodological advances that have not yet been transferred to neighboring sub-disciplines. The central roadblock for a better knowledge exchange is the lack of a common methodological framework integrating the perspectives of these sub-disciplines. It is the objective of this paper to develop a research framework for algorithm engineering. Our framework builds on three areas discussed in the philosophy of science: ontology, epistemology and methodology. In essence, ontology describes algorithm engineering as being concerned with algorithmic problems, algorithmic tasks, algorithm designs and algorithm implementations. Epistemology describes the body of knowledge of algorithm engineering as a collection of prescriptive and descriptive knowledge, residing in World 3 of Popper's Three Worlds model. Methodology refers to the steps how we can systematically enhance our knowledge of specific algorithms. The framework helps us to identify and discuss various validity concerns relevant to any algorithm engineering contribution. In this way, our framework has important implications for researching algorithms in various areas of computer science.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Greedy Optimization of Resistance-based Graph Robustness with Global and Local Edge Insertions
Authors:
Maria Predari,
Lukas Berner,
Robert Kooij,
Henning Meyerhenke
Abstract:
The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider two optimization problems of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i.e., is most robust) -- one where the new edges can be anywhere in the graph and one where the new edges need to be incident to a specified focus no…
▽ More
The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider two optimization problems of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i.e., is most robust) -- one where the new edges can be anywhere in the graph and one where the new edges need to be incident to a specified focus node. The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in an established generic greedy heuristic. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process larger graphs for which the application of the state-of-the-art greedy approach was impractical before.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Network Sparsification via Degree- and Subgraph-based Edge Sampling
Authors:
Zhen Su,
Jürgen Kurths,
Henning Meyerhenke
Abstract:
Network (or graph) sparsification compresses a graph by removing inessential edges. By reducing the data volume, it accelerates or even facilitates many downstream analyses. Still, the accuracy of many sparsification methods, with filtering-based edge sampling being the most typical one, heavily relies on an appropriate definition of edge importance. Instead, we propose a different perspective wit…
▽ More
Network (or graph) sparsification compresses a graph by removing inessential edges. By reducing the data volume, it accelerates or even facilitates many downstream analyses. Still, the accuracy of many sparsification methods, with filtering-based edge sampling being the most typical one, heavily relies on an appropriate definition of edge importance. Instead, we propose a different perspective with a generalized local-property-based sampling method, which preserves (scaled) local \emph{node} characteristics. Apart from degrees, these local node characteristics we use are the expected (scaled) number of wedges and triangles a node belongs to. Through such a preservation, main complex structural properties are preserved implicitly. We adapt a game-theoretic framework from uncertain graph sampling by including a threshold for faster convergence (at least $4$ times faster empirically) to approximate solutions. Extensive experimental studies on functional climate networks show the effectiveness of this method in preserving macroscopic to mesoscopic and microscopic network structural properties.
△ Less
Submitted 10 January, 2023; v1 submitted 8 January, 2023;
originally announced January 2023.
-
Algorithms for Large-scale Network Analysis and the NetworKit Toolkit
Authors:
Eugenio Angriman,
Alexander van der Grinten,
Michael Hamann,
Henning Meyerhenke,
Manuel Penschuck
Abstract:
The abundance of massive network data in a plethora of applications makes scalable analysis algorithms and software tools necessary to generate knowledge from such data in reasonable time. Addressing scalability as well as other requirements such as good usability and a rich feature set, the open-source software NetworKit has established itself as a popular tool for large-scale network analysis. T…
▽ More
The abundance of massive network data in a plethora of applications makes scalable analysis algorithms and software tools necessary to generate knowledge from such data in reasonable time. Addressing scalability as well as other requirements such as good usability and a rich feature set, the open-source software NetworKit has established itself as a popular tool for large-scale network analysis. This chapter provides a brief overview of the contributions to NetworKit made by the DFG Priority Programme SPP 1736 Algorithms for Big Data. Algorithmic contributions in the areas of centrality computations, community detection, and sparsification are in the focus, but we also mention several other aspects -- such as current software engineering principles of the project and ways to visualize network data within a NetworKit-based workflow.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
More Recent Advances in (Hyper)Graph Partitioning
Authors:
Ümit V. Çatalyürek,
Karen D. Devine,
Marcelo Fonseca Faraj,
Lars Gottesbüren,
Tobias Heuer,
Henning Meyerhenke,
Peter Sanders,
Sebastian Schlag,
Christian Schulz,
Daniel Seemaier,
Dorothea Wagner
Abstract:
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also c…
▽ More
In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also covering hypergraph partitioning and streaming algorithms, and has an additional focus on parallel algorithms.
△ Less
Submitted 30 June, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Interactive Visualization of Protein RINs using NetworKit in the Cloud
Authors:
Eugenio Angriman,
Fabian Brandt-Tumescheit,
Leon Franke,
Alexander van der Grinten,
Henning Meyerhenke
Abstract:
Network analysis has been applied in diverse application domains. In this paper, we consider an example from protein dynamics, specifically residue interaction networks (RINs). In this context, we use NetworKit -- an established package for network analysis -- to build a cloud-based environment that enables domain scientists to run their visualization and analysis workflows on large compute server…
▽ More
Network analysis has been applied in diverse application domains. In this paper, we consider an example from protein dynamics, specifically residue interaction networks (RINs). In this context, we use NetworKit -- an established package for network analysis -- to build a cloud-based environment that enables domain scientists to run their visualization and analysis workflows on large compute servers, without requiring extensive programming and/or system administration knowledge. To demonstrate the versatility of this approach, we use it to build a custom Jupyter-based widget for RIN visualization. In contrast to existing RIN visualization approaches, our widget can easily be customized through simple modifications of Python code, while both supporting a good feature set and providing near real-time speed. It is also easily integrated into analysis pipelines (e.g., that use Python to feed RIN data into downstream machine learning tasks).
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs
Authors:
Alexander van der Grinten,
Geert Custers,
Duy Le Thanh,
Henning Meyerhenke
Abstract:
Sparse matrix multiplication (SpGEMM) is a fundamental kernel used in many diverse application areas, both numerical and discrete. For example, many algebraic graph algorithms rely on SpGEMM in the tropical semiring to compute shortest paths in graphs. Recently, SpGEMM has received growing attention regarding implementations for specific (parallel) architectures. Yet, this concerns only the static…
▽ More
Sparse matrix multiplication (SpGEMM) is a fundamental kernel used in many diverse application areas, both numerical and discrete. For example, many algebraic graph algorithms rely on SpGEMM in the tropical semiring to compute shortest paths in graphs. Recently, SpGEMM has received growing attention regarding implementations for specific (parallel) architectures. Yet, this concerns only the static problem, where both input matrices do not change. In many applications, however, matrices (or their corresponding graphs) change over time. Although recomputing from scratch is very expensive, we are not aware of any dynamic SpGEMM algorithms in the literature. In this paper, we thus propose a batch-dynamic algorithm for MPI-based parallel computing. Building on top of a distributed graph/matrix data structure that allows for fast updates, our dynamic SpGEMM reduces the communication volume significantly. It does so by exploiting that updates change far fewer matrix entries than there are non-zeros in the input operands. Our experiments with popular benchmark graphs show that our approach pays off. For batches of insertions or removals of matrix entries, our dynamic SpGEMM is substantially faster than the static algorithms in the state-of-the-art competitors CombBLAS, CTF and PETSc.
△ Less
Submitted 31 May, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters
Authors:
Jonathan Bader,
Lauritz Thamsen,
Svetlana Kulagina,
Jonathan Will,
Henning Meyerhenke,
Odej Kao
Abstract:
Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However,…
▽ More
Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However, these resource managers only consider the number of CPUs and the amount of available memory when assigning tasks to resources; they do not consider hardware differences beyond these numbers, while computational speed and memory access rates can differ significantly.
We propose Tarema, a system for allocating task instances to heterogeneous cluster resources during the execution of scalable scientific workflows. First, Tarema profiles the available infrastructure with a set of benchmark programs and groups cluster nodes with similar performance. Second, Tarema uses online monitoring data of tasks, assigning labels to tasks depending on their resource usage. Third, Tarema uses the node groups and task labels to dynamically assign task instances evenly to resources based on resource demand. Our evaluation of a prototype implementation for Kubernetes, using five real-world Nextflow workflows from the popular nf-core framework and two 15-node clusters consisting of different virtual machines, shows a mean reduction of isolated job runtimes by 19.8% compared to popular schedulers in widely-used resource managers and 4.54% compared to the heuristic SJFN, while providing a better cluster usage. Moreover, executing two long-running workflows in parallel and on restricted resources shows that Tarema is able to reduce the runtimes even more while providing a fair cluster usage.
△ Less
Submitted 19 January, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
The climatic interdependence of extreme-rainfall events around the globe
Authors:
Zhen Su,
Henning Meyerhenke,
Jürgen Kurths
Abstract:
The identification of regions of similar climatological behavior can be utilized for the discovery of spatial relationships over long-range scales, including teleconnections. In this regard, the global picture of the interdependence patterns of extreme rainfall events (EREs) still needs to be further explored. To this end, we propose a top-down complex-network-based clustering workflow, with the c…
▽ More
The identification of regions of similar climatological behavior can be utilized for the discovery of spatial relationships over long-range scales, including teleconnections. In this regard, the global picture of the interdependence patterns of extreme rainfall events (EREs) still needs to be further explored. To this end, we propose a top-down complex-network-based clustering workflow, with the combination of consensus clustering and mutual correspondences. Consensus clustering provides a reliable community structure under each dataset, while mutual correspondences build a matching relationship between different community structures obtained from different datasets. This approach ensures the robustness of the identified structures when multiple datasets are available. By applying it simultaneously to two satellite-derived precipitation datasets, we identify consistent synchronized structures of EREs around the globe, during boreal summer. Two of them show independent spatiotemporal characteristics, uncovering the primary compositions of different monsoon systems. They explicitly manifest the primary intraseasonal variability in the context of the global monsoon, in particular the `monsoon jump' over both East Asia and West Africa and the mid-summer drought over Central America and southern Mexico. Through a case study related to the Asian summer monsoon (ASM), we verify that the intraseasonal changes of upper-level atmospheric conditions are preserved by significant connections within the global synchronization structure. Our work advances network-based clustering methodology for (i) decoding the spatiotemporal configuration of interdependence patterns of natural variability and for (ii) the intercomparison of these patterns, especially regarding their spatial distributions over different datasets.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures
Authors:
Maria Predari,
Charilaos Tzovas,
Christian Schulz,
Henning Meyerhenke
Abstract:
Processing massive application graphs on distributed memory systems requires to map the graphs onto the system's processing elements (PEs). This task becomes all the more important when PEs have non-uniform communication costs or the input is highly irregular. Typically, mapping is addressed using partitioning, in a two-step approach or an integrated one. Parallel partitioning tools do exist; yet,…
▽ More
Processing massive application graphs on distributed memory systems requires to map the graphs onto the system's processing elements (PEs). This task becomes all the more important when PEs have non-uniform communication costs or the input is highly irregular. Typically, mapping is addressed using partitioning, in a two-step approach or an integrated one. Parallel partitioning tools do exist; yet, corresponding mapping algorithms or their public implementations all have major sequential parts or other severe scaling limitations. In this paper, we propose a parallel algorithm that maps graphs onto the PEs of a hierarchical system. Our solution integrates partitioning and mapping; it models the system hierarchy in a concise way as an implicit labeled tree. The vertices of the application graph are labeled as well, and these vertex labels induce the mapping. The mapping optimization follows the basic idea of parallel label propagation, but we tailor the gain computations of label changes to quickly account for the induced communication costs. Our MPI-based code is the first public implementation of a parallel graph mapping algorithm; to this end, we extend the partitioning library ParHIP. To evaluate our algorithm's implementation, we perform comparative experiments with complex networks in the million- and billion-scale range. In general our mapping tool shows good scalability on up to a few thousand PEs. Compared to other MPI-based competitors, our algorithm achieves the best speed to quality trade-off and our quality results are even better than non-parallel mapping tools.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Fully-dynamic Weighted Matching Approximation in Practice
Authors:
Eugenio Angriman,
Henning Meyerhenke,
Christian Schulz,
Bora Uçar
Abstract:
Finding large or heavy matchings in graphs is a ubiquitous combinatorial optimization problem. In this paper, we engineer the first non-trivial implementations for approximating the dynamic weighted matching problem. Our first algorithm is based on random walks/paths combined with dynamic programming. The second algorithm has been introduced by Stubbs and Williams without an implementation. Roughl…
▽ More
Finding large or heavy matchings in graphs is a ubiquitous combinatorial optimization problem. In this paper, we engineer the first non-trivial implementations for approximating the dynamic weighted matching problem. Our first algorithm is based on random walks/paths combined with dynamic programming. The second algorithm has been introduced by Stubbs and Williams without an implementation. Roughly speaking, their algorithm uses dynamic unweighted matching algorithms as a subroutine (within a multilevel approach); this allows us to use previous work on dynamic unweighted matching algorithms as a black box in order to obtain a fully-dynamic weighted matching algorithm. We empirically study the algorithms on an extensive set of dynamic instances and compare them with optimal weighted matchings. Our experiments show that the random walk algorithm typically fares much better than Stubbs/Williams (regarding the time/quality tradeoff), and its results are often not far from the optimum.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
New Approximation Algorithms for Forest Closeness Centrality -- for Individual Vertices and Vertex Groups
Authors:
Alexander van der Grinten,
Eugenio Angriman,
Maria Predari,
Henning Meyerhenke
Abstract:
The emergence of massive graph data sets requires fast mining algorithms. Centrality measures to identify important vertices belong to the most popular analysis methods in graph mining. A measure that is gaining attention is forest closeness centrality; it is closely related to electrical measures using current flow but can also handle disconnected graphs. Recently, [Jin et al., ICDM'19] proposed…
▽ More
The emergence of massive graph data sets requires fast mining algorithms. Centrality measures to identify important vertices belong to the most popular analysis methods in graph mining. A measure that is gaining attention is forest closeness centrality; it is closely related to electrical measures using current flow but can also handle disconnected graphs. Recently, [Jin et al., ICDM'19] proposed an algorithm to approximate this measure probabilistically. Their algorithm processes small inputs quickly, but does not scale well beyond hundreds of thousands of vertices.
In this paper, we first propose a different approximation algorithm; it is up to two orders of magnitude faster and more accurate in practice. Our method exploits the strong connection between uniform spanning trees and forest distances by adapting and extending recent approximation algorithms for related single-vertex problems. This results in a nearly-linear time algorithm with an absolute probabilistic error guarantee. In addition, we are the first to consider the problem of finding an optimal group of vertices w.r.t. forest closeness. We prove that this latter problem is NP-hard; to approximate it, we adapt a greedy algorithm by [Li et al., WWW'19], which is based on (partial) matrix inversion. Moreover, our experiments show that on disconnected graphs, group forest closeness outperforms existing centrality measures in the context of semi-supervised vertex classification.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Distributing Sparse Matrix/Graph Applications in Heterogeneous Clusters -- an Experimental Study
Authors:
Charilaos Tzovas,
Maria Predari,
Henning Meyerhenke
Abstract:
Many problems in scientific and engineering applications contain sparse matrices or graphs as main input objects, e.g. numerical simulations on meshes. Large inputs are abundant these days and require parallel processing for memory size and speed. To optimize the execution of such simulations on cluster systems, the input problem needs to be distributed suitably onto the processing units (PUs). Mo…
▽ More
Many problems in scientific and engineering applications contain sparse matrices or graphs as main input objects, e.g. numerical simulations on meshes. Large inputs are abundant these days and require parallel processing for memory size and speed. To optimize the execution of such simulations on cluster systems, the input problem needs to be distributed suitably onto the processing units (PUs). More and more frequently, such clusters contain different CPUs or a combination of CPUs and GPUs. This heterogeneity makes the load distribution problem quite challenging. Our study is motivated by the observation that established partitioning tools do not handle such heterogeneous distribution problems as well as homogeneous ones.
In this paper, we first formulate the problem of balanced load distribution for heterogeneous architectures as a multi-objective, single-constraint optimization problem. We then split the problem into two phases and propose a greedy approach to determine optimal block sizes for each PU. These block sizes are then fed into numerous existing graph partitioners, for us to examine how well they handle the above problem. One of the tools we consider is an extension of our own previous work (von Looz et al, ICPP'18) called Geographer. Our experiments on well-known benchmark meshes indicate that only two tools under consideration are able to yield good quality. These two are Parmetis (both the geometric and the combinatorial variant) and Geographer. While Parmetis is faster, Geographer yields better quality on average.
△ Less
Submitted 20 November, 2020; v1 submitted 3 November, 2020;
originally announced November 2020.
-
Group-Harmonic and Group-Closeness Maximization -- Approximation and Engineering
Authors:
Eugenio Angriman,
Ruben Becker,
Gianlorenzo D'Angelo,
Hugo Gilbert,
Alexander van der Grinten,
Henning Meyerhenke
Abstract:
Centrality measures characterize important nodes in networks. Efficiently computing such nodes has received a lot of attention. When considering the generalization of computing central groups of nodes, challenging optimization problems occur. In this work, we study two such problems, group-harmonic maximization and group-closeness maximization both from a theoretical and from an algorithm engineer…
▽ More
Centrality measures characterize important nodes in networks. Efficiently computing such nodes has received a lot of attention. When considering the generalization of computing central groups of nodes, challenging optimization problems occur. In this work, we study two such problems, group-harmonic maximization and group-closeness maximization both from a theoretical and from an algorithm engineering perspective.
On the theoretical side, we obtain the following results. For group-harmonic maximization, unless $P=NP$, there is no polynomial-time algorithm that achieves an approximation factor better than $1-1/e$ (directed) and $1-1/(4e)$ (undirected), even for unweighted graphs. On the positive side, we show that a greedy algorithm achieves an approximation factor of $λ(1-2/e)$ (directed) and $λ(1-1/e)/2$ (undirected), where $λ$ is the ratio of minimal and maximal edge weights. For group-closeness maximization, the undirected case is $NP$-hard to be approximated to within a factor better than $1-1/(e+1)$ and a constant approximation factor is achieved by a local-search algorithm. For the directed case, however, we show that, for any $ε<1/2$, the problem is $NP$-hard to be approximated within a factor of $4|V|^{-ε}$.
From the algorithm engineering perspective, we provide efficient implementations of the above greedy and local search algorithms. In our experimental study we show that, on small instances where an optimum solution can be computed in reasonable time, the quality of both the greedy and the local search algorithms come very close to the optimum. On larger instances, our local search algorithms yield results with superior quality compared to existing greedy and local search solutions, at the cost of additional running time. We thus advocate local search for scenarios where solution quality is of highest concern.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis
Authors:
Eugenio Angriman,
Maria Predari,
Alexander van der Grinten,
Henning Meyerhenke
Abstract:
The ubiquity of massive graph data sets in numerous applications requires fast algorithms for extracting knowledge from these data. We are motivated here by three electrical measures for the analysis of large small-world graphs $G = (V, E)$ -- i.e., graphs with diameter in $O(\log |V|)$, which are abundant in complex network analysis. From a computational point of view, the three measures have in…
▽ More
The ubiquity of massive graph data sets in numerous applications requires fast algorithms for extracting knowledge from these data. We are motivated here by three electrical measures for the analysis of large small-world graphs $G = (V, E)$ -- i.e., graphs with diameter in $O(\log |V|)$, which are abundant in complex network analysis. From a computational point of view, the three measures have in common that their crucial component is the diagonal of the graph Laplacian's pseudoinverse, $L^\dagger$. Computing diag$(L^\dagger)$ exactly by pseudoinversion, however, is as expensive as dense matrix multiplication -- and the standard tools in practice even require cubic time. Moreover, the pseudoinverse requires quadratic space -- hardly feasible for large graphs. Resorting to approximation by, e.g., using the Johnson-Lindenstrauss transform, requires the solution of $O(\log |V| / ε^2)$ Laplacian linear systems to guarantee a relative error, which is still very expensive for large inputs.
In this paper, we present a novel approximation algorithm that requires the solution of only one Laplacian linear system. The remaining parts are purely combinatorial -- mainly sampling uniform spanning trees, which we relate to diag$(L^\dagger)$ via effective resistances. For small-world networks, our algorithm obtains a $\pm ε$-approximation with high probability, in a time that is nearly-linear in $|E|$ and quadratic in $1 / ε$. Another positive aspect of our algorithm is its parallel nature due to independent sampling. We thus provide two parallel implementations of our algorithm: one using OpenMP, one MPI + OpenMP. In our experiments against the state of the art, our algorithm (i) yields more accurate results, (ii) is much faster and more memory-efficient, and (iii) obtains good parallel speedups, in particular in the distributed setting.
△ Less
Submitted 8 February, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Combined Centrality Measures for an Improved Characterization of Influence Spread in Social Networks
Authors:
Mehmet Simsek,
Henning Meyerhenke
Abstract:
Influence Maximization (IM) aims at finding the most influential users in a social network, i. e., users who maximize the spread of an opinion within a certain propagation model. Previous work investigated the correlation between influence spread and nodal centrality measures to bypass more expensive IM simulations. The results were promising but incomplete, since these studies investigated the pe…
▽ More
Influence Maximization (IM) aims at finding the most influential users in a social network, i. e., users who maximize the spread of an opinion within a certain propagation model. Previous work investigated the correlation between influence spread and nodal centrality measures to bypass more expensive IM simulations. The results were promising but incomplete, since these studies investigated the performance (i. e., the ability to identify influential users) of centrality measures only in restricted settings, e. g., in undirected/unweighted networks and/or within a propagation model less common for IM. In this paper, we first show that good results within the Susceptible- Infected-Removed (SIR) propagation model for unweighted and undirected networks do not necessarily transfer to directed or weighted networks under the popular Independent Cascade (IC) propagation model. Then, we identify a set of centrality measures with good performance for weighted and directed networks within the IC model. Our main contribution is a new way to combine the centrality measures in a closed formula to yield even better results. Additionally, we also extend gravitational centrality (GC) with the proposed combined centrality measures. Our experiments on 50 real-world data sets show that our proposed centrality measures outperform well-known centrality measures and the state-of-the art GC measure significantly. social networks, influence maximization, centrality measures, IC propagation model, influential spreaders
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
High-Quality Hierarchical Process Mapping
Authors:
Marcelo Fonseca Faraj,
Alexander van der Grinten,
Henning Meyerhenke,
Jesper Larsson Träff,
Christian Schulz
Abstract:
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that…
▽ More
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that integrate graph partitioning and process mapping. Important ingredients of our algorithm include fast label propagation, more localized local search, initial partitioning, as well as a compressed data structure to compute processor distances without storing a distance matrix. Experiments indicate that our algorithms speed up the overall mapping process and, due to the integrated multilevel approach, also find much better solutions in practice. For example, one configuration of our algorithm yields better solutions than the previous state-of-the-art in terms of mapping quality while being a factor 62 faster. Compared to the currently fastest iterated multilevel mapping algorithm Scotch, we obtain 16% better solutions while investing slightly more running time.
△ Less
Submitted 22 January, 2020; v1 submitted 20 January, 2020;
originally announced January 2020.
-
Local Search for Group Closeness Maximization on Big Graphs
Authors:
Eugenio Angriman,
Alexander van der Grinten,
Henning Meyerhenke
Abstract:
In network analysis and graph mining, closeness centrality is a popular measure to infer the importance of a vertex. Computing closeness efficiently for individual vertices received considerable attention. The NP-hard problem of group closeness maximization, in turn, is more challenging: the objective is to find a vertex group that is central as a whole and state-of-the-art heuristics for it do no…
▽ More
In network analysis and graph mining, closeness centrality is a popular measure to infer the importance of a vertex. Computing closeness efficiently for individual vertices received considerable attention. The NP-hard problem of group closeness maximization, in turn, is more challenging: the objective is to find a vertex group that is central as a whole and state-of-the-art heuristics for it do not scale to very big graphs yet.
In this paper, we present new local search heuristics for group closeness maximization. By using randomized approximation techniques and dynamic data structures, our algorithms are often able to perform locally optimal decisions efficiently. The final result is a group with high (but not optimal) closeness centrality.
We compare our algorithms to the current state-of-the-art greedy heuristic both on weighted and on unweighted real-world graphs. For graphs with hundreds of millions of edges, our local search algorithms take only around ten minutes, while greedy requires more than ten hours. Overall, our new algorithms are between one and two orders of magnitude faster, depending on the desired group size and solution quality. For example, on weighted graphs and $k = 10$, our algorithms yield solutions of $12,4\%$ higher quality, while also being $793,6\times$ faster. For unweighted graphs and $k = 10$, we achieve solutions within $99,4\%$ of the state-of-the-art quality while being $127,8\times$ faster.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Group Centrality Maximization for Large-scale Graphs
Authors:
Eugenio Angriman,
Alexander van der Grinten,
Aleksandar Bojchevski,
Daniel Zügner,
Stephan Günnemann,
Henning Meyerhenke
Abstract:
The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is $\mathcal{NP}$-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time,…
▽ More
The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is $\mathcal{NP}$-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time, approximation algorithms for group betweenness and (to a lesser extent) group closeness are rather slow on large networks due to high constant overheads.
That is why we introduce GED-Walk centrality, a new submodular group centrality measure inspired by Katz centrality. In contrast to closeness and betweenness, it considers walks of any length rather than shortest paths, with shorter walks having a higher contribution. We define algorithms that (i) efficiently approximate the GED-Walk score of a given group and (ii) efficiently approximate the (proved to be $\mathcal{NP}$-hard) problem of finding a group with highest GED-Walk score.
Experiments on several real-world datasets show that scores obtained by GED-Walk improve performance on common graph mining tasks such as collective classification and graph-level classification. An evaluation of empirical running times demonstrates that maximizing GED-Walk (in approximation) is two orders of magnitude faster compared to group betweenness approximation and for group sizes $\leq 100$ one to two orders faster than group closeness approximation. For graphs with tens of millions of edges, approximate GED-Walk maximization typically needs less than one minute. Furthermore, our experiments suggest that the maximization algorithms scale linearly with the size of the input graph and the size of the group.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling
Authors:
Alexander van der Grinten,
Henning Meyerhenke
Abstract:
Betweenness centrality is one of the most popular vertex centrality measures in network analysis. Hence, many (sequential and parallel) algorithms to compute or approximate betweenness have been devised. Recent algorithmic advances have made it possible to approximate betweenness very efficiently on shared-memory architectures. Yet, the best shared-memory algorithms can still take hours of running…
▽ More
Betweenness centrality is one of the most popular vertex centrality measures in network analysis. Hence, many (sequential and parallel) algorithms to compute or approximate betweenness have been devised. Recent algorithmic advances have made it possible to approximate betweenness very efficiently on shared-memory architectures. Yet, the best shared-memory algorithms can still take hours of running time for large graphs, especially for graphs with a high diameter or when a small relative error is required.
In this work, we present an MPI-based generalization of the state-of-the-art shared-memory algorithm for betweenness approximation. This algorithm is based on adaptive sampling; our parallelization strategy can be applied in the same manner to adaptive sampling algorithms for other problems. In experiments on a 16-node cluster, our MPI-based implementation is by a factor of 16.1x faster than the state-of-the-art shared-memory implementation when considering our parallelization focus -- the adaptive sampling phase -- only. For the complete algorithm, we obtain an average (geom. mean) speedup factor of 7.4x over the state of the art. For some previously very challenging inputs, this speedup is much higher. As a result, our algorithm is the first to approximate betweenness centrality on graphs with several billion edges in less than ten minutes with high accuracy.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Guidelines for Experimental Algorithmics in Network Analysis
Authors:
Eugenio Angriman,
Alexander van der Grinten,
Moritz von Looz,
Henning Meyerhenke,
Martin Nöllenburg,
Maria Predari,
Charilaos Tzovas
Abstract:
The field of network science is a highly interdisciplinary area; for the empirical analysis of network data, it draws algorithmic methodologies from several research fields. Hence, research procedures and descriptions of the technical results often differ, sometimes widely. In this paper we focus on methodologies for the experimental part of algorithm engineering for network analysis -- an importa…
▽ More
The field of network science is a highly interdisciplinary area; for the empirical analysis of network data, it draws algorithmic methodologies from several research fields. Hence, research procedures and descriptions of the technical results often differ, sometimes widely. In this paper we focus on methodologies for the experimental part of algorithm engineering for network analysis -- an important ingredient for a research area with empirical focus. More precisely, we unify and adapt existing recommendations from different fields and propose universal guidelines -- including statistical analyses -- for the systematic evaluation of network analysis algorithms. This way, the behavior of newly proposed algorithms can be properly assessed and comparisons to existing solutions become meaningful. Moreover, as the main technical contribution, we provide SimexPal, a highly automated tool to perform and analyze experiments following our guidelines. To illustrate the merits of SimexPal and our guidelines, we apply them in a case study: we design, perform, visualize and evaluate experiments of a recent algorithm for approximating betweenness centrality, an important problem in network analysis. In summary, both our guidelines and SimexPal shall modernize and complement previous efforts in experimental algorithmics; they are not only useful for network analysis, but also in related contexts.
△ Less
Submitted 25 March, 2019;
originally announced April 2019.
-
Parallel Adaptive Sampling with almost no Synchronization
Authors:
Alexander van der Grinten,
Eugenio Angriman,
Henning Meyerhenke
Abstract:
Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a. k. a. progressive) sampling algorithms on multi-threaded shared-memory machines. Our basic algorithmic technique requires no synchronization except for atomic load-acquire and store-release operations. It does, howeve…
▽ More
Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a. k. a. progressive) sampling algorithms on multi-threaded shared-memory machines. Our basic algorithmic technique requires no synchronization except for atomic load-acquire and store-release operations. It does, however, require O(n) memory per thread, where n is the size of the sampling state. We present variants of the algorithm that either reduce this memory consumption to O(1) or ensure that deterministic results are obtained. Using the KADABRA algorithm for betweenness centrality (a popular measure in network analysis) approximation as a case study, we demonstrate the empirical performance of our techniques. In particular, on a 32-core machine, our best algorithm is 2.9x faster than what we could achieve using a straightforward OpenMP-based parallelization and 65.3x faster than the existing implementation of KADABRA.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Scalable Katz Ranking Computation in Large Static and Dynamic Graphs
Authors:
Alexander van der Grinten,
Elisabetta Bergamini,
Oded Green,
David A. Bader,
Henning Meyerhenke
Abstract:
Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality is one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we prop…
▽ More
Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality is one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we propose upper and lower bounds on the Katz score of a given node. While previous approaches relied on numerical approximation or heuristics to compute Katz centrality rankings, we construct an algorithm that iteratively improves those upper and lower bounds until a correct Katz ranking is obtained. We extend our algorithm to dynamic graphs while maintaining its correctness guarantees. Experiments demonstrate that our static graph algorithm outperforms both numerical approaches and heuristics with speedups between 1.5x and 3.5x, depending on the desired quality guarantees. Our dynamic graph algorithm improves upon the static algorithm for update batches of less than 10000 edges. We provide efficient parallel CPU and GPU implementations of our algorithms that enable near real-time Katz centrality computation for graphs with hundreds of millions of nodes in fractions of seconds.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Balanced k-means for Parallel Geometric Partitioning
Authors:
Moritz von Looz,
Charilaos Tzovas,
Henning Meyerhenke
Abstract:
Mesh partitioning is an indispensable tool for efficient parallel numerical simulations. Its goal is to minimize communication between the processes of a simulation while achieving load balance. Established graph-based partitioning tools yield a high solution quality; however, their scalability is limited. Geometric approaches usually scale better, but their solution quality may be unsatisfactory…
▽ More
Mesh partitioning is an indispensable tool for efficient parallel numerical simulations. Its goal is to minimize communication between the processes of a simulation while achieving load balance. Established graph-based partitioning tools yield a high solution quality; however, their scalability is limited. Geometric approaches usually scale better, but their solution quality may be unsatisfactory for `non-trivial' mesh topologies.
In this paper, we present a scalable version of $k$-means that is adapted to yield balanced clusters. Balanced $k$-means constitutes the core of our new partitioning algorithm Geographer. Bootstrapping of initial centers is performed with space-filling curves, leading to fast convergence of the subsequent balanced k-means algorithm.
Our experiments with up to 16384 MPI processes on numerous benchmark meshes show the following: (i) Geographer produces partitions with a lower communication volume than state-of-the-art geometric partitioners from the Zoltan package; (ii) Geographer scales well on large inputs; (iii) a Delaunay mesh with a few billion vertices and edges can be partitioned in a few seconds.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Topology-induced Enhancement of Mappings
Authors:
Roland Glantz,
Maria Predari,
Henning Meyerhenke
Abstract:
In this paper we propose a new method to enhance a mapping $μ(\cdot)$ of a parallel application's computational tasks to the processing elements (PEs) of a parallel computer.
The idea behind our method \mswap is to enhance such a mapping by drawing on the observation that many topologies take the form of a partial cube.
This class of graphs includes all rectangular and cubic meshes, any such t…
▽ More
In this paper we propose a new method to enhance a mapping $μ(\cdot)$ of a parallel application's computational tasks to the processing elements (PEs) of a parallel computer.
The idea behind our method \mswap is to enhance such a mapping by drawing on the observation that many topologies take the form of a partial cube.
This class of graphs includes all rectangular and cubic meshes, any such torus with even extensions in each dimension, all hypercubes, and all trees.
Following previous work, we represent the parallel application and the parallel computer by graphs $G_a = (V_a, E_a)$ and $G_p = (V_p, E_p)$.
$G_p$ being a partial cube allows us to label its vertices, the PEs, by bitvectors such that the cost of exchanging one unit of information between two vertices $u_p$ and $v_p$ of $G_p$ amounts to the Hamming distance between the labels of $u_p$ and $v_p$.
By transferring these bitvectors from $V_p$ to $V_a$ via $μ^{-1}(\cdot)$ and extending them to be unique on $V_a$, we can enhance $μ(\cdot)$ by swapping labels of $V_a$ in a new way.
Pairs of swapped labels are local \wrt the PEs, but not \wrt $G_a$. Moreover, permutations of the bitvectors' entries give rise to a plethora of hierarchies on the PEs. Through these hierarchies we turn \mswap into a hierarchical method for improving $μ(\cdot)$ that is complementary to state-of-the-art methods for computing $μ(\cdot)$ in the first place.
In our experiments we use \mswap to enhance mappings of complex networks onto rectangular meshes and tori with 256 and 512 nodes, as well as hypercubes with 256 nodes. It turns out that common quality measures of mappings derived from state-of-the-art algorithms can be improved considerably.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
Updating Dynamic Random Hyperbolic Graphs in Sublinear Time
Authors:
Moritz von Looz,
Henning Meyerhenke
Abstract:
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points with a probability depending on their hy…
▽ More
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points with a probability depending on their hyperbolic distance.
We present a dynamic extension to model gradual network change, while preserving at each step the point position probabilities. To process the dynamic changes efficiently, we formalize the concept of a probabilistic neighborhood: Let $P$ be a set of $n$ points in Euclidean or hyperbolic space, $q$ a query point, $\operatorname{dist}$ a distance metric, and $f : \mathbb{R}^+ \rightarrow [0,1]$ a monotonically decreasing function. Then, the probabilistic neighborhood $N(q, f)$ of $q$ with respect to $f$ is a random subset of $P$ and each point $p \in P$ belongs to $N(q,f)$ with probability $f(\operatorname{dist}(p,q))$. We present a fast, sublinear-time query algorithm to sample probabilistic neighborhoods from planar point sets. For certain distributions of planar $P$, we prove that our algorithm answers a query in $O((|N(q,f)| + \sqrt{n})\log n)$ time with high probability. This enables us to process a node movement in random hyperbolic graphs in sublinear time, resulting in a speedup of about one order of magnitude in practice compared to the fastest previous approach. Apart from that, our query algorithm is also applicable to Euclidean geometry, making it of independent interest for other sampling or probabilistic spreading scenarios.
△ Less
Submitted 8 February, 2018;
originally announced February 2018.
-
Scaling up Group Closeness Maximization
Authors:
Elisabetta Bergamini,
Tanya Gonser,
Henning Meyerhenke
Abstract:
Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the reciprocal of the average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, onl…
▽ More
Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the reciprocal of the average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm has been proposed [Chen et al., ADC 2016]. The approximation factor of (1 - 1/e) proposed by Chen et al. for this algorithm does not hold, though, as we show in this version of our paper. Since their implementation of the greedy algorithm was still too slow for large networks, Chen et al. also proposed a heuristic without approximation guarantee.
In the present paper we develop new techniques to speed up the greedy algorithm. Compared to the previous implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments. Our method Greedy++ allows us to estimate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. The greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges. Our experiments show that the solution found by Greedy++ is actually very close to the optimum (...)
Note: This paper version fixes the issue of relying on the presumed (but incorrect) submodularity of group closeness. While this has implications on the theoretical assessment of the greedy algorithm, our algorithm variant and its implementation remain unaffected. The reason is that Greedy++ relies (among others) on the supermodularity of farness, which does hold.
△ Less
Submitted 15 May, 2019; v1 submitted 3 October, 2017;
originally announced October 2017.
-
Computing Top-k Closeness Centrality in Fully-dynamic Graphs
Authors:
Patrick Bisenius,
Elisabetta Bergamini,
Eugenio Angriman,
Henning Meyerhenke
Abstract:
Closeness is a widely-studied centrality measure. Since it requires all pairwise distances, computing closeness for all nodes is infeasible for large real-world networks. However, for many applications, it is only necessary to find the k most central nodes and not all closeness values. Prior work has shown that computing the top-k nodes with highest closeness can be done much faster than computing…
▽ More
Closeness is a widely-studied centrality measure. Since it requires all pairwise distances, computing closeness for all nodes is infeasible for large real-world networks. However, for many applications, it is only necessary to find the k most central nodes and not all closeness values. Prior work has shown that computing the top-k nodes with highest closeness can be done much faster than computing closeness for all nodes in real-world networks. However, for networks that evolve over time, no dynamic top-k closeness algorithm exists that improves on static recomputation. In this paper, we present several techniques that allow us to efficiently compute the k nodes with highest (harmonic) closeness after an edge insertion or an edge deletion. Our algorithms use information obtained during earlier computations to omit unnecessary work. However, they do not require asymptotically more memory than the static algorithms (i. e., linear in the number of nodes). We propose separate algorithms for complex networks (which exhibit the small-world property) and networks with large diameter such as street networks, and we compare them against static recomputation on a variety of real-world networks. On many instances, our dynamic algorithms are two orders of magnitude faster than recomputation; on some large graphs, we even reach average speedups between $10^3$ and $10^4$.
△ Less
Submitted 3 October, 2017;
originally announced October 2017.
-
Maxent-Stress Optimization of 3D Biomolecular Models
Authors:
Michael Wegner,
Oskar Taubert,
Alexander Schug,
Henning Meyerhenke
Abstract:
Knowing a biomolecule's structure is inherently linked to and a prerequisite for any detailed understanding of its function. Significant effort has gone into developing technologies for structural characterization. These technologies do not directly provide 3D structures; instead they typically yield noisy and erroneous distance information between specific entities such as atoms or residues, whic…
▽ More
Knowing a biomolecule's structure is inherently linked to and a prerequisite for any detailed understanding of its function. Significant effort has gone into developing technologies for structural characterization. These technologies do not directly provide 3D structures; instead they typically yield noisy and erroneous distance information between specific entities such as atoms or residues, which have to be translated into consistent 3D models.
Here we present an approach for this translation process based on maxent-stress optimization. Our new approach extends the original graph drawing method for the new application's specifics by introducing additional constraints and confidence values as well as algorithmic components. Extensive experiments demonstrate that our approach infers structural models (i. e., sensible 3D coordinates for the molecule's atoms) that correspond well to the distance information, can handle noisy and error-prone data, and is considerably faster than established tools. Our results promise to allow domain scientists nearly-interactive structural modeling based on distance constraints.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
Shared Memory Parallel Subgraph Enumeration
Authors:
Raphael Kimmig,
Henning Meyerhenke,
Darren Strash
Abstract:
The subgraph enumeration problem asks us to find all subgraphs of a target graph that are isomorphic to a given pattern graph. Determining whether even one such isomorphic subgraph exists is NP-complete---and therefore finding all such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration has applications in many fields, including biochemistry and social networks, and interestin…
▽ More
The subgraph enumeration problem asks us to find all subgraphs of a target graph that are isomorphic to a given pattern graph. Determining whether even one such isomorphic subgraph exists is NP-complete---and therefore finding all such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration has applications in many fields, including biochemistry and social networks, and interestingly the fastest algorithms for solving the problem for biochemical inputs are sequential. Since they depend on depth-first tree traversal, an efficient parallelization is far from trivial. Nevertheless, since important applications produce data sets with increasing difficulty, parallelism seems beneficial.
We thus present here a shared-memory parallelization of the state-of-the-art subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs) by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing and our implementation demonstrates a significant speedup on real-world biochemical data---despite a highly irregular data access pattern. We also improve RI-DS by pruning the search space better; this further improves the empirical running times compared to the already highly tuned RI-DS.
△ Less
Submitted 25 May, 2017;
originally announced May 2017.
-
Faster Betweenness Centrality Updates in Evolving Networks
Authors:
Elisabetta Bergamini,
Henning Meyerhenke,
Mark Ortmann,
Arie Slobbe
Abstract:
Finding central nodes is a fundamental problem in network analysis. Betweenness centrality is a well-known measure which quantifies the importance of a node based on the fraction of shortest paths going though it. Due to the dynamic nature of many today's networks, algorithms that quickly update centrality scores have become a necessity. For betweenness, several dynamic algorithms have been propos…
▽ More
Finding central nodes is a fundamental problem in network analysis. Betweenness centrality is a well-known measure which quantifies the importance of a node based on the fraction of shortest paths going though it. Due to the dynamic nature of many today's networks, algorithms that quickly update centrality scores have become a necessity. For betweenness, several dynamic algorithms have been proposed over the years, targeting different update types (incremental- and decremental-only, fully-dynamic). In this paper we introduce a new dynamic algorithm for updating betweenness centrality after an edge insertion or an edge weight decrease. Our method is a combination of two independent contributions: a faster algorithm for updating pairwise distances as well as number of shortest paths, and a faster algorithm for updating dependencies. Whereas the worst-case running time of our algorithm is the same as recomputation, our techniques considerably reduce the number of operations performed by existing dynamic betweenness algorithms.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.
-
Computing top-k Closeness Centrality Faster in Unweighted Graphs
Authors:
Elisabetta Bergamini,
Michele Borassi,
Pierluigi Crescenzi,
Andrea Marino,
Henning Meyerhenke
Abstract:
Given a connected graph $G=(V,E)$, the closeness centrality of a vertex $v$ is defined as $\frac{n-1}{\sum_{w \in V} d(v,w)}$. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the $k$ most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the firs…
▽ More
Given a connected graph $G=(V,E)$, the closeness centrality of a vertex $v$ is defined as $\frac{n-1}{\sum_{w \in V} d(v,w)}$. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the $k$ most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the first part of the paper, we prove that it is not solvable in time $Ø(|E|^{2-ε})$ on directed graphs, for any constant $ε>0$, under reasonable complexity assumptions. Furthermore, we propose a new algorithm for selecting the $k$ most central nodes in a graph: we experimentally show that this algorithm improves significantly both the textbook algorithm, which is based on computing the distance between all pairs of vertices, and the state of the art. For example, we are able to compute the top $k$ nodes in few dozens of seconds in real-world networks with millions of nodes and edges. Finally, as a case study, we compute the $10$ most central actors in the IMDB collaboration network, where two actors are linked if they played together in a movie, and in the Wikipedia citation network, which contains a directed edge from a page $p$ to a page $q$ if $p$ contains a link to $q$.
△ Less
Submitted 27 April, 2017; v1 submitted 4 April, 2017;
originally announced April 2017.
-
Improving the betweenness centrality of a node by adding links
Authors:
Elisabetta Bergamini,
Pierluigi Crescenzi,
Gianlorenzo D'Angelo,
Henning Meyerhenke,
Lorenzo Severini,
Yllka Velaj
Abstract:
Betweenness is a well-known centrality measure that ranks the nodes according to their participation in the shortest paths of a network. In several scenarios, having a high betweenness can have a positive impact on the node itself. Hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. In p…
▽ More
Betweenness is a well-known centrality measure that ranks the nodes according to their participation in the shortest paths of a network. In several scenarios, having a high betweenness can have a positive impact on the node itself. Hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. In particular, we study the problem of maximizing the betweenness score of a given node -- Maximum Betweenness Improvement (MBI) -- and that of maximizing the ranking of a given node -- Maximum Ranking Improvement (MRI). We show that MBI cannot be approximated in polynomial-time within a factor $(1-\frac{1}{2e})$ and that MRI does not admit any polynomial-time constant factor approximation algorithm, both unless $P=NP$. We then propose a simple greedy approximation algorithm for MBI with an almost tight approximation ratio and we test its performance on several real-world networks. We experimentally show that our algorithm highly increases both the betweenness score and the ranking of a given node ant that it outperforms several competitive baselines. To speed up the computation of our greedy algorithm, we also propose a new dynamic algorithm for updating the betweenness of one node after an edge insertion, which might be of independent interest. Using the dynamic algorithm, we are now able to compute an approximation of MBI on networks with up to $10^5$ edges in most cases in a matter of seconds or a few minutes.
△ Less
Submitted 1 August, 2018; v1 submitted 17 February, 2017;
originally announced February 2017.
-
Generating realistic scaled complex networks
Authors:
Christian L. Staudt,
Michael Hamann,
Alexander Gutfraind,
Ilya Safro,
Henning Meyerhenke
Abstract:
Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of mod…
▽ More
Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.
△ Less
Submitted 23 March, 2017; v1 submitted 7 September, 2016;
originally announced September 2016.
-
Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver
Authors:
Elisabetta Bergamini,
Michael Wegner,
Dimitar Lukarski,
Henning Meyerhenke
Abstract:
Matrices associated with graphs, such as the Laplacian, lead to numerous interesting graph problems expressed as linear systems. One field where Laplacian linear systems play a role is network analysis, e. g. for certain centrality measures that indicate if a node (or an edge) is important in the network. One such centrality measure is current-flow closeness. To allow network analysis workflows to…
▽ More
Matrices associated with graphs, such as the Laplacian, lead to numerous interesting graph problems expressed as linear systems. One field where Laplacian linear systems play a role is network analysis, e. g. for certain centrality measures that indicate if a node (or an edge) is important in the network. One such centrality measure is current-flow closeness. To allow network analysis workflows to profit from a fast Laplacian solver, we provide an implementation of the LAMG multigrid solver in the NetworKit package, facilitating the computation of current-flow closeness values or related quantities. Our main contribution consists of two algorithms that accelerate the current-flow computation for one node or a reasonably small node subset significantly. One sampling-based algorithm provides an unbiased estimation of the related electrical farness, the other one is based on the Johnson-Lindenstrauss transform. Our inexact algorithms lead to very accurate results in practice. Thanks to them one is now able to compute an estimation of current-flow closeness of one node on networks with tens of millions of nodes and edges within seconds or a few minutes. From a network analytical point of view, our experiments indicate that current-flow closeness can discriminate among different nodes significantly better than traditional shortest-path closeness and is also considerably more resistant to noise -- we thus show that two known drawbacks of shortest-path closeness are alleviated by the current-flow variant.
△ Less
Submitted 6 November, 2020; v1 submitted 11 July, 2016;
originally announced July 2016.
-
Generating massive complex networks with hyperbolic geometry faster in practice
Authors:
Moritz von Looz,
Mustafa Özdayi,
Sören Laue,
Henning Meyerhenke
Abstract:
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated…
▽ More
Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points whose hyperbolic distance is below a threshold.
We present in this paper a fast generation algorithm for such graphs. Our experiments show that our new generator achieves speedup factors of 3-60 over the best previous implementation. One billion edges can now be generated in under one minute on a shared-memory workstation. Furthermore, we present a dynamic extension to model gradual network change, while preserving at each step the point position probabilities.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
Mathematical Foundations of the GraphBLAS
Authors:
Jeremy Kepner,
Peter Aaltonen,
David Bader,
Aydın Buluc,
Franz Franchetti,
John Gilbert,
Dylan Hutchison,
Manoj Kumar,
Andrew Lumsdaine,
Henning Meyerhenke,
Scott McMillan,
Jose Moreira,
John D. Owens,
Carl Yang,
Marcin Zalewski,
Timothy Mattson
Abstract:
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of th…
▽ More
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
△ Less
Submitted 13 July, 2016; v1 submitted 18 June, 2016;
originally announced June 2016.
-
Better partitions of protein graphs for subsystem quantum chemistry
Authors:
Moritz von Looz,
Mario Wolter,
Christoph R. Jacob,
Henning Meyerhenke
Abstract:
Determining the interaction strength between proteins and small molecules is key to analyzing their biological function. Quantum-mechanical calculations such as \emph{Density Functional Theory} (DFT) give accurate and theoretically well-founded results. With common implementations the running time of DFT calculations increases quadratically with molecule size. Thus, numerous subsystem-based approa…
▽ More
Determining the interaction strength between proteins and small molecules is key to analyzing their biological function. Quantum-mechanical calculations such as \emph{Density Functional Theory} (DFT) give accurate and theoretically well-founded results. With common implementations the running time of DFT calculations increases quadratically with molecule size. Thus, numerous subsystem-based approaches have been developed to accelerate quantum-chemical calculations. These approaches partition the protein into different fragments, which are treated separately. Interactions between different fragments are approximated and introduce inaccuracies in the calculated interaction energies.
To minimize these inaccuracies, we represent the amino acids and their interactions as a weighted graph in order to apply graph partitioning. None of the existing graph partitioning work can be directly used, though, due to the unique constraints in partitioning such protein graphs. We therefore present and evaluate several algorithms, partially building upon established concepts, but adapted to handle the new constraints. For the special case of partitioning a protein along the main chain, we also present an efficient dynamic programming algorithm that yields provably optimal results. In the general scenario our algorithms usually improve the previous approach significantly and take at most a few seconds.
△ Less
Submitted 10 June, 2016;
originally announced June 2016.
-
Many-to-many Correspondences between Partitions: Introducing a Cut-based Approach
Authors:
Roland Glantz,
Henning Meyerhenke
Abstract:
Let $\mathcal{P}$ and $\mathcal{P}'$ be finite partitions of the set $V$. Finding good correspondences between the parts of $\mathcal{P}$ and those of $\mathcal{P}'$ is helpful in classification, pattern recognition, and network analysis. Unlike common similarity measures for partitions that yield only a single value, we provide specifics on how $\mathcal{P}$ and $\mathcal{P'}$ correspond to each…
▽ More
Let $\mathcal{P}$ and $\mathcal{P}'$ be finite partitions of the set $V$. Finding good correspondences between the parts of $\mathcal{P}$ and those of $\mathcal{P}'$ is helpful in classification, pattern recognition, and network analysis. Unlike common similarity measures for partitions that yield only a single value, we provide specifics on how $\mathcal{P}$ and $\mathcal{P'}$ correspond to each other.
To this end, we first define natural collections of best correspondences under three constraints \cone, \ctwo, and \cthree. In case of \cone, the best correspondences form a minimum cut basis of a certain bipartite graph, whereas the other two lead to minimum cut bases of $\mathcal{P}$ \wrt $\mathcal{P}'$. We also introduce a constraint, \cfour, which tightens \cthree; both are useful for finding consensus partitions. We then develop branch-and-bound algorithms for finding minimum $P_s$-$P_t$ cuts of $\mathcal{P}$ and thus $\vert \mathcal{P} \vert -1$ best correspondences under \ctwo, \cthree, and \cfour, respectively.
In a case study, we use the correspondences to gain insight into a community detection algorithm. The results suggest, among others, that only very minor losses in the quality of the correspondences occur if the branch-and-bound algorithm is restricted to its greedy core. Thus, even for graphs with more than half a million nodes and hundreds of communities, we can find hundreds of best or almost best correspondences in less than a minute.
△ Less
Submitted 17 January, 2018; v1 submitted 15 March, 2016;
originally announced March 2016.
-
An Empirical Comparison of Big Graph Frameworks in the Context of Network Analysis
Authors:
Jannis Koch,
Christian L. Staudt,
Maximilian Vogel,
Henning Meyerhenke
Abstract:
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frame…
▽ More
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks - GraphLab, Apache Giraph, Giraph++ and Apache Flink - are used to implement algorithms for the representative problems Connected Components, Community Detection, PageRank and Clustering Coefficients. The implementations are executed on a computer cluster to evaluate the frameworks' suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with about 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume.
△ Less
Submitted 3 January, 2016;
originally announced January 2016.
-
Structure-Preserving Sparsification Methods for Social Networks
Authors:
Michael Hamann,
Gerd Lindner,
Henning Meyerhenke,
Christian L. Staudt,
Dorothea Wagner
Abstract:
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating…
▽ More
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. In order to assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.
△ Less
Submitted 3 January, 2016;
originally announced January 2016.
-
k-way Hypergraph Partitioning via n-Level Recursive Bisection
Authors:
Sebastian Schlag,
Vitali Henne,
Tobias Heuer,
Henning Meyerhenke,
Peter Sanders,
Christian Schulz
Abstract:
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time. Using several caching and lazy-evaluation techniques during coarsening and refinement, we reduce the running time by up to two-orders of magnitude compared to a naive $n$-level algorithm that would be adequate for ordinary graph partitioning. The overall performance is even better than the wide…
▽ More
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time. Using several caching and lazy-evaluation techniques during coarsening and refinement, we reduce the running time by up to two-orders of magnitude compared to a naive $n$-level algorithm that would be adequate for ordinary graph partitioning. The overall performance is even better than the widely used hMetis hypergraph partitioner that uses a classical multilevel algorithm with few levels. Aided by a portfolio-based approach to initial partitioning and adaptive budgeting of imbalance within recursive bipartitioning, we achieve very high quality. We assembled a large benchmark set with 310 hypergraphs stemming from application areas such VLSI, SAT solving, social networks, and scientific computing. We achieve significantly smaller cuts than hMetis and PaToH, while being faster than hMetis. Considerably larger improvements are observed for some instance classes like social networks, for bipartitioning, and for partitions with an allowed imbalance of 10%. The algorithm presented in this work forms the basis of our hypergraph partitioning framework KaHyPar (Karlsruhe Hypergraph Partitioning).
△ Less
Submitted 10 November, 2015;
originally announced November 2015.
-
Approximating Betweenness Centrality in Fully-dynamic Networks
Authors:
Elisabetta Bergamini,
Henning Meyerhenke
Abstract:
Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in networks that change over time.…
▽ More
Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in networks that change over time. In this paper we propose the first betweenness centrality approximation algorithms with a provable guarantee on the maximum approximation error for dynamic networks. Several new intermediate algorithmic results contribute to the respective approximation algorithms: (i) new upper bounds on the vertex diameter, (ii) the first fully-dynamic algorithm for updating an approximation of the vertex diameter in undirected graphs, and (iii) an algorithm with lower time complexity for updating single-source shortest paths in unweighted graphs after a batch of edge actions. Using approximation, our algorithms are the first to make in-memory computation of betweenness in dynamic networks with millions of edges feasible. Our experiments show that our algorithms can achieve substantial speedups compared to recomputation, up to several orders of magnitude. Moreover, the approximation accuracy is usually significantly better than the theoretical guarantee in terms of absolute error. More importantly, for reasonably small approximation error thresholds, the rank of nodes is well preserved, in particular for nodes with high betweenness.
△ Less
Submitted 27 October, 2015;
originally announced October 2015.
-
Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently
Authors:
Moritz von Looz,
Henning Meyerhenke
Abstract:
$\newcommand{\dist}{\operatorname{dist}}$ In this paper we define the notion of a probabilistic neighborhood in spatial data: Let a set $P$ of $n$ points in $\mathbb{R}^d$, a query point $q \in \mathbb{R}^d$, a distance metric $\dist$, and a monotonically decreasing function $f : \mathbb{R}^+ \rightarrow [0,1]$ be given. Then a point $p \in P$ belongs to the probabilistic neighborhood $N(q, f)…
▽ More
$\newcommand{\dist}{\operatorname{dist}}$ In this paper we define the notion of a probabilistic neighborhood in spatial data: Let a set $P$ of $n$ points in $\mathbb{R}^d$, a query point $q \in \mathbb{R}^d$, a distance metric $\dist$, and a monotonically decreasing function $f : \mathbb{R}^+ \rightarrow [0,1]$ be given. Then a point $p \in P$ belongs to the probabilistic neighborhood $N(q, f)$ of $q$ with respect to $f$ with probability $f(\dist(p,q))$. We envision applications in facility location, sensor networks, and other scenarios where a connection between two entities becomes less likely with increasing distance. A straightforward query algorithm would determine a probabilistic neighborhood in $Θ(n\cdot d)$ time by probing each point in $P$.
To answer the query in sublinear time for the planar case, we augment a quadtree suitably and design a corresponding query algorithm. Our theoretical analysis shows that -- for certain distributions of planar $P$ -- our algorithm answers a query in $O((|N(q,f)| + \sqrt{n})\log n)$ time with high probability (whp). This matches up to a logarithmic factor the cost induced by quadtree-based algorithms for deterministic queries and is asymptotically faster than the straightforward approach whenever $|N(q,f)| \in o(n / \log n)$.
As practical proofs of concept we use two applications, one in the Euclidean and one in the hyperbolic plane. In particular, our results yield the first generator for random hyperbolic graphs with arbitrary temperatures in subquadratic time. Moreover, our experimental data show the usefulness of our algorithm even if the point distribution is unknown or not uniform: The running time savings over the pairwise probing approach constitute at least one order of magnitude already for a modest number of points and queries.
△ Less
Submitted 16 August, 2016; v1 submitted 7 September, 2015;
originally announced September 2015.
-
Drawing Large Graphs by Multilevel Maxent-Stress Optimization
Authors:
Henning Meyerhenke,
Martin Nöllenburg,
Christian Schulz
Abstract:
Drawing large graphs appropriately is an important step for the visual analysis of data from real-world networks. Here we present a novel multilevel algorithm to compute a graph layout with respect to a recently proposed metric that combines layout stress and entropy. As opposed to previous work, we do not solve the linear systems of the maxent-stress metric with a typical numerical solver. Instea…
▽ More
Drawing large graphs appropriately is an important step for the visual analysis of data from real-world networks. Here we present a novel multilevel algorithm to compute a graph layout with respect to a recently proposed metric that combines layout stress and entropy. As opposed to previous work, we do not solve the linear systems of the maxent-stress metric with a typical numerical solver. Instead we use a simple local iterative scheme within a multilevel approach. To accelerate local optimization, we approximate long-range forces and use shared-memory parallelism. Our experiments validate the high potential of our approach, which is particularly appealing for dynamic graphs. In comparison to the previously best maxent-stress optimizer, which is sequential, our parallel implementation is on average 30 times faster already for static graphs (and still faster if executed on one thread) while producing a comparable solution quality.
△ Less
Submitted 10 August, 2015; v1 submitted 14 June, 2015;
originally announced June 2015.
-
n-Level Hypergraph Partitioning
Authors:
Vitali Henne,
Henning Meyerhenke,
Peter Sanders,
Sebastian Schlag,
Christian Schulz
Abstract:
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time and thus allows very high quality. This includes a rating function that avoids nonuniform vertex weights, an efficient "semi-dynamic" hypergraph data structure, a very fast coarsening algorithm, and two new local search algorithms. One is a $k$-way hypergraph adaptation of Fiduccia-Mattheyses lo…
▽ More
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time and thus allows very high quality. This includes a rating function that avoids nonuniform vertex weights, an efficient "semi-dynamic" hypergraph data structure, a very fast coarsening algorithm, and two new local search algorithms. One is a $k$-way hypergraph adaptation of Fiduccia-Mattheyses local search and gives high quality at reasonable cost. The other is an adaptation of size-constrained label propagation to hypergraphs. Comparisons with hMetis and PaToH indicate that the new algorithm yields better quality over several benchmark sets and has a running time that is comparable to hMetis. Using label propagation local search is several times faster than hMetis and gives better quality than PaToH for a VLSI benchmark set.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
Structure-Preserving Sparsification of Social Networks
Authors:
Gerd Lindner,
Christian L. Staudt,
Michael Hamann,
Henning Meyerhenke,
Dorothea Wagner
Abstract:
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating…
▽ More
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally by these scores. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of 100 Facebook social networks with respect to network properties including diameter, connected components, community structure, and multiple node centrality measures. Experiments with our implementations of the sparsification methods (using the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges. Furthermore, the experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. Our Local Degree method is fast enough for large-scale networks and performs well across a wider range of properties than previously proposed methods.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
Fully-dynamic Approximation of Betweenness Centrality
Authors:
Elisabetta Bergamini,
Henning Meyerhenke
Abstract:
Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in evolving networks. In previous w…
▽ More
Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in evolving networks. In previous work we proposed the first semi-dynamic algorithms that recompute an approximation of betweenness in connected graphs after batches of edge insertions.
In this paper we propose the first fully-dynamic approximation algorithms (for weighted and unweighted undirected graphs that need not to be connected) with a provable guarantee on the maximum approximation error. The transfer to fully-dynamic and disconnected graphs implies additional algorithmic problems that could be of independent interest. In particular, we propose a new upper bound on the vertex diameter for weighted undirected graphs. For both weighted and unweighted graphs, we also propose the first fully-dynamic algorithms that keep track of such upper bound. In addition, we extend our former algorithm for semi-dynamic BFS to batches of both edge insertions and deletions.
Using approximation, our algorithms are the first to make in-memory computation of betweenness in fully-dynamic networks with millions of edges feasible. Our experiments show that they can achieve substantial speedups compared to recomputation, up to several orders of magnitude.
△ Less
Submitted 3 July, 2015; v1 submitted 27 April, 2015;
originally announced April 2015.