-
Computational music analysis from first principles
Authors:
Dmitri Tymoczko,
Mark Newman
Abstract:
We use coupled hidden Markov models to automatically annotate the 371 Bach chorales in the Riemenschneider edition, a corpus containing approximately 100,000 notes and 20,000 chords. We give three separate analyses that achieve progressively greater accuracy at the cost of making increasingly strong assumptions about musical syntax. Although our method makes almost no use of human input, we are ab…
▽ More
We use coupled hidden Markov models to automatically annotate the 371 Bach chorales in the Riemenschneider edition, a corpus containing approximately 100,000 notes and 20,000 chords. We give three separate analyses that achieve progressively greater accuracy at the cost of making increasingly strong assumptions about musical syntax. Although our method makes almost no use of human input, we are able to identify both chords and keys with an accuracy of 85% or greater when compared to an expert human analysis, resulting in annotations accurate enough to be used for a range of music-theoretical purposes, while also being free of subjective human judgments. Our work bears on longstanding debates about the objective reality of the structures postulated by standard Western harmonic theory, as well as on specific questions about the nature of Western harmonic syntax.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Mutual information and the encoding of contingency tables
Authors:
Maximilian Jerdee,
Alec Kirkley,
M. E. J. Newman
Abstract:
Mutual information is commonly used as a measure of similarity between competing labelings of a given set of objects, for example to quantify performance in classification and community detection tasks. As argued recently, however, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table, a crucial compon…
▽ More
Mutual information is commonly used as a measure of similarity between competing labelings of a given set of objects, for example to quantify performance in classification and community detection tasks. As argued recently, however, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table, a crucial component of the similarity calculation. In principle the bias can be rectified by subtracting the appropriate information cost, leading to the modified measure known as the reduced mutual information, but in practice one can only ever compute an upper bound on this information cost, and the value of the reduced mutual information depends crucially on how good a bound is established. In this paper we describe an improved method for encoding contingency tables that gives a substantially better bound in typical use cases, and approaches the ideal value in the common case where the labelings are closely similar, as we demonstrate with extensive numerical results.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Using Deep Learning to Identify Initial Error Sensitivity for Interpretable ENSO Forecasts
Authors:
Kinya Toride,
Matthew Newman,
Andrew Hoell,
Antonietta Capotondi,
Jakob Schlör,
Dillon Amaya
Abstract:
We introduce an interpretable-by-design method, optimized model-analog, that integrates deep learning with model-analog forecasting, a straightforward yet effective approach that generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify initial analog sta…
▽ More
We introduce an interpretable-by-design method, optimized model-analog, that integrates deep learning with model-analog forecasting, a straightforward yet effective approach that generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify initial analog states that lead to shadowing target trajectories. The advantage of our method lies in its inherent interpretability, offering insights into initial-error-sensitive regions through estimated weights and the ability to trace the physically-based evolution of the system through analog forecasting. We evaluate our approach using the Community Earth System Model Version 2 Large Ensemble to forecast the El Niño-Southern Oscillation (ENSO) on a seasonal-to-annual time scale. Results show a 10% improvement in forecasting equatorial Pacific sea surface temperature anomalies at 9-12 months leads compared to the original (unweighted) model-analog technique. Furthermore, our model demonstrates improvements in boreal winter and spring initialization when evaluated against a reanalysis dataset. Our approach reveals state-dependent regional sensitivity linked to various seasonally varying physical processes, including the Pacific Meridional Modes, equatorial recharge oscillator, and stochastic wind forcing. Additionally, disparities emerge in the sensitivity associated with El Niño versus La Niña events. El Niño forecasts are more sensitive to initial uncertainty in tropical Pacific sea surface temperatures, while La Niña forecasts are more sensitive to initial uncertainty in tropical Pacific zonal wind stress. This approach has broad implications for forecasting diverse climate phenomena, including regional temperature and precipitation, which are challenging for the original model-analog approach.
△ Less
Submitted 14 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
$q$-ary Sequential Locally Recoverable Codes from the Product Construction
Authors:
Akram Baghban,
Marc Newman,
Anna-Lena Horlemann,
Mehdi Ghiyasvand
Abstract:
This work focuses on sequential locally recoverable codes (SLRCs), a special family of locally repairable codes, capable of correcting multiple code symbol erasures, which are commonly used for distributed storage systems. First, we construct an extended $q$-ary family of non-binary SLRCs using code products with a novel maximum number of recoverable erasures $t$ and a minimal repair alternativity…
▽ More
This work focuses on sequential locally recoverable codes (SLRCs), a special family of locally repairable codes, capable of correcting multiple code symbol erasures, which are commonly used for distributed storage systems. First, we construct an extended $q$-ary family of non-binary SLRCs using code products with a novel maximum number of recoverable erasures $t$ and a minimal repair alternativity $A$. Second, we study how MDS and BCH codes can be used to construct $q$-ary SLRCs. Finally, we compare our codes to other LRCs.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Luck, skill, and depth of competition in games and social hierarchies
Authors:
Maximilian Jerdee,
M. E. J. Newman
Abstract:
Patterns of wins and losses in pairwise contests, such as occur in sports and games, consumer research and paired comparison studies, and human and animal social hierarchies, are commonly analyzed using probabilistic models that allow one to quantify the strength of competitors or predict the outcome of future contests. Here we generalize this approach to incorporate two additional features: an el…
▽ More
Patterns of wins and losses in pairwise contests, such as occur in sports and games, consumer research and paired comparison studies, and human and animal social hierarchies, are commonly analyzed using probabilistic models that allow one to quantify the strength of competitors or predict the outcome of future contests. Here we generalize this approach to incorporate two additional features: an element of randomness or luck that leads to upset wins, and a "depth of competition" variable that measures the complexity of a game or hierarchy. Fitting the resulting model to a large collection of data sets we estimate depth and luck in a range of games, sports, and social situations. In general, we find that social competition tends to be "deep," meaning it has a pronounced hierarchy with many distinct levels, but also that there is often a nonzero chance of an upset victory, meaning that dominance challenges can be won even by significant underdogs. Competition in sports and games, by contrast, tends to be shallow and in most cases there is little evidence of upset wins, beyond those already implied by the shallowness of the hierarchy.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network
Authors:
Johannes Bausch,
Andrew W Senior,
Francisco J H Heras,
Thomas Edlich,
Alex Davies,
Michael Newman,
Cody Jones,
Kevin Satzinger,
Murphy Yuezhen Niu,
Sam Blackwell,
George Holland,
Dvir Kafri,
Juan Atalaya,
Craig Gidney,
Demis Hassabis,
Sergio Boixo,
Hartmut Neven,
Pushmeet Kohli
Abstract:
Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On di…
▽ More
Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On distances up to 11, the decoder maintains its advantage on simulated data with realistic noise including cross-talk, leakage, and analog readout signals, and sustains its accuracy far beyond the 25 cycles it was trained on. Our work illustrates the ability of machine learning to go beyond human-designed algorithms by learning from data directly, highlighting machine learning as a strong contender for decoding in quantum computers.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion
Authors:
James Z. Wang,
Sicheng Zhao,
Chenyan Wu,
Reginald B. Adams,
Michelle G. Newman,
Tal Shafir,
Rachelle Tsachor
Abstract:
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains…
▽ More
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion", coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Normalized mutual information is a biased measure for classification and community detection
Authors:
Maximilian Jerdee,
Alec Kirkley,
M. E. J. Newman
Abstract:
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurio…
▽ More
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
△ Less
Submitted 29 August, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Trauma-Informed Social Media: Towards Solutions for Reducing and Healing Online Harm
Authors:
Carol F. Scott,
Gabriela Marcu,
Riana Elyse Anderson,
Mark W. Newman,
Sarita Schoenebeck
Abstract:
Social media platforms exacerbate trauma, and many users experience various forms of trauma unique to them (e.g., doxxing and swatting). Trauma is the psychological and physical response to experiencing a deeply disturbing event. Platforms' failures to address trauma threaten users' well-being globally, especially amongst minoritized groups. Platform policies also expose moderators and designers t…
▽ More
Social media platforms exacerbate trauma, and many users experience various forms of trauma unique to them (e.g., doxxing and swatting). Trauma is the psychological and physical response to experiencing a deeply disturbing event. Platforms' failures to address trauma threaten users' well-being globally, especially amongst minoritized groups. Platform policies also expose moderators and designers to trauma through content they must engage with as part of their jobs (e.g., child sexual abuse). We consider how a trauma-informed approach might help address or decrease the likelihood of (re)experiencing trauma online. A trauma-informed approach to social media recognizes that everyone likely has a trauma history and that trauma is experienced at the individual, secondary, collective, and cultural levels. This paper proceeds by detailing trauma and its impacts. We then describe how the six trauma-informed principles can be applied to social media design, content moderation, and companies. We conclude by offering recommendations that balance platform responsibility and accountability with well-being and healing for all.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Hierarchical core-periphery structure in networks
Authors:
Austin Polanco,
M. E. J. Newman
Abstract:
We study core-periphery structure in networks using inference methods based on a flexible network model that allows for traditional onion-like cores within cores, but also for hierarchical tree-like structures and more general non-nested types of structure. We propose an efficient Monte Carlo scheme for fitting the model to observed networks and report results for a selection of real-world data se…
▽ More
We study core-periphery structure in networks using inference methods based on a flexible network model that allows for traditional onion-like cores within cores, but also for hierarchical tree-like structures and more general non-nested types of structure. We propose an efficient Monte Carlo scheme for fitting the model to observed networks and report results for a selection of real-world data sets. Among other things, we observe an empirical distinction between networks showing traditional core-periphery structure with a dense core weakly connected to a sparse periphery, and an alternative structure in which the core is strongly connected both within itself and to the periphery. Networks vary in whether they are better represented by one type of structure or the other. We also observe structures that are a hybrid between core-periphery structure and community structure, in which networks have a set of non-overlapping cores that correspond roughly to communities, surrounded by a single undifferentiated periphery. Computer code implementing our methods is available.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Message passing methods on complex networks
Authors:
M. E. J. Newman
Abstract:
Networks and network computations have become a primary mathematical tool for analyzing the structure of many kinds of complex systems, ranging from the Internet and transportation networks to biochemical interactions and social networks. A common task in network analysis is the calculation of quantities that reside on the nodes of a network, such as centrality measures, probabilities, or model st…
▽ More
Networks and network computations have become a primary mathematical tool for analyzing the structure of many kinds of complex systems, ranging from the Internet and transportation networks to biochemical interactions and social networks. A common task in network analysis is the calculation of quantities that reside on the nodes of a network, such as centrality measures, probabilities, or model states. In this review article we discuss message passing methods, a family of techniques for performing such calculations, based on the propagation of information between the nodes of a network. We introduce the message passing approach with a series of examples, give some illustrative applications and results, and discuss the deep connections between message passing and phase transitions in networks. We also point out some limitations of the message passing approach and describe some recently-introduced methods that address these limitations.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
20 years of network community detection
Authors:
Santo Fortunato,
M. E. J. Newman
Abstract:
A fundamental technical challenge in the analysis of network data is the automated discovery of communities - groups of nodes that are strongly connected or that share similar features or roles. In this commentary we review progress in the field over the last 20 years.
A fundamental technical challenge in the analysis of network data is the automated discovery of communities - groups of nodes that are strongly connected or that share similar features or roles. In this commentary we review progress in the field over the last 20 years.
△ Less
Submitted 2 August, 2022; v1 submitted 29 July, 2022;
originally announced August 2022.
-
Efficient computation of rankings from pairwise comparisons
Authors:
M. E. J. Newman
Abstract:
We study the ranking of individuals, teams, or objects, based on pairwise comparisons between them, using the Bradley-Terry model. Estimates of rankings within this model are commonly made using a simple iterative algorithm first introduced by Zermelo almost a century ago. Here we describe an alternative and similarly simple iteration that provably returns identical results but does so much faster…
▽ More
We study the ranking of individuals, teams, or objects, based on pairwise comparisons between them, using the Bradley-Terry model. Estimates of rankings within this model are commonly made using a simple iterative algorithm first introduced by Zermelo almost a century ago. Here we describe an alternative and similarly simple iteration that provably returns identical results but does so much faster -- over a hundred times faster in some cases. We demonstrate this algorithm with applications to a range of example data sets and derive a number of results regarding its convergence.
△ Less
Submitted 7 June, 2023; v1 submitted 30 June, 2022;
originally announced July 2022.
-
Ranking with multiple types of pairwise comparisons
Authors:
M. E. J. Newman
Abstract:
The task of ranking individuals or teams, based on a set of comparisons between pairs, arises in various contexts, including sporting competitions and the analysis of dominance hierarchies among animals and humans. Given data on which competitors beat which others, the challenge is to rank the competitors from best to worst. Here we study the problem of computing rankings when there are multiple,…
▽ More
The task of ranking individuals or teams, based on a set of comparisons between pairs, arises in various contexts, including sporting competitions and the analysis of dominance hierarchies among animals and humans. Given data on which competitors beat which others, the challenge is to rank the competitors from best to worst. Here we study the problem of computing rankings when there are multiple, potentially conflicting modes of comparison, such as multiple types of dominance behaviors among animals. We assume that we do not know a priori what information each behavior conveys about the ranking, or even whether they convey any information at all. Nonetheless we show that it is possible to compute a ranking in this situation and present a fast method for doing so, based on a combination of an expectation-maximization algorithm and a modified Bradley-Terry model. We give a selection of example applications to both animal and human competition.
△ Less
Submitted 19 October, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Cutting Through the Noise to Infer Autonomous System Topology
Authors:
Kirtus G. Leyba,
Joshua J. Daymude,
Jean-Gabriel Young,
M. E. J. Newman,
Jennifer Rexford,
Stephanie Forrest
Abstract:
The Border Gateway Protocol (BGP) is a distributed protocol that manages interdomain routing without requiring a centralized record of which autonomous systems (ASes) connect to which others. Many methods have been devised to infer the AS topology from publicly available BGP data, but none provide a general way to handle the fact that the data are notoriously incomplete and subject to error. This…
▽ More
The Border Gateway Protocol (BGP) is a distributed protocol that manages interdomain routing without requiring a centralized record of which autonomous systems (ASes) connect to which others. Many methods have been devised to infer the AS topology from publicly available BGP data, but none provide a general way to handle the fact that the data are notoriously incomplete and subject to error. This paper describes a method for reliably inferring AS-level connectivity in the presence of measurement error using Bayesian statistical inference acting on BGP routing tables from multiple vantage points. We employ a novel approach for counting AS adjacency observations in the AS-PATH attribute data from public route collectors, along with a Bayesian algorithm to generate a statistical estimate of the AS-level network. Our approach also gives us a way to evaluate the accuracy of existing reconstruction methods and to identify advantageous locations for new route collectors or vantage points.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Clustering of heterogeneous populations of networks
Authors:
Jean-Gabriel Young,
Alec Kirkley,
M. E. J. Newman
Abstract:
Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People's social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we desc…
▽ More
Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People's social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.
△ Less
Submitted 23 January, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Representative community divisions of networks
Authors:
Alec Kirkley,
M. E. J. Newman
Abstract:
Methods for detecting community structure in networks typically aim to identify a single best partition of network nodes into communities, often by optimizing some objective function, but in real-world applications there may be many competitive partitions with objective scores close to the global optimum and one can obtain a more informative picture of the community structure by examining a repres…
▽ More
Methods for detecting community structure in networks typically aim to identify a single best partition of network nodes into communities, often by optimizing some objective function, but in real-world applications there may be many competitive partitions with objective scores close to the global optimum and one can obtain a more informative picture of the community structure by examining a representative set of such high-scoring partitions than by looking at just the single optimum. However, such a set can be difficult to interpret since its size can easily run to hundreds or thousands of partitions. In this paper we present a method for analyzing large partition sets by dividing them into groups of similar partitions and then identifying an archetypal partition as a representative of each group. The resulting set of archetypal partitions provides a succinct, interpretable summary of the form and variety of community structure in any network. We demonstrate the method on a range of example networks.
△ Less
Submitted 17 February, 2022; v1 submitted 10 May, 2021;
originally announced May 2021.
-
The friendship paradox in real and model networks
Authors:
George T. Cantwell,
Alec Kirkley,
M. E. J. Newman
Abstract:
The friendship paradox is the observation that the degrees of the neighbors of a node in any network will, on average, be greater than the degree of the node itself. In common parlance, your friends have more friends than you do. In this paper we develop the mathematical theory of the friendship paradox, both in general as well as for specific model networks, focusing not only on average behavior…
▽ More
The friendship paradox is the observation that the degrees of the neighbors of a node in any network will, on average, be greater than the degree of the node itself. In common parlance, your friends have more friends than you do. In this paper we develop the mathematical theory of the friendship paradox, both in general as well as for specific model networks, focusing not only on average behavior but also on variation about the average and using generating function methods to calculate full distributions of quantities of interest. We compare the predictions of our theory with measurements on a large number of real-world network data sets and find remarkably good agreement. We also develop equivalent theory for the generalized friendship paradox, which compares characteristics of nodes other than degree to those of their neighbors.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Belief propagation for networks with loops
Authors:
Alec Kirkley,
George T. Cantwell,
M. E. J. Newman
Abstract:
Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here we provide a solution to this long-standing problem, deriving a belief propagation method th…
▽ More
Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here we provide a solution to this long-standing problem, deriving a belief propagation method that allows for fast calculation of probability distributions in systems with short loops, potentially with high density, as well as giving expressions for the entropy and partition function, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our approach gives excellent results on both real and synthetic networks, improving significantly on standard message passing methods. We also discuss potential applications of our method to a variety of other problems.
△ Less
Submitted 24 April, 2021; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Bayesian inference of network structure from unreliable data
Authors:
Jean-Gabriel Young,
George T. Cantwell,
M. E. J. Newman
Abstract:
Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method…
▽ More
Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.
△ Less
Submitted 9 March, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Classical Coding Problem from Transversal $T$ Gates
Authors:
Narayanan Rengaswamy,
Robert Calderbank,
Michael Newman,
Henry D. Pfister
Abstract:
Universal quantum computation requires the implementation of a logical non-Clifford gate. In this paper, we characterize all stabilizer codes whose code subspaces are preserved under physical $T$ and $T^{-1}$ gates. For example, this could enable magic state distillation with non-CSS codes and, thus, provide better parameters than CSS-based protocols. However, among non-degenerate stabilizer codes…
▽ More
Universal quantum computation requires the implementation of a logical non-Clifford gate. In this paper, we characterize all stabilizer codes whose code subspaces are preserved under physical $T$ and $T^{-1}$ gates. For example, this could enable magic state distillation with non-CSS codes and, thus, provide better parameters than CSS-based protocols. However, among non-degenerate stabilizer codes that support transversal $T$, we prove that CSS codes are optimal. We also show that triorthogonal codes are, essentially, the only family of CSS codes that realize logical transversal $T$ via physical transversal $T$. Using our algebraic approach, we reveal new purely-classical coding problems that are intimately related to the realization of logical operations via transversal $T$. Decreasing monomial codes are also used to construct a code that realizes logical CCZ. Finally, we use Ax's theorem to characterize the logical operation realized on a family of quantum Reed-Muller codes. This result is generalized to finer angle $Z$-rotations in arXiv:1910.09333.
△ Less
Submitted 18 August, 2021; v1 submitted 14 January, 2020;
originally announced January 2020.
-
On Optimality of CSS Codes for Transversal $T$
Authors:
Narayanan Rengaswamy,
Robert Calderbank,
Michael Newman,
Henry D. Pfister
Abstract:
In order to perform universal fault-tolerant quantum computation, one needs to implement a logical non-Clifford gate. Consequently, it is important to understand codes that implement such gates transversally. In this paper, we adopt an algebraic approach to characterize all stabilizer codes for which transversal $T$ and $T^{-1}$ gates preserve the codespace. Our Heisenberg perspective reduces this…
▽ More
In order to perform universal fault-tolerant quantum computation, one needs to implement a logical non-Clifford gate. Consequently, it is important to understand codes that implement such gates transversally. In this paper, we adopt an algebraic approach to characterize all stabilizer codes for which transversal $T$ and $T^{-1}$ gates preserve the codespace. Our Heisenberg perspective reduces this to a finite geometry problem that translates to the design of certain classical codes. We prove three corollaries: (a) For any non-degenerate $[[ n,k,d ]]$ stabilizer code supporting a physical transversal $T$, there exists an $[[ n,k,d ]]$ CSS code with the same property; (b) Triorthogonal codes are the most general CSS codes that realize logical transversal $T$ via physical transversal $T$; (c) Triorthogonality is necessary for physical transversal $T$ on a CSS code to realize the logical identity. The main tool we use is a recent efficient characterization of certain diagonal gates in the Clifford hierarchy (arXiv:1902.04022). We refer to these gates as Quadratic Form Diagonal (QFD) gates. Our framework generalizes all existing code constructions that realize logical gates via transversal $T$. We provide several examples and briefly discuss connections to decreasing monomial codes, pin codes, generalized triorthogonality and quasitransversality. We partially extend these results towards characterizing all stabilizer codes that support transversal $π/2^{\ell}$ $Z$-rotations. In particular, using Ax's theorem on residue weights of polynomials, we provide an alternate characterization of logical gates induced by transversal $π/2^{\ell}$ $Z$-rotations on a family of quantum Reed-Muller codes. We also briefly discuss a general approach to analyze QFD gates that might lead to a characterization of all stabilizer codes that support any given physical transversal $1$- or $2$-local diagonal gate.
△ Less
Submitted 18 August, 2021; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Consistency of community structure in complex networks
Authors:
Maria A. Riolo,
M. E. J. Newman
Abstract:
The most widely used techniques for community detection in networks, including methods based on modularity, statistical inference, and information theoretic arguments, all work by optimizing objective functions that measure the quality of network partitions. There is a good case to be made, however, that one should not look solely at the single optimal community structure under such an objective f…
▽ More
The most widely used techniques for community detection in networks, including methods based on modularity, statistical inference, and information theoretic arguments, all work by optimizing objective functions that measure the quality of network partitions. There is a good case to be made, however, that one should not look solely at the single optimal community structure under such an objective function, but rather at a selection of high-scoring structures. If one does this one typically finds that the resulting structures show considerable variation, and this has been taken as evidence that these community detection methods are unreliable, since they do not appear to give consistent answers. Here we argue that, upon closer inspection, the structures found are in fact consistent in a certain way. Specifically, we show that they can all be assembled from a set of underlying "building blocks", groups of network nodes that are usually found together in the same community. Different community structures correspond to different arrangements of blocks, but the blocks themselves are largely invariant. We propose an information theoretic method for discovering the building blocks in specific networks and demonstrate it with several example applications. We conclude that traditional community detection is not the failure some have suggested it is, and that in fact it gives a significant amount of insight into network structure, although perhaps not in exactly the way previously imagined.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
Improved mutual information measure for classification and community detection
Authors:
M. E. J. Newman,
George T. Cantwell,
Jean-Gabriel Young
Abstract:
The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the stan…
▽ More
The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.
△ Less
Submitted 29 July, 2019;
originally announced July 2019.
-
Message passing on networks with loops
Authors:
George T. Cantwell,
M. E. J. Newman
Abstract:
In this paper we offer a solution to a long-standing problem in the study of networks. Message passing is a fundamental technique for calculations on networks and graphs. The first versions of the method appeared in the 1930s and over the decades it has been applied to a wide range of foundational problems in mathematics, physics, computer science, statistics, and machine learning, including Bayes…
▽ More
In this paper we offer a solution to a long-standing problem in the study of networks. Message passing is a fundamental technique for calculations on networks and graphs. The first versions of the method appeared in the 1930s and over the decades it has been applied to a wide range of foundational problems in mathematics, physics, computer science, statistics, and machine learning, including Bayesian inference, spin models, coloring, satisfiability, graph partitioning, network epidemiology, and the calculation of matrix eigenvalues. Despite its wide use, however, it has long been recognized that the method has a fundamental flaw: it only works on networks that are free of short loops. Loops introduce correlations that cause the method to give inaccurate answers at best, and to fail completely in the worst cases. Unfortunately, almost all real-world networks contain many short loops, which limits the usefulness of the message passing approach. In this paper we demonstrate how to rectify this shortcoming and create message passing methods that work on any network. We give two example applications, one to the percolation properties of networks and the other to the calculation of the spectra of sparse matrices.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Structure of online dating markets in US cities
Authors:
Elizabeth E. Bruch,
M. E. J. Newman
Abstract:
We study the structure of heterosexual dating markets in the United States through an analysis of the interactions of several million users of a large online dating web site, applying recently developed network analysis methods to the pattern of messages exchanged among users. Our analysis shows that the strongest driver of romantic interaction at the national level is simple geographic proximity,…
▽ More
We study the structure of heterosexual dating markets in the United States through an analysis of the interactions of several million users of a large online dating web site, applying recently developed network analysis methods to the pattern of messages exchanged among users. Our analysis shows that the strongest driver of romantic interaction at the national level is simple geographic proximity, but at the local level other demographic factors come into play. We find that dating markets in each city are partitioned into submarkets along lines of age and ethnicity. Sex ratio varies widely between submarkets, with younger submarkets having more men and fewer women than older ones. There is also a noticeable tendency for minorities, especially women, to be younger than the average in older submarkets, and our analysis reveals how this kind of racial stratification arises through the messaging decisions of both men and women. Our study illustrates how network techniques applied to online interactions can reveal the aggregate effects of individual behavior on social structure.
△ Less
Submitted 3 April, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Explicit lower bounds on strong simulation of quantum circuits in terms of $T$-gate count
Authors:
Cupjin Huang,
Michael Newman,
Mario Szegedy
Abstract:
We investigate Clifford+$T$ quantum circuits with a small number of $T$-gates. Using the sparsification lemma, we identify time complexity lower bounds in terms of $T$-gate count below which a strong simulator would improve on the state-of-the-art $3$-SAT solving.
We investigate Clifford+$T$ quantum circuits with a small number of $T$-gates. Using the sparsification lemma, we identify time complexity lower bounds in terms of $T$-gate count below which a strong simulator would improve on the state-of-the-art $3$-SAT solving.
△ Less
Submitted 25 February, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
Spectra of networks containing short loops
Authors:
M. E. J. Newman
Abstract:
The spectrum of the adjacency matrix plays several important roles in the mathematical theory of networks and in network data analysis, for example in percolation theory, community detection, centrality measures, and the theory of dynamical systems on networks. A number of methods have been developed for the analytic computation of network spectra, but they typically assume that networks are local…
▽ More
The spectrum of the adjacency matrix plays several important roles in the mathematical theory of networks and in network data analysis, for example in percolation theory, community detection, centrality measures, and the theory of dynamical systems on networks. A number of methods have been developed for the analytic computation of network spectra, but they typically assume that networks are locally tree-like, meaning that the local neighborhood of any node takes the form of a tree, free of short loops. Empirically observed networks, by contrast, often have many short loops. Here we develop an approach for calculating the spectra of networks with short loops using a message passing method. We give example applications to some previously studied classes of networks.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Spectra of random networks with arbitrary degrees
Authors:
M. E. J. Newman,
Xiao Zhang,
Raj Rao Nadakuditi
Abstract:
We derive a message passing method for computing the spectra of locally tree-like networks and an approximation to it that allows us to compute closed-form expressions or fast numerical approximates for the spectral density of random graphs with arbitrary node degrees -- the so-called configuration model. We find the latter approximation to work well for all but the sparsest of networks. We also d…
▽ More
We derive a message passing method for computing the spectra of locally tree-like networks and an approximation to it that allows us to compute closed-form expressions or fast numerical approximates for the spectral density of random graphs with arbitrary node degrees -- the so-called configuration model. We find the latter approximation to work well for all but the sparsest of networks. We also derive bounds on the position of the band edges of the spectrum, which are important for identifying structural phase transitions in networks.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
Mixing patterns and individual differences in networks
Authors:
George T. Cantwell,
M. E. J. Newman
Abstract:
We study mixing patterns in networks, meaning the propensity for nodes of different kinds to connect to one another. The phenomenon of assortative mixing, whereby nodes prefer to connect to others that are similar to themselves, has been widely studied, but here we go further and examine how and to what extent nodes that are otherwise similar can have different preferences. Many individuals in a f…
▽ More
We study mixing patterns in networks, meaning the propensity for nodes of different kinds to connect to one another. The phenomenon of assortative mixing, whereby nodes prefer to connect to others that are similar to themselves, has been widely studied, but here we go further and examine how and to what extent nodes that are otherwise similar can have different preferences. Many individuals in a friendship network, for instance, may prefer friends who are roughly the same age as themselves, but some may display a preference for older or younger friends. We introduce a network model that captures this behavior and a method for fitting it to empirical network data. We propose metrics to characterize the mean and variation of mixing patterns and show how to infer their values from the fitted model, either using maximum-likelihood estimates of model parameters or in a Bayesian framework that does not require fixing any parameters.
△ Less
Submitted 17 April, 2019; v1 submitted 2 October, 2018;
originally announced October 2018.
-
Balance in signed networks
Authors:
Alec Kirkley,
George T. Cantwell,
M. E. J. Newman
Abstract:
We consider signed networks in which connections or edges can be either positive (friendship, trust, alliance) or negative (dislike, distrust, conflict). Early literature in graph theory theorized that such networks should display "structural balance," meaning that certain configurations of positive and negative edges are favored and others are disfavored. Here we propose two measures of balance i…
▽ More
We consider signed networks in which connections or edges can be either positive (friendship, trust, alliance) or negative (dislike, distrust, conflict). Early literature in graph theory theorized that such networks should display "structural balance," meaning that certain configurations of positive and negative edges are favored and others are disfavored. Here we propose two measures of balance in signed networks based on the established notions of weak and strong balance, and compare their performance on a range of tasks with each other and with previously proposed measures. In particular, we ask whether real-world signed networks are significantly balanced by these measures compared to an appropriate null model, finding that indeed they are, by all the measures studied. We also test our ability to predict unknown signs in otherwise known networks by maximizing balance. In a series of cross-validation tests we find that our measures are able to predict signs substantially better than chance.
△ Less
Submitted 13 September, 2018;
originally announced September 2018.
-
ARBEE: Towards Automated Recognition of Bodily Expression of Emotion In the Wild
Authors:
Yu Luo,
Jianbo Ye,
Reginald B. Adams, Jr.,
Jia Li,
Michelle G. Newman,
James Z. Wang
Abstract:
Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expre…
▽ More
Humans are arguably innately prepared to comprehend others' emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9,876 video clips of body movements and 13,239 human characters, named BoLD (Body Language Dataset), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton, and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans.
△ Less
Submitted 9 July, 2019; v1 submitted 28 August, 2018;
originally announced August 2018.
-
Aspirational pursuit of mates in online dating markets
Authors:
Elizabeth E. Bruch,
M. E. J. Newman
Abstract:
Romantic courtship is often described as taking place in a dating market where men and women compete for mates, but the detailed structure and dynamics of dating markets have historically been difficult to quantify for lack of suitable data. In recent years, however, the advent and vigorous growth of the online dating industry has provided a rich new source of information on mate pursuit. Here we…
▽ More
Romantic courtship is often described as taking place in a dating market where men and women compete for mates, but the detailed structure and dynamics of dating markets have historically been difficult to quantify for lack of suitable data. In recent years, however, the advent and vigorous growth of the online dating industry has provided a rich new source of information on mate pursuit. Here we present an empirical analysis of heterosexual dating markets in four large US cities using data from a popular, free online dating service. We show that competition for mates creates a pronounced hierarchy of desirability that correlates strongly with user demographics and is remarkably consistent across cities. We find that both men and women pursue partners who are on average about 25% more desirable than themselves by our measures and that they use different messaging strategies with partners of different desirability. We also find that the probability of receiving a response to an advance drops markedly with increasing difference in desirability between the pursuer and the pursued. Strategic behaviors can improve one's chances of attracting a more desirable mate, though the effects are modest.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Explicit lower bounds on strong quantum simulation
Authors:
Cupjin Huang,
Michael Newman,
Mario Szegedy
Abstract:
We consider the problem of strong (amplitude-wise) simulation of $n$-qubit quantum circuits, and identify a subclass of simulators we call monotone. This subclass encompasses almost all prominent simulation techniques. We prove an unconditional (i.e. without relying on any complexity theoretic assumptions) and explicit $(n-2)(2^{n-3}-1)$ lower bound on the running time of simulators within this su…
▽ More
We consider the problem of strong (amplitude-wise) simulation of $n$-qubit quantum circuits, and identify a subclass of simulators we call monotone. This subclass encompasses almost all prominent simulation techniques. We prove an unconditional (i.e. without relying on any complexity theoretic assumptions) and explicit $(n-2)(2^{n-3}-1)$ lower bound on the running time of simulators within this subclass. Assuming the Strong Exponential Time Hypothesis (SETH), we further remark that a universal simulator computing any amplitude to precision $2^{-n}/2$ must take at least $2^{n - o(n)}$ time. Finally, we compare strong simulators to existing SAT solvers, and identify the time-complexity below which a strong simulator would improve on state-of-the-art SAT solving.
△ Less
Submitted 30 April, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
Estimating network structure from unreliable measurements
Authors:
M. E. J. Newman
Abstract:
Most empirical studies of networks assume that the network data we are given represent a complete and accurate picture of the nodes and edges in the system of interest, but in real-world situations this is rarely the case. More often the data only specify the network structure imperfectly -- like data in essentially every other area of empirical science, network data are prone to measurement error…
▽ More
Most empirical studies of networks assume that the network data we are given represent a complete and accurate picture of the nodes and edges in the system of interest, but in real-world situations this is rarely the case. More often the data only specify the network structure imperfectly -- like data in essentially every other area of empirical science, network data are prone to measurement error and noise. At the same time, the data may be richer than simple network measurements, incorporating multiple measurements, weights, lengths or strengths of edges, node or edge labels, or annotations of various kinds. Here we develop a general method for making estimates of network structure and properties using any form of network data, simple or complex, when the data are unreliable, and give example applications to a selection of social and biological networks.
△ Less
Submitted 18 December, 2018; v1 submitted 6 March, 2018;
originally announced March 2018.
-
Efficient method for estimating the number of communities in a network
Authors:
Maria A. Riolo,
George T. Cantwell,
Gesine Reinert,
M. E. J. Newman
Abstract:
While there exist a wide range of effective methods for community detection in networks, most of them require one to know in advance how many communities one is looking for. Here we present a method for estimating the number of communities in a network using a combination of Bayesian inference with a novel prior and an efficient Monte Carlo sampling scheme. We test the method extensively on both r…
▽ More
While there exist a wide range of effective methods for community detection in networks, most of them require one to know in advance how many communities one is looking for. Here we present a method for estimating the number of communities in a network using a combination of Bayesian inference with a novel prior and an efficient Monte Carlo sampling scheme. We test the method extensively on both real and computer-generated networks, showing that it performs accurately and consistently, even in cases where groups are widely varying in size or structure.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Limitations on Transversal Computation through Quantum Homomorphic Encryption
Authors:
Michael Newman,
Yaoyun Shi
Abstract:
Transversality is a simple and effective method for implementing quantum computation fault-tolerantly. However, no quantum error-correcting code (QECC) can transversally implement a quantum universal gate set (Eastin and Knill, Phys. Rev. Lett., 102, 110502). Since reversible classical computation is often a dominating part of useful quantum computation, whether or not it can be implemented transv…
▽ More
Transversality is a simple and effective method for implementing quantum computation fault-tolerantly. However, no quantum error-correcting code (QECC) can transversally implement a quantum universal gate set (Eastin and Knill, Phys. Rev. Lett., 102, 110502). Since reversible classical computation is often a dominating part of useful quantum computation, whether or not it can be implemented transversally is an important open problem. We show that, other than a small set of non-additive codes that we cannot rule out, no binary QECC can transversally implement a classical reversible universal gate set. In particular, no such QECC can implement the Toffoli gate transversally.
We prove our result by constructing an information theoretically secure (but inefficient) quantum homomorphic encryption (ITS-QHE) scheme inspired by Ouyang et al. (arXiv:1508.00938). Homomorphic encryption allows the implementation of certain functions directly on encrypted data, i.e. homomorphically. Our scheme builds on almost any QECC, and implements that code's transversal gate set homomorphically. We observe a restriction imposed by Nayak's bound (FOCS 1999) on ITS-QHE, implying that any ITS quantum fully homomorphic scheme (ITS-QFHE) implementing the full set of classical reversible functions must be highly inefficient. While our scheme incurs exponential overhead, any such QECC implementing Toffoli transversally would still violate this lower bound through our scheme.
△ Less
Submitted 13 August, 2017; v1 submitted 25 April, 2017;
originally announced April 2017.
-
Network structure from rich but noisy data
Authors:
M. E. J. Newman
Abstract:
Driven by growing interest in the sciences, industry, and among the broader public, a large number of empirical studies have been conducted in recent years of the structure of networks ranging from the internet and the world wide web to biological networks and social networks. The data produced by these experiments are often rich and multimodal, yet at the same time they may contain substantial me…
▽ More
Driven by growing interest in the sciences, industry, and among the broader public, a large number of empirical studies have been conducted in recent years of the structure of networks ranging from the internet and the world wide web to biological networks and social networks. The data produced by these experiments are often rich and multimodal, yet at the same time they may contain substantial measurement error. In practice, this means that the true network structure can differ greatly from naive estimates made from the raw data, and hence that conclusions drawn from those naive estimates may be significantly in error. In this paper we describe a technique that circumvents this problem and allows us to make optimal estimates of the true structure of networks in the presence of both richly textured data and significant measurement uncertainty. We give example applications to two different social networks, one derived from face-to-face interactions and one from self-reported friendships.
△ Less
Submitted 6 February, 2018; v1 submitted 21 March, 2017;
originally announced March 2017.
-
Probabilistic Multigraph Modeling for Improving the Quality of Crowdsourced Affective Data
Authors:
Jianbo Ye,
Jia Li,
Michelle G. Newman,
Reginald B. Adams Jr.,
James Z. Wang
Abstract:
We proposed a probabilistic approach to joint modeling of participants' reliability and humans' regularity in crowdsourced affective studies. Reliability measures how likely a subject will respond to a question seriously; and regularity measures how often a human will agree with other seriously-entered responses coming from a targeted population. Crowdsourcing-based studies or experiments, which r…
▽ More
We proposed a probabilistic approach to joint modeling of participants' reliability and humans' regularity in crowdsourced affective studies. Reliability measures how likely a subject will respond to a question seriously; and regularity measures how often a human will agree with other seriously-entered responses coming from a targeted population. Crowdsourcing-based studies or experiments, which rely on human self-reported affect, pose additional challenges as compared with typical crowdsourcing studies that attempt to acquire concrete non-affective labels of objects. The reliability of participants has been massively pursued for typical non-affective crowdsourcing studies, whereas the regularity of humans in an affective experiment in its own right has not been thoroughly considered. It has been often observed that different individuals exhibit different feelings on the same test question, which does not have a sole correct response in the first place. High reliability of responses from one individual thus cannot conclusively result in high consensus across individuals. Instead, globally testing consensus of a population is of interest to investigators. Built upon the agreement multigraph among tasks and workers, our probabilistic model differentiates subject regularity from population reliability. We demonstrate the method's effectiveness for in-depth robust analysis of large-scale crowdsourced affective data, including emotion and aesthetic assessments collected by presenting visual stimuli to human subjects.
△ Less
Submitted 5 January, 2017; v1 submitted 4 January, 2017;
originally announced January 2017.
-
Random graph models for dynamic networks
Authors:
Xiao Zhang,
Cristopher Moore,
M. E. J. Newman
Abstract:
We propose generalizations of a number of standard network models, including the classic random graph, the configuration model, and the stochastic block model, to the case of time-varying networks. We assume that the presence and absence of edges are governed by continuous-time Markov processes with rate parameters that can depend on properties of the nodes. In addition to computing equilibrium pr…
▽ More
We propose generalizations of a number of standard network models, including the classic random graph, the configuration model, and the stochastic block model, to the case of time-varying networks. We assume that the presence and absence of edges are governed by continuous-time Markov processes with rate parameters that can depend on properties of the nodes. In addition to computing equilibrium properties of these models, we demonstrate their use in data analysis and statistical inference, giving efficient algorithms for fitting them to observed network data. This allows us, for instance, to estimate the time constants of network evolution or infer community structure from temporal network data using cues embedded both in the probabilities over time that node pairs are connected by edges and in the characteristic dynamics of edge appearance and disappearance. We illustrate our methods with a selection of applications, both to computer-generated test networks and real-world examples.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.
-
Community detection in networks: Modularity optimization and maximum likelihood are equivalent
Authors:
M. E. J. Newman
Abstract:
We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in whic…
▽ More
We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.
△ Less
Submitted 7 June, 2016;
originally announced June 2016.
-
Estimating the number of communities in a network
Authors:
M. E. J. Newman,
Gesine Reinert
Abstract:
Community detection, the division of a network into dense subnetworks with only sparse connections between them, has been a topic of vigorous study in recent years. However, while there exist a range of powerful and flexible methods for dividing a network into a specified number of communities, it is an open question how to determine exactly how many communities one should use. Here we describe a…
▽ More
Community detection, the division of a network into dense subnetworks with only sparse connections between them, has been a topic of vigorous study in recent years. However, while there exist a range of powerful and flexible methods for dividing a network into a specified number of communities, it is an open question how to determine exactly how many communities one should use. Here we describe a mathematically principled approach for finding the number of communities in a network using a maximum-likelihood method. We demonstrate the approach on a range of real-world examples with known community structure, finding that it is able to determine the number of communities correctly in every case.
△ Less
Submitted 23 August, 2016; v1 submitted 9 May, 2016;
originally announced May 2016.
-
Community detection in networks with unequal groups
Authors:
Pan Zhang,
Cristopher Moore,
M. E. J. Newman
Abstract:
Recently, a phase transition has been discovered in the network community detection problem below which no algorithm can tell which nodes belong to which communities with success any better than a random guess. This result has, however, so far been limited to the case where the communities have the same size or the same average degree. Here we consider the case where the sizes or average degrees a…
▽ More
Recently, a phase transition has been discovered in the network community detection problem below which no algorithm can tell which nodes belong to which communities with success any better than a random guess. This result has, however, so far been limited to the case where the communities have the same size or the same average degree. Here we consider the case where the sizes or average degrees are different. This asymmetry allows us to assign nodes to communities with better-than- random success by examining their local neighborhoods. Using the cavity method, we show that this removes the detectability transition completely for networks with four groups or fewer, while for more than four groups the transition persists up to a critical amount of asymmetry but not beyond. The critical point in the latter case coincides with the point at which local information percolates, causing a global transition from a less-accurate solution to a more-accurate one.
△ Less
Submitted 10 September, 2015; v1 submitted 31 August, 2015;
originally announced September 2015.
-
Multiway spectral community detection in networks
Authors:
Xiao Zhang,
M. E. J. Newman
Abstract:
One of the most widely used methods for community detection in networks is the maximization of the quality function known as modularity. Of the many maximization techniques that have been used in this context, some of the most conceptually attractive are the spectral methods, which are based on the eigenvectors of the modularity matrix. Spectral algorithms have, however, been limited by and large…
▽ More
One of the most widely used methods for community detection in networks is the maximization of the quality function known as modularity. Of the many maximization techniques that have been used in this context, some of the most conceptually attractive are the spectral methods, which are based on the eigenvectors of the modularity matrix. Spectral algorithms have, however, been limited by and large to the division of networks into only two or three communities, with divisions into more than three being achieved by repeated two-way division. Here we present a spectral algorithm that can directly divide a network into any number of communities. The algorithm makes use of a mapping from modularity maximization to a vector partitioning problem, combined with a fast heuristic for vector partitioning. We compare the performance of this spectral algorithm with previous approaches and find it to give superior results, particularly in cases where community sizes are unbalanced. We also give demonstrative applications of the algorithm to two real-world networks and find that it produces results in good agreement with expectations for the networks studied.
△ Less
Submitted 22 June, 2015;
originally announced July 2015.
-
Structure and inference in annotated networks
Authors:
M. E. J. Newman,
Aaron Clauset
Abstract:
For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network, geographic location of nodes in the Internet, or cellular function of nodes in a gene regulatory network. Here we demonstrate how this "metadata" can be used to improve our analysis and understanding of network s…
▽ More
For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network, geographic location of nodes in the Internet, or cellular function of nodes in a gene regulatory network. Here we demonstrate how this "metadata" can be used to improve our analysis and understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. The learned correlations are also of interest in their own right, allowing us to make predictions about the community membership of nodes whose network connections are unknown. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological, and technological domains.
△ Less
Submitted 14 July, 2015;
originally announced July 2015.
-
Structural inference for uncertain networks
Authors:
Travis Martin,
Brian Ball,
M. E. J. Newman
Abstract:
In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a…
▽ More
In the study of networked systems such as biological, technological, and social networks the available data are often uncertain. Rather than knowing the structure of a network exactly, we know the connections between nodes only with a certain probability. In this paper we develop methods for the analysis of such uncertain data, focusing particularly on the problem of community detection. We give a principled maximum-likelihood method for inferring community structure and demonstrate how the results can be used to make improved estimates of the true structure of the network. Using computer-generated benchmark networks we demonstrate that our methods are able to reconstruct known communities more accurately than previous approaches based on data thresholding. We also give an example application to the detection of communities in a protein-protein interaction network.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Generalized communities in networks
Authors:
M. E. J. Newman,
Tiago P. Peixoto
Abstract:
A substantial volume of research has been devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this ge…
▽ More
A substantial volume of research has been devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this generalized structure in empirical network data and demonstrate with real-world examples how it can be used to learn new things about the shape and meaning of networks.
△ Less
Submitted 27 May, 2015;
originally announced May 2015.
-
Identification of core-periphery structure in networks
Authors:
Xiao Zhang,
Travis Martin,
M. E. J. Newman
Abstract:
Many networks can be usefully decomposed into a dense core plus an outlying, loosely-connected periphery. Here we propose an algorithm for performing such a decomposition on empirical network data using methods of statistical inference. Our method fits a generative model of core-periphery structure to observed data using a combination of an expectation--maximization algorithm for calculati…
▽ More
Many networks can be usefully decomposed into a dense core plus an outlying, loosely-connected periphery. Here we propose an algorithm for performing such a decomposition on empirical network data using methods of statistical inference. Our method fits a generative model of core-periphery structure to observed data using a combination of an expectation--maximization algorithm for calculating the parameters of the model and a belief propagation algorithm for calculating the decomposition itself. We find the method to be efficient, scaling easily to networks with a million or more nodes and we test it on a range of networks, including real-world examples as well as computer-generated benchmarks, for which it successfully identifies known core-periphery structure with low error rate. We also demonstrate that the method is immune from the detectability transition observed in the related community detection problem, which prevents the detection of community structure when that structure is too weak. There is no such transition for core-periphery structure, which is detectable, albeit with some statistical error, no matter how weak it is.
△ Less
Submitted 16 September, 2014;
originally announced September 2014.
-
Equitable random graphs
Authors:
M. E. J. Newman,
Travis Martin
Abstract:
Random graph models have played a dominant role in the theoretical study of networked systems. The Poisson random graph of Erdos and Renyi, in particular, as well as the so-called configuration model, have served as the starting point for numerous calculations. In this paper we describe another large class of random graph models, which we call equitable random graphs and which are flexible enough…
▽ More
Random graph models have played a dominant role in the theoretical study of networked systems. The Poisson random graph of Erdos and Renyi, in particular, as well as the so-called configuration model, have served as the starting point for numerous calculations. In this paper we describe another large class of random graph models, which we call equitable random graphs and which are flexible enough to represent networks with diverse degree distributions and many nontrivial types of structure, including community structure, bipartite structure, degree correlations, stratification, and others, yet are exactly solvable for a wide range of properties in the limit of large graph size, including percolation properties, complete spectral density, and the behavior of homogeneous dynamical systems, such as coupled oscillators or epidemic models.
△ Less
Submitted 6 May, 2014;
originally announced May 2014.
-
Percolation on sparse networks
Authors:
Brian Karrer,
M. E. J. Newman,
Lenka Zdeborová
Abstract:
We study percolation on networks, which is used as a model of the resilience of networked systems such as the Internet to attack or failure and as a simple model of the spread of disease over human contact networks. We reformulate percolation as a message passing process and demonstrate how the resulting equations can be used to calculate, among other things, the size of the percolating cluster an…
▽ More
We study percolation on networks, which is used as a model of the resilience of networked systems such as the Internet to attack or failure and as a simple model of the spread of disease over human contact networks. We reformulate percolation as a message passing process and demonstrate how the resulting equations can be used to calculate, among other things, the size of the percolating cluster and the average cluster size. The calculations are exact for sparse networks when the number of short loops in the network is small, but even on networks with many short loops we find them to be highly accurate when compared with direct numerical simulations. By considering the fixed points of the message passing process, we also show that the percolation threshold on a network with few loops is given by the inverse of the leading eigenvalue of the so-called non-backtracking matrix.
△ Less
Submitted 7 October, 2014; v1 submitted 2 May, 2014;
originally announced May 2014.