Search | arXiv e-print repository

The CLRS-Text Algorithmic Reasoning Language Benchmark

Authors: Larisa Markeeva, Sean McLeish, Borja Ibarz, Wilfried Bounsi, Olga Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild, Petar Veličković

Abstract: Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress.… ▽ More Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress. Three years ago, a similar issue was identified and rectified in the field of neural algorithmic reasoning, with the advent of the CLRS benchmark. CLRS is a dataset generator comprising graph execution traces of classical algorithms from the Introduction to Algorithms textbook. Inspired by this, we propose CLRS-Text -- a textual version of these algorithmic traces. Out of the box, CLRS-Text is capable of procedurally generating trace data for thirty diverse, challenging algorithmic tasks across any desirable input distribution, while offering a standard pipeline in which any additional algorithmic tasks may be created in the benchmark. We fine-tune and evaluate various LMs as generalist executors on this benchmark, validating prior work and revealing a novel, interesting challenge for the LM reasoning community. Our code is available at https://github.com/google-deepmind/clrs/tree/master/clrs/_src/clrs_text. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Preprint, under review. Comments welcome

arXiv:2406.02362 [pdf, other]

Temporal Graph Rewiring with Expander Graphs

Authors: Katarina Petrović, Shenyang Huang, Farimah Poursafaei, Petar Veličković

Abstract: Evolving relations in real-world networks are often modelled by temporal graphs. Graph rewiring techniques have been utilised on Graph Neural Networks (GNNs) to improve expressiveness and increase model performance. In this work, we propose Temporal Graph Rewiring (TGR), the first approach for graph rewiring on temporal graphs. TGR enables communication between temporally distant nodes in a contin… ▽ More Evolving relations in real-world networks are often modelled by temporal graphs. Graph rewiring techniques have been utilised on Graph Neural Networks (GNNs) to improve expressiveness and increase model performance. In this work, we propose Temporal Graph Rewiring (TGR), the first approach for graph rewiring on temporal graphs. TGR enables communication between temporally distant nodes in a continuous time dynamic graph by utilising expander graph propagation to construct a message passing highway for message passing between distant nodes. Expander graphs are suitable candidates for rewiring as they help overcome the oversquashing problem often observed in GNNs. On the public tgbl-wiki benchmark, we show that TGR improves the performance of a widely used TGN model by a significant margin. Our code repository is accessible at https://github.com/kpetrovicc/TGR.git . △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 10 pages, 2 figures

arXiv:2402.15332 [pdf, ps, other]

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

Authors: Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković

Abstract: We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the univers… ▽ More We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory. △ Less

Submitted 5 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: To appear in ICML 2024. Comments welcome. More info at categoricaldeeplearning.com

arXiv:2402.08871 [pdf, other]

Position: Topological Deep Learning is the New Frontier for Relational Learning

Authors: Theodore Papamarkou, Tolga Birdal, Michael Bronstein, Gunnar Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Liò, Paolo Di Lorenzo, Vasileios Maroulas, Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guo-Wei Wei, Ghada Zamzmi

Abstract: Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning setting… ▽ More Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field. △ Less

Submitted 6 August, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2310.10553 [pdf, other]

TacticAI: an AI assistant for football tactics

Authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

Abstract: Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing co… ▽ More Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning. △ Less

Submitted 17 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 32 pages, 10 figures

arXiv:2307.08874 [pdf, other]

Latent Space Representations of Neural Algorithmic Reasoners

Authors: Vladimir V. Mirjanić, Razvan Pascanu, Petar Veličković

Abstract: Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work… ▽ More Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at https://github.com/mirjanic/nar-latent-spaces △ Less

Submitted 29 April, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 24 pages, 19 figures; Accepted at the Second Learning on Graphs Conference (LoG 2023); updated layout, reorganized content, added journal reference

Journal ref: PMLR 231:10:1-10:24, 2024

arXiv:2306.03589 [pdf, other]

How does over-squashing affect the power of GNNs?

Authors: Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, Petar Veličković

Abstract: Graph Neural Networks (GNNs) are the state-of-the-art model for machine learning on graph-structured data. The most popular class of GNNs operate by exchanging information between adjacent nodes, and are known as Message Passing Neural Networks (MPNNs). Given their widespread use, understanding the expressive power of MPNNs is a key question. However, existing results typically consider settings w… ▽ More Graph Neural Networks (GNNs) are the state-of-the-art model for machine learning on graph-structured data. The most popular class of GNNs operate by exchanging information between adjacent nodes, and are known as Message Passing Neural Networks (MPNNs). Given their widespread use, understanding the expressive power of MPNNs is a key question. However, existing results typically consider settings with uninformative node features. In this paper, we provide a rigorous analysis to determine which function classes of node features can be learned by an MPNN of a given capacity. We do so by measuring the level of pairwise interactions between nodes that MPNNs allow for. This measure provides a novel quantitative characterization of the so-called over-squashing effect, which is observed to occur when a large volume of messages is aggregated into fixed-size vectors. Using our measure, we prove that, to guarantee sufficient communication between pairs of nodes, the capacity of the MPNN must be large enough, depending on properties of the input graph structure, such as commute times. For many relevant scenarios, our analysis results in impossibility statements in practice, showing that over-squashing hinders the expressive power of MPNNs. We validate our theoretical findings through extensive controlled experiments and ablation studies. △ Less

Submitted 12 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 36 pages; Published in Transactions on Machine Learning Research (TMLR)

arXiv:2302.10258 [pdf, other]

Neural Algorithmic Reasoning with Causal Regularisation

Authors: Beatrice Bevilacqua, Kyriacos Nikiforou, Borja Ibarz, Ioana Bica, Michela Paganini, Charles Blundell, Jovana Mitrovic, Petar Veličković

Abstract: Recent work on neural algorithmic reasoning has investigated the reasoning capabilities of neural networks, effectively demonstrating they can learn to execute classical algorithms on unseen data coming from the train distribution. However, the performance of existing neural reasoners significantly degrades on out-of-distribution (OOD) test data, where inputs have larger sizes. In this work, we ma… ▽ More Recent work on neural algorithmic reasoning has investigated the reasoning capabilities of neural networks, effectively demonstrating they can learn to execute classical algorithms on unseen data coming from the train distribution. However, the performance of existing neural reasoners significantly degrades on out-of-distribution (OOD) test data, where inputs have larger sizes. In this work, we make an important observation: there are many different inputs for which an algorithm will perform certain intermediate computations identically. This insight allows us to develop data augmentation procedures that, given an algorithm's intermediate trajectory, produce inputs for which the target algorithm would have exactly the same next trajectory step. We ensure invariance in the next-step prediction across such inputs, by employing a self-supervised objective derived by our observation, formalised in a causal graph. We prove that the resulting method, which we call Hint-ReLIC, improves the OOD generalisation capabilities of the reasoner. We evaluate our method on the CLRS algorithmic reasoning benchmark, where we show up to 3$\times$ improvements on the OOD test data. △ Less

Submitted 3 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: ICML 2023, Camera Ready; 17 pages, 7 figures

arXiv:2301.08210 [pdf, other]

doi 10.1016/j.sbi.2023.102538

Everything is Connected: Graph Neural Networks

Authors: Petar Veličković

Abstract: In many ways, graphs are the main modality of data we receive from nature. This is due to the fact that most of the patterns we see, both in natural and artificial systems, are elegantly representable using the language of graph structures. Prominent examples include molecules (represented as graphs of atoms and bonds), social networks and transportation networks. This potential has already been s… ▽ More In many ways, graphs are the main modality of data we receive from nature. This is due to the fact that most of the patterns we see, both in natural and artificial systems, are elegantly representable using the language of graph structures. Prominent examples include molecules (represented as graphs of atoms and bonds), social networks and transportation networks. This potential has already been seen by key scientific and industrial groups, with already-impacted application areas including traffic forecasting, drug discovery, social network analysis and recommender systems. Further, some of the most successful domains of application for machine learning in previous years -- images, text and speech processing -- can be seen as special cases of graph representation learning, and consequently there has been significant exchange of information between these areas. The main aim of this short survey is to enable the reader to assimilate the key concepts in the area, and position graph representation learning in a proper context with related fields. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: To appear in Current Opinion in Structural Biology. 14 pages, 1 figure

Journal ref: Current Opinion in Structural Biology Volume 79, April 2023, 102538

arXiv:2212.08541 [pdf, other]

Learnable Commutative Monoids for Graph Neural Networks

Authors: Euan Ong, Petar Veličković

Abstract: Graph neural networks (GNNs) have been shown to be highly sensitive to the choice of aggregation function. While summing over a node's neighbours can approximate any permutation-invariant function over discrete inputs, Cohen-Karlik et al. [2020] proved there are set-aggregation problems for which summing cannot generalise to unbounded inputs, proposing recurrent neural networks regularised towards… ▽ More Graph neural networks (GNNs) have been shown to be highly sensitive to the choice of aggregation function. While summing over a node's neighbours can approximate any permutation-invariant function over discrete inputs, Cohen-Karlik et al. [2020] proved there are set-aggregation problems for which summing cannot generalise to unbounded inputs, proposing recurrent neural networks regularised towards permutation-invariance as a more expressive aggregator. We show that these results carry over to the graph domain: GNNs equipped with recurrent aggregators are competitive with state-of-the-art permutation-invariant aggregators, on both synthetic benchmarks and real-world problems. However, despite the benefits of recurrent aggregators, their $O(V)$ depth makes them both difficult to parallelise and harder to train on large graphs. Inspired by the observation that a well-behaved aggregator for a GNN is a commutative monoid over its latent space, we propose a framework for constructing learnable, commutative, associative binary operators. And with this, we construct an aggregator of $O(\log V)$ depth, yielding exponential improvements for both parallelism and dependency length while achieving performance competitive with recurrent aggregators. Based on our empirical observations, our proposed learnable commutative monoid (LCM) aggregator represents a favourable tradeoff between efficient and expressive aggregators. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: Accepted to the proceedings of the First Learning on Graphs Conference (LoG 2022)

arXiv:2210.02997 [pdf, other]

Expander Graph Propagation

Authors: Andreea Deac, Marc Lackenby, Petar Veličković

Abstract: Deploying graph neural networks (GNNs) on whole-graph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while… ▽ More Deploying graph neural networks (GNNs) on whole-graph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We leverage an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant graph classification datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. To the best of our knowledge, this is a previously unstudied result in the context of graph representation learning, and we believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs. △ Less

Submitted 21 December, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: Presented at LoG 2022. Best Paper Award at the NeurIPS 2022 Workshop on New Frontiers in Graph Learning (GLFrontiers). 18 pages, 1 figure

arXiv:2209.11142 [pdf, other]

A Generalist Neural Algorithmic Learner

Authors: Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković

Abstract: The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms… ▽ More The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models. △ Less

Submitted 3 December, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: To appear at LoG 2022 (Spotlight talk). 23 pages, 11 figures

arXiv:2206.11941 [pdf, other]

Affinity-Aware Graph Networks

Authors: Ameya Velingker, Ali Kemal Sinop, Ira Ktena, Petar Veličković, Sreenivas Gollapudi

Abstract: Graph Neural Networks (GNNs) have emerged as a powerful technique for learning on relational data. Owing to the relatively limited number of message passing steps they perform -- and hence a smaller receptive field -- there has been significant interest in improving their expressivity by incorporating structural aspects of the underlying graph. In this paper, we explore the use of affinity measure… ▽ More Graph Neural Networks (GNNs) have emerged as a powerful technique for learning on relational data. Owing to the relatively limited number of message passing steps they perform -- and hence a smaller receptive field -- there has been significant interest in improving their expressivity by incorporating structural aspects of the underlying graph. In this paper, we explore the use of affinity measures as features in graph neural networks, in particular measures arising from random walks, including effective resistance, hitting and commute times. We propose message passing networks based on these features and evaluate their performance on a variety of node and graph property prediction tasks. Our architecture has lower computational complexity, while our features are invariant to the permutations of the underlying graph. The measures we compute allow the network to exploit the connectivity properties of the graph, thereby allowing us to outperform relevant benchmarks for a wide variety of tasks, often with significantly fewer message passing steps. On one of the largest publicly available graph regression datasets, OGB-LSC-PCQM4Mv1, we obtain the best known single-model validation MAE at the time of writing. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2205.15659 [pdf, other]

The CLRS Algorithmic Reasoning Benchmark

Authors: Petar Veličković, Adrià Puigdomènech Badia, David Budden, Razvan Pascanu, Andrea Banino, Misha Dashevskiy, Raia Hadsell, Charles Blundell

Abstract: Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate… ▽ More Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate specific hypotheses, making results hard to transfer across publications, and increasing the barrier of entry. To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms. We perform extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges. Our library is readily available at https://github.com/deepmind/clrs. △ Less

Submitted 4 June, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: To appear in ICML 2022. 19 pages, 4 figures

arXiv:2203.15544 [pdf, other]

Graph Neural Networks are Dynamic Programmers

Authors: Andrew Dudzik, Petar Veličković

Abstract: Recent advances in neural algorithmic reasoning with graph neural networks (GNNs) are propped up by the notion of algorithmic alignment. Broadly, a neural network will be better at learning to execute a reasoning task (in terms of sample complexity) if its individual components align well with the target algorithm. Specifically, GNNs are claimed to align with dynamic programming (DP), a general pr… ▽ More Recent advances in neural algorithmic reasoning with graph neural networks (GNNs) are propped up by the notion of algorithmic alignment. Broadly, a neural network will be better at learning to execute a reasoning task (in terms of sample complexity) if its individual components align well with the target algorithm. Specifically, GNNs are claimed to align with dynamic programming (DP), a general problem-solving strategy which expresses many polynomial-time algorithms. However, has this alignment truly been demonstrated and theoretically quantified? Here we show, using methods from category theory and abstract algebra, that there exists an intricate connection between GNNs and DP, going well beyond the initial observations over individual algorithms such as Bellman-Ford. Exposing this connection, we easily verify several prior findings in the literature, produce better-grounded GNN architectures for edge-centric tasks, and demonstrate empirical results on the CLRS algorithmic reasoning benchmark. We hope our exposition will serve as a foundation for building stronger algorithmically aligned GNNs. △ Less

Submitted 10 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: To appear at NeurIPS 2022. 18 pages, 4 figures

arXiv:2202.11097 [pdf, other]

Message passing all the way up

Authors: Petar Veličković

Abstract: The message passing framework is the foundation of the immense success enjoyed by graph neural networks (GNNs) in recent years. In spite of its elegance, there exist many problems it provably cannot solve over given input graphs. This has led to a surge of research on going "beyond message passing", building GNNs which do not suffer from those limitations -- a term which has become ubiquitous in r… ▽ More The message passing framework is the foundation of the immense success enjoyed by graph neural networks (GNNs) in recent years. In spite of its elegance, there exist many problems it provably cannot solve over given input graphs. This has led to a surge of research on going "beyond message passing", building GNNs which do not suffer from those limitations -- a term which has become ubiquitous in regular discourse. However, have those methods truly moved beyond message passing? In this position paper, I argue about the dangers of using this term -- especially when teaching graph representation learning to newcomers. I show that any function of interest we want to compute over graphs can, in all likelihood, be expressed using pairwise message passing -- just over a potentially modified graph, and argue how most practical implementations subtly do this kind of trick anyway. Hoping to initiate a productive discussion, I propose replacing "beyond message passing" with a more tame term, "augmented message passing". △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: 10 pages, 3 figures

arXiv:2110.05442 [pdf, other]

Neural Algorithmic Reasoners are Implicit Planners

Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Abstract: Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such… ▽ More Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect. This effect is caused by explicitly running the planning algorithm based on scalar predictions in every state, which can be harmful to data efficiency if such scalars are improperly predicted. We propose eXecuted Latent Value Iteration Networks (XLVINs), which alleviate the above limitations. Our method performs all planning computations in a high-dimensional latent space, breaking the algorithmic bottleneck. It maintains alignment with value iteration by carefully leveraging neural graph-algorithmic reasoning and contrastive self-supervised learning. Across eight low-data settings -- including classical control, navigation and Atari -- XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: To appear at NeurIPS 2021 (Spotlight talk). 20 pages, 10 figures. arXiv admin note: text overlap with arXiv:2010.13146

arXiv:2109.04173 [pdf, other]

Relating Graph Neural Networks to Structural Causal Models

Authors: Matej Zečević, Devendra Singh Dhami, Petar Veličković, Kristian Kersting

Abstract: Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries leveraging the exposed. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for ca… ▽ More Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries leveraging the exposed. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a more general view on neural-causal models, revealing several novel connections between GNN and SCM. We establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs. △ Less

Submitted 22 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: Main paper: 12 pages, References: 2 pages, Appendix: 13 pages; Main paper: 4 figures, Appendix: 2 figures

arXiv:2107.09422 [pdf, other]

Large-scale graph representation learning with very deep GNNs and self-supervision

Authors: Ravichandra Addanki, Peter W. Battaglia, David Budden, Andreea Deac, Jonathan Godwin, Thomas Keck, Wai Lok Sibon Li, Alvaro Sanchez-Gonzalez, Jacklynn Stott, Shantanu Thakoor, Petar Veličković

Abstract: Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes -- a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC wi… ▽ More Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes -- a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC with two large-scale GNNs: a deep transductive node classifier powered by bootstrapping, and a very deep (up to 50-layer) inductive graph regressor regularised by denoising objectives. Our models achieved an award-level (top-3) performance on both the MAG240M and PCQM4M benchmarks. In doing so, we demonstrate evidence of scalable self-supervised graph representation learning, and utility of very deep GNNs -- both very important open issues. Our code is publicly available at: https://github.com/deepmind/deepmind-research/tree/master/ogb_lsc. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: To appear at KDD Cup 2021. 13 pages, 3 figures. All authors contributed equally

arXiv:2107.08881 [pdf, other]

Reasoning-Modulated Representations

Authors: Petar Veličković, Matko Bošnjak, Thomas Kipf, Alexander Lerchner, Raia Hadsell, Razvan Pascanu, Charles Blundell

Abstract: Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that… ▽ More Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that any "tabula rasa" neural network would need to re-learn from scratch, penalising performance. We incorporate this information into a pre-trained reasoning module, and investigate its role in shaping the discovered representations in diverse self-supervised learning settings from pixels. Our approach paves the way for a new class of representation learning, grounded in algorithmic priors. △ Less

Submitted 3 December, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: To appear at LoG 2022. 17 pages, 5 figures

arXiv:2105.02761 [pdf, other]

doi 10.1016/j.patter.2021.100273

Neural Algorithmic Reasoning

Authors: Petar Veličković, Charles Blundell

Abstract: Algorithms have been fundamental to recent global technological advances and, in particular, they have been the cornerstone of technical advances in one field rapidly being applied to another. We argue that algorithms possess fundamentally different qualities to deep learning methods, and this strongly suggests that, were deep learning methods better able to mimic algorithms, generalisation of the… ▽ More Algorithms have been fundamental to recent global technological advances and, in particular, they have been the cornerstone of technical advances in one field rapidly being applied to another. We argue that algorithms possess fundamentally different qualities to deep learning methods, and this strongly suggests that, were deep learning methods better able to mimic algorithms, generalisation of the sort seen with algorithms would become possible with deep learning -- something far out of the reach of current machine learning methods. Furthermore, by representing elements in a continuous space of learnt algorithms, neural networks are able to adapt known algorithms more closely to real-world problems, potentially finding more efficient and pragmatic solutions than those proposed by human computer scientists. Here we present neural algorithmic reasoning -- the art of building neural networks that are able to execute algorithmic computation -- and provide our opinion on its transformative potential for running classical algorithms on inputs previously considered inaccessible to them. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: Accepted as an Opinion paper in Patterns. 7 pages, 1 figure

arXiv:2104.13478 [pdf, other]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Authors: Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković

Abstract: The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simpl… ▽ More The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented. △ Less

Submitted 2 May, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: 156 pages. Work in progress -- comments welcome!

arXiv:2103.01043 [pdf, other]

Persistent Message Passing

Authors: Heiko Strathmann, Mohammadamin Barekatain, Charles Blundell, Petar Veličković

Abstract: Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependen… ▽ More Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependent on previous states. This requires tracking the data structure's evolution through time, placing significant pressure on the GNN's latent representations. We introduce Persistent Message Passing (PMP), a mechanism which endows GNNs with capability of querying past state by explicitly persisting it: rather than overwriting node representations, it creates new nodes whenever required. PMP generalises out-of-distribution to more than 2x larger test inputs on dynamic temporal range queries, significantly outperforming GNNs which overwrite states. △ Less

Submitted 27 April, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 7 pages, 2 figures. Published as a workshop paper at ICLR 2021 SimDL Workshop. Accepted at the ICLR 2021 Workshop on Geometrical and Topological Representation Learning

arXiv:2102.09544 [pdf, ps, other]

Combinatorial optimization and reasoning with graph neural networks

Authors: Quentin Cappart, Didier Chételat, Elias Khalil, Andrea Lodi, Christopher Morris, Petar Veličković

Abstract: Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building bloc… ▽ More Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building block for combinatorial tasks, either directly as solvers or by enhancing exact solvers. The inductive bias of GNNs effectively encodes combinatorial and relational input due to their invariance to permutations and awareness of input sparsity. This paper presents a conceptual review of recent key advancements in this emerging field, aiming at optimization and machine learning researchers. △ Less

Submitted 23 September, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

Journal ref: Journal of Machine Learning Research, 24(130):1-61, 2023

arXiv:2102.06514 [pdf, other]

Large-Scale Representation Learning on Graphs via Bootstrapping

Authors: Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

Abstract: Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootst… ▽ More Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input. BGRL uses only simple augmentations and alleviates the need for contrasting with negative examples, and is thus scalable by design. BGRL outperforms or matches prior methods on several established benchmarks, while achieving a 2-10x reduction in memory costs. Furthermore, we show that BGRL can be scaled up to extremely large graphs with hundreds of millions of nodes in the semi-supervised regime - achieving state-of-the-art performance and improving over supervised baselines where representations are shaped only through label information. In particular, our solution centered on BGRL constituted one of the winning entries to the Open Graph Benchmark - Large Scale Challenge at KDD Cup 2021, on a graph orders of magnitudes larger than all previously available benchmarks, thus demonstrating the scalability and effectiveness of our approach. △ Less

Submitted 20 February, 2023; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: Published as a conference paper at ICLR 2022

arXiv:2010.13146 [pdf, other]

XLVIN: eXecuted Latent Value Iteration Nets

Authors: Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Abstract: Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not incentivised in any way to perform meaningful planning computations, the underlying s… ▽ More Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not incentivised in any way to perform meaningful planning computations, the underlying state space is assumed to be discrete, and the Markov decision process (MDP) is assumed fixed and known. We propose eXecuted Latent Value Iteration Networks (XLVINs), which combine recent developments across contrastive self-supervised learning, graph representation learning and neural algorithmic reasoning to alleviate all of the above limitations, successfully deploying VIN-style models on generic environments. XLVINs match the performance of VIN-like models when the underlying MDP is discrete, fixed and known, and provides significant improvements to model-free baselines across three general MDP setups. △ Less

Submitted 6 December, 2020; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020 Deep Reinforcement Learning Workshop

arXiv:2007.12804 [pdf, other]

Hierarchical Protein Function Prediction with Tail-GNNs

Authors: Stefan Spalević, Petar Veličković, Jovana Kovačević, Mladen Nikolić

Abstract: Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to th… ▽ More Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose Tail-GNNs, neural networks which naturally compose with the output space of any neural network for multi-task prediction, to provide relationally-reinforced labels. For protein function prediction, we combine a Tail-GNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of Tail-GNNs to learn useful representations of labels and exploit them in real-world problem solving. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Journal ref: ICML 2020 - Workshops - Graph RepresGraph Representation Learning and Beyond (GRL+)

arXiv:2006.06380 [pdf, other]

Pointer Graph Networks

Authors: Petar Veličković, Lars Buesing, Matthew C. Overlan, Razvan Pascanu, Oriol Vinyals, Charles Blundell

Abstract: Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due… ▽ More Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model generalisation ability. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets. △ Less

Submitted 18 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: To appear at NeurIPS 2020 (Spotlight talk)

arXiv:2004.05718 [pdf, other]

Principal Neighbourhood Aggregation for Graph Nets

Authors: Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, Petar Veličković

Abstract: Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstr… ▽ More Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models. △ Less

Submitted 31 December, 2020; v1 submitted 12 April, 2020; originally announced April 2020.

Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:1910.10593 [pdf, other]

Neural Execution of Graph Algorithms

Authors: Petar Veličković, Rex Ying, Matilde Padovano, Raia Hadsell, Charles Blundell

Abstract: Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art G… ▽ More Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first search, Bellman-Ford) as well as sequential (Prim's algorithm). As graph algorithms usually rely on making discrete decisions within neighbourhoods, we hypothesise that maximisation-based message passing neural networks are best-suited for such objectives, and validate this claim empirically. We also demonstrate how learning in the space of algorithms can yield new opportunities for positive transfer between tasks---showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm. △ Less

Submitted 15 January, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: To appear at ICLR 2020. 13 pages, 4 figures

arXiv:1905.00534 [pdf, other]

Drug-Drug Adverse Effect Prediction with Graph Co-Attention

Authors: Andreea Deac, Yu-Hsiang Huang, Petar Veličković, Pietro Liò, Jian Tang

Abstract: Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in th… ▽ More Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in the US now) and it is thus of high interest to be able to predict the potential side effects as early as possible. Systematic combinatorial screening of possible drug-drug interactions (DDI) is challenging and expensive. However, the recent significant increases in data availability from pharmaceutical research and development efforts offer a novel paradigm for recovering relevant insights for DDI prediction. Accordingly, several recent approaches focus on curating massive DDI datasets (with millions of examples) and training machine learning models on them. Here we propose a neural network architecture able to set state-of-the-art results on this task---using the type of the side-effect and the molecular structure of the drugs alone---by leveraging a co-attentional mechanism. In particular, we show the importance of integrating joint information from the drug pairs early on when learning each drug's representation. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: 8 pages, 5 figures

arXiv:1904.06316 [pdf, other]

Spatio-Temporal Deep Graph Infomax

Authors: Felix L. Opolka, Aaron Solomon, Cătălina Cangea, Petar Veličković, Pietro Liò, R Devon Hjelm

Abstract: Spatio-temporal graphs such as traffic networks or gene regulatory systems present challenges for the existing deep learning methods due to the complexity of structural changes over time. To address these issues, we introduce Spatio-Temporal Deep Graph Infomax (STDGI)---a fully unsupervised node representation learning approach based on mutual information maximization that exploits both the tempor… ▽ More Spatio-temporal graphs such as traffic networks or gene regulatory systems present challenges for the existing deep learning methods due to the complexity of structural changes over time. To address these issues, we introduce Spatio-Temporal Deep Graph Infomax (STDGI)---a fully unsupervised node representation learning approach based on mutual information maximization that exploits both the temporal and spatial dynamics of the graph. Our model tackles the challenging task of node-level regression by training embeddings to maximize the mutual information between patches of the graph, at any given time step, and between features of the central nodes of patches, in the future. We demonstrate through experiments and qualitative studies that the learned representations can successfully encode relevant information about the input graph and improve the predictive performance of spatio-temporal auto-regressive forecasting models. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: 6 pages, 2 figures, Representation Learning on Graphs and Manifolds Workshop of the International Conference on Learning Representations (ICLR)

arXiv:1901.03906 [pdf, other]

doi 10.1371/journal.pone.0228962

ChronoMID - Cross-Modal Neural Networks for 3-D Temporal Medical Imaging Data

Authors: Alexander G. Rakowski, Petar Veličković, Enrico Dall'Ara, Pietro Liò

Abstract: ChronoMID builds on the success of cross-modal convolutional neural networks (X-CNNs), making the novel application of the technique to medical imaging data. Specifically, this paper presents and compares alternative approaches - timestamps and difference images - to incorporate temporal information for the classification of bone disease in mice, applied to micro-CT scans of mouse tibiae. Whilst m… ▽ More ChronoMID builds on the success of cross-modal convolutional neural networks (X-CNNs), making the novel application of the technique to medical imaging data. Specifically, this paper presents and compares alternative approaches - timestamps and difference images - to incorporate temporal information for the classification of bone disease in mice, applied to micro-CT scans of mouse tibiae. Whilst much previous work on diseases and disease classification has been based on mathematical models incorporating domain expertise and the explicit encoding of assumptions, the approaches given here utilise the growing availability of computing resources to analyse large datasets and uncover subtle patterns in both space and time. After training on a balanced set of over 75000 images, all models incorporating temporal features outperformed a state-of-the-art CNN baseline on an unseen, balanced validation set comprising over 20000 images. The top-performing model achieved 99.54% accuracy, compared to 73.02% for the CNN baseline. △ Less

Submitted 12 January, 2019; originally announced January 2019.

Comments: 8 pages, 7 figures, 2 tables

ACM Class: J.3; I.4.0; I.2.6; G.3

arXiv:1811.01287 [pdf, other]

Towards Sparse Hierarchical Graph Classifiers

Authors: Cătălina Cangea, Petar Veličković, Nikola Jovanović, Thomas Kipf, Pietro Liò

Abstract: Recent advances in representation learning on graphs, mainly leveraging graph convolutional networks, have brought a substantial improvement on many graph-based benchmark tasks. While novel approaches to learning node embeddings are highly suitable for node classification and link prediction, their application to graph classification (predicting a single label for the entire graph) remains mostly… ▽ More Recent advances in representation learning on graphs, mainly leveraging graph convolutional networks, have brought a substantial improvement on many graph-based benchmark tasks. While novel approaches to learning node embeddings are highly suitable for node classification and link prediction, their application to graph classification (predicting a single label for the entire graph) remains mostly rudimentary, typically using a single global pooling step to aggregate node features or a hand-designed, fixed heuristic for hierarchical coarsening of the graph structure. An important step towards ameliorating this is differentiable graph coarsening---the ability to reduce the size of the graph in an adaptive, data-dependent manner within a graph neural network pipeline, analogous to image downsampling within CNNs. However, the previous prominent approach to pooling has quadratic memory requirements during training and is therefore not scalable to large graphs. Here we combine several recent advances in graph neural network design to demonstrate that competitive hierarchical graph classification results are possible without sacrificing sparsity. Our results are verified on several established graph classification benchmarks, and highlight an important direction for future research in graph-based neural networks. △ Less

Submitted 3 November, 2018; originally announced November 2018.

Comments: To appear in the Workshop on Relational Representation Learning (R2L) at NIPS 2018. 6 pages, 3 figures

arXiv:1809.10341 [pdf, other]

Deep Graph Infomax

Authors: Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm

Abstract: We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs ce… ▽ More We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning. △ Less

Submitted 21 December, 2018; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: To appear at ICLR 2019. 17 pages, 8 figures

arXiv:1806.04398 [pdf, other]

doi 10.1089/cmb.2018.0175

Attentive cross-modal paratope prediction

Authors: Andreea Deac, Petar Veličković, Pietro Sormanni

Abstract: Antibodies are a critical part of the immune system, having the function of directly neutralising or tagging undesirable objects (the antigens) for future destruction. Being able to predict which amino acids belong to the paratope, the region on the antibody which binds to the antigen, can facilitate antibody design and contribute to the development of personalised medicine. The suitability of dee… ▽ More Antibodies are a critical part of the immune system, having the function of directly neutralising or tagging undesirable objects (the antigens) for future destruction. Being able to predict which amino acids belong to the paratope, the region on the antibody which binds to the antigen, can facilitate antibody design and contribute to the development of personalised medicine. The suitability of deep neural networks has recently been confirmed for this task, with Parapred outperforming all prior physical models. Our contribution is twofold: first, we significantly outperform the computational efficiency of Parapred by leveraging à trous convolutions and self-attention. Secondly, we implement cross-modal attention by allowing the antibody residues to attend over antigen residues. This leads to new state-of-the-art results on this task, along with insightful interpretations. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: To appear at the 2018 ICML/IJCAI Workshop on Computational Biology. 5 pages, 6 figures

arXiv:1805.00987 [pdf, other]

doi 10.1007/978-3-319-92537-0_7

Automatic Inference of Cross-modal Connection Topologies for X-CNNs

Authors: Laurynas Karazija, Petar Veličković, Pietro Liò

Abstract: This paper introduces a way to learn cross-modal convolutional neural network (X-CNN) architectures from a base convolutional network (CNN) and the training data to reduce the design cost and enable applying cross-modal networks in sparse data environments. Two approaches for building X-CNNs are presented. The base approach learns the topology in a data-driven manner, by using measurements perform… ▽ More This paper introduces a way to learn cross-modal convolutional neural network (X-CNN) architectures from a base convolutional network (CNN) and the training data to reduce the design cost and enable applying cross-modal networks in sparse data environments. Two approaches for building X-CNNs are presented. The base approach learns the topology in a data-driven manner, by using measurements performed on the base CNN and supplied data. The iterative approach performs further optimisation of the topology through a combined learning procedure, simultaneously learning the topology and training the network. The approaches were evaluated agains examples of hand-designed X-CNNs and their base variants, showing superior performance and, in some cases, gaining an additional 9% of accuracy. From further considerations, we conclude that the presented methodology takes less time than any manual approach would, whilst also significantly reducing the design complexity. The application of the methods is fully automated and implemented in Xsertion library. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: 10 pages, 3 figures, 2 tables, to appear in ISNN 2018

arXiv:1711.09159 [pdf, ps, other]

Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders

Authors: Momchil Peychev, Petar Veličković, Pietro Liò

Abstract: The notion of disentangled autoencoders was proposed as an extension to the variational autoencoder by introducing a disentanglement parameter $β$, controlling the learning pressure put on the possible underlying latent representations. For certain values of $β$ this kind of autoencoders is capable of encoding independent input generative factors in separate elements of the code, leading to a more… ▽ More The notion of disentangled autoencoders was proposed as an extension to the variational autoencoder by introducing a disentanglement parameter $β$, controlling the learning pressure put on the possible underlying latent representations. For certain values of $β$ this kind of autoencoders is capable of encoding independent input generative factors in separate elements of the code, leading to a more interpretable and predictable model behaviour. In this paper we quantify the effects of the parameter $β$ on the model performance and disentanglement. After training multiple models with the same value of $β$, we establish the existence of consistent variance in one of the disentanglement measures, proposed in literature. The negative consequences of the disentanglement to the autoencoder's discriminative ability are also asserted while varying the amount of examples available during training. △ Less

Submitted 24 November, 2017; originally announced November 2017.

Comments: Accepted to the Workshop on Learning Disentangled Representations at the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), 5 pages, 2 figures

arXiv:1710.10903 [pdf, other]

Graph Attention Networks

Authors: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

Abstract: We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights t… ▽ More We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training). △ Less

Submitted 4 February, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: To appear at ICLR 2018. 12 pages, 2 figures

arXiv:1709.08073 [pdf, other]

doi 10.1145/3240925.3240937

Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Time-series Data

Authors: Petar Veličković, Laurynas Karazija, Nicholas D. Lane, Sourav Bhattacharya, Edgar Liberis, Pietro Liò, Angela Chieh, Otmane Bellahsen, Matthieu Vegreville

Abstract: We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter eff… ▽ More We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model's predictions, revealing implications about latent variables in this task. △ Less

Submitted 29 November, 2017; v1 submitted 23 September, 2017; originally announced September 2017.

Comments: To appear in NIPS ML4H 2017 and NIPS TSW 2017

arXiv:1709.00572 [pdf, other]

doi 10.1109/TNNLS.2019.2945992

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

Authors: Cătălina Cangea, Petar Veličković, Pietro Liò

Abstract: In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We p… ▽ More In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We propose two deep learning architectures with multimodal cross-connections that allow for dataflow between several feature extractors (XFlow). Our models derive more interpretable features and achieve better performances than models which do not exchange representations, usefully exploiting correlations between audio and visual data, which have a different dimensionality and are nontrivially exchangeable. Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data. Illustrating some of the representations learned by the connections, we analyse their contribution to the increase in discrimination ability and reveal their compatibility with a lip-reading network intermediate representation. We provide the research community with Digits, a new dataset consisting of three data types extracted from videos of people saying the digits 0-9. Results show that both cross-modal architectures outperform their baselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits datasets, achieving state-of-the-art results. △ Less

Submitted 12 April, 2019; v1 submitted 2 September, 2017; originally announced September 2017.

Comments: Accepted at the IEEE ICDL-EPIROB 2017 Workshop on Computational Models for Crossmodal Learning (CMCML), 4 pages, 6 figures

arXiv:1610.00163 [pdf, other]

doi 10.1109/SSCI.2016.7849978

X-CNN: Cross-modal Convolutional Neural Networks for Sparse Datasets

Authors: Petar Veličković, Duo Wang, Nicholas D. Lane, Pietro Liò

Abstract: In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already we… ▽ More In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already well-established concept of neural network ensembles (where information typically may flow only between the output layers of the individual networks). The constituent networks are individually designed to learn the output function on their own subset of the input data, after which cross-connections between them are introduced after each pooling operation to periodically allow for information exchange between them. This injection of knowledge into a model (by prior partition of the input data through domain knowledge or unsupervised methods) is expected to yield greatest returns in sparse data environments, which are typically less suitable for training CNNs. For evaluation purposes, we have compared a standard four-layer CNN as well as a sophisticated FitNet4 architecture against their cross-modal variants on the CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data being removed, and find that at lower levels of data availability, the X-CNNs significantly outperform their baselines (typically providing a 2--6% benefit, depending on the dataset size and whether data augmentation is used), while still maintaining an edge on all of the full dataset tests. △ Less

Submitted 17 October, 2016; v1 submitted 1 October, 2016; originally announced October 2016.

Comments: To appear in the 7th IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2016), 8 pages, 6 figures. Minor revisions, in response to reviewers' comments

Showing 1–42 of 42 results for author: Veličković, P