Search | arXiv e-print repository

STAGE: Simplified Text-Attributed Graph Embeddings Using Pre-trained LLMs

Authors: Aaron Zolnai-Lucas, Jack Boylan, Chris Hokamp, Parsa Ghaffari

Abstract: We present Simplified Text-Attributed Graph Embeddings (STAGE), a straightforward yet effective method for enhancing node features in Graph Neural Network (GNN) models that encode Text-Attributed Graphs (TAGs). Our approach leverages Large-Language Models (LLMs) to generate embeddings for textual attributes. STAGE achieves competitive results on various node classification benchmarks while also ma… ▽ More We present Simplified Text-Attributed Graph Embeddings (STAGE), a straightforward yet effective method for enhancing node features in Graph Neural Network (GNN) models that encode Text-Attributed Graphs (TAGs). Our approach leverages Large-Language Models (LLMs) to generate embeddings for textual attributes. STAGE achieves competitive results on various node classification benchmarks while also maintaining a simplicity in implementation relative to current state-of-the-art (SoTA) techniques. We show that utilizing pre-trained LLMs as embedding generators provides robust features for ensemble GNN training, enabling pipelines that are simpler than current SoTA approaches which require multiple expensive training and prompting stages. We also implement diffusion-pattern GNNs in an effort to make this pipeline scalable to graphs beyond academic benchmarks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2405.03726 [pdf]

sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures

Authors: Andac Demir, Elizaveta Solovyeva, James Boylan, Mei Xiao, Fabrizio Serluca, Sebastian Hoersch, Jeremy Jenkins, Murthy Devarakonda, Bulent Kiziltan

Abstract: Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell d… ▽ More Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell data typically fall short of the standards in textual data used for training LLMs. Single-cell sequencing often suffers from technical artifacts, dropout events, and batch effects. These challenges are compounded in a weakly supervised setting, where the labels of cell states can be noisy, further complicating the analysis. To tackle these challenges, we present sc-OTGM, streamlined with less than 500K parameters, making it approximately 100x more compact than the foundation models, offering an efficient alternative. sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated from a combination of the finite multivariate Gaussian distributions. The core function of sc-OTGM is to create a probabilistic latent space utilizing a GMM as its prior distribution and distinguish between distinct cell populations by learning their respective marginal PDFs. It uses a Hit-and-Run Markov chain sampler to determine the OT plan across these PDFs within the GMM framework. We evaluated our model against a CRISPR-mediated perturbation dataset, called CROP-seq, consisting of 57 one-gene perturbations. Our results demonstrate that sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification through a recommender system. It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: ICLR 2024, Machine Learning for Genomics Explorations Workshop

arXiv:2404.15923 [pdf, other]

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

Authors: Jack Boylan, Shashank Mangla, Dominic Thorn, Demian Gholipour Ghalandari, Parsa Ghaffari, Chris Hokamp

Abstract: This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a… ▽ More This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Text2KG 2024, ESWC 2024

arXiv:2303.14217 [pdf, other]

doi 10.1080/01605682.2023.2253852

Operational Research: Methods and Applications

Authors: Fotios Petropoulos, Gilbert Laporte, Emel Aktas, Sibel A. Alumur, Claudia Archetti, Hayriye Ayhan, Maria Battarra, Julia A. Bennell, Jean-Marie Bourjolly, John E. Boylan, Michèle Breton, David Canca, Laurent Charlin, Bo Chen, Cihan Tugrul Cicek, Louis Anthony Cox Jr, Christine S. M. Currie, Erik Demeulemeester, Li Ding, Stephen M. Disney, Matthias Ehrgott, Martin J. Eppler, Güneş Erdoğan, Bernard Fortz, L. Alberto Franco , et al. (57 additional authors not shown)

Abstract: Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the vari… ▽ More Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order. The authors dedicate this paper to the 2023 Turkey/Syria earthquake victims. We sincerely hope that advances in OR will play a role towards minimising the pain and suffering caused by this and future catastrophes. △ Less

Submitted 13 January, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Journal ref: Journal of the Operational Research Society (2024) 75(3)

arXiv:2107.14512 [pdf, ps, other]

Forecasting and its Beneficiaries

Authors: Bahman Rostami-Tabar, John E. Boylan

Abstract: This chapter addresses the question of who benefits from forecasting, using Forecasting for Social Good as a motivating framework. Barriers to broadening the base of beneficiaries are identified, and some parallels are drawn with similar concerns that were expressed in the Operational Research literature some years ago. A recent initiative, called Democratising Forecasting, is discussed, highlight… ▽ More This chapter addresses the question of who benefits from forecasting, using Forecasting for Social Good as a motivating framework. Barriers to broadening the base of beneficiaries are identified, and some parallels are drawn with similar concerns that were expressed in the Operational Research literature some years ago. A recent initiative, called Democratising Forecasting, is discussed, highlighting its achievements, challenges, limitations and future agenda. Communication issues be-tween the major forecasting stakeholders are also examined, with pointers being given for more effective communications, in order to gain the greatest benefits △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 24

arXiv:2012.03854 [pdf, other]

doi 10.1016/j.ijforecast.2021.11.001

Forecasting: theory and practice

Authors: Fotios Petropoulos, Daniele Apiletti, Vassilios Assimakopoulos, Mohamed Zied Babai, Devon K. Barrow, Souhaib Ben Taieb, Christoph Bergmeir, Ricardo J. Bessa, Jakub Bijak, John E. Boylan, Jethro Browell, Claudio Carnevale, Jennifer L. Castle, Pasquale Cirillo, Michael P. Clements, Clara Cordeiro, Fernando Luiz Cyrino Oliveira, Shari De Baets, Alexander Dokumentov, Joanne Ellison, Piotr Fiszeder, Philip Hans Franses, David T. Frazier, Michael Gilliland, M. Sinan Gönül , et al. (55 additional authors not shown)

Abstract: Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systemati… ▽ More Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases. △ Less

Submitted 5 January, 2022; v1 submitted 4 December, 2020; originally announced December 2020.

Showing 1–6 of 6 results for author: Boylan, J