Search | arXiv e-print repository

BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets

Authors: Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld

Abstract: Researchers in biomedical research, public health, and the life sciences often spend weeks or months discovering, accessing, curating, and integrating data from disparate sources, significantly delaying the onset of actual analysis and innovation. Instead of countless developers creating redundant and inconsistent data pipelines, BioBricks.ai offers a centralized data repository and a suite of dev… ▽ More Researchers in biomedical research, public health, and the life sciences often spend weeks or months discovering, accessing, curating, and integrating data from disparate sources, significantly delaying the onset of actual analysis and innovation. Instead of countless developers creating redundant and inconsistent data pipelines, BioBricks.ai offers a centralized data repository and a suite of developer-friendly tools to simplify access to scientific data. Currently, BioBricks.ai delivers over ninety biological and chemical datasets. It provides a package manager-like system for installing and managing dependencies on data sources. Each 'brick' is a Data Version Control git repository that supports an updateable pipeline for extraction, transformation, and loading data into the BioBricks.ai backend at https://biobricks.ai. Use cases include accelerating data science workflows and facilitating the creation of novel data assets by integrating multiple datasets into unified, harmonized resources. In conclusion, BioBricks.ai offers an opportunity to accelerate access and use of public data through a single open platform. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 23 pages, 2 figures

arXiv:2406.06150 [pdf, other]

Physics-Informed Bayesian Optimization of Variational Quantum Circuits

Authors: Kim A. Nicoli, Christopher J. Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Stefan Kühn, Klaus-Robert Müller, Paolo Stornati, Pan Kessel, Shinichi Nakajima

Abstract: In this paper, we propose a novel and powerful method to harness Bayesian optimization for Variational Quantum Eigensolvers (VQEs) -- a hybrid quantum-classical protocol used to approximate the ground state of a quantum Hamiltonian. Specifically, we derive a VQE-kernel which incorporates important prior information about quantum circuits: the kernel feature map of the VQE-kernel exactly matches th… ▽ More In this paper, we propose a novel and powerful method to harness Bayesian optimization for Variational Quantum Eigensolvers (VQEs) -- a hybrid quantum-classical protocol used to approximate the ground state of a quantum Hamiltonian. Specifically, we derive a VQE-kernel which incorporates important prior information about quantum circuits: the kernel feature map of the VQE-kernel exactly matches the known functional form of the VQE's objective function and thereby significantly reduces the posterior uncertainty. Moreover, we propose a novel acquisition function for Bayesian optimization called Expected Maximum Improvement over Confident Regions (EMICoRe) which can actively exploit the inductive bias of the VQE-kernel by treating regions with low predictive uncertainty as indirectly ``observed''. As a result, observations at as few as three points in the search domain are sufficient to determine the complete objective function along an entire one-dimensional subspace of the optimization landscape. Our numerical experiments demonstrate that our approach improves over state-of-the-art baselines. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 36 pages, 17 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2404.11760 [pdf, other]

Predictive Model Development to Identify Failed Healing in Patients after Non-Union Fracture Surgery

Authors: Cedric Donié, Marie K. Reumann, Tony Hartung, Benedikt J. Braun, Tina Histing, Satoshi Endo, Sandra Hirche

Abstract: Bone non-union is among the most severe complications associated with trauma surgery, occurring in 10-30% of cases after long bone fractures. Treating non-unions requires a high level of surgical expertise and often involves multiple revision surgeries, sometimes even leading to amputation. Thus, more accurate prognosis is crucial for patient well-being. Recent advances in machine learning (ML) ho… ▽ More Bone non-union is among the most severe complications associated with trauma surgery, occurring in 10-30% of cases after long bone fractures. Treating non-unions requires a high level of surgical expertise and often involves multiple revision surgeries, sometimes even leading to amputation. Thus, more accurate prognosis is crucial for patient well-being. Recent advances in machine learning (ML) hold promise for developing models to predict non-union healing, even when working with smaller datasets, a commonly encountered challenge in clinical domains. To demonstrate the effectiveness of ML in identifying candidates at risk of failed non-union healing, we applied three ML models (logistic regression, support vector machine, and XGBoost) to the clinical dataset TRUFFLE, which includes 797 patients with long bone non-union. The models provided prediction results with 70% sensitivity, and the specificities of 66% (XGBoost), 49% (support vector machine), and 43% (logistic regression). These findings offer valuable clinical insights because they enable early identification of patients at risk of failed non-union healing after the initial surgical revision treatment protocol. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: To be presented at the 46th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 2024)

ACM Class: J.3; I.5.4

arXiv:2302.14082 [pdf, other]

Detecting and Mitigating Mode-Collapse for Flow-based Sampling of Lattice Field Theories

Authors: Kim A. Nicoli, Christopher J. Anders, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima

Abstract: We study the consequences of mode-collapse of normalizing flows in the context of lattice field theory. Normalizing flows allow for independent sampling. For this reason, it is hoped that they can avoid the tunneling problem of local-update MCMC algorithms for multi-modal distributions. In this work, we first point out that the tunneling problem is also present for normalizing flows but is shifted… ▽ More We study the consequences of mode-collapse of normalizing flows in the context of lattice field theory. Normalizing flows allow for independent sampling. For this reason, it is hoped that they can avoid the tunneling problem of local-update MCMC algorithms for multi-modal distributions. In this work, we first point out that the tunneling problem is also present for normalizing flows but is shifted from the sampling to the training phase of the algorithm. Specifically, normalizing flows often suffer from mode-collapse for which the training process assigns vanishingly low probability mass to relevant modes of the physical distribution. This may result in a significant bias when the flow is used as a sampler in a Markov-Chain or with Importance Sampling. We propose a metric to quantify the degree of mode-collapse and derive a bound on the resulting bias. Furthermore, we propose various mitigation strategies in particular in the context of estimating thermodynamic observables, such as the free energy. △ Less

Submitted 3 November, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 7 figures, 6 pages of supplement material

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2111.11303 [pdf, ps, other]

doi 10.22323/1.396.0338

Machine Learning of Thermodynamic Observables in the Presence of Mode Collapse

Authors: Kim A. Nicoli, Christopher Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima, Paolo Stornati

Abstract: Estimating the free energy, as well as other thermodynamic observables, is a key task in lattice field theories. Recently, it has been pointed out that deep generative models can be used in this context [1]. Crucially, these models allow for the direct estimation of the free energy at a given point in parameter space. This is in contrast to existing methods based on Markov chains which generically… ▽ More Estimating the free energy, as well as other thermodynamic observables, is a key task in lattice field theories. Recently, it has been pointed out that deep generative models can be used in this context [1]. Crucially, these models allow for the direct estimation of the free energy at a given point in parameter space. This is in contrast to existing methods based on Markov chains which generically require integration through parameter space. In this contribution, we will review this novel machine-learning-based estimation method. We will in detail discuss the issue of mode collapse and outline mitigation techniques which are particularly suited for applications at finite temperature. △ Less

Submitted 30 November, 2021; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 10 pages, 2 figures, Proceedings of the 38th International Symposium on Lattice Field Theory, 26th-30th July 2021, Zoom/Gather@Massachusetts Institute of Technology

Report number: MIT-CTP/5353

arXiv:2011.05451 [pdf, other]

doi 10.1016/j.jcp.2021.110527

Lattice meets lattice: Application of lattice cubature to models in lattice gauge theory

Authors: Tobias Hartung, Karl Jansen, Frances Y. Kuo, Hernan Leövey, Dirk Nuyens, Ian H. Sloan

Abstract: High dimensional integrals are abundant in many fields of research including quantum physics. The aim of this paper is to develop efficient recursive strategies to tackle a class of high dimensional integrals having a special product structure with low order couplings, motivated by models in lattice gauge theory from quantum field theory. A novel element of this work is the potential benefit in us… ▽ More High dimensional integrals are abundant in many fields of research including quantum physics. The aim of this paper is to develop efficient recursive strategies to tackle a class of high dimensional integrals having a special product structure with low order couplings, motivated by models in lattice gauge theory from quantum field theory. A novel element of this work is the potential benefit in using lattice cubature rules. The group structure within lattice rules combined with the special structure in the physics integrands may allow efficient computations based on Fast Fourier Transforms. Applications to the quantum mechanical rotor and compact $U(1)$ lattice gauge theory in two and three dimensions are considered. △ Less

Submitted 29 June, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

MSC Class: 65D30; 65D32; 65T50; 65Z05; 81T80

Journal ref: Journal of Computational Physics Volume 443, 15 October 2021, 110527

arXiv:2007.07115 [pdf, other]

doi 10.1103/PhysRevLett.126.032001

Estimation of Thermodynamic Observables in Lattice Field Theories with Deep Generative Models

Authors: Kim A. Nicoli, Christopher J. Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima, Paolo Stornati

Abstract: In this work, we demonstrate that applying deep generative machine learning models for lattice field theory is a promising route for solving problems where Markov Chain Monte Carlo (MCMC) methods are problematic. More specifically, we show that generative models can be used to estimate the absolute value of the free energy, which is in contrast to existing MCMC-based methods which are limited to o… ▽ More In this work, we demonstrate that applying deep generative machine learning models for lattice field theory is a promising route for solving problems where Markov Chain Monte Carlo (MCMC) methods are problematic. More specifically, we show that generative models can be used to estimate the absolute value of the free energy, which is in contrast to existing MCMC-based methods which are limited to only estimate free energy differences. We demonstrate the effectiveness of the proposed method for two-dimensional $φ^4$ theory and compare it to MCMC-based methods in detailed numerical experiments. △ Less

Submitted 5 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 8 figures

Journal ref: Phys. Rev. Lett. 126, 032001 (2021)

Showing 1–8 of 8 results for author: Hartung, T