-
Decoding the Molecular Universe -- Workshop Report
Authors:
Thomas O. Metz,
Joshua N. Adkins,
Peter B. Armentrout,
Patrick Chain,
Fanny Chu,
Courtney D Corley,
John R. Cort,
Elizabeth Denis,
Daniel Drell,
Katherine R. Duncan,
Robert G. Ewing,
Facundo M. Fernandez,
Oliver Fiehn,
Neha Garg,
Stefan Grimme,
Christopher Henry,
Robert L. Hettich,
Tobias Kind,
Roger G. Linington,
Gary W. Miller,
Trent Northen,
Kirsten Overdahl,
Ari Patrinos,
Daniel Raftery,
Paul Rigor
, et al. (8 additional authors not shown)
Abstract:
On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology develop…
▽ More
On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology development. These experts were invited to assess the value and feasibility of a grand-scale project to create new technologies that would allow the identification and quantification of all small molecules, or to decode the molecular universe. The Decoding the Molecular Universe project would extend and complement the success of the Human Genome Project by developing new capabilities and technologies to measure small molecules (defined as non-protein, non-polymer molecules less than 1500 Daltons) of any origin and generated in biological systems or produced abiotically. Workshop attendees 1) explored what new understanding of biological and environmental systems could be revealed through the lens of small molecules; 2) characterized the similarities in current needs and technical challenges between each science or mission area for unambiguous and comprehensive determination of the composition and quantities of small molecules of any sample; 3) determined the extent to which technologies or methods currently exist for unambiguously and comprehensively determining the small molecule composition of any sample and in a reasonable time; and 4) identified the attributes of the ideal technology or approach for universal small molecule measurement and identification. The workshop concluded with a discussion of how a project of this scale could be undertaken, possible thrusts for the project, early proof-of-principle applications, and similar efforts upon which the project could be modeled.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions
Authors:
Gabriel Figueiredo Miller,
Juan Camilo Vásquez-Correa,
Juan Rafael Orozco-Arroyave,
Elmar Nöth
Abstract:
This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the abi…
▽ More
This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Objective hearing threshold identification from auditory brainstem response measurements using supervised and self-supervised approaches
Authors:
Dominik Thalmeier,
Gregor Miller,
Elida Schneltzer,
Anja Hurt,
Martin Hrabě de Angelis,
Lore Becker,
Christian L. Müller,
Holger Maier
Abstract:
Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, th…
▽ More
Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenotyping programs include auditory phenotyping of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, the German Mouse Clinic and similar facilities worldwide have produced large, uniform data sets of averaged ABR raw data of mutant and wildtype mice. In the course of standard ABR analysis, hearing thresholds are assessed visually by trained staff from series of signal curves of increasing sound pressure level. This is time-consuming and prone to be biased by the reader as well as the graphical display quality and scale. In an attempt to reduce workload and improve quality and reproducibility, we developed and compared two methods for automated hearing threshold identification from averaged ABR raw data: a supervised approach involving two combined neural networks trained on human-generated labels and a self-supervised approach, which exploits the signal power spectrum and combines random forest sound level estimation with a piece-wise curve fitting algorithm for threshold finding. We show that both models work well, outperform human threshold detection, and are suitable for fast, reliable, and unbiased hearing threshold detection and quality control. In a high-throughput mouse phenotyping environment, both methods perform well as part of an automated end-to-end screening pipeline to detect candidate genes for hearing involvement. Code for both models as well as data used for this work are freely available.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
A Bayesian model of acquisition and clearance of bacterial colonization
Authors:
Marko Järvenpää,
Mohamad R. Abdul Sater,
Georgia K. Lagoudas,
Paul C. Blainey,
Loren G. Miller,
James A. McKinnell,
Susan S. Huang,
Yonatan H. Grad,
Pekka Marttinen
Abstract:
Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a singl…
▽ More
Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. A common solution to estimate acquisition and clearance rates is to use a fixed genetic distance threshold. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we summarize recently submitted work \cite{jarvenpaa2018named} and present a Bayesian model that provides probabilities of whether two strains should be considered the same, allowing to determine bacterial clearance and acquisition from genomes sampled over time. We explicitly model the within-host variation using population genetic simulation, and the inference is done by combining information from multiple data sources by using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We use the method to analyse a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Clustering Coefficients of Protein-Protein Interaction Networks
Authors:
Gerald A. Miller,
Yi Y. Shi,
Hong Qian,
Karol Bomsztyk
Abstract:
The properties of certain networks are determined by hidden variables that are not explicitly measured. The conditional probability (propagator) that a vertex with a given value of the hidden variable is connected to k of other vertices determines all measurable properties. We study hidden variable models and find an averaging approximation that enables us to obtain a general analytical result f…
▽ More
The properties of certain networks are determined by hidden variables that are not explicitly measured. The conditional probability (propagator) that a vertex with a given value of the hidden variable is connected to k of other vertices determines all measurable properties. We study hidden variable models and find an averaging approximation that enables us to obtain a general analytical result for the propagator. Analytic results showing the validity of the approximation are obtained. We apply hidden variable models to protein-protein interaction networks (PINs) in which the hidden variable is the association free-energy, determined by distributions that depend on biochemistry and evolution. We compute degree distributions as well as clustering coefficients of several PINs of different species; good agreement with measured data is obtained. For the human interactome two different parameter sets give the same degree distributions, but the computed clustering coefficients differ by a factor of about two. This shows that degree distributions are not sufficient to determine the properties of PINs.
△ Less
Submitted 27 April, 2007;
originally announced April 2007.
-
Free-energy distribution of binary protein-protein binding suggests cross-species interactome differences
Authors:
Yi Y. Shi,
Gerald A. Miller,
Hong Qian,
Karol Bomsztyk
Abstract:
Major advances in large-scale yeast two hybrid (Y2H) screening have provided a global view of binary protein-protein interactions across species as dissimilar as human, yeast, and bacteria. Remarkably, these analyses have revealed that all species studied have a degree distribution of protein-protein binding that is approximately scale-free (varies as a power law) even though their evolutionary…
▽ More
Major advances in large-scale yeast two hybrid (Y2H) screening have provided a global view of binary protein-protein interactions across species as dissimilar as human, yeast, and bacteria. Remarkably, these analyses have revealed that all species studied have a degree distribution of protein-protein binding that is approximately scale-free (varies as a power law) even though their evolutionary divergence times differ by billions of years. The universal power-law shows only the surface of the rich information harbored by these high-throughput data. We develop a detailed mathematical model of the protein-protein interaction network based on association free energy, the biochemical quantity that determines protein-protein interaction strength. This model reproduces the degree distribution of all of the large-scale Y2H data sets available and allows us to extract the distribution of free energy, the likelihood that a pair of proteins of a given species will bind. We find that across-species interactomes have significant differences that reflect the strengths of the protein-protein interaction. Our results identify a global evolutionary shift: more evolved organisms have weaker binary protein-protein binding. This result is consistent with the evolution of increased protein unfoldedness and challenges the dogma that only specific protein-protein interactions can be biologically functional..
△ Less
Submitted 25 July, 2006;
originally announced July 2006.