Search | arXiv e-print repository

Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning

Authors: Danqing Wang, Antonis Antoniades, Kha-Dinh Luong, Edwin Zhang, Mert Kosan, Jiachen Li, Ambuj Singh, William Yang Wang, Lei Li

Abstract: Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations… ▽ More Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations is hard in real-world datasets due to a lack of human-annotated ground truth, which limits their use in areas like molecular sciences. Additionally, the increasing scale of these datasets provides a challenge for random search-based methods. In this paper, we develop a novel global explanation model RLHEX for molecular property prediction. It aligns the counterfactual explanations with human-defined principles, making the explanations more interpretable and easy for experts to evaluate. RLHEX includes a VAE-based graph generator to generate global explanations and an adapter to adjust the latent representation space to human-defined principles. Optimized by Proximal Policy Optimization (PPO), the global explanations produced by RLHEX cover 4.12% more input graphs and reduce the distance between the counterfactual explanation set and the input set by 0.47% on average across three molecular datasets. RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise. The code and data are released at https://github.com/dqwang122/RLHEX. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024

arXiv:2401.12661 [pdf, other]

Analysis of a detailed multi-stage model of stochastic gene expression using queueing theory and model reduction

Authors: Muhan Ma, Juraj Szavits-Nossan, Abhyudai Singh, Ramon Grima

Abstract: We introduce a biologically detailed, stochastic model of gene expression describing the multiple rate-limiting steps of transcription, nuclear pre-mRNA processing, nuclear mRNA export, cytoplasmic mRNA degradation and translation of mRNA into protein. The processes in sub-cellular compartments are described by an arbitrary number of processing stages, thus accounting for a significantly finer mol… ▽ More We introduce a biologically detailed, stochastic model of gene expression describing the multiple rate-limiting steps of transcription, nuclear pre-mRNA processing, nuclear mRNA export, cytoplasmic mRNA degradation and translation of mRNA into protein. The processes in sub-cellular compartments are described by an arbitrary number of processing stages, thus accounting for a significantly finer molecular description of gene expression than conventional models such as the telegraph, two-stage and three-stage models of gene expression. We use two distinct tools, queueing theory and model reduction using the slow-scale linear-noise approximation, to derive exact or approximate analytic expressions for the moments or distributions of nuclear mRNA, cytoplasmic mRNA and protein fluctuations, as well as lower bounds for their Fano factors in steady-state conditions. We use these to study the phase diagram of the stochastic model; in particular we derive parametric conditions determining three types of transitions in the properties of mRNA fluctuations: from sub-Poissonian to super-Poissonian noise, from high noise in the nucleus to high noise in the cytoplasm, and from a monotonic increase to a monotonic decrease of the Fano factor with the number of processing stages. In contrast, protein fluctuations are always super-Poissonian and show weak dependence on the number of mRNA processing stages. Our results delineate the region of parameter space where conventional models give qualitatively incorrect results and provide insight into how the number of processing stages, e.g. the number of rate-limiting steps in initiation, splicing and mRNA degradation, shape stochastic gene expression by modulation of molecular memory. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 49 pages, 4 figures

arXiv:2310.19744 [pdf, other]

Multistable protocells can aid the evolution of prebiotic autocatalytic sets

Authors: Angad Yuvraj Singh, Sanjay Jain

Abstract: We present a simple mathematical model that captures the evolutionary capabilities of a prebiotic compartment or protocell. In the model the protocell contains an autocatalytic set whose chemical dynamics is coupled to the growth-division dynamics of the compartment. Bistability in the dynamics of the autocatalytic set results in a protocell that can exist with two distinct growth rates. Stochasti… ▽ More We present a simple mathematical model that captures the evolutionary capabilities of a prebiotic compartment or protocell. In the model the protocell contains an autocatalytic set whose chemical dynamics is coupled to the growth-division dynamics of the compartment. Bistability in the dynamics of the autocatalytic set results in a protocell that can exist with two distinct growth rates. Stochasticity in chemical reactions plays the role of mutations and causes transitions from one growth regime to another. We show that the system exhibits `natural selection', where a `mutant' protocell in which the autocatalytic set is active arises by chance in a population of inactive protocells, and then takes over the population because of its higher growth rate or `fitness'. The work integrates three levels of dynamics: intracellular chemical, single protocell, and population (or ecosystem) of protocells.. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 28 pages, 12 figures, includes Supplementary Material

arXiv:2308.00717 [pdf]

The Study and Optimization Of Production/Fermentation Processes In Biofuel Production

Authors: Amardeep Singh

Abstract: The production process involved in the creation of biofuels consists of a number of operations and steps that require a meticulous understanding of the parameters and metrics. The production techniques again differ depending on the pre-treatment systems, source material, the methods used for extraction, types of nutrients used, cell cultures employed, time undertaken and temperature. Due to the st… ▽ More The production process involved in the creation of biofuels consists of a number of operations and steps that require a meticulous understanding of the parameters and metrics. The production techniques again differ depending on the pre-treatment systems, source material, the methods used for extraction, types of nutrients used, cell cultures employed, time undertaken and temperature. Due to the strategic and crucial role that bioethanol holds in supporting the energy demands of the future, it becomes important to run such processes to a highly optimized extent. One of the frontiers of leading such optimized designs is by studying the data from the production processes, formulating design experiments from said data and correlating the results with the parameters using analytical tools. While the case examples analyzed relate to bioethanol mostly, an additional analysis has been performed for data on biodiesel. Coupled with confirmatory methods such as Principal Component Analysis, researchers can help narrow down the extent or degree to which the parameters affect the final outcome and even configure inputs that may not play a definitive role in greater outputs. The project first tackles through some conventional case studies involving biofuel production using an FIS(Fuzzy Interface System) and provides certain insights into the ways in which fuel yields can be enhanced depending on the particular cases. For the purpose of analysis, tools such as MATLAB, Python and WEKA have been employed. Python and WEKA have been used extensively in building principal component analysis reviews for the purpose of this project while MATLAB has been used for building the FIS models. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2306.10993 [pdf]

Endothelial Cell-specific Loss of Breast Cancer Susceptibility Gene 2 Exacerbates Atherosclerosis

Authors: David C. R. Michels, Sepideh Nikfarjam, Berk Rasheed, Margi Patel, Shuhan Bu, Mehroz Ehsan, Hien C. Nguyen, Aman Singh, Biao Feng, John McGuire, Robert Gros, Jefferson C. Frisbee, Krishna K. Singh

Abstract: The BReast CAncer type 2 susceptibility protein (BRCA2) responds to DNA damage by participating in homology-directed repair. BRCA2 deficiency culminates in defective DNA damage repair (DDR) that when prolonged leads to the accumulation of DNA damage causing cancer or apoptosis. Oxidative stress promotes DNA damage and apoptosis and is a common mechanism through which cardiovascular risk factors le… ▽ More The BReast CAncer type 2 susceptibility protein (BRCA2) responds to DNA damage by participating in homology-directed repair. BRCA2 deficiency culminates in defective DNA damage repair (DDR) that when prolonged leads to the accumulation of DNA damage causing cancer or apoptosis. Oxidative stress promotes DNA damage and apoptosis and is a common mechanism through which cardiovascular risk factors lead to endothelial dysfunction and atherosclerosis. Herein, we show that endothelial BRCA2 plays a protective role against atherosclerosis under hypercholesterolemic stress. We successfully generated and characterized endothelial cell (EC)-specific BRCA2 knockout (BRCA2endo) mice. To study the effect of EC-specific BRCA2-loss in atherosclerosis, we generated and characterized BRCA2endo mice on apolipoprotein E null background (ApoE-/-), fed them with high-fat diet (HFD) and evaluated atherosclerosis. Baseline phenotyping of BRCA2endo mice did not show any adverse effects in terms of DNA damage and apoptosis as well as cardiac and metabolic function. However, using HFD-fed apolipoprotein E knockout (ApoE-/-) background, we demonstrated that EC-specific loss of BRCA2 resulted in aortic plaque deposition and splenomegaly. Comparison of RNA sequencing data from aortas of EC-specific BRCA2-deficient ApoE-/- and BRCA2-intact ApoE-/- mice revealed a total of 530 significantly differentially expressed genes with Protein Folding Response and Lipid Metabolism as the most affected pathways. This study provides foundational knowledge regarding BRCA2 status and function in the cardiovascular system, and highlights the potential of BRCA2 as a novel therapeutic target in prevention and treatment of atherosclerosis. Our data indicate that BRCA2 mutation carriers may be at a previously unrecognized risk of atherosclerosis in addition to breast and ovarian cancer. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2301.00948 [pdf, other]

Understanding EEG signals for subject-wise Definition of Armoni Activities

Authors: Kislay Raj, Aditya Singh, Abhishek Mandal, Teerath Kumar, Arunabha M. Roy

Abstract: In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a clos… ▽ More In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a closed-loop methodology to improve the cognitive state of a person with ID (Intellectual disability). We have used a platform named 'Armoni', for providing training to the intellectually disabled individuals. The learning is performed in a closed-loop by using feedback in the form of change in affective state. For feedback to the Armoni, an EEG (Electroencephalograph) headband is used. All the changes in EEG are observed and classified against the change in the mean and standard deviation value of all frequency bands of signal. This comparison is being helpful in defining every activity with respect to change in brain signals. In this paper, we have discussed the process of treatment of EEG signal and its definition against the different activities of Armoni. We have tested it on 6 different systems with different age groups and cognitive levels. △ Less

Submitted 26 April, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: Submitted to SN Computer Science journal

arXiv:2210.09085 [pdf]

Perspectives for self-driving labs in synthetic biology

Authors: Hector Garcia Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard, Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose Manuel Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh

Abstract: Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) that decides the next set of experiments. Taken to their ultimate expression, SDLs could usher a new paradigm of scientific research, where the world is probed, interpreted, and explained by machines for human benefit. While there are functioning SDLs in the fields of chemistry and materials science, we… ▽ More Self-driving labs (SDLs) combine fully automated experiments with artificial intelligence (AI) that decides the next set of experiments. Taken to their ultimate expression, SDLs could usher a new paradigm of scientific research, where the world is probed, interpreted, and explained by machines for human benefit. While there are functioning SDLs in the fields of chemistry and materials science, we contend that synthetic biology provides a unique opportunity since the genome provides a single target for affecting the incredibly wide repertoire of biological cell behavior. However, the level of investment required for the creation of biological SDLs is only warranted if directed towards solving difficult and enabling biological questions. Here, we discuss challenges and opportunities in creating SDLs for synthetic biology. △ Less

Submitted 1 November, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 17 pages, 3 figures. Submitted for publication in Current Opinion in Biotechnology. Updated figure 3 in this version

arXiv:2210.04479 [pdf, other]

Classification of cow diet based on milk mid infrared spectra: a data analysis competition at the "International workshop of spectroscopy and chemometrics 2022"

Authors: Maria Frizzarin, Giulio Visentin, Alessandro Ferragina, Elena Hayes, Antonio Bevilacqua, Bhaskar Dhariyal, Katarina Domijan, Hussain Khan, Georgiana Ifrim, Thach Le Nguyen, Joe Meagher, Laura Menchetti, Ashish Singh, Suzy Whoriskey, Robert Williamson, Martina Zappaterra, Alessandro Casa

Abstract: In April 2022, the Vistamilk SFI Research Centre organized the second edition of the "International Workshop on Spectroscopy and Chemometrics - Applications in Food and Agriculture". Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at developing a prediction model to discriminate dairy cows' diet based on milk spectral information co… ▽ More In April 2022, the Vistamilk SFI Research Centre organized the second edition of the "International Workshop on Spectroscopy and Chemometrics - Applications in Food and Agriculture". Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at developing a prediction model to discriminate dairy cows' diet based on milk spectral information collected in the mid-infrared region. In fact, the development of an accurate and reliable discriminant model for dairy cows' diet can provide important authentication tools for dairy processors to guarantee product origin for dairy food manufacturers from grass-fed animals. Different statistical and machine learning modelling approaches have been employed during the workshop, with different pre-processing steps involved and different degree of complexity. The present paper aims to describe the statistical methods adopted by participants to develop such classification model. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 27 pages, 9 figures

arXiv:2210.01769 [pdf, other]

Mind Reader: Reconstructing complex images from brain activities

Authors: Sikun Lin, Thomas Sprague, Ambuj K Singh

Abstract: Understanding how the brain encodes external stimuli and how these stimuli can be decoded from the measured brain activities are long-standing and challenging questions in neuroscience. In this paper, we focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals. Unlike previous works that reconstruct images with single objects or simple shapes, our… ▽ More Understanding how the brain encodes external stimuli and how these stimuli can be decoded from the measured brain activities are long-standing and challenging questions in neuroscience. In this paper, we focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals. Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli that are rich in semantics, closer to everyday scenes, and can reveal more perspectives. However, data scarcity of fMRI datasets is the main obstacle to applying state-of-the-art deep learning models to this problem. We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images. Therefore, the modalities involved in our method are: (i) voxel-level fMRI signals, (ii) observed images that trigger the brain signals, and (iii) textual description of the images. To further address data scarcity, we leverage an aligned vision-language latent space pre-trained on massive datasets. Instead of training models from scratch to find a latent space shared by the three modalities, we encode fMRI signals into this pre-aligned latent space. Then, conditioned on embeddings in this space, we reconstruct images with a generative model. The reconstructed images from our pipeline balance both naturalness and fidelity: they are photo-realistic and capture the ground truth image contents well. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2205.11648 [pdf, other]

doi 10.1145/3534678.3539301

Deep Representations for Time-varying Brain Datasets

Authors: Sikun Lin, Shuyun Tang, Scott Grafton, Ambuj Singh

Abstract: Finding an appropriate representation of dynamic activities in the brain is crucial for many downstream applications. Due to its highly dynamic nature, temporally averaged fMRI (functional magnetic resonance imaging) can only provide a narrow view of underlying brain activities. Previous works lack the ability to learn and interpret the latent dynamics in brain architectures. This paper builds an… ▽ More Finding an appropriate representation of dynamic activities in the brain is crucial for many downstream applications. Due to its highly dynamic nature, temporally averaged fMRI (functional magnetic resonance imaging) can only provide a narrow view of underlying brain activities. Previous works lack the ability to learn and interpret the latent dynamics in brain architectures. This paper builds an efficient graph neural network model that incorporates both region-mapped fMRI sequences and structural connectivities obtained from DWI (diffusion-weighted imaging) as inputs. We find good representations of the latent brain dynamics through learning sample-level adaptive adjacency matrices and performing a novel multi-resolution inner cluster smoothing. We also attribute inputs with integrated gradients, which enables us to infer (1) highly involved brain connections and subnetworks for each task, (2) temporal keyframes of imaging sequences that characterize tasks, and (3) subnetworks that discriminate between individual subjects. This ability to identify critical subnetworks that characterize signal states across heterogeneous tasks and individuals is of great importance to neuroscience and other scientific domains. Extensive experiments and ablation studies demonstrate our proposed method's superiority and efficiency in spatial-temporal graph signal modeling with insightful interpretations of brain dynamics. △ Less

Submitted 16 August, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Journal ref: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022

arXiv:2111.08496 [pdf, other]

doi 10.1073/pnas.2121302119

Sensing the shape of a cell with reaction-diffusion and energy minimization

Authors: Amit R. Singh, Travis Leadbetter, Brian A. Camley

Abstract: Some dividing cells sense their shape by becoming polarized along their long axis. Cell polarity is controlled in part by polarity proteins like Rho GTPases cycling between active membrane-bound forms and inactive cytosolic forms, modeled as a "wave-pinning" reaction-diffusion process. Does shape sensing emerge from wave-pinning? We show that wave pinning senses the cell's long axis. Simulating wa… ▽ More Some dividing cells sense their shape by becoming polarized along their long axis. Cell polarity is controlled in part by polarity proteins like Rho GTPases cycling between active membrane-bound forms and inactive cytosolic forms, modeled as a "wave-pinning" reaction-diffusion process. Does shape sensing emerge from wave-pinning? We show that wave pinning senses the cell's long axis. Simulating wave-pinning on a curved surface, we find that high-activity domains migrate to peaks and troughs of the surface. For smooth surfaces, a simple rule of minimizing the domain perimeter while keeping its area fixed predicts the final position of the domain and its shape. However, when we introduce roughness to our surfaces, shape sensing can be disrupted, and high-activity domains can become localized to locations other than the global peaks and valleys of the surface. On rough surfaces, the domains of the wave-pinning model are more robust in finding the peaks and troughs than the minimization rule, though both can become trapped in steady states away from the peaks and valleys. We can control the robustness of shape sensing by altering the Rho GTPase diffusivity and the domain size. We also find that the shape sensing properties of cell polarity models can explain how domains localize to curved regions of deformed cells. Our results help to understand the factors that allow cells to sense their shape - and the limits that membrane roughness can place on this process. △ Less

Submitted 8 July, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Journal ref: Proc. Nat. Acad. Sci. 119 (31) e2121302119, 2022

arXiv:2110.12392 [pdf, other]

Variation is the Norm: Brain State Dynamics Evoked By Emotional Video Clips

Authors: Ashutosh Singh, Christiana Westlin, Hedwig Eisenbarth, Elizabeth A. Reynolds Losin, Jessica R. Andrews-Hanna, Tor D. Wager, Ajay B. Satpute, Lisa Feldman Barrett, Dana H. Brooks, Deniz Erdogmus

Abstract: For the last several decades, emotion research has attempted to identify a "biomarker" or consistent pattern of brain activity to characterize a single category of emotion (e.g., fear) that will remain consistent across all instances of that category, regardless of individual and context. In this study, we investigated variation rather than consistency during emotional experiences while people wat… ▽ More For the last several decades, emotion research has attempted to identify a "biomarker" or consistent pattern of brain activity to characterize a single category of emotion (e.g., fear) that will remain consistent across all instances of that category, regardless of individual and context. In this study, we investigated variation rather than consistency during emotional experiences while people watched video clips chosen to evoke instances of specific emotion categories. Specifically, we developed a sequential probabilistic approach to model the temporal dynamics in a participant's brain activity during video viewing. We characterized brain states during these clips as distinct state occupancy periods between state transitions in blood oxygen level dependent (BOLD) signal patterns. We found substantial variation in the state occupancy probability distributions across individuals watching the same video, supporting the hypothesis that when it comes to the brain correlates of emotional experience, variation may indeed be the norm. △ Less

Submitted 24 October, 2021; originally announced October 2021.

arXiv:2107.14139 [pdf, other]

Vaccination Worldwide: Strategies, Distribution and Challenges

Authors: Chirag Samal, Kasia Jakimowicz, Krishnendu Dasgupta, Aniket Vashishtha, Francisco O., Arunakiry Natarajan, Haris Nazir, Alluri Siddhartha Varma, Tejal Dahake, Amitesh Anand Pandey, Ishaan Singh, John Sangyeob Kim, Mehrab Singh Gill, Saurish Srivastava, Orna Mukhopadhyay, Parth Patwa, Qamil Mirza, Sualeha Irshad, Sheshank Shankar, Rohan Iyer, Rohan Sukumaran, Ashley Mehra, Anshuman Sharma, Abhishek Singh, Maurizio Arseni , et al. (4 additional authors not shown)

Abstract: The Coronavirus 2019 (Covid-19) pandemic caused by the SARS-CoV-2 virus represents an unprecedented crisis for our planet. It is a bane of the über connected world that we live in that this virus has affected almost all countries and caused mortality and economic upheaval at a scale whose effects are going to be felt for generations to come. While we can all be buoyed at the pace at which vaccines… ▽ More The Coronavirus 2019 (Covid-19) pandemic caused by the SARS-CoV-2 virus represents an unprecedented crisis for our planet. It is a bane of the über connected world that we live in that this virus has affected almost all countries and caused mortality and economic upheaval at a scale whose effects are going to be felt for generations to come. While we can all be buoyed at the pace at which vaccines have been developed and brought to market, there are still challenges ahead for all countries to get their populations vaccinated equitably and effectively. This paper provides an overview of ongoing immunization efforts in various countries. In this early draft, we have identified a few key factors that we use to review different countries' current COVID-19 immunization strategies and their strengths and draw conclusions so that policymakers worldwide can learn from them. Our paper focuses on processes related to vaccine approval, allocation and prioritization, distribution strategies, population to vaccine ratio, vaccination governance, accessibility and use of digital solutions, and government policies. The statistics and numbers are dated as per the draft date [June 24th, 2021]. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2107.02906 [pdf, other]

doi 10.1016/j.chemolab.2021.104442

Mid infrared spectroscopy and milk quality traits: a data analysis competition at the "International Workshop on Spectroscopy and Chemometrics 2021"

Authors: Maria Frizzarin, Antonio Bevilacqua, Bhaskar Dhariyal, Katarina Domijan, Federico Ferraccioli, Elena Hayes, Georgiana Ifrim, Agnieszka Konkolewska, Thach Le Nguyen, Uche Mbaka, Giovanna Ranzato, Ashish Singh, Marco Stefanucci, Alessandro Casa

Abstract: A chemometric data analysis challenge has been arranged during the first edition of the "International Workshop on Spectroscopy and Chemometrics", organized by the Vistamilk SFI Research Centre and held online in April 2021. The aim of the competition was to build a calibration model in order to predict milk quality traits exploiting the information contained in mid-infrared spectra only. Three di… ▽ More A chemometric data analysis challenge has been arranged during the first edition of the "International Workshop on Spectroscopy and Chemometrics", organized by the Vistamilk SFI Research Centre and held online in April 2021. The aim of the competition was to build a calibration model in order to predict milk quality traits exploiting the information contained in mid-infrared spectra only. Three different traits have been provided, presenting heterogeneous degrees of prediction complexity thus possibly requiring trait-specific modelling choices. In this paper the different approaches adopted by the participants are outlined and the insights obtained from the analyses are critically discussed. △ Less

Submitted 19 September, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 17 pages, 6 figures, 6 tables

Journal ref: Chemometrics and Intelligent Laboratory Systems, 2021, Volume 219, 104442

arXiv:2103.06145 [pdf]

GraphBreak: Tool for Network Community based Regulatory Medicine, Gene co-expression, Linkage Disequilibrium analysis, functional annotation and more

Authors: Abhishek Narain Singh

Abstract: Graph network science is becoming increasingly popular, notably in big-data perspective where understanding individual entities for individual functional roles is complex and time consuming. It is likely when a set of genes are regulated by a set of genetic variants, the genes set is recruited for a common or related functional purpose. Grouping and extracting communities from network of associati… ▽ More Graph network science is becoming increasingly popular, notably in big-data perspective where understanding individual entities for individual functional roles is complex and time consuming. It is likely when a set of genes are regulated by a set of genetic variants, the genes set is recruited for a common or related functional purpose. Grouping and extracting communities from network of associations becomes critical to understand system complexity, thus prioritizing genes for dis-ease and functional associations. Workload is reduced when studying entities one at a time. For this, we present GraphBreak, a suite of tools for community detection application, such as for gene co-expression, protein interaction, regulation network, etc.Although developed for use case of eQTLs regulatory genomic net-work community study -- results shown with our analysis with sample eQTL data. Graphbreak can be deployed for other studies if input data has been fed in requisite format, including but not limited to gene co-expression networks, protein-protein interaction network, signaling pathway and metabolic network. Graph-Break showed critical use case value in its downstream analysis for disease association of communities detected. If all independent steps of community detection and analysis are a step-by-step sub-part of the algorithm, GraphBreak can be considered a new algorithm for community based functional characterization. Combination of various algorithmic implementation modules into a single script for this purpose illustrates GraphBreak novelty. Compared to other similar tools, with GraphBreak we can better detect communities with over-representation of its member genes for statistical association with diseases, therefore target genes which can be prioritized for drug-positioning or drug-re-positioning as the case be. △ Less

Submitted 24 February, 2021; originally announced March 2021.

arXiv:2103.03667 [pdf]

SasCsvToolkit -- A versatile parallel 'bag-of-tasks' job submission application on heterogeneous and homogeneous platforms for Big Data Analytics such as for Biomedical Informatics

Authors: Abhishek Narain Singh

Abstract: Background: The need for big data analysis requires being able to process large data which are being held fine-tuned for usage by corporate. It is only very recently that the need for big data has caught attention for low budget corporate groups and academia who typically do not have money and resources to buy expensive licenses of big data analysis platforms such as SAS. The corporate continue to… ▽ More Background: The need for big data analysis requires being able to process large data which are being held fine-tuned for usage by corporate. It is only very recently that the need for big data has caught attention for low budget corporate groups and academia who typically do not have money and resources to buy expensive licenses of big data analysis platforms such as SAS. The corporate continue to work on SAS data format largely because of systemic organizational history and that the prior codes have been built on them. The data-providers continue to thus provide data in SAS formats. Acute sudden need has arisen because of this gap of data being in SAS format and the coders not having a SAS expertise or training background as the economic and inertial forces acting of having shaped these two class of people have been different. Method: We analyze the differences and thus the need for SasCsvToolkit which helps to generate a CSV file for a SAS format data so that the data scientist can then make use of his skills in other tools that can process CSVs such as R, SPSS, or even Microsoft Excel. At the same time, it also provides conversion of CSV files to SAS format. Apart from this, a SAS database programmer always struggles in finding the right method to do a database search, exact match, substring match, except condition, filters, unique values, table joins and data mining for which the toolbox also provides template scripts to modify and use from command line. Results: The toolkit has been implemented on SLURM scheduler platform as a `bag-of-tasks` algorithm for parallel and distributed workflow though serial version has also been incorporated. △ Less

Submitted 24 February, 2021; originally announced March 2021.

arXiv:2102.13470 [pdf]

Feature set optimization by clustering, univariate association, Deep & Machine learning omics Wide Association Study (DMWAS) for Biomarkers discovery as tested on GTEx pilot dataset for death due to heart attack

Authors: Abhishek Narain Singh

Abstract: Univariate and multivariate methods for association of the genom-ic variations with the end-or-endo phenotype have been widely used for genome wide association studies. In addition to encoding the SNPs, we advocate usage of clustering as a novel method to encode the structural variations, SVs, in genomes, such as the deletions and insertions polymorphism (DIPs), Copy Number Variations (CNVs), tran… ▽ More Univariate and multivariate methods for association of the genom-ic variations with the end-or-endo phenotype have been widely used for genome wide association studies. In addition to encoding the SNPs, we advocate usage of clustering as a novel method to encode the structural variations, SVs, in genomes, such as the deletions and insertions polymorphism (DIPs), Copy Number Variations (CNVs), translocation, inversion, etc., that can be used as an independent fea-ture variable value for downstream computation by artificial intelli-gence methods to predict the endo-or-end phenotype. We introduce a clustering based encoding scheme for structural variations and om-ics based analysis. We conducted a complete all genomic variants association with the phenotype using deep learning and other ma-chine learning techniques, though other methods such as genetic al-gorithm can also be applied. Applying this encoding of SVs and one-hot encoding of SNPs on GTEx V7 pilot DNA variation dataset, we were able to get high accuracy using various methods of DMWAS, and particularly found logistic regression to work the best for death due to heart-attack (MHHRTATT) phenotype. The genom-ic variants acting as feature sets were then arranged in descending order of power of impact on the disease or trait phenotype, which we call optimization and that also uses top univariate association into account. Variant Id P1_M_061510_3_402_P at chromosome 3 & position 192063195 was found to be most highly associated to MHHRTATT. We present here the top ten optimized genomic va-riant feature set for the MHHRTATT phenotypic cause of death. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.13469 [pdf]

The unmasking of Mitochondrial Adam and Structural Variants larger than point mutations as stronger candidates for traits, disease phenotype and sex determination

Authors: Abhishek Narain Singh

Abstract: Background: Structural Variations, SVs, in a genome can be linked to a disease or characteristic phenotype. The variations come in many types and it is a challenge, not only determining the variations accurately, but also conducting the downstream statistical and analytical procedure. Method: Structural variations, SVs, with size 1 base-pair to 1000s of base-pairs with their precise breakpoints an… ▽ More Background: Structural Variations, SVs, in a genome can be linked to a disease or characteristic phenotype. The variations come in many types and it is a challenge, not only determining the variations accurately, but also conducting the downstream statistical and analytical procedure. Method: Structural variations, SVs, with size 1 base-pair to 1000s of base-pairs with their precise breakpoints and single-nucleotide polymorphisms, SNPs, were determined for members of a family. The genome was assembled using optimal metrics of ABySS and SOAPdenovo assembly tools using paired-end DNA sequence. Results: An interesting discovery was the mitochondrial DNA could have paternal leakage of inheritance or that the mutations could be high from maternal inheritance. It is also discovered that the mitochondrial DNA is less prone to SVs re-arrangements than SNPs, which propose better standards for determining ancestry and divergence between races and species over a long-time frame. Sex determination of an individual is found to be strongly confirmed using calls of nucleotide bases of SVs to the Y chromosome, more strongly determined than SNPs. We note that in general there is a larger variance -and thus the standard deviation, in the sum of SVs nucleotide compared to sum of SNPs of an individual when compared to reference sequence, and thus SVs serve as a stronger means to characterize an individual for a given trait or phenotype or to determine sex. The SVs and SNPs in HLA loci would also serve as a medical transformation method for determining the success of an organ transplant for a patient, and predisposition to diseases apriori. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2101.09158 [pdf, other]

SUTRA: A Novel Approach to Modelling Pandemics with Applications to COVID-19

Authors: Manindra Agrawal, Madhuri Kanitkar, Deepu Phillip, Tanima Hajra, Arti Singh, Avaneesh Singh, Prabal Pratap Singh, Mathukumalli Vidyasagar

Abstract: The Covid-19 pandemic has two key properties: (i) asymptomatic cases (both detected and undetected) that can result in new infections, and (ii) time-varying characteristics due to new variants, Non-Pharmaceutical Interventions etc. We develop a model called SUTRA (Susceptible, Undetected though infected, Tested positive, and Removed Analysis) that takes into account both of these two key propertie… ▽ More The Covid-19 pandemic has two key properties: (i) asymptomatic cases (both detected and undetected) that can result in new infections, and (ii) time-varying characteristics due to new variants, Non-Pharmaceutical Interventions etc. We develop a model called SUTRA (Susceptible, Undetected though infected, Tested positive, and Removed Analysis) that takes into account both of these two key properties. While applying the model to a region, two parameters of the model can be learnt from the number of daily new cases found in the region. Using the learnt values of the parameters the model can predict the number of daily new cases so long as the learnt parameters do not change substantially. Whenever any of the two parameters changes due to the key property (ii) above, the SUTRA model can detect that the values of one or both of the parameters have changed. Further, the model has the capability to relearn the changed parameter values, and then use these to carry out the prediction of the trajectory of the pandemic for the region of concern. The SUTRA approach can be applied at various levels of granularity, from an entire country to a district, more specifically, to any large enough region for which the data of daily new cases are available. We have applied the SUTRA model to thirty-two countries, covering more than half of the world's population. Our conclusions are: (i) The model is able to capture the past trajectories very well. Moreover, the parameter values, which we can estimate robustly, help quantify the impact of changes in the pandemic characteristics. (ii) Unless the pandemic characteristics change significantly, the model has good predictive capability. (iii) Natural immunity provides significantly better protection against infection than the currently available vaccines. △ Less

Submitted 25 October, 2022; v1 submitted 22 January, 2021; originally announced January 2021.

Comments: 38 pages, 20 figures, 5 tables

arXiv:2011.08977 [pdf, other]

Classification Of Sleep-Wake State In A Ballistocardiogram System Based On Deep Learning

Authors: Nemath Ahmed, Aashit Singh, Srivyshnav KS, Gulshan Kumar, Gaurav Parchani, Vibhor Saran

Abstract: Sleep state classification is vital in managing and understanding sleep patterns and is generally the first step in identifying acute or chronic sleep disorders. However, it is essential to do this without affecting the natural environment or conditions of the subject during their sleep. Techniques such as Polysomnography(PSG) are obtrusive and are not convenient for regular sleep monitoring. Fort… ▽ More Sleep state classification is vital in managing and understanding sleep patterns and is generally the first step in identifying acute or chronic sleep disorders. However, it is essential to do this without affecting the natural environment or conditions of the subject during their sleep. Techniques such as Polysomnography(PSG) are obtrusive and are not convenient for regular sleep monitoring. Fortunately, The rise of novel technologies and advanced computing has given a recent resurgence to monitoring sleep techniques. One such contactless and unobtrusive monitoring technique is Ballistocradiography(BCG), in which vitals are monitored by measuring the body's reaction to the cardiac ejection of blood. In this study, we propose a Multi-Head 1D-Convolution based Deep Neural Network to classify sleep-wake state and predict sleep-wake time accurately using the signals coming from a BCG sensor. Our method achieves a sleep-wake classification score of 95.5%, which is on par with researches based on the PSG system. We further conducted two independent studies in a controlled and uncontrolled environment to test the sleep-wake prediction accuracy. We achieve a score of 94.16% in a controlled environment on 115 subjects and 94.90% in an uncontrolled environment on 350 subjects. The high accuracy and contactless nature of the proposed system make it a convenient method for long term monitoring of sleep states. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: 11 Pages, 4 Figues, 4 tables

arXiv:2011.04202 [pdf, other]

Clinical Landscape of COVID-19 Testing: Difficult Choices

Authors: Darshan Gandhi, Sanskruti Landage, Joseph Bae, Sheshank Shankar, Rohan Sukumaran, Parth Patwa, Sethuraman T V, Priyanshi Katiyar, Shailesh Advani, Rohan Iyer, Sunaina Anand, Aryan Mahindra, Rachel Barbar, Abhishek Singh, Ramesh Raskar

Abstract: The coronavirus disease 2019 (COVID-19) pandemic has spread rapidly across the world, leading to enormous amounts of human death and economic loss. Until definitive preventive or curative measures are developed, policies regarding testing, contact tracing, and quarantine remain the best public health tools for curbing viral spread. Testing is a crucial component of these efforts, enabling the iden… ▽ More The coronavirus disease 2019 (COVID-19) pandemic has spread rapidly across the world, leading to enormous amounts of human death and economic loss. Until definitive preventive or curative measures are developed, policies regarding testing, contact tracing, and quarantine remain the best public health tools for curbing viral spread. Testing is a crucial component of these efforts, enabling the identification and isolation of infected individuals. Differences in testing methodologies, time frames, and outcomes can have an impact on their overall efficiency, usability and efficacy. In this early draft, we draw a comparison between the various types of diagnostic tests including PCR, antigen, and home tests in relation to their relative advantages, disadvantages, and use cases. We also look into alternative and unconventional methods. Further, we analyze the short-term and long-term impacts of the virus and its testing on various verticals such as business, government laws, policies, and healthcare. △ Less

Submitted 15 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 9 pages, 12 figures

arXiv:2009.07103 [pdf]

Machine learning predicts early onset of fever from continuous physiological data of critically ill patients

Authors: Aditya Singh, Akram Mohammed, Lokesh Chinthala, Rishikesan Kamaleswaran

Abstract: Fever can provide valuable information for diagnosis and prognosis of various diseases such as pneumonia, dengue, sepsis, etc., therefore, predicting fever early can help in the effectiveness of treatment options and expediting the treatment process. This study aims to develop novel algorithms that can accurately predict fever onset in critically ill patients by applying machine learning technique… ▽ More Fever can provide valuable information for diagnosis and prognosis of various diseases such as pneumonia, dengue, sepsis, etc., therefore, predicting fever early can help in the effectiveness of treatment options and expediting the treatment process. This study aims to develop novel algorithms that can accurately predict fever onset in critically ill patients by applying machine learning technique on continuous physiological data. We analyzed continuous physiological data collected every 5-minute from a cohort of over 200,000 critically ill patients admitted to an Intensive Care Unit (ICU) over a 2-year period. Each episode of fever from the same patient were considered as an independent event, with separations of at least 24 hours. We extracted descriptive statistical features from six physiological data streams, including heart rate, respiration, systolic and diastolic blood pressure, mean arterial pressure, and oxygen saturation, and use these features to independently predict the onset of fever. Using a bootstrap aggregation method, we created a balanced dataset of 7,801 afebrile and febrile patients and analyzed features up to 4 hours before the fever onset. We found that supervised machine learning methods can predict fever up to 4 hours before onset in critically ill patients with high recall, precision, and F1-score. This study demonstrates the viability of using machine learning to predict fever among hospitalized adults. The discovery of salient physiomarkers through machine learning and deep learning techniques has the potential to further accelerate the development and implementation of innovative care delivery protocols and strategies for medically vulnerable patients. △ Less

Submitted 14 September, 2020; originally announced September 2020.

arXiv:2008.07625 [pdf, other]

Liquid-liquid Phase Separation as the Second Step of Complex Coacervation

Authors: Aditya N. Singh, Arun Yethiraj

Abstract: Liquid liquid phase separation (LLPS) mediated by pi-cation bonds between tyrosine and arginine residues are of biological importance. To understand the interactions between proteins in the condensed phase in close analogy to complex coacervation, we run multiple umbrella calculations between oligomers containing tyrosine (pY) and arginine (pR). We find pR-pY complexation to be energetically drive… ▽ More Liquid liquid phase separation (LLPS) mediated by pi-cation bonds between tyrosine and arginine residues are of biological importance. To understand the interactions between proteins in the condensed phase in close analogy to complex coacervation, we run multiple umbrella calculations between oligomers containing tyrosine (pY) and arginine (pR). We find pR-pY complexation to be energetically driven. Metadynamics simulations reveal that this energy of complexation comes primarily from pi-cation bonds. On running free energy calculation for the second binding step of complex coacervation, we find striking similarities between this process and pi-mediated LLPS. These calculations lead us to believe that contrary to the common notion, complex coacervation as whole, which involves an entropic complexation followed by an energetic aggregation is not invoked by proteins containing arginine and tyrosine residues. Rather, the latter step in itself, in which neutral polyion pairs aggregate together is the correct mechanism for pi-cation mediated LLPS. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2008.06276 [pdf]

doi 10.5334/jors.342

Simple RGC: ImageJ plugins for counting retinal ganglion cells and determining the transduction efficiency of viral vectors in retinal wholemounts

Authors: Tiger Cross, Rasika Navarange, Joon-Ho Son, William Burr, Arjun Singh, Kelvin Zhang, Miruna Rusu, Konstantinos Gkoutzis, Andrew Osborne, Bart Nieuwenhuis

Abstract: Simple RGC consists of a collection of ImageJ plugins to assist researchers investigating retinal ganglion cell (RGC) injury models in addition to helping assess the effectiveness of treatments. The first plugin named RGC Counter accurately calculates the total number of RGCs from retinal wholemount images. The second plugin named RGC Transduction measures the co-localisation between two channels… ▽ More Simple RGC consists of a collection of ImageJ plugins to assist researchers investigating retinal ganglion cell (RGC) injury models in addition to helping assess the effectiveness of treatments. The first plugin named RGC Counter accurately calculates the total number of RGCs from retinal wholemount images. The second plugin named RGC Transduction measures the co-localisation between two channels making it possible to determine the transduction efficiencies of viral vectors and transgene expression levels. The third plugin named RGC Batch is a batch image processor to deliver fast analysis of large groups of microscope images. These ImageJ plugins make analysis of RGCs in retinal wholemounts easy, quick, consistent, and less prone to unconscious bias by the investigator. The plugins are freely available from the ImageJ update site https://sites.imagej.net/Sonjoonho/. △ Less

Submitted 21 April, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: Authors: Tiger Cross, Rasika Navarange, Joon-Ho Son, William Burr, Arjun Singh, Kelvin Zhang. Comment: These authors have contributed equally to this work. Authors: Andrew Osborne and Bart Nieuwenhuis. Comment: These authors share senior authorship and correspondence

Journal ref: Journal of Open Research Software, 9(1), 2021, p.15

arXiv:2006.14707 [pdf, other]

Machine-Learning Driven Drug Repurposing for COVID-19

Authors: Semih Cantürk, Aman Singh, Patrick St-Amant, Jason Behrmann

Abstract: The integration of machine learning methods into bioinformatics provides particular benefits in identifying how therapeutics effective in one context might have utility in an unknown clinical context or against a novel pathology. We aim to discover the underlying associations between viral proteins and antiviral therapeutics that are effective against them by employing neural network models. Using… ▽ More The integration of machine learning methods into bioinformatics provides particular benefits in identifying how therapeutics effective in one context might have utility in an unknown clinical context or against a novel pathology. We aim to discover the underlying associations between viral proteins and antiviral therapeutics that are effective against them by employing neural network models. Using the National Center for Biotechnology Information virus protein database and the DrugVirus database, which provides a comprehensive report of broad-spectrum antiviral agents (BSAAs) and viruses they inhibit, we trained ANN models with virus protein sequences as inputs and antiviral agents deemed safe-in-humans as outputs. Model training excluded SARS-CoV-2 proteins and included only Phases II, III, IV and Approved level drugs. Using sequences for SARS-CoV-2 (the coronavirus that causes COVID-19) as inputs to the trained models produces outputs of tentative safe-in-human antiviral candidates for treating COVID-19. Our results suggest multiple drug candidates, some of which complement recent findings from noteworthy clinical studies. Our in-silico approach to drug repurposing has promise in identifying new drug candidates and treatments for other viruses. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: Submitted to NeurIPS 2020. 11 pages, 3 figures, 5 tables, 12 pages of appendices

MSC Class: 68T07 (Primary); 68T10 (Secondary) ACM Class: I.2.6

arXiv:2004.04935 [pdf]

A single-cell RNA expression map of coronavirus receptors and associated factors in developing human embryos

Authors: Stacy Colaco, Karisma Chhabria, Domdatt Singh, Anshul Bhide, Neha Singh, Abhishek Singh, Atahar Husein, Anuradha Mishra, Richa Sharma, Nancy Ashary, Deepak Modi

Abstract: To predict if developing human embryos are permissive to coronaviruses, we analyzed publicly available single cell RNA-seq datasets of zygotes, 4-cell, 8-cell, morula, inner cell mass, epiblast, primitive endoderm and trophectoderm for the coronavirus receptors (ACE2, BSG, DPP4 and ANPEP), the Spike protein cleavage enzymes (TMPRSS2, CTSL). We also analyzed the presence of host genes involved in v… ▽ More To predict if developing human embryos are permissive to coronaviruses, we analyzed publicly available single cell RNA-seq datasets of zygotes, 4-cell, 8-cell, morula, inner cell mass, epiblast, primitive endoderm and trophectoderm for the coronavirus receptors (ACE2, BSG, DPP4 and ANPEP), the Spike protein cleavage enzymes (TMPRSS2, CTSL). We also analyzed the presence of host genes involved in viral replication, the endosomal sorting complexes required for transport (ESCRT) and SARS-Cov-2 interactions. The results reveal that ACE2, BSG, DPP4 and ANPEP are expressed in the cells of the zygote, to blastocyst including the trophectodermal lineage. ACE2, TMPRSS, BSG and CTSL are co-transcribed in a proportion of epiblast cells and most cells of the trophectoderm. The embryonic and trophectodermal cells also express genes for proteins ESCRT, viral replication and those that interact with SARS-CoV-2. We identified 1985 genes in epiblast and 1452 genes in the trophectoderm that are enriched in the ACE2 and TMPRSS2 co-expressing cells; 216 genes of these are common in both the cell types. These genes are associated with lipid metabolism, lysosome, peroxisome and oxidative phosphorylation pathways. Together our results suggest that developing human embryos could be permissive to coronavirus entry by both canonical and non-canonical mechanisms and they also express the genes for proteins involved in viral endocytosis and replication. This knowledge will be useful for evidence-based patient management for IVF during the COVID-19 pandemic. △ Less

Submitted 23 July, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: 33 pages, 6 main figures, 3 main table, 2 supplementary figure

arXiv:1912.13005 [pdf, other]

Global redistribution and local migration in semi-discrete host-parasitoid population dynamic models

Authors: Brooks Emerick, Abhyudai Singh

Abstract: Host-parasitoid population dynamics is often probed using a semi-discrete/hybrid modeling framework. Here, the update functions in the discrete-time model connecting year-to-year changes in the population densities are obtained by solving ordinary differential equations that mechanistically describe interactions when hosts become vulnerable to parasitoid attacks. We use this semi-discrete formalis… ▽ More Host-parasitoid population dynamics is often probed using a semi-discrete/hybrid modeling framework. Here, the update functions in the discrete-time model connecting year-to-year changes in the population densities are obtained by solving ordinary differential equations that mechanistically describe interactions when hosts become vulnerable to parasitoid attacks. We use this semi-discrete formalism to study two key spatial effects: local movement (migration) of parasitoids between patches during the vulnerable period; and yearly redistribution of populations across patches outside the vulnerable period. Our results show that in the absence of any redistribution, constant density-independent migration and parasitoid attack rates are unable to stabilize an otherwise unstable host-parasitoid population dynamics. Interestingly, inclusion of host redistribution (but not parasitoid redistribution) before the start of the vulnerable period can lead to stable coexistence of both species. Next, we consider a Type-III functional response (parasitoid attack rate increases with host density), where the absence of any spatial effects leads to a neutrally stable host-parasitoid equilibrium. As before, density-independent parasitoid migration by itself is again insufficient to stabilize the population dynamics and host redistribution provides a stabilizing influence. Finally, we show that a Type-III functional response combined with density-dependent parasitoid migration leads to stable coexistence, even in the absence of population redistributions. In summary, we have systematically characterized parameter regimes leading to stable/unstable population dynamics with different forms of spatial heterogeneity coupled to the parasitoid's functional response using mechanistically formulated semi-discrete models. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 27 pages, 8 figures

MSC Class: 92B05 (primar) 39A60 (secondary)

arXiv:1911.04046 [pdf, other]

Network Inference in Systems Biology: Recent Developments, Challenges, and Applications

Authors: Michael M. Saint-Antoine, Abhyudai Singh

Abstract: One of the most interesting, difficult, and potentially useful topics in computational biology is the inference of gene regulatory networks (GRNs) from expression data. Although researchers have been working on this topic for more than a decade and much progress has been made, it remains an unsolved problem and even the most sophisticated inference algorithms are far from perfect. In this paper, w… ▽ More One of the most interesting, difficult, and potentially useful topics in computational biology is the inference of gene regulatory networks (GRNs) from expression data. Although researchers have been working on this topic for more than a decade and much progress has been made, it remains an unsolved problem and even the most sophisticated inference algorithms are far from perfect. In this paper, we review the latest developments in network inference, including state-of-the-art algorithms like PIDC, Phixer, and more. We also discuss unsolved computational challenges, including the optimal combination of algorithms, integration of multiple data sources, and pseudo-temporal ordering of static expression data. Lastly, we discuss some exciting applications of network inference in cancer research, and provide a list of useful software tools for researchers hoping to conduct their own network inference analyses. △ Less

Submitted 10 November, 2019; originally announced November 2019.

arXiv:1910.09112 [pdf, other]

The Driving Force for the Complexation of Charged Polypeptides

Authors: Aditya N. Singh, Arun Yethiraj

Abstract: The phase separation of oppositely-charged polyelectrolytes in solution is of current interest . In this work we study the driving force for polyelectrolyte complexation using molecular dynamics simulations. We calculate the potential of mean force between poly(lysine) and poly(glutamate) oligomers using three different forcefields, an atomistic force field and two coarse-grained force fields. The… ▽ More The phase separation of oppositely-charged polyelectrolytes in solution is of current interest . In this work we study the driving force for polyelectrolyte complexation using molecular dynamics simulations. We calculate the potential of mean force between poly(lysine) and poly(glutamate) oligomers using three different forcefields, an atomistic force field and two coarse-grained force fields. There is qualitative agreement between all forcefields, i.e., the sign and magnitude of the free energy and the nature of the driving force are similar, which suggests that the molecular nature of water does not play a significant role. For fully charged peptides, we find that the driving force for association is entropic in all cases when small ions either neutralize the poly-ions, or are in excess. The removal of all counterions switches the driving force, making complexation energetic. This suggests that the entropy of complexation is dominated by the counterions. When only 6 residues of a 11-mer are charged, however, the driving force is enthalpic in salt-free conditions. The simulations shed insight into the mechanism of complex coacervation and the importance of realistic models for the polyions. △ Less

Submitted 7 January, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

arXiv:1910.04868 [pdf, other]

Estimating localized complexity of white-matter wiring with GANs

Authors: Haraldur T. Hallgrimsson, Richika Sharan, Scott T. Grafton, Ambuj K. Singh

Abstract: In-vivo examination of the physical connectivity of axonal projections through the white matter of the human brain is made possible by diffusion weighted magnetic resonance imaging (dMRI) Analysis of dMRI commonly considers derived scalar metrics such as fractional anisotrophy as proxies for "white matter integrity," and differences of such measures have been observed as significantly correlating… ▽ More In-vivo examination of the physical connectivity of axonal projections through the white matter of the human brain is made possible by diffusion weighted magnetic resonance imaging (dMRI) Analysis of dMRI commonly considers derived scalar metrics such as fractional anisotrophy as proxies for "white matter integrity," and differences of such measures have been observed as significantly correlating with various neurological diagnosis and clinical measures such as executive function, presence of multiple sclerosis, and genetic similarity. The analysis of such voxel measures is confounded in areas of more complicated fiber wiring due to crossing, kissing, and dispersing fibers. Recently, Volz et al. introduced a simple probabilistic measure of the count of distinct fiber populations within a voxel, which was shown to reduce variance in group comparisons. We propose a complementary measure that considers the complexity of a voxel in context of its local region, with an aim to quantify the localized wiring complexity of every part of white matter. This allows, for example, identification of particularly ambiguous regions of the brain for tractographic approaches of modeling global wiring connectivity. Our method builds on recent advances in image inpainting, in which the task is to plausibly fill in a missing region of an image. Our proposed method builds on a Bayesian estimate of heteroscedastic aleatoric uncertainty of a region of white matter by inpainting it from its context. We define the localized wiring complexity of white matter as how accurately and confidently a well-trained model can predict the missing patch. In our results, we observe low aleatoric uncertainty along major neuronal pathways which increases at junctions and towards cortex boundaries. This directly quantifies the difficulty of lesion inpainting of dMRI images at all parts of white matter. △ Less

Submitted 30 November, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: Three page extended abstract, accepted to Medical Imaging meets NeurIPS 2019 workshop

arXiv:1906.09410 [pdf, ps, other]

doi 10.1098/rsif.2022.0877

A reaction network scheme which implements inference and learning for Hidden Markov Models

Authors: Abhinav Singh, Carsten Wiuf, Abhishek Behera, Manoj Gopalkrishnan

Abstract: With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a des… ▽ More With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the "Expectation" step and the "Maximization" step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm. △ Less

Submitted 18 August, 2019; v1 submitted 22 June, 2019; originally announced June 2019.

Comments: Accepted at 25th International Conference on DNA Computing and Molecular Programming

arXiv:1902.06028 [pdf, other]

Evaluating Pruning Methods in Gene Network Inference

Authors: Michael M. Saint-Antoine, Abhyudai Singh

Abstract: One challenge in gene network inference is distinguishing between direct and indirect regulation. Some algorithms, including ARACNE and Phixer, approach this problem by using pruning methods to eliminate redundant edges in an attempt to explain the observed data with the simplest possible network structure. However, we hypothesize that there may be a cost in accuracy to simplifying the predicted n… ▽ More One challenge in gene network inference is distinguishing between direct and indirect regulation. Some algorithms, including ARACNE and Phixer, approach this problem by using pruning methods to eliminate redundant edges in an attempt to explain the observed data with the simplest possible network structure. However, we hypothesize that there may be a cost in accuracy to simplifying the predicted networks in this way, especially due to the prevalence of redundant connections, such as feed forward loops, in gene networks. In this paper, we evaluate the pruning methods of ARACNE and Phixer, and score their accuracy using receiver operating characteristic curves and precision-recall curves. Our results suggest that while pruning can be useful in some situations, it may have a negative effect on overall accuracy that has not been previously studied. Researchers should be aware of both the advantages and disadvantages of pruning when inferring networks, in order to choose the best inference strategy for their experimental context. △ Less

Submitted 29 May, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

arXiv:1808.00996 [pdf]

Genetic control and geo-climate adaptation of pod dehiscence provide novel insights into the soybean domestication and expansion

Authors: Jiaoping Zhang, Asheesh K. Singh

Abstract: Loss of pod dehiscence is a key step during soybean [Glycine max (L.) Merr.] domestication. Genome-wide association analysis for soybean shattering identified loci harboring Pdh1, NST1A and SHAT1-5. Pairwise epistatic interactions were observed, and the dehiscent Pdh1 overcomes the resistance conferred by NST1A or SHAT1-5 locus, indicating that Pdh1 predominates pod dehiscence expression. Further… ▽ More Loss of pod dehiscence is a key step during soybean [Glycine max (L.) Merr.] domestication. Genome-wide association analysis for soybean shattering identified loci harboring Pdh1, NST1A and SHAT1-5. Pairwise epistatic interactions were observed, and the dehiscent Pdh1 overcomes the resistance conferred by NST1A or SHAT1-5 locus, indicating that Pdh1 predominates pod dehiscence expression. Further candidate gene association analysis identified a nonsense mutation in NST1A associated with pod dehiscence. Allele composition and population differential analyses unraveled that Pdh1 and NST1A, but not SHAT1-5, underwent domestication and modern breeding selections. Geographic analysis showed that in Northeast China (NEC), indehiscence at both Pdh1 and NST1A were required by cultivated soybean; while indehiscent Pdh1 alone is capable of coping shattering in Huang-Huai-Hai (HHH) valleys where it originated; and no specific indehiscence was required in Southern China (SC). Geo-climatic investigation revealed strong correlation between relative humidity and frequency of indehiscent Pdh1 across China. This study demonstrates that the epistatic interaction between Pdh1 and NST1A fulfills a pivotal role in determining the level of resistance against pod dehiscence. Humidity shapes the distribution of indehiscent alleles. Our results also suggest that HHH valleys, not NEC, was at least one of the origin centers of cultivated soybean. △ Less

Submitted 2 August, 2018; originally announced August 2018.

Comments: 17 pages 8 figures

arXiv:1804.08154 [pdf, other]

Local White Matter Architecture Defines Functional Brain Dynamics

Authors: Yo Joong Choe, Sivaraman Balakrishnan, Aarti Singh, Jean M. Vettel, Timothy Verstynen

Abstract: Large bundles of myelinated axons, called white matter, anatomically connect disparate brain regions together and compose the structural core of the human connectome. We recently proposed a method of measuring the local integrity along the length of each white matter fascicle, termed the local connectome. If communication efficiency is fundamentally constrained by the integrity along the entire le… ▽ More Large bundles of myelinated axons, called white matter, anatomically connect disparate brain regions together and compose the structural core of the human connectome. We recently proposed a method of measuring the local integrity along the length of each white matter fascicle, termed the local connectome. If communication efficiency is fundamentally constrained by the integrity along the entire length of a white matter bundle, then variability in the functional dynamics of brain networks should be associated with variability in the local connectome. We test this prediction using two statistical approaches that are capable of handling the high dimensionality of data. First, by performing statistical inference on distance-based correlations, we show that similarity in the local connectome between individuals is significantly correlated with similarity in their patterns of functional connectivity. Second, by employing variable selection using sparse canonical correlation analysis and cross-validation, we show that segments of the local connectome are predictive of certain patterns of functional brain dynamics. These results are consistent with the hypothesis that structural variability along axon bundles constrains communication between disparate brain regions. △ Less

Submitted 16 September, 2018; v1 submitted 22 April, 2018; originally announced April 2018.

Comments: Accepted to the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2018)

arXiv:1802.05342 [pdf, other]

doi 10.1016/j.neuroimage.2018.01.050

Spatial Coherence of Oriented White Matter Microstructure: Applications to White Matter Regions Associated with Genetic Similarity

Authors: Haraldur T. Hallgrímsson, Matthew Cieslak, Luca Foschini, Scott T. Grafton, Ambuj K. Singh

Abstract: We present a method to discover differences between populations with respect to the spatial coherence of their oriented white matter microstructure in arbitrarily shaped white matter regions. This method is applied to diffusion MRI scans of a subset of the Human Connectome Project dataset: 57 pairs of monozygotic and 52 pairs of dizygotic twins. After controlling for morphological similarity betwe… ▽ More We present a method to discover differences between populations with respect to the spatial coherence of their oriented white matter microstructure in arbitrarily shaped white matter regions. This method is applied to diffusion MRI scans of a subset of the Human Connectome Project dataset: 57 pairs of monozygotic and 52 pairs of dizygotic twins. After controlling for morphological similarity between twins, we identify 3.7% of all white matter as being associated with genetic similarity (35.1k voxels, $p < 10^{-4}$, false discovery rate 1.5%), 75% of which spatially clusters into twenty-two contiguous white matter regions. Furthermore, we show that the orientation similarity within these regions generalizes to a subset of 47 pairs of non-twin siblings, and show that these siblings are on average as similar as dizygotic twins. The regions are located in deep white matter including the superior longitudinal fasciculus, the optic radiations, the middle cerebellar peduncle, the corticospinal tract, and within the anterior temporal lobe, as well as the cerebellum, brain stem, and amygdalae. These results extend previous work using undirected fractional anisotrophy for measuring putative heritable influences in white matter. Our multidirectional extension better accounts for crossing fiber connections within voxels. This bottom up approach has at its basis a novel measurement of coherence within neighboring voxel dyads between subjects, and avoids some of the fundamental ambiguities encountered with tractographic approaches to white matter analysis that estimate global connectivity. △ Less

Submitted 14 February, 2018; originally announced February 2018.

Journal ref: NeuroImage (2018)

arXiv:1711.07383 [pdf, other]

The Linear-Noise Approximation and moment-closure approximations for stochastic chemical kinetics

Authors: Abhyudai Singh, Ramon Grima

Abstract: This is a short review of two common approximations in stochastic chemical and biochemical kinetics. It will appear as Chapter 6 in the book "Quantitative Biology: Theory, Computational Methods and Examples of Models" edited by Brian Munsky, Lev Tsimring and Bill Hlavacek (to be published in late 2017/2018 by MIT Press). All chapter references in this article refer to chapters in the aforementione… ▽ More This is a short review of two common approximations in stochastic chemical and biochemical kinetics. It will appear as Chapter 6 in the book "Quantitative Biology: Theory, Computational Methods and Examples of Models" edited by Brian Munsky, Lev Tsimring and Bill Hlavacek (to be published in late 2017/2018 by MIT Press). All chapter references in this article refer to chapters in the aforementioned book. △ Less

Submitted 24 November, 2017; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: 24 pages, 4 figures. To be published as a chapter in the book "Quantitative Biology: Theory, Computational Methods and Examples of Models" edited by Brian Munsky, Lev Tsimring and Bill Hlavack (MIT Press)

arXiv:1708.03951 [pdf, other]

Optimization of Ensemble Supervised Learning Algorithms for Increased Sensitivity, Specificity, and AUC of Population-Based Colorectal Cancer Screenings

Authors: Anirudh Kamath, Aditya Singh, Raj Ramnani, Ayush Vyas, Jay Shenoy

Abstract: Over 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \… ▽ More Over 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \$15. They are highly specific, yet not nearly as sensitive, yielding a high percentage (25%) of false negatives (Colon Cancer Alliance). Moreover, FOBT results are far too generalized, meaning that a positive result could mean much more than just colorectal cancer, and could just as easily mean hemorrhoids, anal fissure, proctitis, Crohn's disease, diverticulosis, ulcerative colitis, rectal ulcer, rectal prolapse, ischemic colitis, angiodysplasia, rectal trauma, proctitis from radiation therapy, and others. Stool DNA tests, the modern benchmark for CRC screening, have a much higher sensitivity and specificity, but also cost \$600, take two weeks to process, and are not for high-risk individuals or people with a history of polyps. To yield a cheap and effective CRC screening alternative, a unique ensemble-based classification algorithm is put in place that considers the FIT result, BMI, smoking history, and diabetic status of patients. This method is tested under ten-fold cross validation to have a .95 AUC, 92% specificity, 89% sensitivity, .88 F1, and 90% precision. Once clinically validated, this test promises to be cheaper, faster, and potentially more accurate when compared to a stool DNA test. △ Less

Submitted 14 August, 2017; v1 submitted 13 August, 2017; originally announced August 2017.

Comments: 7 pages, 3 figures

arXiv:1702.00203 [pdf]

Change in flexibility of DNA with binding ligands

Authors: Anurag Singh, Amar Nath Gupta

Abstract: The percentage and sequence of AT and GC base pairs and charges on the DNA backbone contribute significantly to the stiffness of DNA. This elastic property of DNA also changes with small interacting ligands. The single-molecule force spectroscopy technique shows different interaction modes by measuring the mechanical properties of DNA bound with small ligands. When a ds-DNA molecule is overstretch… ▽ More The percentage and sequence of AT and GC base pairs and charges on the DNA backbone contribute significantly to the stiffness of DNA. This elastic property of DNA also changes with small interacting ligands. The single-molecule force spectroscopy technique shows different interaction modes by measuring the mechanical properties of DNA bound with small ligands. When a ds-DNA molecule is overstretched in the presence of ligands, it undergoes a co-operative structural transition based on the externally applied force, the mode of binding of the ligands, the binding constant of the ligands to the DNA, the concentration of the ligands and the ionic strength of the supporting medium. This leads to the changes in the regions- upto 60 pN, cooperative structural transition region and the overstretched region, compared to that of the FEC in the absence of any binding ligand. The cooperative structural transitions were studied by the extended and twistable worm-like chain model. Here we have depicted these changes in persistence length and the elastic modulus constant as a function of binding constant and the concentration of the bound ligands, which vary with time. Therefore, besides ionic strength, interacting proteins and content of AT and GC base pairs, the ligand binding or intercalation with the ligands is an important parameter which changes the stiffness of DNA. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Comments: 9 pages

arXiv:1612.09518 [pdf, ps, other]

Bounds on stationary moments in stochastic chemical kinetics

Authors: Khem Raj Ghusinga, Cesar A. Vargas-Garcia, Andrew Lamperski, Abhyudai Singh

Abstract: In the stochastic formulation of chemical kinetics, the stationary moments of the population count of species can be described via a set of linear equations. However, except for some specific cases such as systems with linear reaction propensities, the moment equations are underdetermined as a lower order moment might depend upon a higher order moment. Here, we propose a method to find lower, and… ▽ More In the stochastic formulation of chemical kinetics, the stationary moments of the population count of species can be described via a set of linear equations. However, except for some specific cases such as systems with linear reaction propensities, the moment equations are underdetermined as a lower order moment might depend upon a higher order moment. Here, we propose a method to find lower, and upper bounds on stationary moments of molecular counts in a chemical reaction system. The method exploits the fact that statistical moments of any positive-valued random variable must satisfy some constraints. Such constraints can be expressed as nonlinear inequalities on moments in terms of their lower order moments, and solving them in conjugation with the stationary moment equations results in bounds on the moments. Using two examples of biochemical systems, we illustrate that not only one obtains upper and lower bounds on a given stationary moment, but these bounds also improve as one uses more moment equations and utilizes the inequalities for the corresponding higher order moments. Our results provide avenues for development of moment approximations that provide explicit bounds on moment dynamics for systems whose dynamics are otherwise intractable. △ Less

Submitted 30 December, 2016; originally announced December 2016.

arXiv:1609.07461 [pdf, other]

Effect of gene-expression bursts on stochastic timing of cellular events

Authors: Khem Raj Ghusinga, Abhyudai Singh

Abstract: Gene expression is inherently a noisy process which manifests as cell-to-cell variability in time evolution of proteins. Consequently, events that trigger at critical threshold levels of regulatory proteins exhibit stochasticity in their timing. An important contributor to the noise in gene expression is translation bursts which correspond to randomness in number of proteins produced in a single m… ▽ More Gene expression is inherently a noisy process which manifests as cell-to-cell variability in time evolution of proteins. Consequently, events that trigger at critical threshold levels of regulatory proteins exhibit stochasticity in their timing. An important contributor to the noise in gene expression is translation bursts which correspond to randomness in number of proteins produced in a single mRNA lifetime. Modeling timing of an event as a first-passage time (FPT) problem, we explore the effect of burst size distribution on event timing. Towards this end, the probability density function of FPT is computed for a gene expression model with burst size drawn from a generic non-negative distribution. Analytical formulas for FPT moments are provided in terms of known vectors and inverse of a matrix. The effect of burst size distribution is investigated by looking at how the feedback regulation strategy that minimizes noise in timing around a given time deviates from the case when burst is deterministic. Interestingly, results show that the feedback strategy for deterministic burst case is quite robust to change in burst size distribution, and deviations from it are confined to about 20% of the optimal value. These findings facilitate an improved understanding of noise regulation in event timing. △ Less

Submitted 23 September, 2016; originally announced September 2016.

Comments: submitted to American Control Conference 2017

arXiv:1606.08223 [pdf, ps, other]

doi 10.1063/1.4964285

Sufficient minimal model for DNA denaturation: Integration of harmonic scalar elasticity and bond energies

Authors: Amit Raj Singh, Rony Granek

Abstract: We study DNA denaturation by integrating elasticity -- as described by the Gaussian network model -- with bond binding energies, distinguishing between different base-pair and stacking energies. We use exact calculation, within the model, of the Helmholtz free-energy of any partial denaturation state, which implies that the entropy of all formed bubbles ("loops") is accounted for. Considering base… ▽ More We study DNA denaturation by integrating elasticity -- as described by the Gaussian network model -- with bond binding energies, distinguishing between different base-pair and stacking energies. We use exact calculation, within the model, of the Helmholtz free-energy of any partial denaturation state, which implies that the entropy of all formed bubbles ("loops") is accounted for. Considering base-pair bond removal single events, the bond designated for opening is chosen by minimizing the free-energy difference for the process, over all remaining base-pair bonds. Despite of its great simplicity, for several known DNA sequences our results are in accord with available theoretical and experimental studies. Moreover, we report free-energy profiles along the denaturation pathway, which allow to detect stable or meta-stable partial denaturation states, composed of "bubbles", as local free-energy minima separated by barriers. Our approach allows to study very long DNA strands with commonly available computational power, as we demonstrate for a few random sequences in the range 200-800 base-pairs. For the latter we also elucidate the self-averaging property of the system. Implications for the well known breathing dynamics of DNA are elucidated. △ Less

Submitted 27 June, 2016; originally announced June 2016.

arXiv:1606.00535 [pdf, ps, other]

Conditions for cell size homeostasis: A stochastic hybrid systems approach

Authors: César Augusto Vargas-García, Mohammad Soltani, Abhyudai Singh

Abstract: A ubiquitous feature of living cells is their growth over time followed by division into daughter cells. How isogenic cell populations maintain size homeostasis, i.e., a narrow distribution of cell size, is an intriguing fundamental problem. We model cell size using a stochastic hybrid system, where a cell grows exponentially in size (volume) over time and probabilistic division events are trigger… ▽ More A ubiquitous feature of living cells is their growth over time followed by division into daughter cells. How isogenic cell populations maintain size homeostasis, i.e., a narrow distribution of cell size, is an intriguing fundamental problem. We model cell size using a stochastic hybrid system, where a cell grows exponentially in size (volume) over time and probabilistic division events are triggered at discrete time intervals. Moreover, whenever division events occur, size is randomly partitioned among daughter cells. We first consider a scenario, where a timer (i.e., cell-cycle clock) that measures the time since the last division event regulates both the cellular growth and division rates. Analysis reveals that such a timer-controlled system cannot achieve size homeostasis, in the sense that, the cell-to-cell size variation grows unboundedly with time. To explore biologically meaningful mechanisms for controlling size we consider two classes of regulation: a size-dependent growth rate and a size-dependent division rate. Our results show that these strategies can provide bounded intercellular variation in cell size, and exact mathematical conditions on the form of regulation needed for size homeostasis are derived. Different known forms of size control strategies, such as, the adder and the sizer are shown to be consistent with these results. Interestingly, for timer-based division mechanisms, the mean cell size depends on the noise in the cell-cycle duration but independent of errors incurred in partitioning of volume among daughter cells. In contrast, the mean cell size decreases with increasing partitioning errors for size-based division mechanisms. Finally, we discuss how organisms ranging from bacteria to mammalian cells have adopted different control approaches for maintaining size homeostasis. △ Less

Submitted 2 June, 2016; originally announced June 2016.

arXiv:1605.02251 [pdf, other]

Cell-cycle coupled expression minimizes random fluctuations in gene product levels

Authors: Mohammad Soltani, Abhyudai Singh

Abstract: Expression of many genes varies as a cell transitions through different cell-cycle stages. How coupling between stochastic expression and cell cycle impacts cell-to-cell variability (noise) in the level of protein is not well understood. We analyze a model, where a stable protein is synthesized in random bursts, and the frequency with which bursts occur varies within the cell cycle. Formulas quant… ▽ More Expression of many genes varies as a cell transitions through different cell-cycle stages. How coupling between stochastic expression and cell cycle impacts cell-to-cell variability (noise) in the level of protein is not well understood. We analyze a model, where a stable protein is synthesized in random bursts, and the frequency with which bursts occur varies within the cell cycle. Formulas quantifying the extent of fluctuations in the protein copy number are derived and decomposed into components arising from the cell cycle and stochastic processes. The latter stochastic component represents contributions from bursty expression and errors incurred during partitioning of molecules between daughter cells. These formulas reveal an interesting trade-off: cell-cycle dependencies that amplify the noise contribution from bursty expression also attenuate the contribution from partitioning errors. We investigate existence of optimum strategies for coupling expression to the cell cycle that minimize the stochastic component. Intriguingly, results show that a zero production rate throughout the cell cycle, with expression only occurring just before cell division minimizes noise from bursty expression for a fixed mean protein level. In contrast, the optimal strategy in the case of partitioning errors is to make the protein just after cell division. We provide examples of regulatory proteins that are expressed only towards the end of cell cycle, and argue that such strategies enhance robustness of cell-cycle decisions to the intrinsic stochasticity of gene expression. △ Less

Submitted 7 May, 2016; originally announced May 2016.

Comments: 28 pages, 3 figures

arXiv:1602.08568 [pdf, other]

Gene expression noise is affected differentially by feedback in burst frequency and burst size

Authors: Pavol Bokes, Abhyudai Singh

Abstract: Inside individual cells, expression of genes is stochastic across organisms ranging from bacterial to human cells. A ubiquitous feature of stochastic expression is burst-like synthesis of gene products, which drives considerable intercellular variability in protein levels across an isogenic cell population. One common mechanism by which cells control such stochasticity is negative feedback regulat… ▽ More Inside individual cells, expression of genes is stochastic across organisms ranging from bacterial to human cells. A ubiquitous feature of stochastic expression is burst-like synthesis of gene products, which drives considerable intercellular variability in protein levels across an isogenic cell population. One common mechanism by which cells control such stochasticity is negative feedback regulation, where a protein inhibits its own synthesis. For a single gene that is expressed in bursts, negative feedback can affect the burst frequency or the burst size. In order to compare these feedback types, we study a piecewise deterministic model for gene expression of a self-regulating gene. Mathematically tractable steady-state protein distributions are derived and used to compare the noise suppression abilities of the two feedbacks. Results show that in the low noise regime, both feedbacks are similar in term of their noise buffering abilities. Intriguingly, feedback in burst size outperforms the feedback in burst frequency in the high noise regime. Finally, we discuss various regulatory strategies by which cells implement feedback to control burst sizes of expressed proteins at the level of single cells. △ Less

Submitted 11 September, 2016; v1 submitted 27 February, 2016; originally announced February 2016.

Comments: 27 pages, 11 figures

MSC Class: 92C42

arXiv:1512.07864 [pdf, other]

doi 10.1038/srep30229

A mechanistic first--passage time framework for bacterial cell-division timing

Authors: Khem Raj Ghusinga, Cesar A. Vargas-Garcia, Abhyudai Singh

Abstract: How exponentially growing cells maintain size homeostasis is an important fundamental problem. Recent single-cell studies in prokaryotes have uncovered the adder principle, where cells on average, add a fixed size (volume) from birth to division. Interestingly, this added volume differs considerably among genetically-identical newborn cells with similar sizes suggesting a stochastic component in t… ▽ More How exponentially growing cells maintain size homeostasis is an important fundamental problem. Recent single-cell studies in prokaryotes have uncovered the adder principle, where cells on average, add a fixed size (volume) from birth to division. Interestingly, this added volume differs considerably among genetically-identical newborn cells with similar sizes suggesting a stochastic component in the timing of cell-division. To mechanistically explain the adder principle, we consider a time-keeper protein that begins to get stochastically expressed after cell birth at a rate proportional to the volume. Cell-division time is formulated as the first-passage time for protein copy numbers to hit a fixed threshold. Consistent with data, the model predicts that while the mean cell-division time decreases with increasing size of newborns, the noise in timing increases with size at birth. Intriguingly, our results show that the distribution of the volume added between successive cell-division events is independent of the newborn cell size. This was dramatically seen in experimental studies, where histograms of the added volume corresponding to different newborn sizes collapsed on top of each other. The model provides further insights consistent with experimental observations: the distributions of the added volume and the cell-division time when scaled by their respective means become invariant of the growth rate. Finally, we discuss various modifications to the proposed model that lead to deviations from the adder principle. In summary, our simple yet elegant model explains key experimental findings and suggests a mechanism for regulating both the mean and fluctuations in cell-division timing for size control. △ Less

Submitted 24 December, 2015; originally announced December 2015.

Journal ref: Scientific Reports 6: 30229 (2016)

arXiv:1510.00658 [pdf, other]

doi 10.1109/ACC.2016.7524951

Optimal regulation of protein degradation to schedule cellular events with precision

Authors: Khem Raj Ghusinga, Abhyudai Singh

Abstract: An important occurrence in many cellular contexts is the crossing of a prescribed threshold by a regulatory protein. The timing of such events is stochastic as a consequence of the innate randomness in gene expression. A question of interest is to understand how gene expression is regulated to achieve precision in event timing. To address this, we model event timing using the first-passage time fr… ▽ More An important occurrence in many cellular contexts is the crossing of a prescribed threshold by a regulatory protein. The timing of such events is stochastic as a consequence of the innate randomness in gene expression. A question of interest is to understand how gene expression is regulated to achieve precision in event timing. To address this, we model event timing using the first-passage time framework - a mathe- matical tool to analyze the time when a stochastic process first crosses a specific threshold. The protein evolution is described via a simple stochastic model of gene expression. Moreover, we consider the feedback regulation of protein degradation to be a possible noise control mechanism employed to achieve the precision. Exact analytical formulas are developed for the distribution and moments of the first-passage time. Using these expressions, we investigate for the optimal feedback strategy such that noise (coefficient of variation squared) in event timing is minimized around a given fixed mean time. Our results show that the minimum noise is achieved when the protein degradation rate is zero for all protein levels. Lastly, the implications of this finding are discussed. △ Less

Submitted 2 October, 2015; originally announced October 2015.

arXiv:1509.09192 [pdf, other]

Stochastic Analysis Of An Incoherent Feedforward Genetic Motif

Authors: Thierry Platini, Mohammad Soltani, Abhyudai Singh

Abstract: Gene products (RNAs, proteins) often occur at low molecular counts inside individual cells, and hence are subject to considerable random fluctuations (noise) in copy number over time. Not surprisingly, cells encode diverse regulatory mechanisms to buffer noise. One such mechanism is the incoherent feedforward circuit. We analyze a simplistic version of this circuit, where an upstream regulator X a… ▽ More Gene products (RNAs, proteins) often occur at low molecular counts inside individual cells, and hence are subject to considerable random fluctuations (noise) in copy number over time. Not surprisingly, cells encode diverse regulatory mechanisms to buffer noise. One such mechanism is the incoherent feedforward circuit. We analyze a simplistic version of this circuit, where an upstream regulator X affects both the production and degradation of a protein Y. Thus, any random increase in X's copy numbers would increase both production and degradation, keeping Y levels unchanged. To study its stochastic dynamics, we formulate this network into a mathematical model using the Chemical Master Equation formulation. We prove that if the functional dependence of Y's production and degradation on X is similar, then the steady-distribution of Y's copy numbers is independent of X. To investigate how fluctuations in Y propagate downstream, a protein Z whose production rate only depend on Y is introduced. Intriguingly, results show that the extent of noise in Z increases with noise in X, in spite of the fact that the magnitude of noise in Y is invariant of X. Such counter intuitive results arise because X enhances the time-scale of fluctuations in Y, which amplifies fluctuations in downstream processes. In summary, while feedforward systems can buffer a protein from noise in its upstream regulators, noise can propagate downstream due to changes in the time-scale of fluctuations. △ Less

Submitted 30 September, 2015; originally announced September 2015.

Comments: 8 pages

arXiv:1509.04559 [pdf, other]

Decomposing variability in protein levels from noisy expression, genome duplication and partitioning errors during cell-divisions

Authors: Mohammad Soltani, Cesar Augusto Vargas-Garcia, Duarte Antunes, Abhyudai Singh

Abstract: Inside individual cells, expression of genes is inherently stochastic and manifests as cell-to-cell variability or noise in protein copy numbers. Since proteins half-lives can be comparable to the cell-cycle length, randomness in cell-division times generates additional intercellular variability in protein levels. Moreover, as many mRNA/protein species are expressed at low-copy numbers, errors inc… ▽ More Inside individual cells, expression of genes is inherently stochastic and manifests as cell-to-cell variability or noise in protein copy numbers. Since proteins half-lives can be comparable to the cell-cycle length, randomness in cell-division times generates additional intercellular variability in protein levels. Moreover, as many mRNA/protein species are expressed at low-copy numbers, errors incurred in partitioning of molecules between the mother and daughter cells are significant. We derive analytical formulas for the total noise in protein levels for a general class of cell-division time and partitioning error distributions. Using a novel hybrid approach the total noise is decomposed into components arising from i) stochastic expression; ii) partitioning errors at the time of cell-division and iii) random cell-division events. These formulas reveal that random cell-division times not only generate additional extrinsic noise but also critically affect the mean protein copy numbers and intrinsic noise components. Counter intuitively, in some parameter regimes noise in protein levels can decrease as cell-division times become more stochastic. Computations are extended to consider genome duplication, where the gene dosage is increased by two-fold at a random point in the cell-cycle. We systematically investigate how the timing of genome duplication influences different protein noise components. Intriguingly, results show that noise contribution from stochastic expression is minimized at an optimal genome duplication time. Our theoretical results motivate new experimental methods for decomposing protein noise levels from single-cell expression data. Characterizing the contributions of individual noise mechanisms will lead to precise estimates of gene expression parameters and techniques for altering stochasticity to change phenotype of individual cells. △ Less

Submitted 2 October, 2015; v1 submitted 5 September, 2015; originally announced September 2015.

Comments: 40 pages, 10 figures

arXiv:1503.01843 [pdf, ps, other]

Host-feeding enhances stability of discrete-time host-parasitoid population dynamic models

Authors: Brooks Emerick, Abhyudai Singh

Abstract: Discrete-time models are the traditional approach for capturing population dynamics of a host-parasitoid system. Recent work has introduced a semi-discrete framework for obtaining model update functions that connect host-parasitoid population levels from year-to-year. In particular, this framework uses differential equations to describe the hosts-parasitoid interaction during the time of year wher… ▽ More Discrete-time models are the traditional approach for capturing population dynamics of a host-parasitoid system. Recent work has introduced a semi-discrete framework for obtaining model update functions that connect host-parasitoid population levels from year-to-year. In particular, this framework uses differential equations to describe the hosts-parasitoid interaction during the time of year where they come in contact, allowing specific behaviors to be mechanistically incorporated into the model. We use the semi-discrete approach to study the effects of host-feeding, which occurs when a parasitoid consumes a potential host larva without ovipositing. Our results show that host-feeding by itself cannot stabilize the system, and both the host and parasitoid populations exhibit diverging oscillations similar to the Nicholson-Bailey model. However, when combined with other stabilizing mechanisms such as density-dependent host mortality or density-dependent parasitoid attack rate, host-feeding expands the region of parameter space that allows for a stable host-parasitoid equilibrium. Finally, our results show that host-feeding causes inefficiency in the parasitoid population, which yields a higher population of hosts per generation. This suggests that host-feeding may have limited long-term impact in terms of suppressing host levels for biological control applications. △ Less

Submitted 5 March, 2015; originally announced March 2015.

Comments: 18 pages, 4 figures

arXiv:1501.05575 [pdf, ps, other]

Ordering Dynamics in Neuron Activity Pattern Model: An insight to Brain Functionality

Authors: Awaneesh Singh, Jasleen Gundh, R. K. Brojen Singh

Abstract: We study the ordering kinetics in $d=2$ ferromagnets which corresponds to populated neuron activities with long-ranged interactions, $V(r)\sim r^{-n}$ associated with short-ranged interaction. We present the results from comprehensive Monte Carlo (MC) simulations for the nonconserved Ising model with $n\ge 2$. Our results of long-ranged neuron kinetics are consistent with the same dynamical behavi… ▽ More We study the ordering kinetics in $d=2$ ferromagnets which corresponds to populated neuron activities with long-ranged interactions, $V(r)\sim r^{-n}$ associated with short-ranged interaction. We present the results from comprehensive Monte Carlo (MC) simulations for the nonconserved Ising model with $n\ge 2$. Our results of long-ranged neuron kinetics are consistent with the same dynamical behavior of short-ranged case ($n > 4$). The calculated characteristic length scale in long-ranged interaction is found to be $n$ dependent ($L(t)\sim t^{1/(n-2)}$), whereas short-ranged interaction follows $L(t)\sim t^{1/2}$ law and approximately preserve universality in domain kinetics. Further, we did the comparative study of phase ordering near the critical temperature which follows different behaviours of domain ordering near and far critical temperature but follows universal scaling law. △ Less

Submitted 22 January, 2015; originally announced January 2015.

Showing 1–50 of 61 results for author: Singh, A