Search | arXiv e-print repository

Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models

Authors: Po-Yu Liang, Xueting Huang, Tibo Duran, Andrew J. Wiemer, Jun Bai

Abstract: Generating peptides with desired properties is crucial for drug discovery and biotechnology. Traditional sequence-based and structure-based methods often require extensive datasets, which limits their effectiveness. In this study, we proposed a novel method that utilized autoencoder shaped models to explore the protein embedding space, and generate novel peptide analogs by leveraging protein langu… ▽ More Generating peptides with desired properties is crucial for drug discovery and biotechnology. Traditional sequence-based and structure-based methods often require extensive datasets, which limits their effectiveness. In this study, we proposed a novel method that utilized autoencoder shaped models to explore the protein embedding space, and generate novel peptide analogs by leveraging protein language models. The proposed method requires only a single sequence of interest, avoiding the need for large datasets. Our results show significant improvements over baseline models in similarity indicators of peptide structures, descriptors and bioactivities. The proposed method validated through Molecular Dynamics simulations on TIGIT inhibitors, demonstrates that our method produces peptide analogs with similar yet distinct properties, highlighting its potential to enhance peptide screening processes. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2407.16584 [pdf]

The need to implement FAIR principles in biomolecular simulations

Authors: Rommie Amaro, Johan Åqvist, Ivet Bahar, Federica Battistini, Adam Bellaiche, Daniel Beltran, Philip C. Biggin, Massimiliano Bonomi, Gregory R. Bowman, Richard Bryce, Giovanni Bussi, Paolo Carloni, David Case, Andrea Cavalli, Chie-En A. Chang, Thomas E. Cheatham III, Margaret S. Cheung, Cris Chipot, Lillian T. Chong, Preeti Choudhary, Gerardo Andres Cisneros, Cecilia Clementi, Rosana Collepardo-Guevara, Peter Coveney, Roberto Covino , et al. (101 additional authors not shown)

Abstract: This letter illustrates the opinion of the molecular dynamics (MD) community on the need to adopt a new FAIR paradigm for the use of molecular simulations. It highlights the necessity of a collaborative effort to create, establish, and sustain a database that allows findability, accessibility, interoperability, and reusability of molecular dynamics simulation data. Such a development would democra… ▽ More This letter illustrates the opinion of the molecular dynamics (MD) community on the need to adopt a new FAIR paradigm for the use of molecular simulations. It highlights the necessity of a collaborative effort to create, establish, and sustain a database that allows findability, accessibility, interoperability, and reusability of molecular dynamics simulation data. Such a development would democratize the field and significantly improve the impact of MD simulations on life science research. This will transform our working paradigm, pushing the field to a new frontier. We invite you to support our initiative at the MDDB community (https://mddbr.eu/community/) △ Less

Submitted 30 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

arXiv:2406.18535 [pdf, other]

DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

Authors: Jinzhe Liu, Xiangsheng Huang, Zhuo Chen, Yin Fang

Abstract: Large Language Models (LLMs) encounter challenges with the unique syntax of specific domains, such as biomolecules. Existing fine-tuning or modality alignment techniques struggle to bridge the domain knowledge gap and understand complex molecular data, limiting LLMs' progress in specialized fields. To overcome these limitations, we propose an expandable and adaptable non-parametric knowledge injec… ▽ More Large Language Models (LLMs) encounter challenges with the unique syntax of specific domains, such as biomolecules. Existing fine-tuning or modality alignment techniques struggle to bridge the domain knowledge gap and understand complex molecular data, limiting LLMs' progress in specialized fields. To overcome these limitations, we propose an expandable and adaptable non-parametric knowledge injection framework named Domain-specific Retrieval-Augmented Knowledge (DRAK), aimed at enhancing reasoning capabilities in specific domains. Utilizing knowledge-aware prompts and gold label-induced reasoning, DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks. We evaluated two distinct forms of DRAK variants, proving that DRAK exceeds previous benchmarks on six molecular tasks within the Mol-Instructions dataset. Extensive experiments have underscored DRAK's formidable performance and its potential to unlock molecular insights, offering a unified paradigm for LLMs to tackle knowledge-intensive tasks in specific domains. Our code will be available soon. △ Less

Submitted 4 March, 2024; originally announced June 2024.

Comments: Ongoing work; 11 pages, 6 Figures, 2 Tables

arXiv:2406.08980 [pdf, other]

From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2404.12973 [pdf, other]

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

Authors: Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

Abstract: The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. Howeve… ▽ More The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets. △ Less

Submitted 27 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.01702 [pdf]

Hill Function-based Model of Transcriptional Response: Impact of Nonspecific Binding and RNAP Interactions

Authors: Wenjia Shi, Yao Ma, Peilin Hu, Mi Pang, Xiaona Huang, Yiting Dang, Yuxin Xie, Danni Wu

Abstract: Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical i… ▽ More Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical information can Hill function provide in addition to fitting. Here, started from the interactions between TFs and RNA polymerase during transcription regulation and both of their association-dissociation reactions at specific/nonspecific sites on DNA, the regulatory effect of TFs was deduced as fold change. We found that, for weak promoter, fold change can degrade into the regulatory factor (Freg) which is closely correlated with Hill function. By directly comparing and fitting with Hill function, the fitting parameters and corresponding biochemical reaction parameters in Freg were analyzed and discussed, where the single TF and multiple TFs that with cooperativity and basic logic effects were considered. We concluded the strength of promoter and interactions between TFs determine whether Hill function can reflect the corresponding biochemical information. Our findings highlight the role of Hill function in modeling/fitting for transcriptional regulation, which also benefits the preparation of synthetic regulatory elements. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2310.11791 [pdf, other]

STW-MD: A Novel Spatio-Temporal Weighting and Multi-Step Decision Tree Method for Considering Spatial Heterogeneity in Brain Gene Expression Data

Authors: Shanjun Mao, Xiao Huang, Runjiu Chen, Chenyang Zhang, Yizhu Diao, Zongjin Li, Qingzhe Wang, Shan Tang, Shuixia Guo

Abstract: Motivation: Gene expression during brain development or abnormal development is a biological process that is highly dynamic in spatio and temporal. Due to the lack of comprehensive integration of spatial and temporal dimensions of brain gene expression data, previous studies have mainly focused on individual brain regions or a certain developmental stage. Our motivation is to address this gap by i… ▽ More Motivation: Gene expression during brain development or abnormal development is a biological process that is highly dynamic in spatio and temporal. Due to the lack of comprehensive integration of spatial and temporal dimensions of brain gene expression data, previous studies have mainly focused on individual brain regions or a certain developmental stage. Our motivation is to address this gap by incorporating spatio-temporal information to gain a more complete understanding of the mechanisms underlying brain development or disorders associated with abnormal brain development, such as Alzheimer's disease (AD), and to identify potential determinants of response. Results: In this study, we propose a novel two-step framework based on spatial-temporal information weighting and multi-step decision trees. This framework can effectively exploit the spatial similarity and temporal dependence between different stages and different brain regions, and facilitate differential gene analysis in brain regions with high heterogeneity. We focus on two datasets: the AD dataset, which includes gene expression data from early, middle, and late stages, and the brain development dataset, spanning fetal development to adulthood. Our findings highlight the advantages of the proposed framework in discovering gene classes and elucidating their impact on brain development and AD progression across diverse brain regions and stages. These findings align with existing studies and provide insights into the processes of normal and abnormal brain development. Availability: The code of STW-MD is available at https://github.com/tsnm1/STW-MD. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 11 pages, 6 figures

arXiv:2309.15867 [pdf]

Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning

Authors: Xiaoqin Huang, Asma Poursoroush, Jian Sun, Michael V. Boland, Chris Johnson, Siamak Yousefi

Abstract: Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a laten… ▽ More Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.15226 [pdf]

Network Pharmacology, Molecular Docking, and MR Analysis: Targets and Mechanisms of Gegen Qinlian Decoction for Helicobacter pylori

Authors: Ruotong. Lu, Xiaozhe. Huang, Sihuan. Deng, Haikun. Du

Abstract: Objective: The study explored therapeutic targets and mechanisms of Gegen Qinlian Decoction for Helicobacter pylori infection and related gastric cancer using network pharmacology, molecular docking, and Mendelian randomization. Methods: Medicinal components of Gegen Qinlian Decoction were extracted from TCMSP and HERB databases. Disease treatment targets were sourced from DisGeNET and PubChem.… ▽ More Objective: The study explored therapeutic targets and mechanisms of Gegen Qinlian Decoction for Helicobacter pylori infection and related gastric cancer using network pharmacology, molecular docking, and Mendelian randomization. Methods: Medicinal components of Gegen Qinlian Decoction were extracted from TCMSP and HERB databases. Disease treatment targets were sourced from DisGeNET and PubChem. Interaction networks were constructed via the STRING database and visualized using Cytoscape 3.9.1. Enrichment analysis of intersected targets was performed using DAVID and Metascapes. Molecular docking employed Autodock Tools 1.5.6 and PyMOL 2.5.2. Mendelian randomization was based on the ukb-b-531 sample from UK Biobank. Results: 146 active components and 248 targets from Gegen Qinlian Decoction were identified. 66 targets overlapped with Helicobacter pylori infection genes. Molecular docking highlighted interactions between primary drug components like quercetin, wogonin, kaempferol, and target genes PTGS1, PTGS2, MAPK14. Mendelian randomization pinpointed genes like IGF2, PIK3CG, GJA1, and PLAU associated with Helicobacter pylori infection. Conclusion: Gegen Qinlian Decoction's active components target Helicobacter pylori infection through diverse targets and pathways, presenting potential research avenues. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 16pages

arXiv:2309.14404 [pdf]

pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

Authors: Zebin Ma, Yonglin Zou, Xiaobin Huang, Wenjin Yan, Hao Xu, Jiexin Yang, Ying Zhang, Jinqi Huang

Abstract: Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we deve… ▽ More Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language Model-based Functional Peptide Predictor) for predicting functional peptides and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis sampling and Shapley value-based feature selection techniques to relieve data imbalance issues and reduce computational costs. On a validated independent test set, pLMFPPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively. Comparative experiments show that pLMFPPred outperforms current methods for predicting functional peptides.The experimental results suggest that the proposed method (pLMFPPred) can provide better performance in terms of Accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score than existing methods. pLMFPPred has achieved good performance in predicting functional peptides and represents a new computational method for predicting functional peptides. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures,under review

arXiv:2303.08015 [pdf, ps, other]

Molecular Communication for Quorum Sensing Inspired Cooperative Drug Delivery

Authors: Yuting Fang, Stuart T. Johnston, Matt Faria, Xinyu Huang, Andrew W. Eckford, Jamie Evans

Abstract: A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the propo… ▽ More A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the proposed system can be used to achieve a high release rate of drug molecules from a high number of B-NMs when the population density of B-NMs may not be known. Analytical expressions for i) the expected activation probability of the B-NM due to randomly-distributed B-NMs and ii) the expected aggregate absorption rate of drug molecules due to randomly-distributed QS activated B-NMs are derived. Analytical results are verified by particle-based simulations. The derived results can help to predict and control the impact of environmental factors (e.g. diffusion coefficient and degradation rate) on the absorption rate of drug molecules since rigorous diffusion-based molecular channels are considered. Our results show that the activation probability at the B-NM increases as this B-NM is located closer to the center of the B-NM population and the aggregate absorption rate of the drug molecules non-linearly increases as the population density increases. △ Less

Submitted 14 February, 2023; originally announced March 2023.

Comments: 9 pages; 9 figures

arXiv:2212.10614 [pdf, other]

MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning

Authors: Cameron Diao, Kaixiong Zhou, Zirui Liu, Xiao Huang, Xia Hu

Abstract: Molecular representation learning is crucial for the problem of molecular property prediction, where graph neural networks (GNNs) serve as an effective solution due to their structure modeling capabilities. Since labeled data is often scarce and expensive to obtain, it is a great challenge for GNNs to generalize in the extensive molecular space. Recently, the training paradigm of "pre-train, fine-… ▽ More Molecular representation learning is crucial for the problem of molecular property prediction, where graph neural networks (GNNs) serve as an effective solution due to their structure modeling capabilities. Since labeled data is often scarce and expensive to obtain, it is a great challenge for GNNs to generalize in the extensive molecular space. Recently, the training paradigm of "pre-train, fine-tune" has been leveraged to improve the generalization capabilities of GNNs. It uses self-supervised information to pre-train the GNN, and then performs fine-tuning to optimize the downstream task with just a few labels. However, pre-training does not always yield statistically significant improvement, especially for self-supervised learning with random structural masking. In fact, the molecular structure is characterized by motif subgraphs, which are frequently occurring and influence molecular properties. To leverage the task-related motifs, we propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT). MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. The prompt effectively augments the molecular graph with meaningful motifs in the continuous representation space; this provides more structural patterns to aid the downstream classifier in identifying molecular properties. Extensive experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction, with or without a few fine-tuning steps. △ Less

Submitted 22 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.03329 [pdf, other]

Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation

Authors: Xin-Yao Huang, Sung-Yu Chen, Chun-Shu Wei

Abstract: Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is… ▽ More Electroencephalogram (EEG) has been one of the common neuromonitoring modalities for real-world brain-computer interfaces (BCIs) because of its non-invasiveness, low cost, and high temporal resolution. Recently, light-weight and portable EEG wearable devices based on low-density montages have increased the convenience and usability of BCI applications. However, loss of EEG decoding performance is often inevitable due to reduced number of electrodes and coverage of scalp regions of a low-density EEG montage. To address this issue, we introduce knowledge distillation (KD), a learning mechanism developed for transferring knowledge/information between neural network models, to enhance the performance of low-density EEG decoding. Our framework includes a newly proposed similarity-keeping (SK) teacher-student KD scheme that encourages a low-density EEG student model to acquire the inter-sample similarity as in a pre-trained teacher model trained on high-density EEG data. The experimental results validate that our SK-KD framework consistently improves motor-imagery EEG decoding accuracy when number of electrodes deceases for the input EEG data. For both common low-density headphone-like and headband-like montages, our method outperforms state-of-the-art KD methods across various EEG decoding model architectures. As the first KD scheme developed for enhancing EEG decoding, we foresee the proposed SK-KD framework to facilitate the practicality of low-density EEG-based BCI in real-world applications. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2210.13114 [pdf]

A study on the transmission dynamics of COVID-19 considering the impact of asymptomatic infection

Authors: ZH. Zhang, XT. Huang, KD. Cheng, CQ. Xu, SB. Guo, XJ. Wang

Abstract: The COVID-19 epidemic has been spreading around the world for nearly three years, and asymptomatic infections have exacerbated the spread of the epidemic. To evaluate the role of asymptomatic infections in the spread of the epidemic, we develop mathematical models to assess the proportion of asymptomatic infections caused by different strains of the main covid-19 variants. The analysis shows that… ▽ More The COVID-19 epidemic has been spreading around the world for nearly three years, and asymptomatic infections have exacerbated the spread of the epidemic. To evaluate the role of asymptomatic infections in the spread of the epidemic, we develop mathematical models to assess the proportion of asymptomatic infections caused by different strains of the main covid-19 variants. The analysis shows that when the control reproduction number is less than 1, the disease-free equilibrium point of the model is globally asymptotically stable; and when the control reproduction number is greater than 1, the endemic equilibrium point exists and is unique, and is locally asymptotically stable. We fit the epidemic data in the four time periods corresponding to the selected 614G, Alpha, Delta and Omicron variants. The fitting results show that, from the comparison of the four time periods, the proportion of asymptomatic persons among the infected persons gradually increased. We also predict the peak time and peak value for the four time periods, and the results indicate that the transmission speed and transmission intensity of the variant strains increased to some extent. Finally, we discuss the impact of the detection ratio of symptomatic infections on the spread of the epidemic. The results show that with the increase of the detection ratio, the cumulative number of cases has dropped significantly, but the decline in the proportion of asymptomatic infections is not obvious. Therefore, in view of the hidden transmission of asymptomatic infections, the cooperation between various epidemic prevention and control policies is required to effectively curb the spread of the epidemic. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 22 pages, 8 figures and 2 tables

MSC Class: 37Nxx

arXiv:2206.02788 [pdf]

doi 10.1073/pnas.2118836119

Accurate Virus Identification with Interpretable Raman Signatures by Machine Learning

Authors: Jiarong Ye, Yin-Ting Yeh, Yuan Xue, Ziyang Wang, Na Zhang, He Liu, Kunyan Zhang, RyeAnne Ricker, Zhuohang Yu, Allison Roder, Nestor Perea Lopez, Lindsey Organtini, Wallace Greene, Susan Hafenstein, Huaguang Lu, Elodie Ghedin, Mauricio Terrones, Shengxi Huang, Sharon Xiaolei Huang

Abstract: Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on i… ▽ More Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device coupled with label-free Raman Spectroscopy holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such a machine learning approach for analyzing Raman spectra of human and avian viruses. A Convolutional Neural Network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A vs. type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and non-enveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus, IBV) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups (for example, amide, amino acid, carboxylic acid), we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: 23 pages, 8 figures

Journal ref: Proceedings of the National Academy of Sciences of the United States of America (2022)

arXiv:2202.10921 [pdf]

A Deep Learning Approach to Predicting Ventilator Parameters for Mechanically Ventilated Septic Patients

Authors: Zhijun Zeng, Zhen Hou, Ting Li, Lei Deng, Jianguo Hou, Xinran Huang, Jun Li, Meirou Sun, Yunhan Wang, Qiyu Wu, Wenhao Zheng, Hua Jiang, Qi Wang

Abstract: We develop a deep learning approach to predicting a set of ventilator parameters for a mechanically ventilated septic patient using a long and short term memory (LSTM) recurrent neural network (RNN) model. We focus on short-term predictions of a set of ventilator parameters for the septic patient in emergency intensive care unit (EICU). The short-term predictability of the model provides attending… ▽ More We develop a deep learning approach to predicting a set of ventilator parameters for a mechanically ventilated septic patient using a long and short term memory (LSTM) recurrent neural network (RNN) model. We focus on short-term predictions of a set of ventilator parameters for the septic patient in emergency intensive care unit (EICU). The short-term predictability of the model provides attending physicians with early warnings to make timely adjustment to the treatment of the patient in the EICU. The patient specific deep learning model can be trained on any given critically ill patient, making it an intelligent aide for physicians to use in emergent medical situations. △ Less

Submitted 20 February, 2022; originally announced February 2022.

arXiv:2112.08051 [pdf, ps, other]

Complementary Theory of Evolutionary Genetics

Authors: Xiaoqiu Huang

Abstract: This theory seeks to define species and to explore evolutionary forces and genetic elements in speciation and species maintenance. The theory explains how speciation and species maintenance are caused by natural selection acting on non-Mendelian and Mendelian variation, respectively. The emergence and maintenance of species as groups of populations are balanced by evolutionary forces including com… ▽ More This theory seeks to define species and to explore evolutionary forces and genetic elements in speciation and species maintenance. The theory explains how speciation and species maintenance are caused by natural selection acting on non-Mendelian and Mendelian variation, respectively. The emergence and maintenance of species as groups of populations are balanced by evolutionary forces including complementary mechanisms of gene flow within and between populations at population-specific rates: sexual and asexual reproduction, recombining and nonrecombining genome regions, vertical and horizontal DNA transfer, and transposon proliferation and control. While recombining genome regions carry conserved genes and are subjected to meiotic recombination, nonrecombining genome regions carry accessory genes and are not subjected to such structural restrain. Sexual reproduction, vertical DNA transfer, recombining genome regions and transposon control keep species in existence by maintaining recombining chromosome number and structure, while asexual reproduction, horizontal DNA transfer, nonrecombining genome regions and transposon proliferation help species emerge by promoting reproductive isolation and changes in chromosome number and structure. The theory is based on the analysis of the genome sequences of isolates in the Fusarium oxysporum complex. The rate of horizontal supernumerary chromosome transfer in this complex was estimated to be 0.1 per genome per year. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: A single Latex file

arXiv:2110.08048 [pdf, other]

Multi-Layer Pseudo-Supervision for Histopathology Tissue Semantic Segmentation using Patch-level Classification Labels

Authors: Chu Han, Jiatai Lin, Jinhai Mai, Yi Wang, Qingling Zhang, Bingchao Zhao, Xin Chen, Xipeng Pan, Zhenwei Shi, Xiaowei Xu, Su Yao, Lixu Yan, Huan Lin, Zeyan Xu, Xiaomei Huang, Guoqiang Han, Changhong Liang, Zaiyi Liu

Abstract: Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on hi… ▽ More Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on histopathology images, finally reducing the annotation efforts. We proposed a two-step model including a classification and a segmentation phases. In the classification phase, we proposed a CAM-based model to generate pseudo masks by patch-level labels. In the segmentation phase, we achieved tissue semantic segmentation by our proposed Multi-Layer Pseudo-Supervision. Several technical novelties have been proposed to reduce the information gap between pixel-level and patch-level annotations. As a part of this paper, we introduced a new weakly-supervised semantic segmentation (WSSS) dataset for lung adenocarcinoma (LUAD-HistoSeg). We conducted several experiments to evaluate our proposed model on two datasets. Our proposed model outperforms two state-of-the-art WSSS approaches. Note that we can achieve comparable quantitative and qualitative results with the fully-supervised model, with only around a 2\% gap for MIoU and FwIoU. By comparing with manual labeling, our model can greatly save the annotation time from hours to minutes. The source code is available at: \url{https://github.com/ChuHan89/WSSS-Tissue}. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: 15 pages, 10 figures, journal

MSC Class: 68U10 ACM Class: I.4.6

arXiv:2107.02935 [pdf]

doi 10.5121/ijbb.2021.11201

Sramm: short read alignment mapping metrics

Authors: Alvin Chon, Xiaoqiu Huang

Abstract: Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationa… ▽ More Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationally intensive or designed for specific applications. SRAMM efficiently generates multiple different concept-based mapping scores to provide for an informative post alignment examination and filtering process of aligned short reads for various downstream applications. SRAMM is compatible with Python 2.6+ and Python 3.6+ on all operating systems. It works with any short read aligner that generates SAM/BAM/CRAM file outputs and reports 'AS' tags. It is freely available under the MIT license at http://github.com/achon/sramm. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: 7 pages, 2 figures

Journal ref: Vol. 11, No.1/2, June 2021

arXiv:2104.09307 [pdf, other]

Monitoring urban ecosystem service value using dynamic multi-level grids

Authors: Zhenfeng Shao, Yong Li, Xiao Huang, Bowen Cai, Lin Ding, Wenkang Pan, Ya Zhang

Abstract: Ecosystem services are the direct and indirect contributions of an ecosystem to human well-being and survival. Ecosystem valuation is a method of assigning a monetary value to an ecosystem with its goods and services,often referred to as ecosystem service value (ESV). With the rapid expansion of cities, a mismatch occurs between urban development and ecological development, and it is increasingly… ▽ More Ecosystem services are the direct and indirect contributions of an ecosystem to human well-being and survival. Ecosystem valuation is a method of assigning a monetary value to an ecosystem with its goods and services,often referred to as ecosystem service value (ESV). With the rapid expansion of cities, a mismatch occurs between urban development and ecological development, and it is increasingly urgent to establish a valid ecological assessment method. In this study, we propose an ecological evaluation standard framework by designing an ESV monitoring workflow based on the establishment of multi-level grids. The proposed method is able to capture multi-scale features, facilitates multi-level spatial expression, and can effectively reveal the spatial heterogeneity of ESV. Taking Haian city in the Jiangsu province as the study case, we implemented the proposed dynamic multi-level grids-based (DMLG) to calculate its urban ESV in 2016 and 2019. We found that the ESV of Haian city showed considerable growth (increased by 24.54 million RMB). Negative ESVs are concentrated in the central city, which presented a rapid trend of outward expansion. The results illustrated that the ongoing urban expanse does not reduce the ecological value in the study area. The proposed unified grid framework can be applied to other geographical regions and is expected to benefit future studies in ecosystem service evaluation in terms of capture multi-level spatial heterogeneity. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2002.03173 [pdf]

doi 10.1021/acs.jproteome.0c00129

Protein structure and sequence re-analysis of 2019-nCoV genome does not indicate snakes as its intermediate host or the unique similarity between its spike protein insertions and HIV-1

Authors: Chengxin Zhang, Wei Zheng, Xiaoqiang Huang, Eric W. Bell, Xiaogen Zhou, Yang Zhang

Abstract: As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji et al. (https://doi.org/10.1002/jmv.25682) and by Pradhan et al. (https://doi.org/10.1101/2020.01.30.927871)… ▽ More As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji et al. (https://doi.org/10.1002/jmv.25682) and by Pradhan et al. (https://doi.org/10.1101/2020.01.30.927871), which concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions shared a unique similarity to HIV-1. Results from our re-implementation of the analyses, built on larger-scale datasets using state-of-the-art bioinformatics methods and databases, do not support the conclusions proposed by these manuscripts. Based on our analyses and existing data of coronaviruses, we concluded that the intermediate hosts of 2019-nCoV are more likely to be mammals and birds than snakes, and that the "novel insertions" observed in the spike protein are naturally evolved from bat coronaviruses. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Comments: Structure models for 2019-nCoV proteins are available at https://zhanglab.ccmb.med.umich.edu/C-I-TASSER/2019-nCov/

Journal ref: J. Proteome Res. 2020, 19, 4, 1351-1360

arXiv:1905.10705 [pdf, other]

Modeling treatment events in disease progression

Authors: Guanyang Wang, Yumeng Zhang, Yong Deng, Xuxin Huang, Łukasz Kidziński

Abstract: Ability to quantify and predict progression of a disease is fundamental for selecting an appropriate treatment. Many clinical metrics cannot be acquired frequently either because of their cost (e.g. MRI, gait analysis) or because they are inconvenient or harmful to a patient (e.g. biopsy, x-ray). In such scenarios, in order to estimate individual trajectories of disease progression, it is advantag… ▽ More Ability to quantify and predict progression of a disease is fundamental for selecting an appropriate treatment. Many clinical metrics cannot be acquired frequently either because of their cost (e.g. MRI, gait analysis) or because they are inconvenient or harmful to a patient (e.g. biopsy, x-ray). In such scenarios, in order to estimate individual trajectories of disease progression, it is advantageous to leverage similarities between patients, i.e. the covariance of trajectories, and find a latent representation of progression. Most of existing methods for estimating trajectories do not account for events in-between observations, what dramatically decreases their adequacy for clinical practice. In this study, we develop a machine learning framework named Coordinatewise-Soft-Impute (CSI) for analyzing disease progression from sparse observations in the presence of confounding events. CSI is guaranteed to converge to the global minimum of the corresponding optimization problem. Experimental results also demonstrates the effectiveness of CSI using both simulated and real dataset. △ Less

Submitted 25 May, 2019; originally announced May 2019.

arXiv:1812.06574 [pdf, other]

doi 10.1016/j.neunet.2019.09.007

A Biologically Plausible Supervised Learning Method for Spiking Neural Networks Using the Symmetric STDP Rule

Authors: Yunzhe Hao, Xuhui Huang, Meng Dong, Bo Xu

Abstract: Spiking neural networks (SNNs) possess energy-efficient potential due to event-based computation. However, supervised training of SNNs remains a challenge as spike activities are non-differentiable. Previous SNNs training methods can be generally categorized into two basic classes, i.e., backpropagation-like training methods and plasticity-based learning methods. The former methods are dependent o… ▽ More Spiking neural networks (SNNs) possess energy-efficient potential due to event-based computation. However, supervised training of SNNs remains a challenge as spike activities are non-differentiable. Previous SNNs training methods can be generally categorized into two basic classes, i.e., backpropagation-like training methods and plasticity-based learning methods. The former methods are dependent on energy-inefficient real-valued computation and non-local transmission, as also required in artificial neural networks (ANNs), whereas the latter are either considered to be biologically implausible or exhibit poor performance. Hence, biologically plausible (bio-plausible) high-performance supervised learning (SL) methods for SNNs remain deficient. In this paper, we proposed a novel bio-plausible SNN model for SL based on the symmetric spike-timing dependent plasticity (sym-STDP) rule found in neuroscience. By combining the sym-STDP rule with bio-plausible synaptic scaling and intrinsic plasticity of the dynamic threshold, our SNN model implemented SL well and achieved good performance in the benchmark recognition task (MNIST dataset). To reveal the underlying mechanism of our SL model, we visualized both layer-based activities and synaptic weights using the t-distributed stochastic neighbor embedding (t-SNE) method after training and found that they were well clustered, thereby demonstrating excellent classification ability. Furthermore, to verify the robustness of our model, we trained it on another more realistic dataset (Fashion-MNIST), which also showed good performance. As the learning rules were bio-plausible and based purely on local spike events, our model could be easily applied to neuromorphic hardware for online training and may be helpful for understanding SL information processing at the synaptic level in biological neural systems. △ Less

Submitted 6 October, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

Comments: 29 pages, 6 figures

Journal ref: Neural Networks 121C (2020) pp. 387-395

arXiv:1810.08887 [pdf]

Modeling Oral Multispecies Biofilm Recovery After Antibacterial Treatment

Authors: Xiaobo Jing, Xiangya Huang, Markus Haapasalo, Ya Shen, Qi Wang

Abstract: Recovery of multispecies oral biofilms is investigated following treatment by chlorhexidine gluconate (CHX), iodine-potassium iodide (IPI) and Sodium hypochlorite (NaOCl) both experimentally and theoretically. Experimentally, biofilms taken from two donors were exposed to the three antibacterial solutions (irrigants) for 10 minutes, respectively. We observe that (a) live bacterial cell ratios decl… ▽ More Recovery of multispecies oral biofilms is investigated following treatment by chlorhexidine gluconate (CHX), iodine-potassium iodide (IPI) and Sodium hypochlorite (NaOCl) both experimentally and theoretically. Experimentally, biofilms taken from two donors were exposed to the three antibacterial solutions (irrigants) for 10 minutes, respectively. We observe that (a) live bacterial cell ratios decline for a week after the exposure and the trend reverses beyond a week; after fifteen weeks, live bacterial cell ratios in biofilms fully return to their pretreatment levels; (b) NaOCl is shown as the strongest antibacterial agent for the oral biofilms; (c) multispecies oral biofilms from different donors showed no difference in their susceptibility to all the bacterial solutions. Guided by the experiment, a mathematical model for biofilm dynamics is developed, accounting for multiple bacterial phenotypes, quorum sensing, and growth factor proteins, to describe the nonlinear time evolutionary behavior of the biofilms. The model captures time evolutionary dynamics of biofilms before and after antibacterial treatment very well. It reveals the crucial role played by quorum sensing molecules and growth factors in biofilm recovery and verifies that the source of biofilms has a minimal to their recovery. The model is also applied to describe the state of biofilms of various ages treated by CHX, IPI and NaOCl, taken from different donors. Good agreement with experimental data predicted by the model is obtained as well, confirming its applicability to modeling biofilm dynamics in general. △ Less

Submitted 20 October, 2018; originally announced October 2018.

arXiv:1802.01756 [pdf]

Highly accurate model for prediction of lung nodule malignancy with CT scans

Authors: Jason Causey, Junyu Zhang, Shiqian Ma, Bo Jiang, Jake Qualls, David G. Politte, Fred Prior, Shuzhong Zhang, Xiuzhen Huang

Abstract: Computed tomography (CT) examinations are commonly used to predict lung nodule malignancy in patients, which are shown to improve noninvasive early diagnosis of lung cancer. It remains challenging for computational approaches to achieve performance comparable to experienced radiologists. Here we present NoduleX, a systematic approach to predict lung nodule malignancy from CT data, based on deep le… ▽ More Computed tomography (CT) examinations are commonly used to predict lung nodule malignancy in patients, which are shown to improve noninvasive early diagnosis of lung cancer. It remains challenging for computational approaches to achieve performance comparable to experienced radiologists. Here we present NoduleX, a systematic approach to predict lung nodule malignancy from CT data, based on deep learning convolutional neural networks (CNN). For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort. All nodules were identified and classified by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of ~0.99. This is commensurate with the analysis of the dataset by experienced radiologists. Our approach, NoduleX, provides an effective framework for highly accurate nodule malignancy prediction with the model trained on a large patient population. Our results are replicable with software available at http://bioinformatics.astate.edu/NoduleX. △ Less

Submitted 5 February, 2018; originally announced February 2018.

arXiv:1801.03039 [pdf, ps, other]

doi 10.1093/bioinformatics/bty401

EBIC: an evolutionary-based parallel biclustering algorithm for pattern discover

Authors: Patryk Orzechowski, Moshe Sipper, Xiuzhen Huang, Jason H. Moore

Abstract: In this paper a novel biclustering algorithm based on artificial intelligence (AI) is introduced. The method called EBIC aims to detect biologically meaningful, order-preserving patterns in complex data. The proposed algorithm is probably the first one capable of discovering with accuracy exceeding 50% multiple complex patterns in real gene expression datasets. It is also one of the very few biclu… ▽ More In this paper a novel biclustering algorithm based on artificial intelligence (AI) is introduced. The method called EBIC aims to detect biologically meaningful, order-preserving patterns in complex data. The proposed algorithm is probably the first one capable of discovering with accuracy exceeding 50% multiple complex patterns in real gene expression datasets. It is also one of the very few biclustering methods designed for parallel environments with multiple graphics processing units (GPUs). We demonstrate that EBIC outperforms state-of-the-art biclustering methods, in terms of recovery and relevance, on both synthetic and genetic datasets. EBIC also yields results over 12 times faster than the most accurate reference algorithms. The proposed algorithm is anticipated to be added to the repertoire of unsupervised machine learning algorithms for the analysis of datasets, including those from large-scale genomic studies. △ Less

Submitted 26 July, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

Comments: 9 pages, 7 figures

MSC Class: 68; 92 ACM Class: I.5.2; I.2.11; I.5.3; J.3

arXiv:1702.00493 [pdf]

Information-theoretic interpretation of tuning curves for multiple motion directions

Authors: Wentao Huang, Xin Huang, Kechen Zhang

Abstract: We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemb… ▽ More We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Comments: The 51st Annual Conference on Information Sciences and Systems (CISS), 2017

arXiv:1507.07422 [pdf]

doi 10.5121/ijma.2015.7203

Analysis of Pain Hemodynamic Response Using Near-Infrared Spectroscopy (NIRS)

Authors: Raul Fernandez Rojas, Xu Huang, Keng Liang Ou, Dat Tran, Sheikh Md. Rabiul Islam

Abstract: Despite recent advances in brain research, understanding the various signals for pain and pain intensities in the brain cortex is still a complex task due to temporal and spatial variations of brain hemodynamics. In this paper we have investigated pain based on cerebral hemodynamics via near-infrared spectroscopy (NIRS). This study presents a pain stimulation experiment that uses three acupuncture… ▽ More Despite recent advances in brain research, understanding the various signals for pain and pain intensities in the brain cortex is still a complex task due to temporal and spatial variations of brain hemodynamics. In this paper we have investigated pain based on cerebral hemodynamics via near-infrared spectroscopy (NIRS). This study presents a pain stimulation experiment that uses three acupuncture manipulation techniques to safely induce pain in healthy subjects. Acupuncture pain response was presented and hemodynamic pain signal analysis showed the presence of dominant channels and their relationship among surrounding channels, which contribute the further pain research area. △ Less

Submitted 23 July, 2015; originally announced July 2015.

Comments: 11 pages, 11 figures

Journal ref: The International Journal of Multimedia & Its Applications (IJMA) Vol. 7, No. 2, April 2015

arXiv:1501.06058 [pdf, ps, other]

Flow Distances on Open Flow Networks

Authors: Liangzhu Guo, Xiaodan Lou, Peiteng Shi, Jun Wang, Xiaohan Huang, Jiang Zhang

Abstract: Open flow network is a weighted directed graph with a source and a sink, depicting flux distributions on networks in the steady state of an open flow system. Energetic food webs, economic input-output networks, and international trade networks, are open flow network models of energy flows between species, money or value flows between industrial sectors, and goods flows between countries, respectiv… ▽ More Open flow network is a weighted directed graph with a source and a sink, depicting flux distributions on networks in the steady state of an open flow system. Energetic food webs, economic input-output networks, and international trade networks, are open flow network models of energy flows between species, money or value flows between industrial sectors, and goods flows between countries, respectively. Flow distances (first-passage or total) between any given two nodes $i$ and $j$ are defined as the average number of transition steps of a random walker along the network from $i$ to $j$ under some conditions. They apparently deviate from the conventional random walk distance on a closed directed graph because they consider the openness of the flow network. Flow distances are explicitly expressed by underlying Markov matrix of a flow system in this paper. With this novel theoretical conception, we can visualize open flow networks, calculating centrality of each node, and clustering nodes into groups. We apply flow distances to two kinds of empirical open flow networks, including energetic food webs and economic input-output network. In energetic food webs example, we visualize the trophic level of each species and compare flow distances with other distance metrics on graph. In input-output network, we rank sectors according to their average distances away other sectors, and cluster sectors into different groups. Some other potential applications and mathematical properties are also discussed. To summarize, flow distance is a useful and powerful tool to study open flow systems. △ Less

Submitted 24 January, 2015; originally announced January 2015.

arXiv:1501.03258 [pdf]

Yeast caspase 1 suppresses the burst of reactive oxygen species and maintains mitochondrial stability in Saccharomyces cerevisiae

Authors: Lin Du, Xiaodan Huang, Jian Tan, Yongjun Lu, Shining Zhou

Abstract: Caspases are a family of cysteine proteases that play essential roles during apoptosis, and we presume some of them may also protect the cell from oxidative stress. We found that the absence of yeast caspase 1(Yca1)in Saccharomyces cerevisiae leads to a more intense burst of mitochondrial reactive oxygen species (ROS) In addition, compared to wild type yeast cells, the ability of yca1 mutant cells… ▽ More Caspases are a family of cysteine proteases that play essential roles during apoptosis, and we presume some of them may also protect the cell from oxidative stress. We found that the absence of yeast caspase 1(Yca1)in Saccharomyces cerevisiae leads to a more intense burst of mitochondrial reactive oxygen species (ROS) In addition, compared to wild type yeast cells, the ability of yca1 mutant cells to maintain mitochondrial activity is significantly reduced after either oxidative stress treatment or aging. During mitochondrial ROS burst, deletion of the yca1 gene delayed structural damage of a green fluorescent protein (GFP) reporter bound in the inner mitochondrial membrane. This work implies that yeast caspase 1 is closely connected to the oxidative stress response. We speculate that Yca1 can discriminate proteins damaged by oxidation and accelerate their hydrolysis to attenuate the ROS burst. △ Less

Submitted 14 January, 2015; originally announced January 2015.

arXiv:1306.6010 [pdf, ps, other]

Association between Experiences and Representations: Memory, Dreaming, Dementia and Consciousness

Authors: Xiaoqiu Huang

Abstract: The mechanisms underlying major aspects of the human brain remain a mystery. It is unknown how verbal episodic memory is formed and integrated with sensory episodic memory. There is no consensus on the function and nature of dreaming. Here we present a theory for governing neural activity in the human brain. The theory describes the mechanisms for building memory traces for entities and explains h… ▽ More The mechanisms underlying major aspects of the human brain remain a mystery. It is unknown how verbal episodic memory is formed and integrated with sensory episodic memory. There is no consensus on the function and nature of dreaming. Here we present a theory for governing neural activity in the human brain. The theory describes the mechanisms for building memory traces for entities and explains how verbal memory is integrated with sensory memory. We infer that a core function of dreaming is to move charged particles such as calcium ions from the hippocampus to association areas to primary areas. We link a high level of calcium ions concentrations to Alzheimer's disease. We present a more precise definition of consciousness. Our results are a step forward in understanding the function and health of the human brain and provide the public with ways to keep a healthy brain. △ Less

Submitted 14 June, 2013; originally announced June 2013.

Comments: 23 pages, 1 figure

arXiv:1301.0974 [pdf, ps, other]

doi 10.1063/1.4802007

Hierarchical Nystrom Methods for Constructing Markov State Models for Conformational Dynamics

Authors: Yuan Yao, Raymond Z. Cui, Gregory R. Bowman, Daniel Silva, Jian Sun, Xuhui Huang

Abstract: Markov state models (MSMs) have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable ma… ▽ More Markov state models (MSMs) have become a popular approach for investigating the conformational dynamics of proteins and other biomolecules. MSMs are typically built from numerous molecular dynamics simulations by dividing the sampled configurations into a large number of microstates based on geometric criteria. The resulting microstate model can then be coarse-grained into a more understandable macro state model by lumping together rapidly mixing microstates into larger, metastable aggregates. However, finite sampling often results in the creation of many poorly sampled microstates. During coarse-graining, these states are mistakenly identified as being kinetically important because transitions to/from them appear to be slow. In this paper we propose a formalism based on an algebraic principle for matrix approximation, i.e. the Nystrom method, to deal with such poorly sampled microstates. Our scheme builds a hierarchy of microstates from high to low populations and progressively applies spectral clustering on sets of microstates within each level of the hierarchy. It helps spectral clustering identify metastable aggregates with highly populated microstates rather than being distracted by lowly populated states. We demonstrate the ability of this algorithm to discover the major metastable states on two model systems, the alanine dipeptide and TrpZip2. △ Less

Submitted 5 January, 2013; originally announced January 2013.

arXiv:1012.3274 [pdf]

Preliminary Functional-Structural Modeling on Poplar (Salicaceae)

Authors: Dongxiang Liu, Meng Zhen Kang, Véronique Letort, Meijun Xing, Yang Gang, Xinyuan Huang, Weiqun Cao

Abstract: Poplar is one of the best fast-growing trees in the world, widely used for windbreak and wood product. Although architecture of poplar has direct impact on its applications, it has not been descried in previous poplar models, probably because of the difficulties raised by measurement, data processing and parameterization. In this paper, the functional-structural model GreenLab is calibrated by usi… ▽ More Poplar is one of the best fast-growing trees in the world, widely used for windbreak and wood product. Although architecture of poplar has direct impact on its applications, it has not been descried in previous poplar models, probably because of the difficulties raised by measurement, data processing and parameterization. In this paper, the functional-structural model GreenLab is calibrated by using poplar data of 3, 4, 5, 6 years old. The data was acquired by simplifying measurement. The architecture was also simplified by classifying the branches into several types (physiological age) using clustering analysis, which decrease the number of parameters. By multi-fitting the sampled data of each tree, the model parameters were identified and the plant architectures at different tree ages were simulated. △ Less

Submitted 15 December, 2010; originally announced December 2010.

Journal ref: Plant growth modeling, Simulation, Visualization and Applications. - PMA09, Beijing : China (2009)

arXiv:0908.0563 [pdf, ps, other]

doi 10.1088/1367-2630/11/10/103001

Emergence of target waves in paced populations of cyclically competing species

Authors: Luo-Luo Jiang, Tao Zhou, Matjaz Perc, Xin Huang, Bing-Hong Wang

Abstract: We investigate the emergence of target waves in a cyclic predator-prey model incorporating a periodic current of the three competing species in a small area situated at the center of the square lattice. The periodic current acts as a pacemaker, trying to impose its rhythm on the overall spatiotemporal evolution of the three species. We show that the pacemaker is able to nucleate target waves tha… ▽ More We investigate the emergence of target waves in a cyclic predator-prey model incorporating a periodic current of the three competing species in a small area situated at the center of the square lattice. The periodic current acts as a pacemaker, trying to impose its rhythm on the overall spatiotemporal evolution of the three species. We show that the pacemaker is able to nucleate target waves that eventually spread across the whole population, whereby three routes leading to this phenomenon can be distinguished depending on the mobility of the three species and the oscillation period of the localized current. First, target waves can emerge due to the synchronization between the periodic current and oscillations of the density of the three species on the spatial grid. The second route is similar to the first, the difference being that the synchronization sets in only intermittently. Finally, the third route towards target waves is realized when the frequency of the pacemaker is much higher than that characterizing the oscillations of the overall density of the three species. By considering mobility and the frequency of the current as variable parameters, we thus provide insights into the mechanisms of pattern formation resulting from the interplay between local and global dynamics in systems governed by cyclically competing species. △ Less

Submitted 5 August, 2009; originally announced August 2009.

Comments: 14 pages, 7 figures; accepted for publication in New Journal of Physics [supplementary material available at http://www.matjazperc.com/njp/target.html]

Journal ref: New J. Phys. 11 (2009) 103001

arXiv:0812.3426 [pdf, ps, other]

doi 10.1063/1.3103496

Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways

Authors: Yuan Yao, Jian Sun, Xuhui Huang, Gregory R. Bowman, Gurjeet Singh, Michael Lesnick

Abstract: Characterization of transient intermediate or transition states is crucial for the description of biomolecular folding pathways, which is however difficult in both experiments and computer simulations. Such transient states are typically of low population in simulation samples. Even for simple systems such as RNA hairpins, recently there are mounting debates over the existence of multiple interm… ▽ More Characterization of transient intermediate or transition states is crucial for the description of biomolecular folding pathways, which is however difficult in both experiments and computer simulations. Such transient states are typically of low population in simulation samples. Even for simple systems such as RNA hairpins, recently there are mounting debates over the existence of multiple intermediate states. In this paper, we develop a computational approach to explore the relatively low populated transition or intermediate states in biomolecular folding pathways, based on a topological data analysis tool, Mapper, with simulation data from large-scale distributed computing. The method is inspired by the classical Morse theory in mathematics which characterizes the topology of high dimensional shapes via some functional level sets. In this paper we exploit a conditional density filter which enables us to focus on the structures on pathways, followed by clustering analysis on its level sets, which helps separate low populated intermediates from high populated uninteresting structures. A successful application of this method is given on a motivating example, a RNA hairpin with GCAA tetraloop, where we are able to provide structural evidence from computer simulations on the multiple intermediate states and exhibit different pictures about unfolding and refolding pathways. The method is effective in dealing with high degree of heterogeneity in distribution, capturing structural features in multiple pathways, and being less sensitive to the distance metric than nonlinear dimensionality reduction or geometric embedding methods. It provides us a systematic tool to explore the low density intermediate states in complex biomolecular folding systems. △ Less

Submitted 17 December, 2008; originally announced December 2008.

Comments: 23 pages, 6 figures

arXiv:0807.4390 [pdf, ps, other]

How target waves emerge in population dynamics

Authors: Luo-Luo Jiang, Tao Zhou, Xin Huang, Bing-Hong Wang

Abstract: Based on a multi-agent model, we investigate how target waves emerge from a population dynamics with cyclical interactions among three species. We show that the periodically injecting source in a small central area can generate target waves in a two-dimensional lattice system. By detecting the temporal period of species' concentration at the central area, three modes of target waves can be disti… ▽ More Based on a multi-agent model, we investigate how target waves emerge from a population dynamics with cyclical interactions among three species. We show that the periodically injecting source in a small central area can generate target waves in a two-dimensional lattice system. By detecting the temporal period of species' concentration at the central area, three modes of target waves can be distinguished. Those different modes result from the competition between local and global oscillations induced by cyclical interactions: Mode A corresponds to a synchronization of local and global oscillations, Mode B results from an intermittent synchronization, and Mode C corresponds to the case when the frequency of the local oscillation is much higher than that of the global oscillation. This work provides insights into pattern formation in biologic and ecologic systems that are totally different from the extensively studied diffusion systems driven by chemical reactions. △ Less

Submitted 28 July, 2008; originally announced July 2008.

Comments: 10 Pages, 5 figures

Showing 1–36 of 36 results for author: Huang, X