Search | arXiv e-print repository

Fragment-Masked Molecular Optimization

Authors: Kun Li, Xiantao Cai, Jia Wu, Bo Du, Wenbin Hu

Abstract: Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures… ▽ More Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions without training, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 945 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 94.4%, with an average efficacy increase of 5.3%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at:https://anonymous.4open.science/r/FMOP-98C2. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2401.10334 [pdf, other]

DrugAssist: A Large Language Model for Molecule Optimization

Authors: Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue Wang, Wei Liu, Xiangxiang Zeng

Abstract: Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in… ▽ More Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https://github.com/blazerye/DrugAssist, which we hope to pave the way for future research in LLMs' application for drug discovery. △ Less

Submitted 28 December, 2023; originally announced January 2024.

Comments: Geyan Ye and Xibao Cai are equal contributors; Longyue Wang is corresponding author

arXiv:2310.12996 [pdf, other]

Zero-shot Learning of Drug Response Prediction for Preclinical Drug Screening

Authors: Kun Li, Yong Luo, Xiantao Cai, Wenbin Hu, Bo Du

Abstract: Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering superv… ▽ More Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. We conducted experiments using the GDSCv2 and CellMiner datasets. The results demonstrate that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10\% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 16 pages, 3 figures, 3 tables

arXiv:2308.05864 [pdf, other]

doi 10.1038/s41592-024-02233-6

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Vojislav Gligorovski, Maxime Scheder, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa , et al. (15 additional authors not shown)

Abstract: Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver… ▽ More Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. △ Less

Submitted 1 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: NeurIPS22 Cell Segmentation Challenge: https://neurips22-cellseg.grand-challenge.org/ . Nature Methods (2024)

arXiv:2201.09647 [pdf, other]

AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor

Authors: Feng Ren, Xiao Ding, Min Zheng, Mikhail Korzinkin, Xin Cai, Wei Zhu, Alexey Mantsyzov, Alex Aliper, Vladimir Aladinskiy, Zhongying Cao, Shanshan Kong, Xi Long, Bonnie Hei Man Liu, Yingtao Liu, Vladimir Naumov, Anastasia Shneyderman, Ivan V. Ozerov, Ju Wang, Frank W. Pun, Alan Aspuru-Guzik, Michael Levitt, Alex Zhavoronkov

Abstract: The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li… ▽ More The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery. △ Less

Submitted 12 February, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: 9 pages, 6 figures

arXiv:2012.12968 [pdf, ps, other]

doi 10.1063/5.0041354

A single-shot measurement of time-dependent diffusion over sub-millisecond timescales using static field gradient NMR

Authors: Teddy X. Cai, Nathan H. Williamson, Velencia J. Witherspoon, Rea Ravin, Peter J. Basser

Abstract: Time-dependent diffusion behavior is probed over sub-millisecond timescales in a single shot using an NMR static gradient, time-incremented echo train acquisition (SG-TIETA) framework. The method extends the Carr-Purcell-Meiboom-Gill (CPMG) cycle under a static field gradient by discretely incrementing the $π$-pulse spacings to simultaneously avoid off-resonance effects and probe a range of timesc… ▽ More Time-dependent diffusion behavior is probed over sub-millisecond timescales in a single shot using an NMR static gradient, time-incremented echo train acquisition (SG-TIETA) framework. The method extends the Carr-Purcell-Meiboom-Gill (CPMG) cycle under a static field gradient by discretely incrementing the $π$-pulse spacings to simultaneously avoid off-resonance effects and probe a range of timescales ($50 - 500$ microseconds). Pulse spacings are optimized based on a derived ruleset. The remaining effects of pulse inaccuracy are examined and found to be consistent across pure liquids of different diffusivities: water, decane, and octanol-1. A pulse accuracy correction is developed. Instantaneous diffusivity, $D_{\mathrm{inst}}(t)$, curves (i.e., half of the time derivative of the mean-squared displacement in the gradient direction), are recovered from pulse accuracy-corrected SG-TIETA decays using a model-free, log-linear least squares inversion method validated by Monte Carlo simulations. A signal-averaged, 1-minute experiment is described. A flat $D_{\mathrm{inst}}(t)$ is measured on pure dodecamethylcyclohexasiloxane whereas decreasing $D_{\mathrm{inst}}(t)$ are measured on yeast suspensions, consistent with the expected short-time $D_{\mathrm{inst}}(t)$ behavior for confining microstructural barriers on the order of microns. △ Less

Submitted 3 March, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: 7 pages, 6 figures + Supplementary Material

Journal ref: Journal of Chemical Physics, Vol. 154, Iss. 11, 2021, Pages 111105

arXiv:2002.03034 [pdf]

Population pharmacokinetics and dosing regimen optimization of tacrolimus in Chinese lung transplant recipients

Authors: Xiaojun Cai, Huizhu Song, Zheng Jiao, Hang Yang, Min Zhu, Chengyu Wang, Dong Wei, Lingzhi Shi, Bo Wu, Jinyu Chen

Abstract: We aimed to develop a population pharmacokinetic model of tacrolimus in Chinese lung transplant recipients, and propose model based dosing regimens for individualized treatment. We obtained 807 tacrolimus whole blood concentrations from 52 lung transplant patients and genotyped CYP3A5*3. Population pharmacokinetic analysis was performed using nonlinear mixed effects modeling. Monte Carlo simulatio… ▽ More We aimed to develop a population pharmacokinetic model of tacrolimus in Chinese lung transplant recipients, and propose model based dosing regimens for individualized treatment. We obtained 807 tacrolimus whole blood concentrations from 52 lung transplant patients and genotyped CYP3A5*3. Population pharmacokinetic analysis was performed using nonlinear mixed effects modeling. Monte Carlo simulations were employed to design initial dosing regimens. Tacrolimus pharmacokinetics was described by a one compartment model with first order absorption and elimination process. The mean estimated apparent clearance was 13.1 l/h with 20.1% inter subject variability in CYP3A5*3/*3 70kg patients with 30% hematocrit and voriconazole free therapy, which is lower than that in Caucasian(17.5 to 36.5 l/h). Hematocrit, postoperative days, tacrolimus daily dose, voriconazole cotherapy, and CYP3A5*3 genotype were identified as significant covariates for tacrolimus clearance. To achieve the target trough concentration (10 to 15 ng/ml) on the 8th day after transplantation, CYP3A5*1/*3 patients with voriconazole free cotherapy, a higher initial dosage than the current regimen of 0.04 mg/kg q12h should be recommened. Given the nonlinear kinetics of tacrolimus and large variability, population pharmacokinetic model should be combined with therapeutic drug monitoring to optimize individualized therapy. △ Less

Submitted 31 January, 2020; originally announced February 2020.

arXiv:1912.04151 [pdf, other]

Identification of causal intervention effects under contagion

Authors: Xiaoxuan Cai, Wen Wei Loh, Forrest W. Crawford

Abstract: Defining and identifying causal intervention effects for transmissible infectious disease outcomes is challenging because a treatment -- such as a vaccine -- given to one individual may affect the infection outcomes of others. Epidemiologists have proposed causal estimands to quantify effects of interventions under contagion using a two-person partnership model. These simple conceptual models have… ▽ More Defining and identifying causal intervention effects for transmissible infectious disease outcomes is challenging because a treatment -- such as a vaccine -- given to one individual may affect the infection outcomes of others. Epidemiologists have proposed causal estimands to quantify effects of interventions under contagion using a two-person partnership model. These simple conceptual models have helped researchers develop causal estimands relevant to clinical evaluation of vaccine effects. However, many of these partnership models are formulated under structural assumptions that preclude realistic infectious disease transmission dynamics, limiting their conceptual usefulness in defining and identifying causal treatment effects in empirical intervention trials. In this paper, we propose causal intervention effects in two-person partnerships under arbitrary infectious disease transmission dynamics, and give nonparametric identification results showing how effects can be estimated in empirical trials using time-to-infection or binary outcome data. The key insight is that contagion is a causal phenomenon that induces conditional independencies on infection outcomes that can be exploited for the identification of clinically meaningful causal estimands. These new estimands are compared to existing quantities, and results are illustrated using a realistic simulation of an HIV vaccine trial. △ Less

Submitted 10 December, 2019; v1 submitted 9 December, 2019; originally announced December 2019.

arXiv:1911.12909 [pdf]

doi 10.1016/j.ejps.2020.105237

Systematic external evaluation of published population pharmacokinetic models for tacrolimus in adult liver transplant recipients

Authors: Xiaojun Cai, Ruidong Li, Changcheng Sheng, Yifeng Tao, Quanbao Zhang, Xiaofei Zhang, Juan Li, Conghuan Shen, Xiaoyan Qiu, Zhengxin Wang, Zheng Jiao

Abstract: Background:Diverse tacrolimus population pharmacokinetic models in adult liver transplant recipients have been established to describe the PK characteristics of tacrolimus in the last two decades. However, their extrapolated predictive performance remains unclear.Therefore,in this study,we aimed to evaluate their external predictability and identify their potential influencing factors. Methods:The… ▽ More Background:Diverse tacrolimus population pharmacokinetic models in adult liver transplant recipients have been established to describe the PK characteristics of tacrolimus in the last two decades. However, their extrapolated predictive performance remains unclear.Therefore,in this study,we aimed to evaluate their external predictability and identify their potential influencing factors. Methods:The external predictability of each selected popPK model was evaluated using an independent dataset of 84 patients with 572 trough concentrations prospectively collected from Huashan Hospital. Prediction and simulation based diagnostics and Bayesian forecasting were conducted to evaluate model predictability. Furthermore, the effect of model structure on the predictive performance was investigated.Results:Sixteen published popPK models were assessed. In prediction-based diagnostics,the prediction error within 30% was below 50% in all the published models. The simulation based normalised prediction distribution error test and visual predictive check indicated large discrepancies between the observations and simulations in most of the models. Bayesian forecasting showed improvement in model predictability with two to three prior observations. Additionally, the predictive performance of the nonlinear Michaelis Menten model was superior to that of linear compartment models,indicating the underlying nonlinear kinetics of tacrolimus in liver transplant recipients.Conclusions:The published models performed inadequately in prediction and simulation based diagnostics. Bayesian forecasting may improve the predictive performance of the models. Furthermore, nonlinear kinetics of tacrolimus may be mainly caused by the properties of the drug itself, and incorporating nonlinear kinetics may be considered to improve model predictability. △ Less

Submitted 28 November, 2019; originally announced November 2019.

Report number: EJPS-D-19-01454

Journal ref: Eur.J.Pharm.Sci.145(2020)105237

arXiv:1908.04752 [pdf, other]

Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method

Authors: Tongda Xu, Xiyan Cai, Yao Wang, Xiuyuan Wang, Sohae Chung, Els Fieremans, Joseph Rath, Steven Flanagan, Yvonne W Lui

Abstract: Mild Traumatic Brain Injury (mTBI) is a significant public health problem. The most troubling symptoms after mTBI are cognitive complaints. Studies show measurable differences between patients with mTBI and healthy controls with respect to tissue microstructure using diffusion MRI. However, it remains unclear which diffusion measures are the most informative with regard to cognitive functions in b… ▽ More Mild Traumatic Brain Injury (mTBI) is a significant public health problem. The most troubling symptoms after mTBI are cognitive complaints. Studies show measurable differences between patients with mTBI and healthy controls with respect to tissue microstructure using diffusion MRI. However, it remains unclear which diffusion measures are the most informative with regard to cognitive functions in both the healthy state as well as after injury. In this study, we use diffusion MRI to formulate a predictive model for performance on working memory based on the most relevant MRI features. The key challenge is to identify relevant features over a large feature space with high accuracy in an efficient manner. To tackle this challenge, we propose a novel improvement of the best first search approach with crossover operators inspired by genetic algorithm. Compared against other heuristic feature selection algorithms, the proposed method achieves significantly more accurate predictions and yields clinically interpretable selected features. △ Less

Submitted 11 November, 2019; v1 submitted 10 August, 2019; originally announced August 2019.

arXiv:physics/0003084 [pdf, ps, other]

doi 10.1103/PhysRevE.61.7243

Spatial-temporal correlations in the process to self-organized criticality

Authors: C. B. Yang, X. Cai, Z. M. Zhou

Abstract: A new type of spatial-temporal correlation in the process approaching to the self-organized criticality is investigated for the two simple models for biological evolution. The change behaviors of the position with minimum barrier are shown to be quantitatively different in the two models. Different results of the correlation are given for the two models. We argue that the correlation can be used… ▽ More A new type of spatial-temporal correlation in the process approaching to the self-organized criticality is investigated for the two simple models for biological evolution. The change behaviors of the position with minimum barrier are shown to be quantitatively different in the two models. Different results of the correlation are given for the two models. We argue that the correlation can be used, together with the power-law distributions, as criteria for self-organized criticality. △ Less

Submitted 28 March, 2000; originally announced March 2000.

Comments: 3 pages in RevTeX, 3 eps figures

Showing 1–11 of 11 results for author: Cai, X