-
SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis
Authors:
Chonghuan Zhang,
Qianghua Lin,
Biwei Zhu,
Haopeng Yang,
Xiao Lian,
Hao Deng,
Jiajun Zheng,
Kuangbiao Liao
Abstract:
The field of natural language processing (NLP) has witnessed a transformative shift with the emergence of large language models (LLMs), revolutionizing various language tasks and applications, and the integration of LLM into specialized domains enhances their capabilities for domain-specific applications. Notably, NLP has made significant strides in organic chemistry, particularly in predicting sy…
▽ More
The field of natural language processing (NLP) has witnessed a transformative shift with the emergence of large language models (LLMs), revolutionizing various language tasks and applications, and the integration of LLM into specialized domains enhances their capabilities for domain-specific applications. Notably, NLP has made significant strides in organic chemistry, particularly in predicting synthetic tasks, paving the way for the development of LLMs tailored to the organic chemistry field. In this work, we introduce SynAsk, a comprehensive organic chemistry domain-specific LLM platform developed by AIChemEco Inc. By finetuning an LLM with domain-specific data and integrating it with a chain of thought approach, SynAsk seamlessly accesses our knowledge base and advanced chemistry tools in a question-and-answer format. This includes functionalities such as a basic chemistry knowledge base, molecular information retrieval, reaction performance prediction, retrosynthesis prediction, chemical literature acquisition, and more. This novel methodology synergizes fine-tuning techniques with external resource integration, resulting in an organic chemistry-specific model poised to facilitate research and discovery in the field. Accessible via http://synask.aichemeco.com, SynAsk represents a significant advancement in leveraging NLP for synthetic applications.
△ Less
Submitted 13 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading
Authors:
Fei Kong,
Xiyue Wang,
Jinxi Xiang,
Sen Yang,
Xinran Wang,
Meng Yue,
Jun Zhang,
Junhan Zhao,
Xiao Han,
Yuhan Dong,
Biyue Zhu,
Fang Wang,
Yueping Liu
Abstract:
Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data.…
▽ More
Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards.
△ Less
Submitted 28 March, 2024; v1 submitted 12 February, 2023;
originally announced February 2023.
-
One-Way Matching of Datasets with Low Rank Signals
Authors:
Shuxiao Chen,
Sizun Jiang,
Zongming Ma,
Garry P. Nolan,
Bokai Zhu
Abstract:
We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated exampl…
▽ More
We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching under a mismatch proportion loss. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated examples. Furthermore, we illustrate practical use of the matching procedure on two single-cell data examples.
△ Less
Submitted 3 October, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
An End-to-End AI-Based Framework for Automated Discovery of CEST/MT MR Fingerprinting Acquisition Protocols and Quantitative Deep Reconstruction (AutoCEST)
Authors:
Or Perlman,
Bo Zhu,
Moritz Zaiss,
Matthew S. Rosen,
Christian T. Farrar
Abstract:
Purpose: To develop an automated machine-learning-based method for the discovery of rapid and quantitative chemical exchange saturation transfer (CEST) MR fingerprinting acquisition and reconstruction protocols.
Methods: An MR physics governed AI system was trained to generate optimized acquisition schedules and the corresponding quantitative reconstruction neural-network. The system (termed Aut…
▽ More
Purpose: To develop an automated machine-learning-based method for the discovery of rapid and quantitative chemical exchange saturation transfer (CEST) MR fingerprinting acquisition and reconstruction protocols.
Methods: An MR physics governed AI system was trained to generate optimized acquisition schedules and the corresponding quantitative reconstruction neural-network. The system (termed AutoCEST) is composed of a CEST saturation block, a spin dynamics module, and a deep reconstruction network, all differentiable and jointly connected. The method was validated using a variety of chemical exchange phantoms and an in-vivo mouse brain at 9.4T.
Results: The acquisition times for AutoCEST optimized schedules ranged from 35-71s, with a quantitative image reconstruction time of only 29 ms. The resulting exchangeable proton concentration maps for the phantoms were in good agreement with the known solute concentrations for AutoCEST sequences (mean absolute error = 2.42 mM; Pearson's r=0.992 , p$<$0.0001), but not for an unoptimized sequence (mean absolute error = 65.19 mM; Pearson's r=-0.161, p=0.522). Similarly, improved exchange rate agreement was observed between AutoCEST and quantification of exchange using saturation power (QUESP) methods (mean absolute error: 35.8 Hz, Pearson's r=0.971, p$<$0.0001) compared to an unoptimized schedule and QUESP (mean absolute error = 58.2 Hz; Pearson's r=0.959, p$<$0.0001). The AutoCEST in-vivo mouse brain semi-solid proton volume-fractions were lower in the cortex (12.21$\pm$1.37%) compared to the white-matter (19.73 $\pm$ 3.30%), as expected, and the amide proton volume-fraction and exchange rates agreed with previous reports.
Conclusion: AutoCEST can automatically generate optimized CEST/MT acquisition protocols that can be rapidly reconstructed into quantitative exchange parameter maps.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Stretching single RNAs: exact numerical and stochastic simulation methods
Authors:
Fei Liu,
Bi-hui Zhu,
Zhong-can Ou-Yang
Abstract:
Exact numerical methods and stochastic simulation methods are developed to study the force stretching single RNA issue on the secondary structure level in equilibrium. By computing the force-extension curves on the constant force and the constant extension ensembles, we find the two independent methods agree with each other quite well. To show the precision of our methods in predicting unfolding…
▽ More
Exact numerical methods and stochastic simulation methods are developed to study the force stretching single RNA issue on the secondary structure level in equilibrium. By computing the force-extension curves on the constant force and the constant extension ensembles, we find the two independent methods agree with each other quite well. To show the precision of our methods in predicting unfolding experiments, the unfolding forces of different RNA molecules under different experimental conditions are calculated. We find that the ionic corrections on the RNA free energies alone might not account for the apparent differences between the theoretical calculations and the experimental data; an ionic correction to the persistent length of single-stranded RNA should be necessary.
△ Less
Submitted 2 November, 2004;
originally announced November 2004.