Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Xu, R X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.06066  [pdf, other

    cs.CL

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Authors: Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, Wenfeng Liang

    Abstract: In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-$K$ out of $N$ experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  2. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  3. arXiv:2312.08935  [pdf, other

    cs.AI cs.CL cs.LG

    Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

    Authors: Peiyi Wang, Lei Li, Zhihong Shao, R. X. Xu, Damai Dai, Yifei Li, Deli Chen, Y. Wu, Zhifang Sui

    Abstract: In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of… ▽ More

    Submitted 19 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Add Step-by-Step reinforcement learning results

  4. arXiv:2307.14588  [pdf

    eess.IV cs.CV cs.LG

    MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

    Authors: Liang Xu, Mingxiao Chen, Yi Cheng, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu

    Abstract: The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome t… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  5. arXiv:2205.12117  [pdf, other

    cs.LG

    Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for Imbalanced Data Classification

    Authors: Liang Xu, Yi Cheng, Fan Zhang, Bingxuan Wu, Pengfei Shao, Peng Liu, Shuwei Shen, Peng Yao, Ronald X. Xu

    Abstract: Deep convolutional neural networks often perform poorly when faced with datasets that suffer from quantity imbalances and classification difficulties. Despite advances in the field, existing two-stage approaches still exhibit dataset bias or domain shift. To counter this, a phased progressive learning schedule has been proposed that gradually shifts the emphasis from representation learning to tra… ▽ More

    Submitted 15 March, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

  6. arXiv:2107.13200  [pdf

    eess.IV cs.CV cs.LG

    An explainable two-dimensional single model deep learning approach for Alzheimer's disease diagnosis and brain atrophy localization

    Authors: Fan Zhang, Bo Pan, Pengfei Shao, Peng Liu, Shuwei Shen, Peng Yao, Ronald X. Xu

    Abstract: Early and accurate diagnosis of Alzheimer's disease (AD) and its prodromal period mild cognitive impairment (MCI) is essential for the delayed disease progression and the improved quality of patients'life. The emerging computer-aided diagnostic methods that combine deep learning with structural magnetic resonance imaging (sMRI) have achieved encouraging results, but some of them are limit of issue… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  7. Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion Classification

    Authors: Peng Yao, Shuwei Shen, Mengjuan Xu, Peng Liu, Fan Zhang, Jinyu Xing, Pengfei Shao, Benjamin Kaffenberger, Ronald X. Xu

    Abstract: Deep convolutional neural network (DCNN) models have been widely explored for skin disease diagnosis and some of them have achieved the diagnostic outcomes comparable or even superior to those of dermatologists. However, broad implementation of DCNN in skin disease detection is hindered by small size and data imbalance of the publically accessible skin lesion datasets. This paper proposes a novel… ▽ More

    Submitted 11 February, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

    Journal ref: IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021

  8. arXiv:2101.02353  [pdf

    cs.CV cs.AI

    Low-cost and high-performance data augmentation for deep-learning-based skin lesion classification

    Authors: Shuwei Shen, Mengjuan Xu, Fan Zhang, Pengfei Shao, Honghong Liu, Liang Xu, Chi Zhang, Peng Liu, Zhihong Zhang, Peng Yao, Ronald X. Xu

    Abstract: Although deep convolutional neural networks (DCNNs) have achieved significant accuracy in skin lesion classification comparable or even superior to those of dermatologists, practical implementation of these models for skin cancer screening in low resource settings is hindered by their limitations in computational cost and training dataset. To overcome these limitations, we propose a low-cost and h… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: 8 pages, 5 figures