Skip to main content

Showing 1–8 of 8 results for author: Yun, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02852  [pdf, other

    cs.LG

    Toward Inference-optimal Mixture-of-Expert Large Language Models

    Authors: Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

    Abstract: Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of token… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

  2. arXiv:2310.01271  [pdf, other

    cs.CL cs.IR

    LEEC: A Legal Element Extraction Dataset with an Extensive Domain-Specific Label System

    Authors: Xue Zongyue, Liu Huanghai, Hu Yiran, Kong Kangle, Wang Chenlu, Liu Yun, Shen Weixing

    Abstract: As a pivotal task in natural language processing, element extraction has gained significance in the legal domain. Extracting legal elements from judicial documents helps enhance interpretative and analytical capacities of legal cases, and thereby facilitating a wide array of downstream applications in various domains of law. Yet existing element extraction datasets are limited by their restricted… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  3. arXiv:2309.05127  [pdf, other

    cs.IR

    Learning Personalized User Preference from Cold Start in Multi-turn Conversations

    Authors: Deguang Kong, Abhay Jha, Lei Yun

    Abstract: This paper presents a novel teachable conversation interaction system that is capable of learning users preferences from cold start by gradually adapting to personal preferences. In particular, the TAI system is able to automatically identify and label user preference in live interactions, manage dialogue flows for interactive teaching sessions, and reuse learned preference for preference elicitat… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: preference, personalization, cold-start, dialogue, LLM. embedding

  4. arXiv:2304.14365  [pdf, other

    cs.CV

    Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

    Authors: Xiaoyu Tian, Tao Jiang, Longfei Yun, Yucheng Mao, Huitong Yang, Yue Wang, Yilun Wang, Hang Zhao

    Abstract: Robotic perception requires the modeling of both 3D geometry and semantics. Existing methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details and struggling to handle general, out-of-vocabulary objects. 3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations. To support 3D… ▽ More

    Submitted 13 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted to NeurIPS 2023

  5. arXiv:2301.12055  [pdf, other

    cs.LG

    TIDo: Source-free Task Incremental Learning in Non-stationary Environments

    Authors: Abhinit Kumar Ambastha, Leong Tze Yun

    Abstract: This work presents an incremental learning approach for autonomous agents to learn new tasks in a non-stationary environment. Updating a DNN model-based agent to learn new target tasks requires us to store past training data and needs a large labeled target task dataset. Few-shot task incremental learning methods overcome the limitation of labeled target datasets by adapting trained models to lear… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  6. arXiv:2301.12054  [pdf, other

    cs.LG

    Adversarial Learning Networks: Source-free Unsupervised Domain Incremental Learning

    Authors: Abhinit Kumar Ambastha, Leong Tze Yun

    Abstract: This work presents an approach for incrementally updating deep neural network (DNN) models in a non-stationary environment. DNN models are sensitive to changes in input data distribution, which limits their application to problem settings with stationary input datasets. In a non-stationary environment, updating a DNN model requires parameter re-training or model fine-tuning. We propose an unsuperv… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  7. arXiv:2210.10421  [pdf

    cs.CV cs.LG

    Multi-view Gait Recognition based on Siamese Vision Transformer

    Authors: Yanchen Yang, Lijun Yun, Ruoyu Li, Feiyan Cheng

    Abstract: While the Vision Transformer has been used in gait recognition, its application in multi-view gait recognition is still limited. Different views significantly affect the extraction and identification accuracy of the characteristics of gait contour. To address this, this paper proposes a Siamese Mobile Vision Transformer (SMViT). This model not only focuses on the local characteristics of the human… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 13 pages,9 figures,1 table

  8. arXiv:1904.07695  [pdf, other

    cs.IR cs.CL

    Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

    Authors: Qiang Jipeng, Qian Zhenyu, Li Yun, Yuan Yunhao, Wu Xindong

    Abstract: Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms (e.g., PLSA and LDA) based on word co-occurrences cannot solve this problem very well since only very limited word co-occurrence information is available in sh… ▽ More

    Submitted 13 April, 2019; originally announced April 2019.

    Comments: arXiv admin note: text overlap with arXiv:1808.02215 by other authors