Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Tam, W L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong , et al. (34 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2402.15810  [pdf, other

    cs.DL cs.CL cs.LG

    OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

    Authors: Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang

    Abstract: With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.… ▽ More

    Submitted 20 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: KDD'24, 9 pages, 5 appendix pages

    Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

  3. arXiv:2311.18743  [pdf, other

    cs.CL cs.AI cs.LG

    AlignBench: Benchmarking Chinese Alignment of Large Language Models

    Authors: Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Xiaotao Gu, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

    Abstract: Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, the effective evaluation of alignment for emerging Chinese LLMs is still largely unexplored. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs' alignment in Chinese. We design a human-in-the-loop data curation pip… ▽ More

    Submitted 25 August, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted to ACL 2024

  4. arXiv:2306.06629  [pdf, other

    cs.CL cs.AI

    GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

    Authors: Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang

    Abstract: Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-s… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: accepted for ACL 2023 industry track

  5. arXiv:2306.06625  [pdf, other

    cs.CL cs.AI

    Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

    Authors: Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang

    Abstract: The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model's intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture an… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to Findings of ACL2023

  6. arXiv:2305.08316  [pdf, other

    q-bio.MN cs.AI cs.CE cs.LG

    SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Efficient and Generalizable Protein-Protein Interaction Prediction

    Authors: Ziyuan Zhao, Peisheng Qian, Xulei Yang, Zeng Zeng, Cuntai Guan, Wai Leong Tam, Xiaoli Li

    Abstract: Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multigraph… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2023

  7. arXiv:2210.02414  [pdf, other

    cs.CL cs.AI cs.LG

    GLM-130B: An Open Bilingual Pre-trained Model

    Authors: Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

    Abstract: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and… ▽ More

    Submitted 25 October, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  8. arXiv:2207.07087  [pdf, other

    cs.CL cs.IR cs.LG

    Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers

    Authors: Weng Lam Tam, Xiao Liu, Kaixuan Ji, Lilong Xue, Xingjian Zhang, Yuxiao Dong, Jiahua Liu, Maodi Hu, Jie Tang

    Abstract: Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved comparable performance to fine-tuning of the full parameter set on both language understanding and generation tasks. In this work, we study the problem of prompt tuning for neural text retrievers. We introduce parameter-efficient prompt tuning for text retrieval across in-domain, cross-domain, and… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  9. arXiv:2110.07602  [pdf, other

    cs.CL

    P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

    Authors: Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, Jie Tang

    Abstract: Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence labeling tasks, indicating a lack of unive… ▽ More

    Submitted 20 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Proceedings of the 60th Annual Meeting of the Association of Computational Linguistics, 2022