Zum Hauptinhalt springen

Showing 151–200 of 713 results for author: Wei, F

.
  1. arXiv:2303.08518  [pdf, other

    cs.CL

    UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

    Authors: Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang

    Abstract: Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifical… ▽ More

    Submitted 16 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023 Main Conference

  2. arXiv:2303.07678  [pdf, other

    cs.IR cs.CL

    Query2doc: Query Expansion with Large Language Models

    Authors: Liang Wang, Nan Yang, Furu Wei

    Abstract: This paper introduces a simple yet effective query expansion approach, denoted as query2doc, to improve both sparse and dense retrieval systems. The proposed method first generates pseudo-documents by few-shot prompting large language models (LLMs), and then expands the query with generated pseudo-documents. LLMs are trained on web-scale text corpora and are adept at knowledge memorization. The ps… ▽ More

    Submitted 11 October, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted to EMNLP 2023

  3. Robust phase metrology with hybrid quantum interferometers against particle losses

    Authors: X. N. Feng, D. He, L. F. Wei

    Abstract: Entanglement is an important quantum resource to achieve high sensitive quantum metrology. However, the rapid decoherence of quantum entangled states, due to the unavoidable environment noise, result in practically the unwanted sharp drop of the measurement sensitivity. To overcome such a difficulty, here we propose a spin-oscillator hybrid quantum interferometer to achieve the desirable precise e… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  4. arXiv:2303.03926  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

    Authors: Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

    Abstract: We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target language speech by using both the source language speech and the target language text as prompts. VALL-E X inherits strong in-context learning capabilitie… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: We encourage readers to listen to the audio samples on our demo page: \url{https://aka.ms/vallex}

  5. arXiv:2303.01421  [pdf, other

    cs.CL cs.LG

    Semiparametric Language Models Are Scalable Continual Learners

    Authors: Guangyue Peng, Tao Ge, Si-Qing Chen, Furu Wei, Houfeng Wang

    Abstract: Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional semiparametric LMs will finally become prohibitive for computing and storing if they are applied to continual learning over streaming data, because the non-parametric memory… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Work in progress

  6. arXiv:2303.00579  [pdf, other

    cs.LG

    Are More Layers Beneficial to Graph Transformers?

    Authors: Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

    Abstract: Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow. In this work, we explore whether more layers are beneficial to graph transformers, and find that current graph transformers suffer from the bottleneck of improving performance by increasing depth. Our further analysis reveals the reason is that deep graph transformers… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

  7. arXiv:2302.14771  [pdf, other

    cs.CV

    Generic-to-Specific Distillation of Masked Autoencoders

    Authors: Wei Huang, Zhiliang Peng, Li Dong, Furu Wei, Jianbin Jiao, Qixiang Ye

    Abstract: Large vision Transformers (ViTs) driven by self-supervised pre-training mechanisms achieved unprecedented progress. Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms. Knowledge distillation defines a paradigm to transfer representations from large (teacher) models to small (student) ones. However, the conventional single-stage distilla… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted by CVPR2023

  8. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  9. arXiv:2302.12242  [pdf, other

    cs.CV cs.AI

    Side Adapter Network for Open-Vocabulary Semantic Segmentation

    Authors: Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

    Abstract: This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is app… ▽ More

    Submitted 22 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: CVPR2023 Highlight

  10. HanoiT: Enhancing Context-aware Translation via Selective Context

    Authors: Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei

    Abstract: Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. To mitigate this problem, we propose a novel end-to-end encoder-decoder mo… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  11. arXiv:2301.02111  [pdf, other

    cs.CL cs.SD eess.AS

    Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

    Authors: Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

    Abstract: We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Working in progress

  12. arXiv:2301.02010  [pdf, other

    cs.CL

    HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding

    Authors: Bo Zheng, Zhouyang Li, Fuxuan Wei, Qiguang Chen, Libo Qin, Wanxiang Che

    Abstract: Multilingual spoken language understanding (SLU) consists of two sub-tasks, namely intent detection and slot filling. To improve the performance of these two sub-tasks, we propose to use consistency regularization based on a hybrid data augmentation strategy. The consistency regularization enforces the predicted distributions for an example and its semantically equivalent augmentation to be consis… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Accepted by EMNLP2022 MMNLU-22 Workshop. The winner of the MMNLU-22 Competition Full Dataset Task. Code is available at https://github.com/bozheng-hit/MMNLU-22-HIT-SCIR

  13. arXiv:2301.01296  [pdf, other

    cs.CV

    TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

    Authors: Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

    Abstract: Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different option… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Code is available at https://github.com/OliverRensu/TinyMIM

  14. arXiv:2212.10923  [pdf, other

    cs.CL cs.AI

    Language Models as Inductive Reasoners

    Authors: Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei

    Abstract: Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, formal language is used as representations of knowledge (facts and rules, more specifically). However, formal language can cause systematic problems for inductive reasoning such as disability of handling raw input such as natural language, sensitiveness to mislabeled… ▽ More

    Submitted 5 February, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: Accepted by EACL 2024

  15. arXiv:2212.10559  [pdf, other

    cs.CL

    Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

    Authors: Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei

    Abstract: Large pretrained language models have shown surprising in-context learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without parameter updates. Despite the great success in performance, its working mechanism still remains an open question. In this paper, we explain language models as meta-optimizers and understand in-context learning… ▽ More

    Submitted 15 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023 findings

  16. arXiv:2212.10554  [pdf, other

    cs.CL

    A Length-Extrapolatable Transformer

    Authors: Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention r… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 9 pages

  17. arXiv:2212.10218  [pdf, other

    cs.CL

    GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

    Authors: Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

    Abstract: Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language u… ▽ More

    Submitted 9 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  18. arXiv:2212.10190  [pdf, other

    cs.CL

    Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

    Authors: Xun Wang, Tao Ge, Allen Mao, Yuki Li, Furu Wei, Si-Qing Chen

    Abstract: We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemisti… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  19. arXiv:2212.09611  [pdf, other

    cs.CL cs.CV

    Optimizing Prompts for Text-to-Image Generation

    Authors: Yaru Hao, Zewen Chi, Li Dong, Furu Wei

    Abstract: Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretr… ▽ More

    Submitted 29 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted by NeurIPS-23

  20. arXiv:2212.09058  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    BEATs: Audio Pre-Training with Acoustic Tokenizers

    Authors: Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

    Abstract: The massive growth of self-supervised learning (SSL) has been witnessed in language, vision, speech, and audio domains over the past few years. While discrete label prediction is widely adopted for other modalities, the state-of-the-art audio SSL models still employ reconstruction loss for pre-training. Compared with reconstruction loss, semantic-rich discrete label prediction encourages the SSL m… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  21. arXiv:2212.08653  [pdf, other

    cs.CV eess.IV

    Attentive Mask CLIP

    Authors: Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

    Abstract: Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incor… ▽ More

    Submitted 9 October, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2771-2781

  22. arXiv:2212.07752  [pdf, other

    cs.CL

    Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models

    Authors: Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei

    Abstract: Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the… ▽ More

    Submitted 13 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  23. arXiv:2212.06713  [pdf, other

    cs.CL

    Structured Prompting: Scaling In-Context Learning to 1,000 Examples

    Authors: Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei

    Abstract: Large language models have exhibited intriguing in-context learning capability, achieving promising zero- and few-shot performance without updating the parameters. However, conventional in-context learning is usually restricted by length constraints, rendering it ineffective to absorb supervision from a large number of examples. In order to go beyond few shots, we introduce structured prompting th… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: 14 pages

  24. arXiv:2212.04257  [pdf, other

    cs.CL cs.LG

    Momentum Calibration for Text Generation

    Authors: Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

    Abstract: The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the mode… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  25. arXiv:2212.03533  [pdf, other

    cs.CL cs.IR

    Text Embeddings by Weakly-Supervised Contrastive Pre-training

    Authors: Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

    Abstract: This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clu… ▽ More

    Submitted 22 February, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 17 pages, v2 fixes the SummEval numbers

  26. arXiv:2212.00616  [pdf, other

    cs.CL

    Extensible Prompts for Language Models on Zero-shot Language Style Customization

    Authors: Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei

    Abstract: We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). X-Prompt instructs an LLM with not only NL but also an extensible vocabulary of imaginary words. Registering new imaginary words allows us to instruct the LLM to comprehend concepts that are difficult to describe with NL words, thereby making a prompt more descriptive. Also, these imagi… ▽ More

    Submitted 30 November, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted by NeurIPS 2023

  27. arXiv:2211.13184  [pdf, other

    cs.LG cs.CL

    TorchScale: Transformers at Scale

    Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Work in progress

  28. arXiv:2211.11275  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.SD

    VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

    Authors: Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei

    Abstract: Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech rep… ▽ More

    Submitted 19 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 11 pages, Accepted by IEEE Transactions on Multimedia

  29. arXiv:2211.06722  [pdf, ps, other

    math.CO

    An Asymptotically Sharp Bound on the Maximum Number of Independent Transversals

    Authors: Jake Ruotolo, Kevin Wang, Fan Wei

    Abstract: Let $G$ be a multipartite graph with partition $V_1, V_2,\ldots, V_k$ of $V(G)$. Let $d_{i,j}$ denote the edge density of the pair $(V_i, V_j)$. An independent transversal is an independent set of $G$ with exactly one vertex in each $V_i$. In this paper, we prove an asymptotically sharp upper bound on the maximum number of independent transversals given the $d_{i,j}$'s.

    Submitted 31 January, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: 15 pages, 1 figure

    MSC Class: 05C35; 05C69

  30. arXiv:2211.01837  [pdf, other

    cs.CL

    Latent Prompt Tuning for Text Summarization

    Authors: Yubo Zhang, Xingxing Zhang, Xun Wang, Si-qing Chen, Furu Wei

    Abstract: Prompts with different control signals (e.g., length, keywords, etc.) can be used to control text summarization. When control signals are available, they can control the properties of generated summaries and potentially improve summarization quality (since more information are given). Unfortunately, control signals are not already available during inference time. In this paper, we propose Lotus (s… ▽ More

    Submitted 19 December, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

  31. arXiv:2211.01367  [pdf, other

    cs.CV

    Two-Stream Network for Sign Language Recognition and Translation

    Authors: Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

    Abstract: Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understa… ▽ More

    Submitted 22 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted by NeurIPS 2022. Code and models are available at: https://github.com/FangyunWei/SLRT

  32. Dynamics of an HIV/AIDS transmission model with protection awareness and fluctuations

    Authors: Xuanpei Zhai, Wenshuang Li, Fengying Wei, Xuerong Mao

    Abstract: We establish a stochastic HIV/AIDS model for the individuals with protection awareness and reveal how the protection awareness plays its important role in the control of AIDS. We firstly show that there exists a global positive solution for the stochastic model. By constructing Lyapunov functions, the ergodic stationary distribution when $R_{0}^{s}>1$ and the extinction when $R_{0}^{e}<1$ for the… ▽ More

    Submitted 1 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

  33. arXiv:2210.17027  [pdf, other

    cs.SD cs.CL eess.AS

    Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

    Authors: Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei

    Abstract: Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare. To address this issue, we propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speec… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  34. arXiv:2210.15461  [pdf, other

    cs.CL cs.AI

    LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

    Authors: Hongcheng Guo, Jiaheng Liu, Haoyang Huang, Jian Yang, Zhoujun Li, Dongdong Zhang, Zheng Cui, Furu Wei

    Abstract: Multimodal Machine Translation (MMT) focuses on enhancing text-only translation with visual features, which has attracted considerable attention from both natural language processing and computer vision communities. Recent advances still struggle to train a separate model for each language pair, which is costly and unaffordable when the number of languages increases in the real world. In other wor… ▽ More

    Submitted 28 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022

  35. arXiv:2210.14867  [pdf, other

    cs.CL cs.LG

    Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

    Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

    Abstract: In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Work in progress

  36. arXiv:2210.10615  [pdf, other

    cs.CV

    A Unified View of Masked Image Modeling

    Authors: Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

    Abstract: Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks. In this work, we propose a unified view of masked image modeling after revisiting existing methods. Under the unified view, we introduce a simple yet effective method, termed as MaskDistill, which rec… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  37. arXiv:2210.09304  [pdf, other

    cs.CV

    Non-Contrastive Learning Meets Language-Image Pre-Training

    Authors: Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei

    Abstract: Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align images and texts. Nonetheless, the loose correlation between images and texts of web-crawled data renders the contrastive objective data inefficient and craving for a large training batch size. In this work, we explore the validity of non-contrastive language-image pre-training (nCLIP), and study whether nice pro… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  38. arXiv:2210.07022  [pdf, other

    cs.CL

    CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

    Authors: Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei

    Abstract: Named entity recognition (NER) suffers from the scarcity of annotated training data, especially for low-resource languages without labeled data. Cross-lingual NER has been proposed to alleviate this issue by transferring knowledge from high-resource languages to low-resource languages via aligned cross-lingual representations or machine translation results. However, the performance of cross-lingua… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: 10 pages

  39. arXiv:2210.06465  [pdf, other

    cs.CV

    AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars

    Authors: Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong

    Abstract: Although 2D generative models have made great progress in face image generation and animation, they often suffer from undesirable artifacts such as 3D inconsistency when rendering images from different camera viewpoints. This prevents them from synthesizing video animations indistinguishable from real ones. Recently, 3D-aware GANs extend 2D GANs for explicit disentanglement of camera pose by lever… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022. Project Page: https://yuewuhkust.github.io/AniFaceGAN

  40. arXiv:2210.06423  [pdf, other

    cs.LG cs.CL cs.CV

    Foundation Transformers

    Authors: Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

    Abstract: A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves… ▽ More

    Submitted 19 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Work in progress

  41. arXiv:2210.03730  [pdf, other

    cs.CL eess.AS

    SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

    Authors: Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei

    Abstract: The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. Leveraging hidden-unit as an interface to align speech and text, we can decomp… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 14 pages, accepted by EMNLP 2022

  42. arXiv:2210.02849  [pdf, other

    cs.CL

    XDoc: Unified Pre-training for Cross-Format Document Understanding

    Authors: Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

    Abstract: The surge of pre-training has witnessed the rapid development of document understanding recently. Pre-training and fine-tuning framework has been effectively used to tackle texts in various formats, including plain texts, document texts, and web texts. Despite achieving promising performance, existing pre-trained models usually target one specific document format at one time, making it difficult t… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  43. arXiv:2209.15329  [pdf, other

    cs.CL cs.AI eess.AS

    SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

    Authors: Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei

    Abstract: How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discret… ▽ More

    Submitted 15 June, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated

  44. arXiv:2209.13940  [pdf, other

    cs.CL

    Revamping Multilingual Agreement Bidirectionally via Switched Back-translation for Multilingual Neural Machine Translation

    Authors: Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam

    Abstract: Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance. We present \textbf{B}idirec… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

  45. arXiv:2209.07703  [pdf

    astro-ph.SR astro-ph.HE physics.space-ph

    Analyses of Flight Time During Solar Proton Events and Solar Flares

    Authors: X. H. Xu, Y. Wang, F. S. Wei, X. S. Feng, M. H. Bo, H. W. Tang, D. S. Wang, B. Lei, B. Y. Wang, P. B. Zuo, C. W. Jiang, X. J. Xu, Z. L. Zhou, Z. Li, P. Zou, L. D. Wang, Y. X. Gu, Y. L. Chen, W. Y. Zhang, P. Sun

    Abstract: Analyzing the effects of space weather on aviation is a new and developing topic. It has been commonly accepted that the flight time of the polar flights may increase during solar proton events because the flights have to change their route to avoid the high-energy particles. However, apart from such phenomenon, researches related to the flight time during space weather events is very rare. Based… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: submitted to Scientific Reports

  46. arXiv:2209.07701  [pdf

    astro-ph.SR physics.space-ph

    Characteristics of Flight Delays during Solar Flares

    Authors: X. H. Xu, Y. Wang, F. S. Wei, X. S. Feng, M. H. Bo, H. W. Tang, D. S. Wang, L. Bian, B. Y. Wang, W. Y. Zhang, Y. S. Huang, Z. Li, J. P. Guo, P. B. Zuo, C. W. Jiang, X. J. Xu, Z. L. Zhou, P. Zou

    Abstract: Solar flare is one of the severest solar activities on the sun, and it has many important impacts on the near-earth space. It has been found that flight arrival delays will increase during solar flare. However, the detailed intrinsic mechanism of how solar flares influence the delays is still unknown. Based on 5-years huge amount of flight data, here we comprehensively analyze the flight departure… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: submitted to APJL

  47. arXiv:2209.07700  [pdf

    astro-ph.SR physics.space-ph

    The Effects of Space Weather on Flight Delays

    Authors: Y. Wang, X. H. Xu, F. S. Wei, X. S. Feng, M. H. Bo, H. W. Tang, D. S. Wang, L. Bian, B. Y. Wang, W. Y. Zhang, Y. S. Huang, Z. Li, J. P. Guo, P. B. Zuo, C. W. Jiang, X. J. Xu, Z. L. Zhou, P. Zou

    Abstract: Although the sun is really far away from us, some solar activities could still influence the performance and reliability of space-borne and ground-based technological systems on Earth. Those time-varying conditions in space caused by the sun are also called space weather, as the atmospheric conditions that can affect weather on the ground. It is known that aviation activities can be affected durin… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: submitted to science advances

  48. arXiv:2209.04260  [pdf, other

    astro-ph.HE hep-ex hep-ph physics.space-ph

    Search for relativistic fractionally charged particles in space

    Authors: DAMPE Collaboration, F. Alemanno, C. Altomare, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De-Benedittis, I. De Mitri, F. de Palma, M. Deliyergiyev, A. Di Giovanni, M. Di Santo , et al. (126 additional authors not shown)

    Abstract: More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 19 pages, 6 figures, accepted by PRD

    Report number: 106, 063026

    Journal ref: Physical Review D 106.6 (2022): 063026

  49. arXiv:2208.10442  [pdf, other

    cs.CV cs.CL

    Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

    Authors: Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

    Abstract: A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Mult… ▽ More

    Submitted 30 August, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: 18 pages

  50. arXiv:2208.09595  [pdf, ps, other

    cs.CR cs.IT cs.LG math.ST

    The Saddle-Point Accountant for Differential Privacy

    Authors: Wael Alghamdi, Shahab Asoodeh, Flavio P. Calmon, Juan Felipe Gomez, Oliver Kosut, Lalitha Sankar, Fei Wei

    Abstract: We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximat… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Comments: 31 pages, 4 figures