Skip to main content

Showing 1–50 of 57 results for author: Si, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05638  [pdf, other

    cs.CV

    HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion

    Authors: Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si

    Abstract: Traditional deep learning relies on end-to-end backpropagation for training, but it suffers from drawbacks such as high memory consumption and not aligning with biological neural networks. Recent advancements have introduced locally supervised learning, which divides networks into modules with isolated gradients and trains them locally. However, this approach can lead to performance lag due to lim… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2407.05623  [pdf, other

    cs.CV

    Momentum Auxiliary Network for Supervised Local Learning

    Authors: Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan, Chenyang Si

    Abstract: Deep neural networks conventionally employ end-to-end backpropagation for their training process, which lacks biological credibility and triggers a locking dilemma during network parameter updates, leading to significant GPU memory use. Supervised local learning, which segments the network into multiple local blocks updated by independent auxiliary networks. However, these methods cannot replace e… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  3. arXiv:2407.05417  [pdf, other

    cs.LG cs.AI cs.CV

    See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition

    Authors: Chongjie Si, Xiaokang Yang, Wei Shen

    Abstract: The rapid expansion of large foundation models within the pre-training and fine-tuning framework has underscored that larger models often yield better results. However, the scaling up of large foundation models has led to soaring costs in fine-tuning and parameter storage, rendering extensive adaptations impractical. This challenge has sparked the development of parameter-efficient fine-tuning (PE… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Codes in https://github.com/Chongjie-Si/Subspace-Tuning

  4. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  5. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompting Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More

    Submitted 14 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2406.03172  [pdf, other

    cs.LG

    Initialization-enhanced Physics-Informed Neural Network with Domain Decomposition (IDPINN)

    Authors: Chenhao Si, Ming Yan

    Abstract: We propose a new physics-informed neural network framework, IDPINN, based on the enhancement of initialization and domain decomposition to improve prediction accuracy. We train a PINN using a small dataset to obtain an initial network structure, including the weighted matrix and bias, which initializes the PINN for each subdomain. Moreover, we leverage the smoothness condition on the interface to… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 20 pages, 14 figures

  7. arXiv:2405.14739  [pdf, other

    cs.CV

    FLoRA: Low-Rank Core Space for N-dimension

    Authors: Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen

    Abstract: Adapting pre-trained foundation models for various downstream tasks has been prevalent in artificial intelligence. Due to the vast number of tasks and high costs, adjusting all parameters becomes unfeasible. To mitigate this, several fine-tuning techniques have been developed to update the pre-trained model weights in a more resource-efficient manner, such as through low-rank adjustments. Yet, alm… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.10329   

    stat.AP cs.AI

    Causal inference approach to appraise long-term effects of maintenance policy on functional performance of asphalt pavements

    Authors: Lingyun You, Nanning Guo, Zhengwu Long, Fusong Wang, Chundi Si, Aboelkasim Diab

    Abstract: Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order… ▽ More

    Submitted 2 July, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: The arXiv version needs to be withdrawn since the model needs to be validated and updated with advanced machine learning technologies to enhance the accuracy of the model, and there are some crucial definition errors of symbols in the arXiv version

  9. arXiv:2404.11981  [pdf, other

    cs.CV

    Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation

    Authors: Chongjie Si, Xuehui Wang, Xiaokang Yang, Wei Shen

    Abstract: Weakly Incremental Learning for Semantic Segmentation (WILSS) leverages a pre-trained segmentation model to segment new classes using cost-effective and readily available image-level labels. A prevailing way to solve WILSS is the generation of seed areas for each new class, serving as a form of pixel-level supervision. However, a scenario usually arises where a pixel is concurrently predicted as a… ▽ More

    Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  10. arXiv:2404.07503  [pdf, other

    cs.CL

    Best Practices and Lessons Learned on Synthetic Data for Language Models

    Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

    Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  11. arXiv:2403.03163  [pdf, other

    cs.CL cs.CV cs.CY

    Design2Code: How Far Are We From Automating Front-End Engineering?

    Authors: Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yang

    Abstract: Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development, in which multimodal LLMs might directly convert visual designs into code implementations. In this work, we formalize this as a Design2Code task and conduct comprehensive benchmarking. Specifically,… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Technical Report; The first two authors contributed equally

  12. arXiv:2402.17318  [pdf, other

    cs.NE cs.CV cs.LG

    Scaling Supervised Local Learning with Augmented Auxiliary Networks

    Authors: Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan

    Abstract: Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. How… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  13. arXiv:2401.10226  [pdf, other

    cs.CV

    Towards Language-Driven Video Inpainting via Multimodal Large Language Models

    Authors: Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

    Abstract: We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 vid… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Project Page: https://jianzongwu.github.io/projects/rovi

  14. arXiv:2312.11034  [pdf, other

    cs.LG

    Appeal: Allow Mislabeled Samples the Chance to be Rectified in Partial Label Learning

    Authors: Chongjie Si, Xuehui Wang, Yan Wang, Xiaokang Yang, Wei Shen

    Abstract: In partial label learning (PLL), each instance is associated with a set of candidate labels among which only one is ground-truth. The majority of the existing works focuses on constructing robust classifiers to estimate the labeling confidence of candidate labels in order to identify the correct one. However, these methods usually struggle to identify and rectify mislabeled samples. To help these… ▽ More

    Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Under review. An extended version of 2024 AAAI oral paper "Partial Label Learning with a Partner"

  15. arXiv:2312.07537  [pdf, other

    cs.CV

    FreeInit: Bridging Initialization Gap in Video Diffusion Models

    Authors: Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu

    Abstract: Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics. In this paper, we delve deep into the noise initialization of video diffusion models, and discover an implicit training-inference gap that attributes to the unsatisfactory inference quality. Our key findings are: 1)… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project page: https://tianxingwu.github.io/pages/FreeInit/ Code: https://github.com/TianxingWu/FreeInit

  16. arXiv:2312.00777  [pdf, other

    cs.CV

    VideoBooth: Diffusion-based Video Generation with Image Prompts

    Authors: Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

    Abstract: Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with users' intents, especially for customized content creation. In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts. Specifically,… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Project page: https://vchitect.github.io/VideoBooth-project/

  17. arXiv:2311.17982  [pdf, other

    cs.CV

    VBench: Comprehensive Benchmark Suite for Video Generative Models

    Authors: Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

    Abstract: Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Equal contributions from first four authors. Project page: https://vchitect.github.io/VBench-project/ Code: https://github.com/Vchitect/VBench

  18. arXiv:2311.16119  [pdf, other

    cs.CR cs.AI cs.CL

    Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

    Authors: Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber

    Abstract: Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant securit… ▽ More

    Submitted 2 March, 2024; v1 submitted 24 October, 2023; originally announced November 2023.

    Comments: 34 pages, 8 figures Codebase: https://github.com/PromptLabs/hackaprompt Dataset: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset/blob/main/README.md Playground: https://huggingface.co/spaces/hackaprompt/playground

  19. arXiv:2310.12558  [pdf, other

    cs.CL cs.HC

    Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

    Authors: Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber

    Abstract: Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. Our experiments with 80 crowdworkers compare language models with search engines (information retrieva… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: NAACL 2024

  20. arXiv:2309.15103  [pdf, other

    cs.CV

    LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

    Authors: Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

    Abstract: This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task to simultaneously a) accomplish the synthesis of visually realistic and temporally coherent videos while b) preserving the strong creative generation nature of the pre-trained T2I model. To this end, we propose L… ▽ More

    Submitted 26 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Project webpage: https://vchitect.github.io/LaVie-project/

  21. arXiv:2309.11497  [pdf, other

    cs.CV

    FreeU: Free Lunch in Diffusion U-Net

    Authors: Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu

    Abstract: In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency featur… ▽ More

    Submitted 17 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Method update: we proposed structure-based scaling to enhance the performance of FreeU. Project page: https://chenyangsi.top/FreeU/

  22. arXiv:2306.11046  [pdf, other

    cs.CV

    FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation

    Authors: Jingwen Guo, Hong Liu, Shitong Sun, Tianyu Guo, Min Zhang, Chenyang Si

    Abstract: Existing skeleton-based action recognition methods typically follow a centralized learning paradigm, which can pose privacy concerns when exposing human-related videos. Federated Learning (FL) has attracted much attention due to its outstanding advantages in privacy-preserving. However, directly applying FL approaches to skeleton videos suffers from unstable training. In this paper, we investigate… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  23. arXiv:2305.14628  [pdf, other

    cs.CL cs.AI

    Getting MoRE out of Mixture of Language Model Reasoning Experts

    Authors: Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan Boyd-Graber

    Abstract: While recent large language models (LLMs) improve on various question answering (QA) datasets, it remains difficult for a single model to generalize across question types that require distinct reasoning abilities. We provide empirical evidence that state-of-the-art LLMs suffer from poor generalizability on reasoning types beyond those seen in the prompt. To remedy this, we propose a Mixture-of-Rea… ▽ More

    Submitted 20 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  24. arXiv:2305.13299  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

    Authors: Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, He He

    Abstract: In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. F… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  25. Complementary Classifier Induced Partial Label Learning

    Authors: Yuheng Jia, Chongjie Si, Min-ling Zhang

    Abstract: In partial label learning (PLL), each training sample is associated with a set of candidate labels, among which only one is valid. The core of PLL is to disambiguate the candidate labels to get the ground-truth one. In disambiguation, the existing works usually do not fully investigate the effectiveness of the non-candidate label set (a.k.a. complementary labels), which accurately indicates a set… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  26. arXiv:2303.14123  [pdf, other

    cs.CV

    Semantic Prompt for Few-Shot Image Recognition

    Authors: Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan

    Abstract: Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare samples through combining semantic prototypes with visual prototypes. However, these methods still suffer from the spurious visual features learned from the rare… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  27. arXiv:2303.01675  [pdf, other

    cs.DC

    Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches

    Authors: Siyu Wang, Zongyan Cao, Chang Si, Lansong Diao, Jiamang Wang, Wei Lin

    Abstract: Pipeline parallelism has been demonstrated to be a remarkable approach to improve throughput for training deep neural networks with billions of parameters over heterogeneous clusters. The 1F1B scheduling plan is a widely adopted strategy for memory and performance optimization, which interchanges the forward and backward stage computations of different micro-batches. On the other hand, a common is… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  28. arXiv:2302.08141  [pdf, other

    cs.DC cs.LG cs.PL

    Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

    Authors: Shiwei Zhang, Lansong Diao, Siyu Wang, Zongyan Cao, Yiliang Gu, Chang Si, Ziji Shi, Zhen Zheng, Chuan Wu, Wei Lin

    Abstract: We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment. It transforms a tensor program written for a single device into an equivalent distributed program that is capable of scaling up to thousands of devices with no user configuration. Rhino firstly works on a semantically independent intermediate representation of… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  29. arXiv:2302.07324  [pdf, other

    cs.CL

    READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

    Authors: Chenglei Si, Zhengyan Zhang, Yingfa Chen, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

    Abstract: For many real-world applications, the user-generated inputs usually contain various noises due to speech recognition errors caused by linguistic variations1 or typographical errors (typos). Thus, it is crucial to test model performance on data with realistic input noises to ensure robustness and fairness. However, little study has been done to construct such benchmarks for Chinese, where various l… ▽ More

    Submitted 24 May, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: ACL 2023

  30. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  31. MetaFormer Baselines for Vision

    Authors: Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

    Abstract: MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of MetaFormer, again, without focusing on token mixer design: we introduce several baseline models under MetaFormer using the most basic or common mixers, and summarize our observations as follows: (1) MetaFormer ensu… ▽ More

    Submitted 2 December, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to TPAMI. Code: https://github.com/sail-sg/metaformer

  32. arXiv:2210.09150  [pdf, other

    cs.CL

    Prompting GPT-3 To Be Reliable

    Authors: Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, Lijuan Wang

    Abstract: Large language models (LLMs) show impressive abilities via few-shot prompting. Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world language applications. However, the crucial problem of how to improve the reliability of GPT-3 is still under-explored. While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond t… ▽ More

    Submitted 14 February, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: ICLR 2023

  33. arXiv:2208.13465  [pdf, other

    cs.CV

    Exploring Semantic Attributes from A Foundation Model for Federated Learning of Disjoint Label Spaces

    Authors: Shitong Sun, Chenyang Si, Guile Wu, Shaogang Gong

    Abstract: Conventional centralised deep learning paradigms are not feasible when data from different sources cannot be shared due to data privacy or transmission limitation. To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server). Existing federated learning par… ▽ More

    Submitted 28 November, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: Under Review

  34. arXiv:2207.04197  [pdf, other

    cs.LG

    Multi-label Classification with High-rank and High-order Label Correlations

    Authors: Chongjie Si, Yuheng Jia, Ran Wang, Min-Ling Zhang, Yanghe Feng, Chongxiao Qu

    Abstract: Exploiting label correlations is important to multi-label classification. Previous methods capture the high-order label correlations mainly by transforming the label matrix to a latent label space with low-rank matrix factorization. However, the label matrix is generally a full-rank or approximate full-rank matrix, making the low-rank factorization inappropriate. Besides, in the latent space, the… ▽ More

    Submitted 6 November, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: 2023, TKDE

  35. arXiv:2205.12956  [pdf, other

    cs.CV cs.AI cs.LG

    Inception Transformer

    Authors: Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

    Abstract: Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, that effectively learns comprehensive features with both high- and low-frequency information in visual d… ▽ More

    Submitted 26 May, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Code and models will be released at https://github.com/sail-sg/iFormer

  36. arXiv:2205.12507  [pdf, other

    cs.CL

    Re-Examining Calibration: The Case of Question Answering

    Authors: Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber

    Abstract: For users to trust model predictions, they need to understand model outputs, particularly their confidence - calibration aims to adjust (calibrate) models' confidence to match expected accuracy. We argue that the traditional calibration evaluation does not promote effective calibrations: for example, it can encourage always assigning a mediocre confidence score to all predictions, which does not h… ▽ More

    Submitted 23 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 Findings

  37. arXiv:2203.14415  [pdf, other

    cs.CV cs.AI

    Mugs: A Multi-Granular Self-Supervised Learning Framework

    Authors: Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

    Abstract: In self-supervised learning, multi-granular features are heavily desired though rarely investigated, as different downstream tasks (e.g., general and fine-grained classification) often require different or multi-granular features, e.g.~fine- or coarse-grained one or their mixture. In this work, for the first time, we propose an effective MUlti-Granular Self-supervised learning (Mugs) framework to… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Comments: code and models are available at https://github.com/sail-sg/mugs

  38. arXiv:2203.00672  [pdf, other

    cs.CV

    Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption

    Authors: Ke Han, Chenyang Si, Yan Huang, Liang Wang, Tieniu Tan

    Abstract: In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain. As an important tool of regularizing the distribution, batch normalization (BN) has been widely used in existing methods. However, they neglect that BN is severely biased to the training domain and inevitably suffers the performance drop… ▽ More

    Submitted 28 March, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: accepted by AAAI 2022

  39. arXiv:2112.10508  [pdf, other

    cs.CL cs.LG

    Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

    Authors: Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, Samson Tan

    Abstract: What are the units of text that we want to model? From bytes to multi-word expressions, text can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating those as discrete and atomic tokens, but starting with byte-pair encoding (BPE), subword-based approaches have become dominant in many areas, enabling small vocab… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 15 page preprint

  40. arXiv:2111.11418  [pdf, other

    cs.CV cs.AI cs.LG

    MetaFormer Is Actually What You Need for Vision

    Authors: Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

    Abstract: Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in Transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the Trans… ▽ More

    Submitted 4 July, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 (Oral). Code: https://github.com/sail-sg/poolformer

  41. Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

    Authors: Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang

    Abstract: Skeleton-based action recognition is widely used in varied areas, e.g., surveillance and human-machine interaction. Existing models are mainly learned in a supervised manner, thus heavily depending on large-scale labeled data which could be infeasible when labels are prohibitively expensive. In this paper, we propose a novel Contrast-Reconstruction Representation Learning network (CRRL) that simul… ▽ More

    Submitted 10 February, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: Publised in IEEE TIP. (https://ieeexplore.ieee.org/document/9901454)

  42. arXiv:2109.05289  [pdf, other

    cs.CL

    What's in a Name? Answer Equivalence For Open-Domain Question Answering

    Authors: Chenglei Si, Chen Zhao, Jordan Boyd-Graber

    Abstract: A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answ… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 main conference

  43. Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

    Authors: Ziqing Yang, Yiming Cui, Chenglei Si, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu

    Abstract: Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this paper, we aim to apply AT on machine reading comprehension (MRC) tasks. Furthermore, we adapt AT for MRC tasks by proposing a novel adversarial training… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to *SEM 2021 workshop at ACL 2021

    Journal ref: Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

  44. arXiv:2106.00400  [pdf, other

    cs.CL

    Sub-Character Tokenization for Chinese Pretrained Language Models

    Authors: Chenglei Si, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

    Abstract: Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system where additional linguistic information exists below the character level, i.e., at the sub-character level. To utilize such information, we propose sub-character… ▽ More

    Submitted 14 February, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted at TACL

  45. arXiv:2105.11874  [pdf, other

    cs.CV

    Few-Shot Learning with Part Discovery and Augmentation from Unlabeled Images

    Authors: Wentao Chen, Chenyang Si, Wei Wang, Liang Wang, Zilei Wang, Tieniu Tan

    Abstract: Few-shot learning is a challenging task since only few instances are given for recognizing an unseen class. One way to alleviate this problem is to acquire a strong inductive bias via meta-learning on similar tasks. In this paper, we show that such inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classe… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: Accepted by IJCAI 2021

  46. arXiv:2105.11085  [pdf

    cs.LG cs.DC

    Fed-NILM: A Federated Learning-based Non-Intrusive Load Monitoring Method for Privacy-Protection

    Authors: Haijin Wang, Caomingzhe Si, Junhua Zhao, Guolong Liu, Fushuan Wen

    Abstract: Non-intrusive load monitoring (NILM) is essential for understanding customer's power consumption patterns and may find wide applications like carbon emission reduction and energy conservation. The training of NILM models requires massive load data containing different types of appliances. However, inadequate load data and the risk of power consumer privacy breaches may be encountered by local data… ▽ More

    Submitted 25 June, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

  47. arXiv:2104.01618  [pdf

    eess.SP cs.DC cs.LG

    A Federated Learning Framework for Non-Intrusive Load Monitoring

    Authors: Haijin Wang, Caomingzhe Si, Junhua Zhao

    Abstract: Non-intrusive load monitoring (NILM) aims at decomposing the total reading of the household power consumption into appliance-wise ones, which is beneficial for consumer behavior analysis as well as energy conservation. NILM based on deep learning has been a focus of research. To train a better neural network, it is necessary for the network to be fed with massive data containing various appliances… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

  48. arXiv:2012.15699  [pdf, other

    cs.CL

    Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

    Authors: Chenglei Si, Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

    Abstract: Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted to cover more search space of adversarial attacks by adding textual adversarial examples during training. However, the number of adversarial examples for text augmentation is still extremely insufficient due to the exponentia… ▽ More

    Submitted 5 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL 2021 (Findings)

  49. CharBERT: Character-aware Pre-trained Language Model

    Authors: Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu

    Abstract: Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile. In this paper, we propose a character-aware pre-trained language model named CharBERT improving on t… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 12 pages, to appear at COLING 2020

  50. arXiv:2007.05934  [pdf, other

    cs.CV

    Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

    Authors: Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, Jiashi Feng

    Abstract: We consider the problem of semi-supervised 3D action recognition which has been rarely explored before. Its major challenge lies in how to effectively learn motion representations from unlabeled data. Self-supervised learning (SSL) has been proved very effective at learning representations from unlabeled data in the image domain. However, few effective self-supervised approaches exist for 3D actio… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: Accepted by ECCV2020