Zum Hauptinhalt springen

Showing 51–100 of 291 results for author: Bai, S

.
  1. arXiv:2309.07698  [pdf, other

    cs.CV

    Dataset Condensation via Generative Model

    Authors: David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

    Abstract: Dataset condensation aims to condense a large dataset with a lot of training samples into a small set. Previous methods usually condense the dataset into the pixels format. However, it suffers from slow optimization speed and large number of parameters to be optimized. When increasing image resolutions and classes, the number of learnable parameters grows accordingly, prohibiting condensation meth… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: old work,done in 2022

  2. Ethical Framework for Harnessing the Power of AI in Healthcare and Beyond

    Authors: Sidra Nasir, Rizwan Ahmed Khan, Samita Bai

    Abstract: In the past decade, the deployment of deep learning (Artificial Intelligence (AI)) methods has become pervasive across a spectrum of real-world applications, often in safety-critical contexts. This comprehensive research article rigorously investigates the ethical dimensions intricately linked to the rapid evolution of AI technologies, with a particular focus on the healthcare domain. Delving deep… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Journal ref: IEEE Access 2024

  3. arXiv:2308.16890  [pdf, other

    cs.CV cs.CL

    TouchStone: Evaluating Vision-Language Models by Language Models

    Authors: Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou

    Abstract: Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs). However, current assessments mainly focus on recognizing and reasoning abilities, lacking direct evaluation of conversational skills and neglecting visual s… ▽ More

    Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: https://github.com/OFA-Sys/TouchStone

  4. arXiv:2308.12966  [pdf, other

    cs.CV cs.CL

    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

    Authors: Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. Starting from the Qwen-LM as a foundation, we endow it with visual capacity by the meticulously designed (i) visual receptor, (ii) input-output interface, (iii) 3-stage training pipeline, and (iv) multilingual multimodal cleaned corpus. Beyon… ▽ More

    Submitted 12 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Code, demo and models are available at https://github.com/QwenLM/Qwen-VL

  5. arXiv:2308.09988  [pdf, ps, other

    math.AP math.FA

    On $p$-Laplacian Kirchhoff-Schrödinger-Poisson type systems with critical growth on the Heisenberg group

    Authors: Shujie Bai, Yueqiang Song, Dušan D. Repovš

    Abstract: In this article, we investigate the Kirchhoff-Schrödinger-Poisson type systems on the Heisenberg group of the following form: \begin{equation*} \left\{ \begin{array}{lll} {-(a+b\int_Ω|\nabla_{H} u|^{p}dξ)Δ_{H,p}u-μφ|u|^{p-2}u}=λ|u|^{q-2}u+|u|^{Q^{\ast}-2}u &\mbox{in}\ Ω, \\ -Δ_{H}φ=|u|^{p} &\mbox{in}\ Ω, \\ u=φ=0 &\mbox{on}\ \partialΩ, \end{array} \right. \end{equation*} where $a,b$ are positive r… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    MSC Class: 35J20; 35R03; 46E35

    Journal ref: Electron. Res. Arch. 31:9 (2023), 5749-5765

  6. arXiv:2308.07209  [pdf, other

    cs.LG cs.CV eess.IV

    Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning

    Authors: Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, Yong Liu

    Abstract: Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore,… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  7. arXiv:2308.06739  [pdf, other

    cs.CV

    Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

    Authors: David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou

    Abstract: Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

  8. arXiv:2308.04269  [pdf, other

    cs.CV cs.AI

    Lossy and Lossless (L$^2$) Post-training Model Size Compression

    Authors: Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang

    Abstract: Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high com… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  9. arXiv:2308.00353  [pdf, other

    cs.CV

    Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

    Authors: Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

    Abstract: Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories. A key factor for the recent progress in 2D open-world perception is the availability of large-scale image-text pairs from the Interne… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: submit to TPAMI

  10. Quantum metrology in the noisy intermediate-scale quantum era

    Authors: Lin Jiao, Wei Wu, Si-Yuan Bai, Jun-Hong An

    Abstract: Quantum metrology pursues the physical realization of higher-precision measurements to physical quantities than the classically achievable limit by exploiting quantum features, such as entanglement and squeezing, as resources. It has potential applications in developing next-generation frequency standards, magnetometers, radar, and navigation. However, the ubiquitous decoherence in the quantum wor… ▽ More

    Submitted 28 November, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Minireview of quantum metrology based on Lectures given at the summer school "Fundamental and Frontiers of Quantum Metrology and Quantum Computation" held in Bohai University, China, from 23 July to 8 August

    Journal ref: Adv Quantum Technol. 2023, 2300218

  11. arXiv:2307.05358  [pdf, other

    cs.LG cs.AI

    Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

    Authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi

    Abstract: Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Journal ref: The 38th Annual AAAI Conference on Artificial Intelligence, 2024

  12. arXiv:2307.00498  [pdf, other

    cs.LG cs.CV

    Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

    Authors: Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong Liu

    Abstract: Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quanti… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted for publication in the Pattern Recognition

    Journal ref: Pattern Recognition 2023

  13. arXiv:2306.16718  [pdf, other

    cs.CV

    Metric-aligned Sample Selection and Critical Feature Sampling for Oriented Object Detection

    Authors: Peng Sun, Yongbin Zheng, Wenqi Wu, Wanying Xu, Shengjian Bai

    Abstract: Arbitrary-oriented object detection is a relatively emerging but challenging task. Although remarkable progress has been made, there still remain many unsolved issues due to the large diversity of patterns in orientation, scale, aspect ratio, and visual appearance of objects in aerial images. Most of the existing methods adopt a coarse-grained fixed label assignment strategy and suffer from the in… ▽ More

    Submitted 10 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  14. arXiv:2306.14435  [pdf, other

    cs.CV cs.LG

    DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

    Authors: Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

    Abstract: Accurate and controllable image editing is a challenging task that has attracted significant attention recently. Notably, DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. However, due to its reliance on generative adversarial networks (GANs), its generality is limited by the capacity of pretrained GAN models. In this… ▽ More

    Submitted 7 April, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: Code is released at https://github.com/Yujun-Shi/DragDiffusion

  15. arXiv:2306.00974  [pdf, other

    cs.CV

    Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

    Authors: Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

    Abstract: Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the f… ▽ More

    Submitted 29 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Project page: https://sage-diffusion.github.io/

  16. arXiv:2305.15643  [pdf, other

    cs.LG math.OC stat.ML

    Federated Composite Saddle Point Optimization

    Authors: Site Bai, Brian Bullins

    Abstract: Federated learning (FL) approaches for saddle point problems (SPP) have recently gained in popularity due to the critical role they play in machine learning (ML). Existing works mostly target smooth unconstrained objectives in Euclidean space, whereas ML problems often involve constraints or non-smooth regularization, which results in a need for composite optimization. Addressing these issues, we… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  17. arXiv:2305.11676  [pdf, other

    cs.CV

    Learning Global-aware Kernel for Image Harmonization

    Authors: Xintian Shen, Jiangning Zhang, Jun Chen, Shipeng Bai, Yue Han, Yabiao Wang, Chengjie Wang, Yong Liu

    Abstract: Image harmonization aims to solve the visual inconsistency problem in composited images by adaptively adjusting the foreground pixels with the background as references. Existing methods employ local color transformation or region matching between foreground and background, which neglects powerful proximity prior and independently distinguishes fore-/back-ground as a whole part for harmonization. A… ▽ More

    Submitted 17 August, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 10 pages, 10 figures

  18. arXiv:2305.11172  [pdf, other

    cs.CV cs.CL cs.SD eess.AS

    ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

    Authors: Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou

    Abstract: In this work, we explore a scalable way for building a general representation model toward unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B parameters that can seamlessly align and integrate representations across vision, audio, and language modalities. The architecture of ONE-PEACE comprises modality adapters, shared self-attention layers, and modality FFNs. This desi… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 30 pages, 9 figures, 18 tables

  19. arXiv:2305.10545  [pdf

    cond-mat.mtrl-sci

    Recycling Silicon Scrap for Spherical Si-C composite as High-Performance Lithium-ion Battery Anodes

    Authors: Bhagath Sreenarayanan, Marta Vicencio, Shuang Bai, Bingyu Lu, Ou Mao, Shiva Adireddy, Wurigumula Bao, Ying Shirley Meng

    Abstract: The growth of the semiconductor and solar industry has been exponential in the last two decades due to the computing and energy demands of the world. Silicon (Si) is one of the main constituents for both sectors and, thus, is used in large quantities. As a result, a lot of Si waste is generated mainly by these two industries. For a sustainable world, the circular economy is the key; thus, the wast… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  20. arXiv:2305.01239  [pdf, other

    cs.CV cs.AI

    DRPT: Disentangled and Recurrent Prompt Tuning for Compositional Zero-Shot Learning

    Authors: Xiaocheng Lu, Ziming Liu, Song Guo, Jingcai Guo, Fushuo Huo, Sikai Bai, Tao Han

    Abstract: Compositional Zero-shot Learning (CZSL) aims to recognize novel concepts composed of known knowledge without training samples. Standard CZSL either identifies visual primitives or enhances unseen composed entities, and as a result, entanglement between state and object primitives cannot be fully utilized. Admittedly, vision-language models (VLMs) could naturally cope with CZSL through tuning promp… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  21. Non-Markovian quantum interconnect formed by a surface plasmon polariton waveguide

    Authors: Chun-Jie Yang, Xin-Yue Liu, Shi-Qiang Xia, Si-Yuan Bai, Jun-Hong An

    Abstract: Allowing the generation of effective interactions between distant quantum emitters (QEs) via flying photons, quantum interconnect (QI) is essentially a light-matter interface and acts as a building block in quantum technologies. A surface plasmon polariton (SPP) supported by a metallic waveguide provides an ideal interface to explore strong light-matter couplings and to realize QI. However, the lo… ▽ More

    Submitted 20 March, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Journal ref: Phys. Rev. A 109, 033518 (2024)

  22. arXiv:2304.06817  [pdf

    physics.chem-ph cond-mat.mtrl-sci

    Elucidating the Role of Prelithiation in Si-based Anodes for Interface Stabilization

    Authors: Shuang Bai, Wurigumula Bao, Kun Qian, Bing Han, Weikang Li, Baharak Sayahpour, Bhagath Screenarayanan, Darren H. S. Tan, So-yeon Ham, Ying Shirley Meng

    Abstract: Prelithiation as a facile and effective method to compensate the lithium inventory loss in the initial cycle has progressed considerably both on anode and cathode sides. However, much less research has been devoted to the prelithiation effect on the interface stabilization for long-term cycling of Si-based anodes. An in-depth quantitative analysis of the interface that form during the prelithiatio… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  23. arXiv:2304.03319  [pdf, ps, other

    math.PR

    Joint sum-max limit for a class of long-range dependent processes with heavy tails

    Authors: Shuyang Bai, He Tang

    Abstract: We consider a class of stationary processes exhibiting both long-range dependence and heavy tails. Separate limit theorems for sums and for extremes have been established recently in literature with novel objects appearing in the limits. In this article, we establish the joint sum-max limit theorems for this class of processes. In the finite-variance case, the limit consists of two independent com… ▽ More

    Submitted 11 September, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: 26 pages

    MSC Class: 60F17; 60G10

  24. arXiv:2303.08242  [pdf, other

    stat.ML cs.LG stat.AP

    Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

    Authors: Rui Xie, Shuyang Bai, Ping Ma

    Abstract: The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by The Annals of Applied Statistics

  25. arXiv:2303.08132  [pdf, other

    cs.CV

    InstMove: Instance Motion for Object-centric Video Segmentation

    Authors: Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

    Abstract: Despite significant efforts, cutting-edge video segmentation methods still remain sensitive to occlusion and rapid movement, due to their reliance on the appearance of objects in the form of object embeddings, which are vulnerable to these disturbances. A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies o… ▽ More

    Submitted 30 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023; Code: https://github.com/wjf5203/VNext

  26. arXiv:2303.06340  [pdf, other

    q-bio.QM cs.LG eess.IV

    Intelligent diagnostic scheme for lung cancer screening with Raman spectra data by tensor network machine learning

    Authors: Yu-Jia An, Sheng-Chen Bai, Lin Cheng, Xiao-Guang Li, Cheng-en Wang, Xiao-Dong Han, Gang Su, Shi-Ju Ran, Cong Wang

    Abstract: Artificial intelligence (AI) has brought tremendous impacts on biomedical sciences from academic researches to clinical applications, such as in biomarkers' detection and diagnosis, optimization of treatment, and identification of new therapeutic targets in drug discovery. However, the contemporary AI technologies, particularly deep machine learning (ML), severely suffer from non-interpretability,… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 10 pages, 7 figures

  27. arXiv:2303.04366  [pdf, other

    cs.LG

    Semantically Consistent Multi-view Representation Learning

    Authors: Yiyang Zhou, Qinghai Zheng, Shunshun Bai, Jihua Zhu

    Abstract: In this work, we devote ourselves to the challenging task of Unsupervised Multi-view Representation Learning (UMRL), which requires learning a unified feature representation from multiple views in an unsupervised manner. Existing UMRL methods mainly concentrate on the learning process in the feature space while ignoring the valuable semantic information hidden in different views. To address this i… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: 19 pages, 4figures

  28. Floquet Engineering to Overcome No-Go Theorem of Noisy Quantum Metrology

    Authors: Si-Yuan Bai, Jun-Hong An

    Abstract: Permitting a more precise measurement to physical quantities than the classical limit by using quantum resources, quantum metrology holds a promise in developing many revolutionary technologies. However, the noise-induced decoherence forces its superiority to disappear, which is called no-go theorem of noisy quantum metrology and constrains its application. We propose a scheme to overcome the no-g… ▽ More

    Submitted 1 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Phys. Rev. Lett. 131, 050801 (2023)

  29. arXiv:2302.01872  [pdf, other

    cs.CV

    MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

    Authors: Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H. S. Torr, Song Bai

    Abstract: Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied. To revisit… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: MOSE Dataset Report

    Journal ref: ICCV 2023

  30. arXiv:2212.06384  [pdf, other

    cs.CV

    PV3D: A 3D Generative Model for Portrait Video Generation

    Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou

    Abstract: Recent advances in generative adversarial networks (GANs) have demonstrated the capabilities of generating stunning photo-realistic portrait images. While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos. In this work, we propose PV3D,… ▽ More

    Submitted 20 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted to ICLR2023, Project Page https://showlab.github.io/pv3d

  31. arXiv:2212.04408  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

    Authors: Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou

    Abstract: Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we rele… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  32. arXiv:2212.02837  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    Pretrained Diffusion Models for Unified Human Motion Synthesis

    Authors: Jianxin Ma, Shuai Bai, Chang Zhou

    Abstract: Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics. Conventional approaches develop separate models for different motion synthesis tasks, and typically use a model of a small size to avoid overfitting the scarce data available in each setting. It remains an open question whether developing a single unified model is feasible, which may 1)… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  33. arXiv:2211.16312  [pdf, other

    cs.CV

    PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

    Authors: Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

    Abstract: Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred to 3D scenarios due to the inaccessibility of large-scale 3D-text pairs. To this… ▽ More

    Submitted 22 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: CVPR2023

  34. arXiv:2211.15846  [pdf, other

    cs.CV cs.LG

    LUMix: Improving Mixup by Better Modelling Label Uncertainty

    Authors: Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

    Abstract: Modern deep networks can be better generalized when trained with noisy samples and regularization techniques. Mixup and CutMix have been proven to be effective for data augmentation to help avoid overfitting. Previous Mixup-based methods linearly combine images and labels to generate additional training data. However, this is problematic if the object does not occupy the whole image as we demonstr… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  35. arXiv:2211.09973  [pdf, other

    cs.CV

    The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

    Authors: Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai

    Abstract: This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge. We adopt the previously proposed online video instance segmentation method IDOL for this challenge. In addition, we use pseudo labels to further help contrastive learning, so as to obtain more temporally consistent instance embedding to improve tracking performance between frames. The propose… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: The Runner-up Solution for YouTube-VIS Long Video Challenge 2022, ECCV 2022 Workshop. arXiv admin note: text overlap with arXiv:2207.10661

  36. arXiv:2211.09961  [pdf, other

    cs.LG stat.ML

    Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

    Authors: Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

    Abstract: Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that str… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  37. Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

    Authors: Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

    Abstract: This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT) field representation that encodes line segments using a closed-form 4D geometric vector field. The proposed HAWP consists of three sequential components empowered by en… ▽ More

    Submitted 5 September, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Journal extension of arXiv:2003.01663; Accepted by IEEE TPAMI; Code is available at https://github.com/cherubicxn/hawp

  38. arXiv:2210.11714  [pdf, other

    cs.CL cs.HC

    Design a Sustainable Micro-mobility Future: Trends and Challenges in the United States and European Union Using Natural Language Processing Techniques

    Authors: Lilit Avetisyan, Chengxin Zhang, Sue Bai, Ehsan Moradi Pari, Fred Feng, Shan Bao, Feng Zhou

    Abstract: Micro-mobility is promising to contribute to sustainable cities in the future with its efficiency and low cost. To better design such a sustainable future, it is necessary to understand the trends and challenges. Thus, we examined people's opinions on micro-mobility in the US and the EU using Tweets. We used topic modeling based on advanced natural language processing techniques and categorized th… ▽ More

    Submitted 29 October, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 33 pages, 4 figures

    ACM Class: I.5

  39. arXiv:2210.07574  [pdf, other

    cs.CV

    Is synthetic data from generative models ready for image recognition?

    Authors: Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

    Abstract: Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be use… ▽ More

    Submitted 15 February, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: ICLR 2023, spotlight

  40. arXiv:2210.00226  [pdf, other

    cs.LG

    Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

    Authors: Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

    Abstract: Federated learning aims to train models collaboratively across different clients without the sharing of data for privacy considerations. However, one major challenge for this learning paradigm is the {\em data heterogeneity} problem, which refers to the discrepancies between the local data distributions among various clients. To tackle this problem, we first study how data heterogeneity affects th… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: camera ready version of ICLR 2023

  41. arXiv:2209.08599  [pdf, other

    math.SG math.AT math.DS math.GT

    Arnold conjecture over integers

    Authors: Shaoyun Bai, Guangbo Xu

    Abstract: For any closed symplectic manifold, we show that the number of 1-periodic orbits of a nondegenerate Hamiltonian thereon is bounded from below by a version of total Betti number over Z of the ambient space taking account of the total Betti number over Q and torsions of all characteristic. The proof is based on constructing a Hamiltonian Floer theory over the Novikov ring with integer coefficients,… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 168 pages, 2 figures. Comments welcome!

  42. arXiv:2209.01386  [pdf, other

    cs.AR cs.LG eess.SP

    SaleNet: A low-power end-to-end CNN accelerator for sustained attention level evaluation using EEG

    Authors: Chao Zhang, Zijian Tang, Taoming Guo, Jiaxin Lei, Jiaxin Xiao, Anhe Wang, Shuo Bai, Milin Zhang

    Abstract: This paper proposes SaleNet - an end-to-end convolutional neural network (CNN) for sustained attention level evaluation using prefrontal electroencephalogram (EEG). A bias-driven pruning method is proposed together with group convolution, global average pooling (GAP), near-zero pruning, weight clustering and quantization for the model compression, achieving a total compression ratio of 183.11x. Th… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 5 pages, 4 figures, to be published in IEEE International Symposium on Circuits and Systems (ISCAS) 2022

  43. arXiv:2209.00224  [pdf, ps, other

    cs.CV

    1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words

    Authors: Zhangzi Zhu, Chuhui Xue, Yu Hao, Wenqing Zhang, Song Bai

    Abstract: Scene text recognition has attracted increasing interest in recent years due to its wide range of applications in multilingual translation, autonomous driving, etc. In this report, we describe our solution to the Out of Vocabulary Scene Text Understanding (OOV-ST) Challenge, which aims to extract out-of-vocabulary (OOV) words from natural scene images. Our oCLIP-based model achieves 28.59\% in h-m… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Report to ECCV TiE OOV competition

  44. arXiv:2208.03524  [pdf

    eess.IV cs.CV

    Deep Learning-enabled Spatial Phase Unwrapping for 3D Measurement

    Authors: Xiaolong Luo, Wanzhong Song, Songlin Bai, Yu Li, Zhihe Zhao

    Abstract: In terms of 3D imaging speed and system cost, the single-camera system projecting single-frequency patterns is the ideal option among all proposed Fringe Projection Profilometry (FPP) systems. This system necessitates a robust spatial phase unwrapping (SPU) algorithm. However, robust SPU remains a challenge in complex scenes. Quality-guided SPU algorithms need more efficient ways to identify the u… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

    Comments: 26 pages

    ACM Class: I.4.5

    Journal ref: Optics & Laser Technology, 163 (2023) 109340

  45. arXiv:2208.02747  [pdf, ps, other

    cs.CV

    Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

    Authors: Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai

    Abstract: This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune… ▽ More

    Submitted 31 August, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

  46. arXiv:2208.00090  [pdf, other

    cs.CV

    Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

    Authors: Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

    Abstract: Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning, they still fail to generalize to unseen poses or occlusion cases and may make large mistakes when multiple peopl… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: ECCV 2022

  47. arXiv:2207.12955  [pdf, other

    cs.CV

    Contextual Text Block Detection towards Scene Text Understanding

    Authors: Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

    Abstract: Most existing scene text detectors focus on detecting characters or words that only capture partial text messages due to missing contextual information. For a better understanding of text in scenes, it is more desired to detect contextual text blocks (CTBs) which consist of one or multiple integral text units (e.g., characters, words, or phrases) in natural reading order and transmit certain compl… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022

  48. arXiv:2207.10661  [pdf, other

    cs.CV

    In Defense of Online Models for Video Instance Segmentation

    Authors: Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

    Abstract: In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highl… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, Oral

  49. arXiv:2207.09161  [pdf, other

    cs.CV

    Single Stage Virtual Try-on via Deformable Attention Flows

    Authors: Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, Hongxia Yang

    Abstract: Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image. Existing methods usually build up multi-stage frameworks to deal with clothes warping and body blending respectively, or rely heavily on intermediate parser-based labels which may be noisy or even inaccurate. To solve the above challenges, we propose a single-stage try-on framewo… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  50. arXiv:2207.06118  [pdf, other

    cs.AI

    Stability of Weighted Majority Voting under Estimated Weights

    Authors: Shaojie Bai, Dongxia Wang, Tim Muller, Peng Cheng, Jiming Chen

    Abstract: Weighted Majority Voting (WMV) is a well-known optimal decision rule for collective decision making, given the probability of sources to provide accurate information (trustworthiness). However, in reality, the trustworthiness is not a known quantity to the decision maker - they have to rely on an estimate called trust. A (machine learning) algorithm that computes trust is called unbiased when it h… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: 15 pages, 16 figures