Skip to main content

Showing 1–50 of 105 results for author: Zhai, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  2. Dye4AI: Assuring Data Boundary on Generative AI Services

    Authors: Shu Wang, Kun Sun, Yan Zhai

    Abstract: Generative artificial intelligence (AI) is versatile for various applications, but security and privacy concerns with third-party AI vendors hinder its broader adoption in sensitive scenarios. Hence, it is essential for users to validate the AI trustworthiness and ensure the security of data boundaries. In this paper, we present a dye testing system named Dye4AI, which injects crafted trigger data… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  4. arXiv:2405.15452  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

    Authors: Keyuan Cheng, Muhammad Asif Ali, Shu Yang, Gang Lin, Yuxuan Zhai, Haoyang Fei, Ke Xu, Lu Yu, Lijie Hu, Di Wang

    Abstract: Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  5. arXiv:2405.14103  [pdf, other

    cs.LG

    Online Self-Preferring Language Models

    Authors: Yuanzhao Zhai, Zhuo Zhang, Kele Xu, Hanyang Peng, Yue Yu, Dawei Feng, Cheng Yang, Bo Ding, Huaimin Wang

    Abstract: Aligning with human preference datasets has been critical to the success of large language models (LLMs). Reinforcement learning from human feedback (RLHF) employs a costly reward model to provide feedback for on-policy sampling responses. Recently, offline methods that directly fit responses with binary preferences in the dataset have emerged as alternatives. However, existing methods do not expl… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 20 pages, 9 figures

  6. arXiv:2405.10292  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

    Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

    Abstract: Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  7. arXiv:2405.08344  [pdf, other

    cs.CV

    No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

    Authors: Yingjie Zhai, Wenshuo Li, Yehui Tang, Xinghao Chen, Yunhe Wang

    Abstract: Current architectures for video understanding mainly build upon 3D convolutional blocks or 2D convolutions with additional operations for temporal modeling. However, these methods all regard the temporal axis as a separate dimension of the video sequence, which requires large computation and memory budgets and thus limits their usage on mobile devices. In this paper, we propose to squeeze the time… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  8. arXiv:2405.06228  [pdf, other

    cs.CV

    Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

    Authors: Zhenliang Ni, Xinghao Chen, Yingjie Zhai, Yehui Tang, Yunhe Wang

    Abstract: Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstru… ▽ More

    Submitted 18 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: ECCV 2024

  9. arXiv:2405.02520  [pdf, other

    cs.DC

    TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU

    Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Zizhong Chen, Franck Cappello

    Abstract: The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the erro… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  10. arXiv:2404.13311  [pdf, other

    cs.CV

    STAT: Towards Generalizable Temporal Action Localization

    Authors: Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan

    Abstract: Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels. Despite the significant progress, existing methods suffer from severe performance degradation when transferring to different distributions and thus may hardly adapt to real-world scenarios . To address this problem, we propose the Generalizable Temporal Action Localiz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 14 pages, LaTeX;

  11. arXiv:2404.00492  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-hop Question Answering under Temporal Knowledge Editing

    Authors: Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

    Abstract: Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 23 pages

  12. arXiv:2403.04193  [pdf

    cs.CR

    VAEMax: Open-Set Intrusion Detection based on OpenMax and Variational Autoencoder

    Authors: Zhiyin Qiu, Ding Zhou, Yahui Zhai, Bo Liu, Lei He, Jiuxin Cao

    Abstract: Promptly discovering unknown network attacks is critical for reducing the risk of major loss imposed on system or equipment. This paper aims to develop an open-set intrusion detection model to classify known attacks as well as inferring unknown ones. To achieve this, we employ OpenMax and variational autoencoder to propose a dual detection model, VAEMax. First, we extract flow payload feature base… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, 5 tables, 2024 5th ICTC

  13. arXiv:2402.18800  [pdf, other

    cs.LG stat.ML

    BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data

    Authors: Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai

    Abstract: Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOT… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  14. arXiv:2402.18787  [pdf, other

    cs.LG cs.CR

    Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

    Authors: Qiao Han, yong huang, xinling Guo, Yiteng Zhai, Yu Qin, Yao Yang

    Abstract: Recent studies have revealed the vulnerability of Deep Neural Networks (DNNs) to adversarial examples, which can easily fool DNNs into making incorrect predictions. To mitigate this deficiency, we propose a novel adversarial defense method called "Immunity" (Innovative MoE with MUtual information \& positioN stabilITY) based on a modified Mixture-of-Experts (MoE) architecture in this work. The key… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  15. arXiv:2402.15703  [pdf, other

    cs.LG cs.AI stat.ML

    Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

    Authors: Ruiqi Zhang, Yuexiang Zhai, Andrea Zanette

    Abstract: What can an agent learn in a stochastic Multi-Armed Bandit (MAB) problem from a dataset that contains just a single sample for each arm? Surprisingly, in this work, we demonstrate that even in such a data-starved setting it may still be possible to find a policy competitive with the optimal one. This paves the way to reliable decision-making in settings where critical decisions must be made by rel… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 22 pages

  16. arXiv:2402.14228  [pdf, other

    cs.LG cs.AI

    COPR: Continual Human Preference Learning via Optimal Policy Regularization

    Authors: Han Zhang, Lin Gui, Yu Lei, Yuanzhao Zhai, Yehong Zhang, Yulan He, Hui Wang, Yue Yu, Kam-Fai Wong, Bin Liang, Ruifeng Xu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is commonly utilized to improve the alignment of Large Language Models (LLMs) with human preferences. Given the evolving nature of human preferences, continual alignment becomes more crucial and practical in comparison to traditional static alignment. Nevertheless, making RLHF compatible with Continual Learning (CL) is challenging due to its comple… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  17. arXiv:2402.01289  [pdf, other

    cs.CV

    UCVC: A Unified Contextual Video Compression Framework with Joint P-frame and B-frame Coding

    Authors: Jiayu Yang, Wei Jiang, Yongqi Zhai, Chunhui Yang, Ronggang Wang

    Abstract: This paper presents a learned video compression method in response to video compression track of the 6th Challenge on Learned Image Compression (CLIC), at DCC 2024.Specifically, we propose a unified contextual video compression framework (UCVC) for joint P-frame and B-frame coding. Each non-intra frame refers to two neighboring decoded frames, which can be either both from the past for P-frame com… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: DCC2024, CLIC2024

  18. STAR: An Efficient Softmax Engine for Attention Model with RRAM Crossbar

    Authors: Yifeng Zhai, Bing Li, Bonan Yan, Jing Wang

    Abstract: RRAM crossbars have been studied to construct in-memory accelerators for neural network applications due to their in-situ computing capability. However, prior RRAM-based accelerators show efficiency degradation when executing the popular attention models. We observed that the frequent softmax operations arise as the efficiency bottleneck and also are insensitive to computing precision. Thus, we pr… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

  19. arXiv:2401.08154  [pdf, ps, other

    cs.CV eess.IV

    TLIC: Learned Image Compression with ROI-Weighted Distortion and Bit Allocation

    Authors: Wei Jiang, Yongqi Zhai, Hangyu Li, Ronggang Wang

    Abstract: This short paper describes our method for the track of image compression. To achieve better perceptual quality, we use the adversarial loss to generate realistic textures, use region of interest (ROI) mask to guide the bit allocation for different regions. Our Team name is TLIC.

    Submitted 23 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 2nd Place in the Image Compression Track, CLIC 2024, DCC 2024

  20. arXiv:2401.06209  [pdf, other

    cs.CV

    Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

    Authors: Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie

    Abstract: Is vision good enough for language? Recent advancements in multimodal models primarily stem from the powerful reasoning abilities of large language models (LLMs). However, the visual component typically depends only on the instance-level contrastive language-image pre-training (CLIP). Our research reveals that the visual capabilities in recent multimodal LLMs (MLLMs) still exhibit systematic short… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Project page: https://tsb0601.github.io/mmvp_blog/

  21. arXiv:2401.05899  [pdf, other

    cs.LG

    Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

    Authors: Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang

    Abstract: Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy optimization, usually via constructing a Pessimistic Markov Decision Process (P-MDP). However, the P-MDP discourages the policies from learning in out-of-distributio… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  22. arXiv:2401.04812  [pdf, other

    cs.AI

    Sample-and-Bound for Non-Convex Optimization

    Authors: Yaoguang Zhai, Zhizhen Qin, Sicun Gao

    Abstract: Standard approaches for global optimization of non-convex functions, such as branch-and-bound, maintain partition trees to systematically prune the domain. The tree size grows exponentially in the number of dimensions. We propose new sampling-based methods for non-convex optimization that adapts Monte Carlo Tree Search (MCTS) to improve efficiency. Instead of the standard use of visitation count i… ▽ More

    Submitted 19 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Published at AAAI 2024. Code is available at https://github.com/aaucsd/MCIR

  23. arXiv:2401.00243  [pdf, other

    cs.LG

    Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

    Authors: Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang

    Abstract: Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs). However, a notable challenge in RLHF is overoptimization, where beyond a certain threshold, the pursuit of higher rewards leads to a decline in human preferences. In this paper, we observe the weakness of KL regularization which is commonly employed in existing RLHF methods… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: 10 pages, 5 figures,

  24. arXiv:2312.16797  [pdf, other

    cs.CV

    Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification

    Authors: Yajing Zhai, Yawen Zeng, Zhiyong Huang, Zheng Qin, Xin Jin, Da Cao

    Abstract: The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to the success of person re-identification (ReID) task. However, current ReID algorithms typically failed to effectively leverage the rich contextual information available, primarily due to their reliance on simplistic and coarse utilization of image attributes. R… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  25. arXiv:2312.12458  [pdf, other

    cs.CL cs.AI

    When Parameter-efficient Tuning Meets General-purpose Vision-language Models

    Authors: Yihang Zhai, Haixin Wang, Jianlong Chang, Xinlong Yang, Jinan Sun, Shikun Zhang, Qi Tian

    Abstract: Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications. However, existing works still face two main limitations: the high training costs and heavy computing resource dependence of full model fine-tuning, and the lack of semantic… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  26. arXiv:2311.18377  [pdf

    physics.chem-ph cs.LG q-bio.BM

    Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data

    Authors: Chengwei Zhang, Yushuang Zhai, Ziyang Gong, Hongliang Duan, Yuan-Bin She, Yun-Fang Yang, An Su

    Abstract: Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecu… ▽ More

    Submitted 5 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  27. arXiv:2311.18232  [pdf, other

    cs.CL cs.AI cs.LG

    LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

    Authors: Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

    Abstract: Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional or goal-directed agents and might necessitate considerable prompt tuning. This becomes particularly apparent in multi-turn conversations: even the best current LLMs rarely ask clarifying questions, engage in explicit information gathering,… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  28. arXiv:2311.13110  [pdf, other

    cs.LG cs.CL cs.CV

    White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

    Authors: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma

    Abstract: In this paper, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information… ▽ More

    Submitted 24 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: This paper integrates the works arXiv:2306.01129 and arXiv:2308.16271 into a complete story. In this paper, we improve the writing and organization, and also add conceptual, empirical, and theoretical improvements over the previous work. V2: small typo fixes and formatting improvements

  29. arXiv:2311.12996  [pdf, other

    cs.AI cs.RO

    RLIF: Interactive Imitation Learning as Reinforcement Learning

    Authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine

    Abstract: Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correc… ▽ More

    Submitted 18 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  30. arXiv:2311.12603  [pdf, other

    cs.CV

    Surgical Temporal Action-aware Network with Sequence Regularization for Phase Recognition

    Authors: Zhen Chen, Yuhao Zhai, Jun Zhang, Jinqiao Wang

    Abstract: To assist surgeons in the operating theatre, surgical phase recognition is critical for developing computer-assisted surgical systems, which requires comprehensive understanding of surgical videos. Although existing studies made great progress, there are still two significant limitations worthy of improvement. First, due to the compromise of resource consumption, frame-wise visual features are ext… ▽ More

    Submitted 21 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2023)

  31. arXiv:2310.15694  [pdf, other

    cs.LG cs.CL

    COPR: Continual Learning Human Preference through Optimal Policy Regularization

    Authors: Han Zhang, Lin Gui, Yuanzhao Zhai, Hui Wang, Yu Lei, Ruifeng Xu

    Abstract: The technique of Reinforcement Learning from Human Feedback (RLHF) is a commonly employed method to improve pre-trained Language Models (LM), enhancing their ability to conform to human preferences. Nevertheless, the current RLHF-based LMs necessitate full retraining each time novel queries or feedback are introduced, which becomes a challenging task because human preferences can vary between diff… ▽ More

    Submitted 26 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  32. arXiv:2309.10313  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating the Catastrophic Forgetting in Multimodal Large Language Models

    Authors: Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

    Abstract: Following the success of GPT4, there has been a surge in interest in multimodal large language model (MLLM) research. This line of research focuses on developing general-purpose LLMs through fine-tuning pre-trained LLMs and vision models. However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  33. arXiv:2309.01265  [pdf, other

    cs.CV

    SOAR: Scene-debiasing Open-set Action Recognition

    Authors: Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua

    Abstract: Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have different scene distributions from the training samples. To mitigate this problem, we propose a novel method, called Scene-debiasing Open-set Action Reco… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, code:https://github.com/yhZhai/SOAR

  34. arXiv:2309.01246  [pdf, other

    cs.CV

    Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning

    Authors: Yuanhao Zhai, Tianyu Luan, David Doermann, Junsong Yuan

    Abstract: As advanced image manipulation techniques emerge, detecting the manipulation becomes increasingly important. Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To addres… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, code: https://github.com/yhZhai/WSCL

  35. arXiv:2308.10275  [pdf, other

    q-bio.QM cs.LG

    SBSM-Pro: Support Bio-sequence Machine for Proteins

    Authors: Yizheng Wang, Yixiao Zhai, Yijie Ding, Quan Zou

    Abstract: Proteins play a pivotal role in biological systems. The use of machine learning algorithms for protein classification can assist and even guide biological experiments, offering crucial insights for biotechnological applications. We introduce the Support Bio-Sequence Machine for Proteins (SBSM-Pro), a model purpose-built for the classification of biological sequences. This model starts with raw seq… ▽ More

    Submitted 4 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: 38 pages, 9 figures

  36. arXiv:2308.09611  [pdf, other

    cs.CV

    Language-guided Human Motion Synthesis with Atomic Actions

    Authors: Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan

    Abstract: Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and emplo… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023, code: https://github.com/yhZhai/ATOM

  37. arXiv:2308.05199  [pdf, other

    cs.DC

    gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

    Authors: Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur

    Abstract: GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to serious performance issues such as underutilized GPU devices and uncontrolled data distortion. In order to address these issues, in this paper, we propose… ▽ More

    Submitted 6 May, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: 12 pages, 13 figures, and 2 tables. ICS '24

  38. arXiv:2308.00245  [pdf, other

    cs.SE cs.AI

    The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models

    Authors: Haonan Li, Yu Hao, Yizhuo Zhai, Zhiyun Qian

    Abstract: Static analysis is a widely used technique in software engineering for identifying and mitigating bugs. However, a significant hurdle lies in achieving a delicate balance between precision and scalability. Large Language Models (LLMs) offer a promising alternative, as recent advances demonstrate remarkable capabilities in comprehending, generating, and even debugging code. Yet, the logic of bugs c… ▽ More

    Submitted 15 November, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

  39. arXiv:2307.15421  [pdf, other

    eess.IV cs.CV

    MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression

    Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang

    Abstract: Recently, learned image compression has achieved impressive performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in enhancing rate-distortion performance. However, existing global context modules rely on computationally intensive quadratic complexity computations to capture global correlations. This quadratic complexity imposes limitat… ▽ More

    Submitted 19 February, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: Compared with version presented at Neural Compression Workshop, ICML 2023 at OpenReview, in this arxiv version, we add the details of our prior work presented at ACMMM 2023, new comparisons on complexity, and more ablation studies

    Journal ref: ICML 2023 Workshop Neural Compression: From Information Theory to Applications

  40. arXiv:2307.05541  [pdf, other

    cs.CV

    High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition

    Authors: Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, Junsong Yuan

    Abstract: Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: CVPR 2023

    Journal ref: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16795-16804. 2023

  41. arXiv:2307.00040  [pdf, other

    cs.CV cs.AI

    DisCo: Disentangled Control for Realistic Human Dance Generation

    Authors: Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

    Abstract: Generative AI has made significant strides in computer vision, particularly in text-driven image/video synthesis (T2I/T2V). Despite the notable advancements, it remains challenging in human-centric content synthesis such as realistic dance generation. Current methodologies, primarily tailored for human motion transfer, encounter difficulties when confronted with real-world dance scenarios (e.g., s… ▽ More

    Submitted 4 April, 2024; v1 submitted 30 June, 2023; originally announced July 2023.

    Comments: Accepted by CVPR24

  42. arXiv:2306.05236  [pdf, other

    cs.CV

    Population-Based Evolutionary Gaming for Unsupervised Person Re-identification

    Authors: Yunpeng Zhai, Peixi Peng, Mengxi Jia, Shiyong Li, Weiqiang Chen, Xuesong Gao, Yonghong Tian

    Abstract: Unsupervised person re-identification has achieved great success through the self-improvement of individual neural networks. However, limited by the lack of diversity of discriminant information, a single network has difficulty learning sufficient discrimination ability by itself under unsupervised conditions. To address this limit, we develop a population-based evolutionary gaming (PEG) framework… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted in IJCV

  43. TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

    Authors: Yukun Zhai, Xiaoqiang Zhang, Xiameng Qin, Sanyuan Zhao, Xingping Dong, Jianbing Shen

    Abstract: End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework. Typical methods heavily rely on Region-of-Interest (RoI) operations to extract local features and complex post-processing steps to produce final predictions. To address these limitations, we propose TextFormer, a query-based end-to-end text spotter with Tra… ▽ More

    Submitted 1 April, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Machine Intelligence Research, MIR 2024

  44. Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

    Authors: Jiajun Huang, Kaiming Ouyang, Yujia Zhai, Jinyang Liu, Min Si, Ken Raffenetti, Hui Zhou, Atsushi Hori, Zizhong Chen, Yanfei Guo, Rajeev Thakur

    Abstract: In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Inter-process… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by ACM HPDC 2023

  45. FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

    Authors: Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen

    Abstract: General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by f… ▽ More

    Submitted 8 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2104.00897

  46. Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

    Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen

    Abstract: General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reli… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 11 pages, 2023 International Conference on Supercomputing

  47. arXiv:2304.09571  [pdf, other

    cs.CV cs.MM eess.IV

    LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression

    Authors: Wei Jiang, Peirong Ning, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang

    Abstract: The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to IEEE Transactions on Multimedia 2024

  48. arXiv:2304.03890  [pdf, other

    cs.DC

    An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression

    Authors: Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur

    Abstract: With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size in MPI collectives is particularly concerning because it can significantly degrade the overall parallel performance. To address th… ▽ More

    Submitted 17 January, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: 13 pages, 18 figures, 6 tables, IPDPS '24

  49. arXiv:2303.05479  [pdf, other

    cs.LG cs.AI

    Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

    Authors: Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

    Abstract: A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning… ▽ More

    Submitted 19 January, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023. project page: https://nakamotoo.github.io/Cal-QL

  50. arXiv:2302.10270  [pdf

    cs.CV cs.LG

    Crop mapping in the small sample/no sample case: an approach using a two-level cascade classifier and integrating domain knowledge

    Authors: Yunze Zang, Yifei Liu, Xuehong Chen, Anqi Li, Yichen Zhai, Shijie Li, Luling Liu, Chuanhai Zhu, Ruilin Chen, Shupeng Li, Na Jie

    Abstract: Mapping crops using remote sensing technology is important for food security and land management. Machine learning-based methods has become a popular approach for crop mapping in recent years. However, the key to machine learning, acquiring ample and accurate samples, is usually time-consuming and laborious. To solve this problem, a crop mapping method in the small sample/no sample case that integ… ▽ More

    Submitted 26 December, 2022; originally announced February 2023.

    Comments: in Chinese language