Skip to main content

Showing 1–50 of 335 results for author: Cai, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11867  [pdf, other

    cs.LG

    Single Layer Single Gradient Unlearning

    Authors: Zikui Cai, Yaoteng Tan, M. Salman Asif

    Abstract: Machine unlearning methods seek to revise pretrained models such that effects of certain training samples can be removed. In addition to effective erasure, low computational cost and general utility retention are also highly desirable. Existing unlearning methods usually involve iterative updates over the model parameters, which incurs a high computational cost. In this work, we propose an efficie… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11464  [pdf, other

    cs.CV

    Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

    Authors: Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang

    Abstract: In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its vari… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  3. arXiv:2407.01496  [pdf, other

    math.NA cs.LG

    Fast Iterative Solver For Neural Network Method: II. 1D Diffusion-Reaction Problems And Data Fitting

    Authors: Zhiqiang Cai, Anastassia Doktorova, Robert D. Falgout, César Herrera

    Abstract: This paper expands the damped block Newton (dBN) method introduced recently in [4] for 1D diffusion-reaction equations and least-squares data fitting problems. To determine the linear parameters (the weights and bias of the output layer) of the neural network (NN), the dBN method requires solving systems of linear equations involving the mass matrix. While the mass matrix for local hat basis funct… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 65K10; 65F05

  4. arXiv:2406.19280  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

    Authors: Junying Chen, Ruyi Ouyang, Anningzhe Gao, Shunian Chen, Guiming Hardy Chen, Xidong Wang, Ruifei Zhang, Zhenyang Cai, Ke Ji, Guangjun Yu, Xiang Wan, Benyou Wang

    Abstract: The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-i… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.15743  [pdf, other

    cs.SE

    CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation

    Authors: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang

    Abstract: Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle seman… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 7 figures

  6. arXiv:2406.14024  [pdf, other

    cs.CL

    LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

    Authors: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

    Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la… ▽ More

    Submitted 8 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages

  7. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.11116  [pdf

    cs.CL

    Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople

    Authors: Zhuang Qiu, Xufeng Duan, Zhenguang G. Cai

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's gram… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 23 pages

  9. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  10. arXiv:2406.08443  [pdf, other

    cs.CV cs.LG

    Transformation-Dependent Adversarial Attacks

    Authors: Yaoteng Tan, Zikui Cai, M. Salman Asif

    Abstract: We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions by systematically transforming the input (e.g., scaling, blurring, compression). Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.06541  [pdf, other

    cs.AR

    Global and Local Attention-based Inception U-Net for Static IR Drop Estimation

    Authors: Yilu Chen, Zhijie Cai, Min Wei, Zhifeng Lin, Jianli Chen

    Abstract: Static IR drop analysis is a fundamental and critical task in chip design since the IR drop will significantly affect the design's functionality, performance, and reliability. However, the process of IR drop analysis can be time-consuming, potentially taking several hours. Furthermore, in the process of fixing violations, it is frequently imperative to do IR drop analysis iteratively, hence exacer… ▽ More

    Submitted 27 April, 2024; originally announced June 2024.

    Comments: 7 pages, 8 figures

  12. arXiv:2406.05707  [pdf, other

    cs.CL cs.AI

    QGEval: A Benchmark for Question Generation Evaluation

    Authors: Weiping Fu, Bifan Wei, Jianxiang Hu, Zhongmin Cai, Jun Liu

    Abstract: Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  13. arXiv:2406.04702  [pdf, other

    cs.LG

    Marking the Pace: A Blockchain-Enhanced Privacy-Traceable Strategy for Federated Recommender Systems

    Authors: Zhen Cai, Tao Tang, Shuo Yu, Yunpeng Xiao, Feng Xia

    Abstract: Federated recommender systems have been crucially enhanced through data sharing and continuous model updates, attributed to the pervasive connectivity and distributed computing capabilities of Internet of Things (IoT) devices. Given the sensitivity of IoT data, transparent data processing in data sharing and model updates is paramount. However, existing methods fall short in tracing the flow of sh… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  14. arXiv:2406.04249  [pdf, other

    cs.CV

    Conv-INR: Convolutional Implicit Neural Representation for Multimodal Visual Signals

    Authors: Zhicheng Cai

    Abstract: Implicit neural representation (INR) has recently emerged as a promising paradigm for signal representations. Typically, INR is parameterized by a multiplayer perceptron (MLP) which takes the coordinates as the inputs and generates corresponding attributes of a signal. However, MLP-based INRs face two critical issues: i) individually considering each coordinate while ignoring the connections; ii)… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  15. arXiv:2406.04178  [pdf, other

    cs.CV

    Encoding Semantic Priors into the Weights of Implicit Neural Representation

    Authors: Zhicheng Cai, Qiu Shen

    Abstract: Implicit neural representation (INR) has recently emerged as a promising paradigm for signal representations, which takes coordinates as inputs and generates corresponding signal values. Since these coordinates contain no semantic features, INR fails to take any semantic information into consideration. However, semantic information has been proven critical in many vision tasks, especially for visu… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ICME 2024

  16. arXiv:2406.02599  [pdf, other

    cs.CR cs.AI

    Privacy-Aware Randomized Quantization via Linear Programming

    Authors: Zhongteng Cai, Xueru Zhang, Mohammad Mahdi Khalili

    Abstract: Differential privacy mechanisms such as the Gaussian or Laplace mechanism have been widely used in data analytics for preserving individual privacy. However, they are mostly designed for continuous outputs and are unsuitable for scenarios where discrete values are necessary. Although various quantization mechanisms were proposed recently to generate discrete outputs under differential privacy, the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  17. arXiv:2406.02575  [pdf, other

    cs.CL cs.CR cs.LG

    Cross-Modal Safety Alignment: Is textual unlearning all you need?

    Authors: Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

    Abstract: Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting mu… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  18. arXiv:2406.02069  [pdf, other

    cs.CL cs.AI

    PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

    Authors: Zefan Cai., Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao

    Abstract: In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately foc… ▽ More

    Submitted 16 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.01843  [pdf, other

    cs.CV

    L-MAGIC: Language Model Assisted Generation of Images with Coherence

    Authors: Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch

    Abstract: In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: accepted to CVPR 2024

  20. arXiv:2406.01391  [pdf, other

    astro-ph.IM cs.DL

    Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery

    Authors: Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

    Abstract: Identifying and predicting the factors that contribute to the success of interdisciplinary research is crucial for advancing scientific discovery. However, there is a lack of methods to quantify the integration of new ideas and technological advancements in astronomical research and how these new technologies drive further scientific breakthroughs. Large language models, with their ability to extr… ▽ More

    Submitted 15 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: An interactive version of the knowledge graph is made publicly available at https://astrokg.github.io/. Accepted to IJCAI 2024 AI4Research Workshop. Comments are welcome

  21. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  22. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  23. arXiv:2405.19256  [pdf, other

    cs.LG math.NA

    Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

    Authors: Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou

    Abstract: Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 24 pages,10 figures

  24. arXiv:2405.16075  [pdf, other

    cs.LG cs.AI

    Continuous Temporal Domain Generalization

    Authors: Zekun Cai, Guangji Bai, Renhe Jiang, Xuan Song, Liang Zhao

    Abstract: Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  25. arXiv:2405.15222  [pdf, other

    cs.CV cs.AI cs.RO

    Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation

    Authors: Yanwei Zheng, Changrui Li, Chuanlin Lan, Yaling Li, Xiao Zhang, Yifei Zou, Dongxiao Yu, Zhipeng Cai

    Abstract: Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  26. arXiv:2405.15205  [pdf, other

    eess.IV cs.CV

    Enhancing Generalized Fetal Brain MRI Segmentation using A Cascade Network with Depth-wise Separable Convolution and Attention Mechanism

    Authors: Zhigao Cai, Xing-Ming Zhao

    Abstract: Automatic segmentation of the fetal brain is still challenging due to the health state of fetal development, motion artifacts, and variability across gestational ages, since existing methods rely on high-quality datasets of healthy fetuses. In this work, we propose a novel cascade network called CasUNext to enhance the accuracy and generalization of fetal brain MRI segmentation. CasUNext incorpora… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  27. arXiv:2405.15115  [pdf, other

    cs.LG cs.CL stat.ML

    Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification

    Authors: Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li

    Abstract: Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.14195  [pdf, other

    cs.CV cs.AI

    Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

    Authors: Zhenyu Wei, Yujie He, Zhanchuan Cai

    Abstract: RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.09153  [pdf, other

    cs.CL cs.LG

    Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

    Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

    Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

  30. arXiv:2405.05231  [pdf, other

    cs.LG

    DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

    Authors: Renjie Liu, Yichuan Wang, Xiao Yan, Zhenkun Cai, Minjie Wang, Haitian Jiang, Bo Tang, Jinyang Li

    Abstract: Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications. To train GNNs on large graphs that exceed CPU memory, several systems store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when reading node features that are usually smaller than a disk page or degraded model accur… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  31. arXiv:2405.03076  [pdf, other

    cs.MA

    Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management

    Authors: Bingzhang Wang, Zhiyu Cai, Muhammad Monjurul Karim, Chenxi Liu, Yinhai Wang

    Abstract: The digitization of traffic sensing infrastructure has significantly accumulated an extensive traffic data warehouse, which presents unprecedented challenges for transportation analytics. The complexities associated with querying large-scale multi-table databases require specialized programming expertise and labor-intensive development. Additionally, traditional analysis methods have focused mainl… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, submitted to 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

  32. arXiv:2404.18209  [pdf, other

    cs.LG cs.DB

    4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

    Authors: Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

    Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and eva… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Under review

  33. arXiv:2404.15506  [pdf, other

    cs.CV

    Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

    Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

    Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

    Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. arXiv admin note: substantial text overlap with arXiv:2307.10984

  34. arXiv:2404.15127  [pdf, other

    cs.CV cs.CL

    MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

    Authors: Sunan He, Yuxiang Nie, Zhixuan Chen, Zhiyuan Cai, Hongmei Wang, Shu Yang, Hao Chen

    Abstract: The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to con… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  35. arXiv:2404.10573  [pdf, other

    cs.AI cs.CE q-bio.BM

    AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

    Authors: Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng

    Abstract: Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifyin… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  36. arXiv:2404.08491  [pdf, other

    cs.CL cs.AI

    Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

    Authors: Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang

    Abstract: Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tun… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  37. arXiv:2404.05445  [pdf, other

    stat.ME cs.LG stat.CO

    Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

    Authors: Hong Ye Tan, Ziruo Cai, Marcelo Pereyra, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

    Abstract: Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is ac… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62C12; 62F15; 65C40; 65J22

  38. arXiv:2404.05064  [pdf, other

    cs.LG math.NA

    A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

    Authors: Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia

    Abstract: In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    MSC Class: 65D15; 65K10

  39. arXiv:2404.04469  [pdf, other

    cs.CV

    Mixed-Query Transformer: A Unified Image Segmentation Architecture

    Authors: Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto

    Abstract: Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. In this paper, we introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  40. arXiv:2404.04458  [pdf, other

    cs.CV

    JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

    Authors: Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi

    Abstract: Understanding human social behaviour is crucial in computer vision and robotics. Micro-level observations like individual actions fall short, necessitating a comprehensive approach that considers individual behaviour, intra-group dynamics, and social group levels for a thorough understanding. To address dataset limitations, this paper introduces JRDB-Social, an extension of JRDB. Designed to fill… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://jrdb.erc.monash.edu/dataset/social

  41. arXiv:2404.01284  [pdf, other

    cs.CV

    Large Motion Model for Unified Multi-Modal Motion Generation

    Authors: Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

    Abstract: Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Homepage: https://mingyuan-zhang.github.io/projects/LMM.html

  42. arXiv:2404.00139  [pdf

    cs.CR cs.AI

    Security Risks Concerns of Generative AI in the IoT

    Authors: Honghui Xu, Yingshu Li, Olusesi Balogun, Shaoen Wu, Yue Wang, Zhipeng Cai

    Abstract: In an era where the Internet of Things (IoT) intersects increasingly with generative Artificial Intelligence (AI), this article scrutinizes the emergent security risks inherent in this integration. We explore how generative AI drives innovation in IoT and we analyze the potential for data breaches when using generative AI and the misuse of generative AI technologies in IoT ecosystems. These risks… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 6 pages, 2 figures

  43. arXiv:2403.20188  [pdf, other

    cs.NI cs.AI cs.LG

    Distributed Swarm Learning for Edge Internet of Things

    Authors: Yue Wang, Zhi Tian, FXin Fan, Zhipeng Cai, Cameron Nowzari, Kai Zeng

    Abstract: The rapid growth of Internet of Things (IoT) has led to the widespread deployment of smart IoT devices at wireless edge for collaborative machine learning tasks, ushering in a new era of edge learning. With a huge number of hardware-constrained IoT devices operating in resource-limited wireless networks, edge learning encounters substantial challenges, including communication and computation bottl… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.16705

  44. arXiv:2403.17934  [pdf, other

    cs.CV

    AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

    Authors: Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

    Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Homepage: https://ttxskk.github.io/AiOS/

  45. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  46. arXiv:2403.15694  [pdf, other

    cs.LG cs.MM

    Group Benefits Instances Selection for Data Purification

    Authors: Zhenhuang Cai, Chuanyi Zhang, Dan Huang, Yuanbo Chen, Xiuyun Guan, Yazhou Yao

    Abstract: Manually annotating datasets for training deep models is very labor-intensive and time-consuming. To overcome such inferiority, directly leveraging web images to conduct training data becomes a natural choice. Nevertheless, the presence of label noise in web data usually degrades the model performance. Existing methods for combating label noise are typically designed and tested on synthetic noisy… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE Intelligent Systems

  47. arXiv:2403.15407  [pdf, other

    cs.CL cs.AI

    X-AMR Annotation Tool

    Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

    Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: EACL 2024 System Demonstration

  48. arXiv:2403.13678  [pdf, other

    cs.CV

    AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts

    Authors: Jun Yu, Zerui Zhang, Zhihong Wei, Gongpeng Zhao, Zhongpeng Cai, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: Leveraging the synergy of both audio data and visual data is essential for understanding human emotions and behaviors, especially in in-the-wild setting. Traditional methods for integrating such multimodal information often stumble, leading to less-than-ideal outcomes in the task of facial action unit detection. To overcome these shortcomings, we propose a novel approach utilizing audio-visual mul… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  49. arXiv:2403.13027  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Towards Better Statistical Understanding of Watermarking LLMs

    Authors: Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li

    Abstract: In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better under… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  50. arXiv:2403.12959  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    WHAC: World-grounded Humans and Cameras

    Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

    Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Homepage: https://wqyin.github.io/projects/WHAC/