Skip to main content

Showing 1–50 of 283 results for author: Lai, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12684  [pdf, other

    cs.CV

    4Dynamic: Text-to-4D Generation with Hybrid Priors

    Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

    Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.10563  [pdf, other

    cs.CV

    Pathformer3D: A 3D Scanpath Transformer for 360° Images

    Authors: Rong Quan, Yantao Lai, Mengyu Qiu, Dong Liang

    Abstract: Scanpath prediction in 360° images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360° images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perfo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2407.07999  [pdf, ps, other

    cs.CV

    Fusion of Short-term and Long-term Attention for Video Mirror Detection

    Authors: Mingchen Xu, Jing Wu, Yukun Lai, Ze Ji

    Abstract: Techniques for detecting mirrors from static images have witnessed rapid growth in recent years. However, these methods detect mirrors from single input images. Detecting mirrors from video requires further consideration of temporal consistency between frames. We observe that humans can recognize mirror candidates, from just one or two frames, based on their appearance (e.g. shape, color). However… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  4. arXiv:2407.07346  [pdf, other

    cs.LG cs.CE

    INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

    Authors: Souradip Poddar, Youngmin Oh, Yao Lai, Hanqing Zhu, Bosun Hwang, David Z. Pan

    Abstract: Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of SPICE simulations, making effective design automation a challenging endeavor. In this paper, w… ▽ More

    Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  5. arXiv:2407.04185  [pdf, other

    cs.CL

    HAF-RM: A Hybrid Alignment Framework for Reward Model Training

    Authors: Shujun Liu, Xiaoyu Shen, Yuhang Lai, Siyuan Wang, Shengbin Yue, Zengfeng Huang, Xuanjing Huang, Zhongyu Wei

    Abstract: The reward model has become increasingly important in alignment, assessment, and data construction for large language models (LLMs). Most existing researchers focus on enhancing reward models through data improvements, following the conventional training framework for reward models that directly optimizes the predicted rewards. In this paper, we propose a hybrid alignment framework HaF-RM for rewa… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.01965  [pdf, other

    cs.CL cs.IR

    AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment

    Authors: Yilong Lai, Jialong Wu, Congzhi Zhang, Haowen Sun, Deyu Zhou

    Abstract: Conversational Query Reformulation (CQR) has significantly advanced in addressing the challenges of conversational search, particularly those stemming from the latent user intent and the need for historical context. Recent works aimed to boost the performance of CRQ through alignment. However, they are designed for one specific retrieval system, which potentially results in poor generalization. To… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2406.20078  [pdf, other

    cs.CV

    GM-DF: Generalized Multi-Scenario Deepfake Detection

    Authors: Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

    Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.18200  [pdf, other

    cs.CL

    SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

    Authors: Zhenglin Wang, Jialong Wu, Yilong Lai, Congzhi Zhang, Deyu Zhou

    Abstract: Large Language Models (LLMs) demonstrate remarkable emergent abilities across various tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based reasoning methods address this by surpassing the capabilities of chain-of-thought prompting, encouraging exploration of intermediate steps. However, such methods introduce significant inference latency due to the systematic explo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.16271  [pdf, other

    cs.CV

    Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

    Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

    Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for MICCAI2024

  10. arXiv:2406.09794  [pdf, other

    cs.CV

    SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

    Authors: Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

    Abstract: SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  11. arXiv:2406.09178  [pdf, other

    cs.RO

    AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation

    Authors: Minglun Wei, Xintong Yang, Yu-Kun Lai, Seyed Amir Tafrishi, Ze Ji

    Abstract: Due to the complex physical properties of granular materials, research on robot learning for manipulating such materials predominantly either disregards the consideration of their physical characteristics or uses surrogate models to approximate their physical properties. Learning to manipulate granular materials based on physical information obtained through precise modelling remains an unsolved p… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages

  12. arXiv:2406.05482  [pdf, other

    cs.LG

    Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks

    Authors: Yurui Lai, Xiaoyang Lin, Renchi Yang, Hongtao Wang

    Abstract: In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over hi… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: This is the technical report for the paper accepted to KDD 2024. 16 pages

  13. arXiv:2406.05250  [pdf, other

    cs.AI cs.AR cs.LG

    LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation

    Authors: Guojin Chen, Keren Zhu, Seunggeun Kim, Hanqing Zhu, Yao Lai, Bei Yu, David Z. Pan

    Abstract: Analog layout synthesis faces significant challenges due to its dependence on manual processes, considerable time requirements, and performance instability. Current Bayesian Optimization (BO)-based techniques for analog layout synthesis, despite their potential for automation, suffer from slow convergence and extensive data needs, limiting their practical application. This paper presents the \text… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  14. arXiv:2406.04888  [pdf, other

    cs.CV

    Zero-Shot Video Editing through Adaptive Sliding Score Distillation

    Authors: Lianghan Zhu, Yanqi Bao, Jing Huo, Jing Wu, Yu-Kun Lai, Wenbin Li, Yang Gao

    Abstract: The burgeoning field of text-based video generation (T2V) has reignited significant interest in the research of controllable video editing. Although pre-trained T2V-based editing models have achieved efficient editing capabilities, current works are still plagued by two major challenges. Firstly, the inherent limitations of T2V models lead to content inconsistencies and motion discontinuities betw… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  15. arXiv:2405.17069  [pdf, other

    cs.CV cs.LG

    Training-free Editioning of Text-to-Image Models

    Authors: Jinqi Wang, Yunfei Fu, Zhangcan Ding, Bailin Deng, Yu-Kun Lai, Yipeng Qin

    Abstract: Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups o… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.14918  [pdf, other

    cs.LG cs.ET

    AnalogCoder: Analog Circuit Design via Training-Free Code Generation

    Authors: Yao Lai, Sungyoung Lee, Guojin Chen, Souradip Poddar, Mengkang Hu, David Z. Pan, Ping Luo

    Abstract: Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCod… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.11451  [pdf, ps, other

    math.NA cs.AI math.AP stat.ML

    Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

    Authors: Yuling Jiao, Yanming Lai, Yang Wang

    Abstract: Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations wi… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    MSC Class: 65N12; 65N15; 68T07; 62G05; 35J25

  18. arXiv:2405.07884  [pdf, other

    cs.LG

    Lai Loss: A Novel Loss for Gradient Control

    Authors: YuFei Lai

    Abstract: In the field of machine learning, traditional regularization methods tend to directly add regularization terms to the loss function. This paper introduces the "Lai loss", a novel loss design that integrates the regularization terms (specifically, gradients) into the traditional loss function through straightforward geometric concepts. This design penalizes the gradients with the loss itself, allow… ▽ More

    Submitted 23 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 12 pages, 12 figures

  19. arXiv:2405.06758  [pdf, other

    cs.LG

    Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

    Authors: Yao Lai, Jinxin Liu, David Z. Pan, Ping Luo

    Abstract: Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. T… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  20. arXiv:2405.06461  [pdf, other

    cs.GR

    SketchDream: Sketch-based Text-to-3D Generation and Editing

    Authors: Feng-Lin Liu, Hongbo Fu, Yu-Kun Lai, Lin Gao

    Abstract: Existing text-based 3D generation methods generate attractive results but lack detailed geometry control. Sketches, known for their conciseness and expressiveness, have contributed to intuitive 3D modeling but are confined to producing texture-less mesh models within predefined categories. Integrating sketch and text simultaneously for 3D generation promises enhanced control over geometry and appe… ▽ More

    Submitted 14 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  21. arXiv:2405.04952  [pdf, other

    cs.RO

    Evolving R2 to R2+: Optimal, Delayed Line-of-sight Vector-based Path Planning

    Authors: Yan Kai Lai, Prahlad Vadakkepat, Cheng Xiang

    Abstract: A vector-based any-angle path planner, R2, is evolved in to R2+ in this paper. By delaying line-of-sight, R2 and R2+ search times are largely unaffected by the distance between the start and goal points, but are exponential in the worst case with respect to the number of collisions during searches. To improve search times, additional discarding conditions in the overlap rule are introduced in R2+.… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Submitted. The R2 mentioned in the paper is located at https://doi.org/10.1016/j.robot.2023.104606

  22. arXiv:2405.03067  [pdf, other

    cs.SE

    Automated Deep Learning Optimization via DSL-Based Source Code Transformation

    Authors: Ruixin Wang, Minghai Lu, Cody Hao Yu, Yi-Hsiang Lai, Tianyi Zhang

    Abstract: As deep learning models become increasingly bigger and more complex, it is critical to improve model training and inference efficiency. Though a variety of highly optimized libraries and packages (known as DL kernels) have been developed, it is tedious and time-consuming to figure out which kernel to use, where to use, and how to use them correctly. To address this challenge, we propose an Automat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

    ACM Class: D.2.11; I.2.0

    Journal ref: In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

  23. arXiv:2405.02957  [pdf, other

    cs.AI

    Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

    Authors: Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu

    Abstract: In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can s… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  24. arXiv:2404.13579  [pdf, other

    cs.CV cs.AI

    LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions

    Authors: Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li

    Abstract: Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation. Visual text rendering and layout-to-image generation tasks have been popular in controllable text-to-image generation. However, each of these tasks typically focuses on single modality generation or rendering, leaving yet-to-be-br… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  25. arXiv:2404.09412  [pdf, other

    cs.CV

    DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading

    Authors: Tong Wu, Jia-Mu Sun, Yu-Kun Lai, Yuewen Ma, Leif Kobbelt, Lin Gao

    Abstract: Reconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmoni… ▽ More

    Submitted 6 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  26. arXiv:2404.05678  [pdf, other

    stat.ML cs.CY cs.LG

    Flexible Fairness Learning via Inverse Conditional Permutation

    Authors: Yuheng Lai, Leying Guan

    Abstract: Equalized odds, as a popular notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm prediction when conditioning on the true outcome. Despite rapid advancements, most of the current research focuses on the violation of equalized odds caused by one sensitive attribute, leaving the challenge of simultaneously accounti… ▽ More

    Submitted 9 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  27. arXiv:2404.02538  [pdf, other

    stat.ML cs.LG

    Convergence Analysis of Flow Matching in Latent Space with Transformers

    Authors: Yuling Jiao, Yanming Lai, Yang Wang, Bokai Yan

    Abstract: We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analy… ▽ More

    Submitted 28 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  28. arXiv:2403.12707  [pdf, other

    cs.CV

    Selective Domain-Invariant Feature for Generalizable Deepfake Detection

    Authors: Yingxin Lai, Guoqing Yang Yifan He, Zhiming Luo, Shaozi Li

    Abstract: With diverse presentation forgery methods emerging continually, detecting the authenticity of images has drawn growing attention. Although existing methods have achieved impressive accuracy in training dataset detection, they still perform poorly in the unseen domain and suffer from forgery of irrelevant information such as background and identity, affecting generalizability. To solve this problem… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP 2024

  29. arXiv:2403.11073  [pdf

    cs.CV cs.AI

    Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping

    Authors: Haoxi Zhang, Xinxu Zhang, Yuanxin Lin, Maiqi Wang, Yi Lai, Yu Wang, Linfeng Yu, Yufeng Xu, Ran Cheng, Edward Szczerbicki

    Abstract: Automatic karyotype analysis is often defined as a visual perception task focused solely on chromosomal object-level modeling. This definition has led most existing methods to overlook componential and holistic information, significantly constraining model performance. Moreover, the lack of interpretability in current technologies hinders clinical adoption. In this paper, we introduce Tokensome, a… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Preprint. Work in progress

  30. arXiv:2403.10050  [pdf, other

    cs.CV

    Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

    Authors: Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang

    Abstract: 3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namel… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  31. arXiv:2403.09296  [pdf, other

    cs.CV

    Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

    Authors: Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, adapting pre-trained VLMs to a sequence of downstream tasks often leads to the forgetting of previously learned knowledge and a reduction in zero-shot classification performance. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer frame… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024. Project page: https://chuyu.org/research/snd

  32. arXiv:2403.07693  [pdf, other

    cs.CL cs.AI

    Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization

    Authors: Yanyue Zhang, Pengfei Li, Yilong Lai, Deyu Zhou, Yulan He

    Abstract: As more than 70$\%$ of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are reluctant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the over-reliance on a specific framework is to generate additional data based on large language models to balance the emotional distri… ▽ More

    Submitted 19 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  33. arXiv:2403.06754  [pdf, other

    cs.CL cs.AI cs.LG

    ALaRM: Align Language Models via Hierarchical Rewards Modeling

    Authors: Yuhang Lai, Siyuan Wang, Shujun Liu, Xuanjing Huang, Zhongyu Wei

    Abstract: We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework addresses the limitations of current alignment approaches, which often struggle with the inconsistency and sparsity of human supervision signals, by integrating ho… ▽ More

    Submitted 16 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 15 pages, 6 figures

  34. arXiv:2403.06470  [pdf, other

    cs.CV

    3D-aware Image Generation and Editing with Multi-modal Conditions

    Authors: Bo Li, Yi-ke Li, Zhi-fen He, Bin Liu, Yun-Kun Lai

    Abstract: 3D-consistent image generation from a single 2D semantic label is an important and challenging research topic in computer graphics and computer vision. Although some related works have made great progress in this field, most of the existing methods suffer from poor disentanglement performance of shape and appearance, and lack multi-modal control. In this paper, we propose a novel end-to-end 3D-awa… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.06459  [pdf, other

    eess.IV cs.CV

    From Pixel to Cancer: Cellular Automata in Computed Tomography

    Authors: Yuxiang Lai, Xiaoxi Chen, Angtian Wang, Alan Yuille, Zongwei Zhou

    Abstract: AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors. Tumor synthesis seeks to create artificial tumors in medical images, which can greatly diversify the data and annotations for AI training. However, current tumor synthesis approaches are not applicable across different organs due to their need for specific expertise and de… ▽ More

    Submitted 5 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Early accepted to MICCAI 2024

  36. arXiv:2403.01423  [pdf, other

    cs.CR cs.LG

    Collective Certified Robustness against Graph Injection Attacks

    Authors: Yuni Lai, Bailin Pan, Kaihuang Chen, Yancheng Yuan, Kai Zhou

    Abstract: We investigate certified robustness for GNNs under graph injection attacks. Existing research only provides sample-wise certificates by verifying each node independently, leading to very limited certifying performance. In this paper, we present the first collective certificate, which certifies a set of target nodes simultaneously. To achieve it, we formulate the problem as a binary integer quadrat… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  37. arXiv:2402.18331  [pdf, other

    cs.CV

    FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

    Authors: Ziying Pan, Kun Wang, Gang Li, Feihong He, Yongxuan Lai

    Abstract: The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, cal… ▽ More

    Submitted 3 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  38. arXiv:2402.15052  [pdf, other

    cs.CL cs.AI

    ToMBench: Benchmarking Theory of Mind in Large Language Models

    Authors: Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong, Minlie Huang

    Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Under review

  39. arXiv:2402.14877  [pdf, other

    physics.ao-ph cs.LG math.DS physics.data-an physics.pop-ph

    Machine-learning prediction of tipping and collapse of the Atlantic Meridional Overturning Circulation

    Authors: Shirin Panahi, Ling-Wei Kong, Mohammadamin Moradi, Zheng-Meng Zhai, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai

    Abstract: Recent research on the Atlantic Meridional Overturning Circulation (AMOC) raised concern about its potential collapse through a tipping point due to the climate-change caused increase in the freshwater input into the North Atlantic. The predicted time window of collapse is centered about the middle of the century and the earliest possible start is approximately two years from now. More generally,… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 6 pages, 3 figures

  40. arXiv:2402.14131  [pdf, other

    eess.SP cs.LG physics.data-an

    Random forests for detecting weak signals and extracting physical information: a case study of magnetic navigation

    Authors: Mohammadamin Moradi, Zheng-Meng Zhai, Aaron Nielsen, Ying-Cheng Lai

    Abstract: It was recently demonstrated that two machine-learning architectures, reservoir computing and time-delayed feed-forward neural networks, can be exploited for detecting the Earth's anomaly magnetic field immersed in overwhelming complex signals for magnetic navigation in a GPS-denied environment. The accuracy of the detected anomaly field corresponds to a positioning accuracy in the range of 10 to… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 12 pages, 11 figures

    Journal ref: APL Machine Learning 2 (1), 016118 (2024)

  41. arXiv:2402.12659  [pdf, other

    cs.CL cs.AI cs.CE

    FinBen: A Holistic Financial Benchmark for Large Language Models

    Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

    Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  42. arXiv:2402.12317  [pdf, other

    cs.CL cs.AI

    ARKS: Active Retrieval in Knowledge Soup for Code Generation

    Authors: Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu

    Abstract: Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced str… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Retrieval-augmented code generation

  43. arXiv:2402.06147  [pdf, other

    cs.AI cs.CL

    DeAL: Decoding-time Alignment for Large Language Models

    Authors: James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth

    Abstract: Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom r… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: The appendix contains data that is offensive / disturbing in nature

  44. arXiv:2402.04796  [pdf, other

    cs.GR cs.CV

    Mesh-based Gaussian Splatting for Real-time Large-scale Deformation

    Authors: Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai

    Abstract: Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion.… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures

  45. arXiv:2402.04178  [pdf, other

    cs.CV

    SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models

    Authors: Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan, Changsheng Chen, Zitong Yu, Xiaochun Cao

    Abstract: Multimodal large language models (MLLMs) have demonstrated remarkable problem-solving capabilities in various vision fields (e.g., generic object recognition and grounding) based on strong visual semantic representation and language reasoning ability. However, whether MLLMs are sensitive to subtle visual spoof/forged clues and how they perform in the domain of face attack detection (e.g., face spo… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  46. VRMM: A Volumetric Relightable Morphable Head Model

    Authors: Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang

    Abstract: In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling. While recent volumetric prior models offer improvements over traditional methods like 3D Morphable Models (3DMMs), they face challenges in model learning and personalized reconstructions. Our VRMM overcomes these by employing a novel training framework… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted to SIGGRAPH 2024 (Conference); Project page: https://vrmm-paper.github.io/

  47. arXiv:2402.03703  [pdf

    cs.RO

    Hierarchical Large Language Models in Cloud Edge End Architecture for Heterogeneous Robot Cluster Control

    Authors: Zhirong Luan, Yujun Lai, Rundong Huang, Yan Yan, Jingwei Wang, Jizhou Lu, Badong Chen

    Abstract: Despite their powerful semantic understanding and code generation capabilities, Large Language Models (LLMs) still face challenges when dealing with complex tasks. Multi agent strategy generation and motion control are highly complex domains that inherently require experts from multiple fields to collaborate. To enhance multi agent strategy generation and motion control, we propose an innovative a… ▽ More

    Submitted 16 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  48. arXiv:2402.03699  [pdf

    cs.RO cs.CV

    Automatic Robotic Development through Collaborative Framework by Large Language Models

    Authors: Zhirong Luan, Yujun Lai, Rundong Huang, Xiaruiqi Lan, Liangjun Chen, Badong Chen

    Abstract: Despite the remarkable code generation abilities of large language models LLMs, they still face challenges in complex task handling. Robot development, a highly intricate field, inherently demands human involvement in task allocation and collaborative teamwork . To enhance robot development, we propose an innovative automated collaboration framework inspired by real-world robot developers. This fr… ▽ More

    Submitted 16 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  49. arXiv:2401.10590  [pdf, other

    cs.LG cs.CR

    Adversarially Robust Signed Graph Contrastive Learning from Balance Augmentation

    Authors: Jialong Zhou, Xing Ai, Yuni Lai, Kai Zhou

    Abstract: Signed graphs consist of edges and signs, which can be separated into structural information and balance-related information, respectively. Existing signed graph neural networks (SGNNs) typically rely on balance-related information to generate embeddings. Nevertheless, the emergence of recent adversarial attacks has had a detrimental impact on the balance-related information. Similar to how struct… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  50. arXiv:2401.09754  [pdf, other

    cs.LG cs.CR cs.SI

    Universally Robust Graph Neural Networks by Preserving Neighbor Similarity

    Authors: Yulin Zhu, Yuni Lai, Xing Ai, Kai Zhou

    Abstract: Despite the tremendous success of graph neural networks in learning relational data, it has been widely investigated that graph neural networks are vulnerable to structural attacks on homophilic graphs. Motivated by this, a surge of robust models is crafted to enhance the adversarial robustness of graph neural networks on homophilic graphs. However, the vulnerability based on heterophilic graphs r… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.