Skip to main content

Showing 1–50 of 135 results for author: Liao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12479  [pdf, other

    cs.GR cs.CV

    SENC: Handling Self-collision in Neural Cloth Simulation

    Authors: Zhouyingcheng Liao, Sinan Wang, Taku Komura

    Abstract: We present SENC, a novel self-supervised neural cloth simulator that addresses the challenge of cloth self-collision. This problem has remained unresolved due to the gap in simulation setup between recent collision detection and response approaches and self-supervised neural simulators. The former requires collision-free initial setups, while the latter necessitates random cloth instantiation duri… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2407.03008  [pdf, other

    cs.CV

    Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

    Authors: Zhaohe Liao, Jiangtong Li, Li Niu, Liqing Zhang

    Abstract: Despite the recent progress made in Video Question-Answering (VideoQA), these methods typically function as black-boxes, making it difficult to understand their reasoning processes and perform consistent compositional reasoning. To address these challenges, we propose a \textit{model-agnostic} Video Alignment and Answer Aggregation (VA$^{3}$) framework, which is capable of enhancing both compositi… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 10 pages,CVPR

    Journal ref: CVPR (2024) 13395-13404

  3. arXiv:2406.18967  [pdf, other

    cs.CV

    Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis

    Authors: Vu Minh Hieu Phan, Yutong Xie, Bowen Zhang, Yuankai Qi, Zhibin Liao, Antonios Perperidis, Son Lam Phung, Johan W. Verjans, Minh-Son To

    Abstract: Unpaired medical image synthesis aims to provide complementary information for an accurate clinical diagnostics, and address challenges in obtaining aligned multi-modal medical scans. Transformer-based models excel in imaging translation tasks thanks to their ability to capture long-range dependencies. Although effective in supervised training settings, their performance falters in unpaired image… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: MICCAI2024 - Early Accept Top 11%

  4. arXiv:2406.18927  [pdf, other

    cs.CV

    RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation

    Authors: Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li

    Abstract: Fisheye images are categorized fisheye into central and deviated based on the optical center position. Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification. The challenge lies in the variant global distortion distribution pattern caused by the random optical center position. To address th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.17262  [pdf, other

    cs.CL

    D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

    Authors: Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

    Abstract: The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.17100  [pdf, other

    cs.CV

    Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation

    Authors: Zhenyi Liao, Qingsong Xie, Chen Chen, Hannan Lu, Zhijie Deng

    Abstract: Diffusion models (DMs) have achieved significant success in generating imaginative images given textual descriptions. However, they are likely to fall short when it comes to real-life scenarios with intricate details.The low-quality, unrealistic human faces in text-to-image generation are one of the most prominent issues, hindering the wide application of DMs in practice. Targeting addressing such… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under review

  7. SmartAxe: Detecting Cross-Chain Vulnerabilities in Bridge Smart Contracts via Fine-Grained Static Analysis

    Authors: Zeqin Liao, Yuhong Nan, Henglong Liang, Sicheng Hao, Juan Zhai, Jiajing Wu, Zibin Zheng

    Abstract: With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as th… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Journal ref: The ACM International Conference on the Foundations of Software Engineering 2024

  8. SmartState: Detecting State-Reverting Vulnerabilities in Smart Contracts via Fine-Grained State-Dependency Analysis

    Authors: Zeqin Liao, Sicheng Hao, Yuhong Nan, Zibin Zheng

    Abstract: Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contra… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures

    Journal ref: ISSTA 2023

  9. arXiv:2406.12293  [pdf, other

    cs.CV

    Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

    Authors: Zehui Liao, Shishuai Hu, Yong Xia

    Abstract: The challenge of addressing mixed closed-set and open-set label noise in medical image classification remains largely unexplored. Unlike natural image classification where there is a common practice of segregation and separate processing of closed-set and open-set noisy samples from clean ones, medical image classification faces difficulties due to high inter-class similarity which complicates the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure

  10. arXiv:2406.06874  [pdf, other

    cs.AI cs.HC cs.RO

    Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

    Authors: Chenliang Li, Siliang Zeng, Zeyi Liao, Jiaxiang Li, Dongyeop Kang, Alfredo Garcia, Mingyi Hong

    Abstract: Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a s… ▽ More

    Submitted 19 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.05768  [pdf, other

    cs.CV cs.AI

    MLCM: Multistep Consistency Distillation of Latent Diffusion Model

    Authors: Qingsong Xie, Zhenyi Liao, Chen chen, Zhijie Deng, Shixiang Tang, Haonan Lu

    Abstract: Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To addre… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  12. arXiv:2405.19671  [pdf, other

    cs.CV

    GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

    Authors: Haodong Xiang, Xinghui Li, Xiansong Lai, Wanting Zhang, Zhichao Liao, Kai Cheng, Xueping Liu

    Abstract: Recently, 3D Gaussian Splatting(3DGS) has revolutionized neural rendering with its high-quality rendering and real-time speed. However, when it comes to indoor scenes with a significant number of textureless areas, 3DGS yields incomplete and noisy reconstruction results due to the poor initialization of the point cloud and under-constrained optimization. Inspired by the continuity of signed distan… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.08270  [pdf, other

    cs.CV

    Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

    Authors: Shishuai Hu, Zehui Liao, Zeyou Liu, Yong Xia

    Abstract: Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalizat… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2404.18949  [pdf, other

    cs.LG

    The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth

    Authors: Victor Quétu, Zhu Liao, Enzo Tartaglione

    Abstract: While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models.… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.16890

  15. arXiv:2404.16890  [pdf, other

    cs.LG cs.AI

    NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer

    Authors: Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione

    Abstract: While deep neural networks are highly effective at solving complex tasks, their computational demands can hinder their usefulness in real-time applications and with limited-resources systems. Besides, for many tasks it is known that these models are over-parametrized: neoteric works have broadly focused on reducing the width of these networks, rather than their depth. In this paper, we aim to redu… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  16. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  17. arXiv:2404.11525  [pdf, other

    cs.CV eess.IV

    JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

    Authors: Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To

    Abstract: The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offeri… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  18. arXiv:2404.10324  [pdf

    cs.LG cs.CE eess.SY

    Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks

    Authors: Zhiyu Zhang, Chenkaixiang Lu, Wenchong Tian, Zhenliang Liao, Zhiguo Yuan

    Abstract: Physics-based models are computationally time-consuming and infeasible for real-time scenarios of urban drainage networks, and a surrogate model is needed to accelerate the online predictive modelling. Fully-connected neural networks (NNs) are potential surrogate models, but may suffer from low interpretability and efficiency in fitting complex targets. Owing to the state-of-the-art modelling powe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  19. arXiv:2404.07921  [pdf, other

    cs.CL

    AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs

    Authors: Zeyi Liao, Huan Sun

    Abstract: As large language models (LLMs) become increasingly prevalent and integrated into autonomous systems, ensuring their safety is imperative. Despite significant strides toward safety alignment, recent work GCG~\citep{zou2023universal} proposes a discrete token optimization algorithm and selects the single suffix with the lowest loss to successfully jailbreak aligned LLMs. In this work, we first disc… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  20. arXiv:2404.02388  [pdf, other

    cs.CV

    CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

    Authors: Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: Deep Neural Networks (DNNs) are widely used for visual classification tasks, but their complex computation process and black-box nature hinder decision transparency and interpretability. Class activation maps (CAMs) and recent variants provide ways to visually explain the DNN decision-making process by displaying 'attention' heatmaps of the DNNs. Nevertheless, the CAM explanation only offers relat… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2403.07636  [pdf, other

    cs.CV

    Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

    Authors: Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans

    Abstract: Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease. Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports. This leads to the misalignment with the ta… ▽ More

    Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR2024. Pre-print before final camera-ready version

    Journal ref: CVPR2024

  22. arXiv:2403.00258  [pdf, ps, other

    stat.ML cs.LG

    "Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

    Authors: Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, Shiliang Pu, Robert C. Qiu, Zhenyu Liao

    Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, s… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 32 pages, 4 figures, and 2 tables. Fixing typos in Theorems 1 and 2 from NeurIPS 2022 proceeding (https://proceedings.neurips.cc/paper_files/paper/2022/hash/185087ea328b4f03ea8fd0c8aa96f747-Abstract-Conference.html)

  23. arXiv:2402.15089  [pdf, other

    cs.CL cs.AI cs.LG

    AttributionBench: How Hard is Automatic Attribution Evaluation?

    Authors: Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

    Abstract: Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence. However, evaluating the answer's attribution, i.e., whether every claim within the generated responses is fully supported by its cited evidence, remains an open problem. This verification, traditionally dependent on costly human evaluation, underscores the urgent need for a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  24. arXiv:2402.10196  [pdf, other

    cs.CL cs.AI

    A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

    Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun

    Abstract: Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment,… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  25. arXiv:2402.02697  [pdf, ps, other

    cs.LG stat.ML

    Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

    Authors: Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao

    Abstract: Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  26. arXiv:2401.07951  [pdf, other

    cs.CV

    Image Similarity using An Ensemble of Context-Sensitive Models

    Authors: Zukang Liao, Min Chen

    Abstract: Image similarity has been extensively studied in computer vision. In recently years, machine-learned models have shown their ability to encode more semantics than traditional multivariate metrics. However, in labelling similarity, assigning a numerical score to a pair of images is less intuitive than determining if an image A is closer to a reference image R than another image B. In this work, we… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  27. arXiv:2312.12263  [pdf, other

    cs.CV

    FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

    Authors: Jichang Li, Guanbin Li, Hui Cheng, Zicheng Liao, Yizhou Yu

    Abstract: Federated learning with noisy labels (F-LNL) aims at seeking an optimal server model via collaborative distributed learning by aggregating multiple client models trained with local noisy or clean samples. On the basis of a federated learning framework, recent advances primarily adopt label noise filtering to separate clean samples from noisy ones on each client, thereby mitigating the negative imp… ▽ More

    Submitted 16 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI-2024; correct formats to meet standards of the AAAI manuscript

  28. arXiv:2312.02256  [pdf, other

    cs.CV cs.AI cs.GR

    EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

    Authors: Wenyang Zhou, Zhiyang Dou, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, Lingjie Liu

    Abstract: We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a la… ▽ More

    Submitted 14 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project Page: https://frank-zy-dou.github.io/projects/EMDM/index.html

  29. arXiv:2311.10983  [pdf, other

    cs.CV cs.AI cs.LG

    Multiple View Geometry Transformers for 3D Human Pose Estimation

    Authors: Ziwei Liao, Jialiang Zhu, Chunyu Wang, Han Hu, Steven L. Waslander

    Abstract: In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information accurately, particularly during occlusion. Instead, we propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures

  30. arXiv:2311.09077  [pdf, other

    cs.CV

    Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

    Authors: Zhanfeng Liao, Qian Zheng, Yan Liu, Gang Pan

    Abstract: A crucial reason for the success of existing NeRF-based methods is to build a neural density field for the geometry representation via multiple perceptron layers (MLPs). MLPs are continuous functions, however, real geometry or density field is frequently discontinuous at the interface between the air and the surface. Such a contrary brings the problem of unfaithful geometry representation. To this… ▽ More

    Submitted 4 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  31. arXiv:2311.07237  [pdf, other

    cs.CL cs.AI

    In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

    Authors: Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Wenting Zhao, Faeze Brahman, Yejin Choi, Xiang Ren

    Abstract: State-of-the-art LLMs outperform humans on reasoning tasks such as Natural Language Inference. Recent works evaluating LLMs note a marked performance drop on input data from the low-probability distribution, i.e., the longtail. Therefore, we focus on systematically generating statements involving long-tail inferential knowledge for more effective evaluation of LLMs in the reasoning space. We first… ▽ More

    Submitted 27 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  32. arXiv:2311.06956  [pdf, other

    cs.CV

    SegReg: Segmenting OARs by Registering MR Images and CT Annotations

    Authors: Zeyu Zhang, Xuyin Qi, Bowen Zhang, Biao Wu, Hien Le, Bora Jeong, Zhibin Liao, Yunxiang Liu, Johan Verjans, Minh-Son To, Richard Hartley

    Abstract: Organ at risk (OAR) segmentation is a critical process in radiotherapy treatment planning such as head and neck tumors. Nevertheless, in clinical practice, radiation oncologists predominantly perform OAR segmentations manually on CT scans. This manual process is highly time-consuming and expensive, limiting the number of patients who can receive timely radiotherapy. Additionally, CT scans offer lo… ▽ More

    Submitted 1 March, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted to ISBI 2024

  33. arXiv:2311.04686  [pdf, other

    cs.LG cs.DC stat.ML

    Robust and Communication-Efficient Federated Domain Adaptation via Random Features

    Authors: Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

    Abstract: Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, fe… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 21 pages

  34. arXiv:2310.09711  [pdf, other

    cs.CV

    LOVECon: Text-driven Training-Free Long Video Editing with ControlNet

    Authors: Zhenyi Liao, Zhijie Deng

    Abstract: Leveraging pre-trained conditional diffusion models for video editing without further tuning has gained increasing attention due to its promise in film production, advertising, etc. Yet, seminal works in this line fall short in generation length, temporal coherence, or fidelity to the source video. This paper aims to bridge the gap, establishing a simple and effective baseline for training-free di… ▽ More

    Submitted 28 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: AI for Content Creation Workshop @ CVPR 2024

  35. arXiv:2309.15461  [pdf, other

    cs.CL

    ChatCounselor: A Large Language Models for Mental Health Support

    Authors: June M. Liu, Donghao Li, He Cao, Tianhe Ren, Zeyi Liao, Jiamin Wu

    Abstract: This paper presents ChatCounselor, a large language model (LLM) solution designed to provide mental health support. Unlike generic chatbots, ChatCounselor is distinguished by its foundation in real conversations between consulting clients and professional psychologists, enabling it to possess specialized knowledge and counseling skills in the field of psychology. The training dataset, Psych8k, was… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  36. arXiv:2309.09118  [pdf, other

    cs.CV cs.AI cs.RO

    Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors

    Authors: Ziwei Liao, Jun Yang, Jingxing Qian, Angela P. Schoellig, Steven L. Waslander

    Abstract: 3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected o… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: Manuscript submitted to ICRA 2024

  37. arXiv:2309.01256  [pdf, other

    cs.CV cs.CL

    BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

    Authors: Yi Zhang, Ce Zhang, Zihan Liao, Yushun Tang, Zhihai He

    Abstract: Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Ad… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted by BMVC 2023

  38. arXiv:2308.16425  [pdf, other

    cs.LG stat.ML

    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

    Authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu

    Abstract: Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this,… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by Workshop on High-dimensional Learning Dynamics, ICML 2023, Honolulu, Hawaii

  39. arXiv:2308.08772  [pdf, other

    cs.CV

    URL: Combating Label Noise for Lung Nodule Malignancy Grading

    Authors: Xianze Ai, Zehui Liao, Yong Xia

    Abstract: Due to the complexity of annotation and inter-annotator variability, most lung nodule malignancy grading datasets contain label noise, which inevitably degrades the performance and generalizability of models. Although researchers adopt the label-noise-robust methods to handle label noise for lung nodule malignancy grading, they do not consider the inherent ordinal relation among classes of this ta… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 11 pages, accepted by DALI@MICCAI2023

  40. Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?

    Authors: Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione

    Abstract: Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducin… ▽ More

    Submitted 18 August, 2023; v1 submitted 12 August, 2023; originally announced August 2023.

  41. arXiv:2307.16143  [pdf, other

    eess.IV cs.CV

    Structure-Preserving Synthesis: MaskGAN for Unpaired MR-CT Translation

    Authors: Minh Hieu Phan, Zhibin Liao, Johan W. Verjans, Minh-Son To

    Abstract: Medical image synthesis is a challenging task due to the scarcity of paired data. Several methods have applied CycleGAN to leverage unpaired data, but they often generate inaccurate mappings that shift the anatomy. This problem is further exacerbated when the images from the source and target modalities are heavily misaligned. Recently, current methods have aimed to address this issue by incorpora… ▽ More

    Submitted 31 July, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted to MICCAI 2023

    Journal ref: MICCAI 2023

  42. arXiv:2307.00842  [pdf, other

    cs.CV

    VINECS: Video-based Neural Character Skinning

    Authors: Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann, Christian Theobalt

    Abstract: Rigging and skinning clothed human avatars is a challenging task and traditionally requires a lot of manual work and expertise. Recent methods addressing it either generalize across different characters or focus on capturing the dynamics of a single character observed under different pose configurations. However, the former methods typically predict solely static skinning weights, which perform po… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  43. arXiv:2306.11739  [pdf, other

    cs.CV cs.AI cs.RO

    Multi-view 3D Object Reconstruction and Uncertainty Modelling with Neural Shape Prior

    Authors: Ziwei Liao, Steven L. Waslander

    Abstract: 3D object reconstruction is important for semantic scene understanding. It is challenging to reconstruct detailed 3D shapes from monocular images directly due to a lack of depth information, occlusion and noise. Most current methods generate deterministic object models without any awareness of the uncertainty of the reconstruction. We tackle this problem by leveraging a neural object representatio… ▽ More

    Submitted 6 November, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Manuscript accepted by WACV 2024

  44. arXiv:2306.08489  [pdf, ps, other

    stat.ML cs.LG math.SP

    Analysis and Approximate Inference of Large Random Kronecker Graphs

    Authors: Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao

    Abstract: Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of gra… ▽ More

    Submitted 5 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 27 pages, 5 figures, 2 tables

  45. arXiv:2306.05254  [pdf, other

    cs.CV

    Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation

    Authors: Shishuai Hu, Zehui Liao, Yong Xia

    Abstract: Deep learning-based medical image segmentation models suffer from performance degradation when deployed to a new healthcare center. To address this issue, unsupervised domain adaptation and multi-source domain generalization methods have been proposed, which, however, are less favorable for clinical practice due to the cost of acquiring target-domain data and the privacy concerns associated with r… ▽ More

    Submitted 24 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: MICCAI 2023 Early Accept (Top 14%), 12 pages, 5 figures

  46. arXiv:2306.01340  [pdf, other

    cs.CV

    Transformer-based Annotation Bias-aware Medical Image Segmentation

    Authors: Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia

    Abstract: Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: MICCAI 2023 Early Accept (Top 14%), 11 pages, 2 figures

  47. arXiv:2304.02205  [pdf, other

    cs.AI cs.IR

    MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs

    Authors: Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Hai-Tao Zheng, Juanzi Li, Jie Tang

    Abstract: Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted by SIGIR 2023

  48. Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery

    Authors: Dejun Feng, Hongyu Chen, Suning Liu, Ziyang Liao, Xingyu Shen, Yakun Xie, Jun Zhu

    Abstract: With the increasing application of deep learning in various domains, salient object detection in optical remote sensing images (ORSI-SOD) has attracted significant attention. However, most existing ORSI-SOD methods predominantly rely on local information from low-level features to infer salient boundary cues and supervise them using boundary ground truth, but fail to sufficiently optimize and prot… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Accepted by TGRS

  49. arXiv:2302.03549  [pdf, ps, other

    cs.IT

    An Achievable and Analytic Solution to Information Bottleneck for Gaussian Mixtures

    Authors: Yi Song, Kai Wan, Zhenyu Liao, Giuseppe Caire

    Abstract: In this paper, we study a remote source coding scenario in which binary phase shift keying (BPSK) modulation sources are corrupted by additive white Gaussian noise (AWGN). An intermediate node, such as a relay, receives these observations and performs additional compression to balance complexity and relevance. This problem can be further formulated as an information bottleneck (IB) problem with Be… ▽ More

    Submitted 12 May, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

  50. arXiv:2212.08380  [pdf, other

    cs.CV

    Instance-specific Label Distribution Regularization for Learning with Label Noise

    Authors: Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia

    Abstract: Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor p… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.