Zum Hauptinhalt springen

Showing 1–50 of 338 results for author: Hou, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12787  [pdf, other

    cs.CR cs.AI

    LLM-PBE: Assessing Data Privacy in Large Language Models

    Authors: Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song

    Abstract: Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue,… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2408.09675  [pdf, other

    cs.AI cs.MA cs.RO

    Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

    Authors: Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

    Abstract: Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutua… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 23 pages, 6 figures and 2 tables. Submitted to IEEE Journal

  3. arXiv:2408.08610  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation Based on Diffusion Model

    Authors: Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative m… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

  4. arXiv:2408.06811  [pdf

    cs.CV

    Oracle Bone Script Similiar Character Screening Approach Based on Simsiam Contrastive Learning and Supervised Learning

    Authors: Xinying Weng, Yifan Li, Shuaidong Hao, Jialiang Hou

    Abstract: This project proposes a new method that uses fuzzy comprehensive evaluation method to integrate ResNet-50 self-supervised and RepVGG supervised learning. The source image dataset HWOBC oracle is taken as input, the target image is selected, and finally the most similar image is output in turn without any manual intervention. The same feature encoding method is not used for images of different moda… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2408.05330  [pdf, other

    cs.IR cs.AI

    Neural Machine Unranking

    Authors: Jingrui Hou, Axel Finke, Georgina Cosma

    Abstract: We tackle the problem of machine unlearning within neural information retrieval, termed Neural Machine UnRanking (NuMuR) for short. Many of the mainstream task- or model-agnostic approaches for machine unlearning were designed for classification tasks. First, we demonstrate that these methods perform poorly on NuMuR tasks due to the unique challenges posed by neural information retrieval. Then, we… ▽ More

    Submitted 21 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

  6. arXiv:2408.03166  [pdf, other

    cs.IR

    CADRL: Category-aware Dual-agent Reinforcement Learning for Explainable Recommendations over Knowledge Graphs

    Authors: Shangfei Zheng, Hongzhi Yin, Tong Chen, Xiangjie Kong, Jian Hou, Pengpeng Zhao

    Abstract: Knowledge graphs (KGs) have been widely adopted to mitigate data sparsity and address cold-start issues in recommender systems. While existing KGs-based recommendation methods can predict user preferences and demands, they fall short in generating explicit recommendation paths and lack explainability. As a step beyond the above methods, recent advancements utilize reinforcement learning (RL) to fi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  7. arXiv:2407.18232  [pdf, other

    cs.CV

    LION: Linear Group RNN for 3D Object Detection in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai

    Abstract: The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e.,… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Project page: https://happinesslz.github.io/projects/LION/

  8. arXiv:2407.15138  [pdf, other

    cs.CV

    D$^4$M: Dataset Distillation via Disentangled Diffusion Model

    Authors: Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, Bowen Tang

    Abstract: Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performa… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR 2024

  9. arXiv:2407.14047  [pdf, other

    cs.CV cs.AI

    OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

    Authors: Zekun Qian, Ruize Han, Wei Feng, Junhui Hou, Linqi Song, Song Wang

    Abstract: We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensiv… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  11. arXiv:2407.10749  [pdf, other

    cs.CV

    SEED: A Simple and Effective 3D DETR in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai

    Abstract: Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution o… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2407.09786  [pdf, other

    cs.CV

    Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning

    Authors: Lintai Wu, Xianjing Cheng, Junhui Hou, Yong Xu, Huanqiang Zeng

    Abstract: In real-world scenarios, scanned point clouds are often incomplete due to occlusion issues. The task of self-supervised point cloud completion involves reconstructing missing regions of these incomplete objects without the supervision of complete ground truth. Current self-supervised methods either rely on multiple views of partial observations for supervision or overlook the intrinsic geometric s… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 12 pages,8 figures

  13. arXiv:2407.05769  [pdf, other

    cs.CV

    Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework

    Authors: Hao Jing, Anhong Wang, Lijun Zhao, Yakun Yang, Donghan Bu, Jing Zhang, Yifan Zhang, Junhui Hou

    Abstract: In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-bran… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2407.05053  [pdf, other

    cs.RO eess.SY

    Adaptive Stiffness: A Biomimetic Robotic System with Tensegrity-Based Compliant Mechanism

    Authors: Po-Yu Hsieh, June-Hao Hou

    Abstract: Biomimicry has played a pivotal role in robotics. In contrast to rigid robots, bio-inspired robots exhibit an inherent compliance, facilitating versatile movements and operations in constrained spaces. The robot implementation in fabrication, however, has posed technical challenges and mechanical complexity, thereby underscoring a noticeable gap between research and practice. To address the limita… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 14 pages, 21 figures

  15. arXiv:2407.03594  [pdf, other

    cs.CV

    UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos

    Authors: Yuzhong Huang, Chen Liu, Ji Hou, Ke Huo, Shiyu Dong, Fred Morstatter

    Abstract: We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality an… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.07710 by other authors

  16. arXiv:2407.02428  [pdf, other

    cs.RO cs.LG eess.SY stat.ML

    Comparative Evaluation of Learning Models for Bionic Robots: Non-Linear Transfer Function Identifications

    Authors: Po-Yu Hsieh, June-Hao Hou

    Abstract: The control and modeling of bionic robot dynamics have increasingly adopted model-free control strategies using machine learning methods. Given the non-linear elastic nature of bionic robotic systems, learning-based methods provide reliable alternatives by utilizing numerical data to establish a direct mapping from actuation inputs to robot trajectories without complex kinematics models. However,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 16 pages, 20 figures

  17. arXiv:2407.01330  [pdf, other

    cs.CV

    Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

    Authors: Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He

    Abstract: Unsigned distance fields (UDFs) provide a versatile framework for representing a diverse array of 3D shapes, encompassing both watertight and non-watertight geometries. Traditional UDF learning methods typically require extensive training on large datasets of 3D shapes, which is costly and often necessitates hyperparameter adjustments for new datasets. This paper presents a novel neural framework,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 14 pages, 11 figures

    ACM Class: I.3.5

  18. arXiv:2407.01306  [pdf, other

    cs.LG cs.CR

    Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

    Authors: Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani

    Abstract: The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 20 pages, 10 figures, 4 tables

  19. arXiv:2407.00866  [pdf, other

    cs.LG

    Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning

    Authors: Nexhi Sula, Abhinav Kumar, Jie Hou, Han Wang, Reza Tourani

    Abstract: With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also fa… ▽ More

    Submitted 5 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 17 pages, 14 figures, 6 tables

  20. arXiv:2406.10175  [pdf, other

    cs.CV

    Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

    Authors: Weide Liu, Jingwen Hou, Xiaoyang Zhong, Huijing Zhan, Jun Cheng, Yuming Fang, Guanghui Yue

    Abstract: Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fus… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  21. arXiv:2406.08374  [pdf, other

    cs.CV cs.AI eess.IV

    2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

    Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  22. arXiv:2406.06329  [pdf, other

    cs.CL eess.AS

    A Parameter-efficient Language Extension Framework for Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2406.05985  [pdf, other

    cs.RO

    LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding

    Authors: Jiawei Hou, Wenhao Guan, Xiangyang Xue, Taiping Zeng

    Abstract: Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene. Inspired by the representations of spatial layout in local scenes to enc… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  24. arXiv:2406.00434  [pdf, other

    cs.CV

    MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos

    Authors: Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lv, Peng Wang, Wenping Wang, Junhui Hou

    Abstract: In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  25. arXiv:2406.00037  [pdf, other

    cs.CL cs.AI

    Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering

    Authors: Hongyu Yang, Liyang He, Min Hou, Shuanghong Shen, Rui Li, Jiahui Hou, Jianhui Ma, Junda Zhao

    Abstract: Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  26. arXiv:2405.20188  [pdf, other

    cs.CV cs.GR

    SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid Registration

    Authors: Yuxin Yao, Bailin Deng, Junhui Hou, Juyong Zhang

    Abstract: Existing optimization-based methods for non-rigid registration typically minimize an alignment error metric based on the point-to-point or point-to-plane distance between corresponding point pairs on the source surface and target surface. However, these metrics can result in slow convergence or a loss of detail. In this paper, we propose SPARE, a novel formulation that utilizes a symmetrized point… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  27. arXiv:2405.19684  [pdf, other

    cs.CV

    A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

    Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

    Abstract: Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics,… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: A survey on the underwater image enhancement task

  28. arXiv:2405.15364  [pdf, other

    cs.CV

    NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

    Authors: Meng You, Zhiyu Zhu, Hui Liu, Junhui Hou

    Abstract: By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Technical Report

  29. arXiv:2405.15034  [pdf, other

    cs.CG

    NeCGS: Neural Compression for 3D Geometry Sets

    Authors: Siyu Ren, Junhui Hou, Wenping Wang

    Abstract: This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.14633  [pdf, other

    cs.CV cs.CG

    Flatten Anything: Unsupervised Neural Surface Parameterization

    Authors: Qijian Zhang, Junhui Hou, Wenping Wang, Ying He

    Abstract: Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certai… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.14271  [pdf, other

    cs.CV

    Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

    Authors: Yifan Zhang, Junhui Hou

    Abstract: Contrastive image-to-LiDAR knowledge transfer, commonly used for learning 3D representations with synchronized images and point clouds, often faces a self-conflict dilemma. This issue arises as contrastive losses unintentionally dissociate features of unmatched points and pixels that share semantic labels, compromising the integrity of learned representations. To overcome this, we harness Visual F… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Under review

  32. arXiv:2405.12223  [pdf, other

    eess.IV cs.CV

    Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

    Authors: Yinchi Zhou, Tianqi Chen, Jun Hou, Huidong Xie, Nicha C. Dvornek, S. Kevin Zhou, David L. Wilson, James S. Duncan, Chi Liu, Bo Zhou

    Abstract: Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their c… ▽ More

    Submitted 14 August, 2024; v1 submitted 5 April, 2024; originally announced May 2024.

    Comments: Accepted at Medical Image Analysis Journal

  33. arXiv:2404.15802  [pdf, other

    cs.CV cs.AI

    Raformer: Redundancy-Aware Transformer for Video Wire Inpainting

    Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han

    Abstract: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  34. arXiv:2404.14270  [pdf, other

    cs.CL cs.LG

    What do Transformers Know about Government?

    Authors: Jue Hou, Anisia Katinskaia, Lari Kotilainen, Sathianpong Trangcasanchai, Anh-Duc Vu, Roman Yangarber

    Abstract: This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  35. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  36. arXiv:2404.11401  [pdf, other

    cs.CV

    RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering

    Authors: Xianqiang Lyu, Hui Liu, Junhui Hou

    Abstract: We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  37. arXiv:2404.05997  [pdf, other

    cs.CV

    Concept-Attention Whitening for Interpretable Skin Lesion Diagnosis

    Authors: Junlin Hou, Jilan Xu, Hao Chen

    Abstract: The black-box nature of deep learning models has raised concerns about their interpretability for successful deployment in real-world clinical applications. To address the concerns, eXplainable Artificial Intelligence (XAI) aims to provide clear and understandable explanations of the decision-making process. In the medical domain, concepts such as attributes of lesions or abnormalities serve as ke… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  38. arXiv:2404.05169  [pdf, other

    cs.CV

    QMix: Quality-aware Learning with Mixed Noise for Robust Retinal Disease Diagnosis

    Authors: Junlin Hou, Jilan Xu, Rui Feng, Hao Chen

    Abstract: Due to the complexity of medical image acquisition and the difficulty of annotation, medical image datasets inevitably contain noise. Noisy data with wrong labels affects the robustness and generalization ability of deep neural networks. Previous noise learning methods mainly considered noise arising from images being mislabeled, i.e. label noise, assuming that all mislabeled images are of high im… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  39. arXiv:2404.00548  [pdf, other

    cs.CV

    Modeling State Shifting via Local-Global Distillation for Event-Frame Gaze Tracking

    Authors: Jiading Li, Zhiyu Zhu, Jinhui Hou, Junhui Hou, Jinjian Wu

    Abstract: This paper tackles the problem of passive gaze estimation using both event and frame data. Considering the inherently different physiological structures, it is intractable to accurately estimate gaze purely based on a given state. Thus, we reformulate gaze estimation as the quantification of the state shifting from the current state to several prior registered anchor states. Specifically, we propo… ▽ More

    Submitted 28 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  40. arXiv:2403.18548  [pdf, other

    cs.CV

    A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

    Authors: Xiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen

    Abstract: Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by CVPR2024

  41. arXiv:2403.16649  [pdf, other

    cs.AI

    CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

    Authors: Feiteng Fang, Liang Zhu, Min Yang, Xi Feng, Jinchang Hou, Qixuan Zhao, Chengming Li, Xiping Hu, Ruifeng Xu

    Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we… ▽ More

    Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  42. arXiv:2403.15698  [pdf, other

    cs.CV cs.AI

    SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

    Authors: Mengqi Zhou, Yuxi Wang, Jun Hou, Chuanchen Luo, Zhaoxiang Zhang, Junran Peng

    Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More

    Submitted 30 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  43. arXiv:2403.11953  [pdf, other

    eess.IV cs.CV

    Advancing COVID-19 Detection in 3D CT Scans

    Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  44. arXiv:2403.11586  [pdf, other

    cs.CV

    DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

    Authors: Yuxin Yao, Siyu Ren, Junhui Hou, Zhi Deng, Juyong Zhang, Wenping Wang

    Abstract: This paper explores the problem of reconstructing temporally consistent surfaces from a 3D point cloud sequence without correspondence. To address this challenging task, we propose DynoSurf, an unsupervised learning framework integrating a template surface representation with a learnable deformation field. Specifically, we design a coarse-to-fine strategy for learning the template surface based on… ▽ More

    Submitted 22 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  45. arXiv:2403.11498  [pdf, other

    eess.IV cs.CV

    Domain Adaptation Using Pseudo Labels for COVID-19 Detection

    Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  46. arXiv:2403.10349  [pdf, other

    cs.CV

    ParaPoint: Learning Global Free-Boundary Surface Parameterization of 3D Point Clouds

    Authors: Qijian Zhang, Junhui Hou, Ying He

    Abstract: Surface parameterization is a fundamental geometry processing problem with rich downstream applications. Traditional approaches are designed to operate on well-behaved mesh models with high-quality triangulations that are laboriously produced by specialized 3D modelers, and thus unable to meet the processing demand for the current explosion of ordinary 3D data. In this paper, we seek to perform UV… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  47. arXiv:2403.08506  [pdf, other

    cs.LG cs.AI cs.CV

    DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning

    Authors: Sikai Bai, Jie Zhang, Shuaicheng Li, Song Guo, Jingcai Guo, Jun Hou, Tao Han, Xiaocheng Lu

    Abstract: Federated learning (FL) has emerged as a powerful paradigm for learning from decentralized data, and federated domain generalization further considers the test dataset (target domain) is absent from the decentralized training data (source domains). However, most existing FL methods assume that domain labels are provided during training, and their evaluation imposes explicit constraints on the numb… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  48. Improving link prediction accuracy of network embedding algorithms via rich node attribute information

    Authors: Weiwei Gu, Jinqiang Hou, Weiyi Gu

    Abstract: Complex networks are widely used to represent an abundance of real-world relations ranging from social networks to brain networks. Inferring missing links or predicting future ones based on the currently observed network is known as the link prediction task.Recent network embedding based link prediction algorithms have demonstrated ground-breaking performance on link prediction accuracy. Those alg… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Journal ref: Journal of Social Computing, 2023, 4(4): 326-336

  49. arXiv:2403.02998  [pdf, other

    cs.CV

    Towards Calibrated Deep Clustering Network

    Authors: Yuheng Jia, Jianhong Cheng, Hui Liu, Junhui Hou

    Abstract: Deep clustering has exhibited remarkable performance; however, the over-confidence problem, i.e., the estimated confidence for a sample belonging to a particular cluster greatly exceeds its actual prediction accuracy, has been overlooked in prior research. To tackle this critical issue, we pioneer the development of a calibrated deep clustering framework. Specifically, we propose a novel dual-head… ▽ More

    Submitted 2 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  50. arXiv:2403.02710  [pdf, other

    cs.CV cs.RO

    FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

    Authors: Jiawei Hou, Xiaoyan Li, Wenhao Guan, Gang Zhang, Di Feng, Yuheng Du, Xiangyang Xue, Jian Pu

    Abstract: In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024