Zum Hauptinhalt springen

Showing 1–50 of 92 results for author: Xiang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01894  [pdf, other

    cs.CV cs.HC

    Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

    Authors: Zixing Li, Chao Yan, Zhen Lan, Xiaojia Xiang, Han Zhou, Jun Lai, Dengqing Tang

    Abstract: Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and ver… ▽ More

    Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages,15 figures

  2. arXiv:2405.02608  [pdf, other

    cs.CV cs.AI cs.RO

    UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model

    Authors: Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx

    Abstract: Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also ana… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024. Code is available at https://github.com/facebookresearch/UnSAMFlow

  3. arXiv:2404.03181  [pdf, other

    cs.CV

    MonoCD: Monocular 3D Object Detection with Complementary Depths

    Authors: Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan

    Abstract: Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formu… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2404.01654  [pdf, other

    cs.CV cs.AI eess.IV eess.SP

    AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

    Authors: Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng

    Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

  5. arXiv:2404.00257   

    cs.CV cs.AI cs.LG eess.IV

    YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery

    Authors: Qian Wan, Xiang Xiang, Qinhao Zhou

    Abstract: Because of its use in practice, open-world object detection (OWOD) has gotten a lot of attention recently. The challenge is how can a model detect novel classes and then incrementally learn them without forgetting previously known classes. Previous approaches hinge on strongly-supervised or weakly-supervised novel-class data for novel-class detection, which may not apply to real applications. We c… ▽ More

    Submitted 22 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Withdrawn because it was submitted without consent of the first author. In addition, this submission has some errors

  6. arXiv:2403.19979  [pdf, other

    cs.CV cs.AI cs.LG

    Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer

    Authors: Yuwen Tan, Qinhao Zhou, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

    Abstract: Class-incremental learning (CIL) aims to enable models to continuously learn new classes while overcoming catastrophic forgetting. The introduction of pre-trained models has brought new tuning paradigms to CIL. In this paper, we revisit different parameter-efficient tuning (PET) methods within the context of continual learning. We observe that adapter tuning demonstrates superiority over prompt-ba… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: To appear at CVPR 2024

  7. arXiv:2403.19962  [pdf, other

    cs.CL cs.AI cs.LG

    Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

    Authors: Qinhao Zhou, Zihan Zhang, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

    Abstract: Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities, making them highly successful in a variety of tasks. However, when used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4. As intelligent agents, LLMs need to have the capabilities… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To appear at NAACL 2024

  8. arXiv:2403.18816  [pdf, other

    cs.CV

    Garment3DGen: 3D Garment Stylization and Texture Generation

    Authors: Nikolaos Sarafianos, Tuur Stuyck, Xiaoyu Xiang, Yilei Li, Jovan Popovic, Rakesh Ranjan

    Abstract: We introduce Garment3DGen a new method to synthesize 3D garment assets from a base mesh given a single input image as guidance. Our proposed approach allows users to generate 3D textured clothes based on both real and synthetic images, such as those generated by text prompts. The generated assets can be directly draped and simulated on human bodies. We leverage the recent progress of image-to-3D d… ▽ More

    Submitted 13 August, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Project Page and Code: https://nsarafianos.github.io/garment3dgen

  9. arXiv:2403.12317  [pdf, other

    cs.CV

    EffiPerception: an Efficient Framework for Various Perception Tasks

    Authors: Xinhao Xiang, Simon Dräger, Jiawei Zhang

    Abstract: The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a spe… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  10. arXiv:2401.00135  [pdf

    eess.IV cs.CV

    Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction

    Authors: Shuo Xu, Yucheng Zhang, Gang Chen, Xincheng Xiang, Peng Cong, Yuewen Sun

    Abstract: Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality tr… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 11 pages, 12 figures, Journal paper

  11. arXiv:2312.14239  [pdf, other

    cs.CV eess.IV

    PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

    Authors: Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan

    Abstract: 3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded reg… ▽ More

    Submitted 5 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project Page: https://platonerf.github.io/

  12. arXiv:2312.06736  [pdf, other

    cs.CV

    SqueezeSAM: User friendly mobile interactive segmentation

    Authors: Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  13. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, Jing Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/

  14. arXiv:2312.03640  [pdf, other

    eess.IV cs.CV

    Training Neural Networks on RAW and HDR Images for Restoration Tasks

    Authors: Lei Luo, Alexandre Chapiro, Xiaoyu Xiang, Yuchen Fan, Rakesh Ranjan, Rafal Mantiuk

    Abstract: The vast majority of standard image and video content available online is represented in display-encoded color spaces, in which pixel values are conveniently scaled to a limited range (0-1) and the color distribution is approximately perceptually uniform. In contrast, both camera RAW and high dynamic range (HDR) images are often represented in linear color spaces, in which color values are linearl… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  15. arXiv:2312.00863  [pdf, other

    cs.CV

    EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

    Authors: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications.… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  16. arXiv:2311.16892  [pdf, other

    cs.IR

    Enhancing Item-level Bundle Representation for Bundle Recommendation

    Authors: Xiaoyu Du, Kun Qian, Yunshan Ma, Xinguang Xiang

    Abstract: Bundle recommendation approaches offer users a set of related items on a particular topic. The current state-of-the-art (SOTA) method utilizes contrastive learning to learn representations at both the bundle and item levels. However, due to the inherent difference between the bundle-level and item-level preferences, the item-level representations may not receive sufficient information from the bun… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  17. arXiv:2311.16727  [pdf

    cond-mat.mtrl-sci cs.LG physics.atm-clus

    Sluggish and Chemically-Biased Interstitial Diffusion in Concentrated Solid Solution Alloys: Mechanisms and Methods

    Authors: Biao Xu, Haijun Fu, Shasha Huang, Shihua Ma, Yaoxu Xiong, Jun Zhang, Xuepeng Xiang, Wenyu Lu, Ji-Jung Kai, Shijun Zhao

    Abstract: Interstitial diffusion is a pivotal process that governs the phase stability and irradiation response of materials in non-equilibrium conditions. In this work, we study sluggish and chemically-biased interstitial diffusion in Fe-Ni concentrated solid solution alloys (CSAs) by combining machine learning (ML) and kinetic Monte Carlo (kMC), where ML is used to accurately and efficiently predict the m… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 30 pages,9 figures

  18. arXiv:2311.03742  [pdf, other

    cs.CV

    3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion

    Authors: Xinhao Xiang, Simon Dräger, Jiawei Zhang

    Abstract: Good 3D object detection performance from LiDAR-Camera sensors demands seamless feature alignment and fusion strategies. We propose the 3DifFusionDet framework in this paper, which structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes. In this framework, ground truth boxes diffuse in a random distribution for training, and the model learns to reverse t… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  19. arXiv:2311.03620  [pdf, other

    cs.CV

    FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision Transformer Fusion

    Authors: Xinhao Xiang, Jiawei Zhang

    Abstract: For 3D object detection, both camera and lidar have been demonstrated to be useful sensory devices for providing complementary information about the same scenery with data representations in different modalities, e.g., 2D RGB image vs 3D point cloud. An effective representation learning and fusion of such multi-modal sensor data is necessary and critical for better 3D object detection performance.… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  20. arXiv:2310.18840  [pdf, other

    cs.CV

    Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

    Authors: Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue

    Abstract: Personalized text-to-image (T2I) synthesis based on diffusion models has attracted significant attention in recent research. However, existing methods primarily concentrate on customizing subjects or styles, neglecting the exploration of global geometry. In this study, we propose an approach that focuses on the customization of 360-degree panoramas, which inherently possess global geometric proper… ▽ More

    Submitted 7 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted by WACV 2024, Project Page: https://littlewhitesea.github.io/stitchdiffusion.github.io/

  21. arXiv:2310.16003  [pdf, other

    cs.CV

    CVPR 2023 Text Guided Video Editing Competition

    Authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola

    Abstract: Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no stand… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Project page: https://sites.google.com/view/loveucvpr23/track4

  22. arXiv:2310.05549  [pdf, other

    stat.ML cs.LG

    A New Transformation Approach for Uplift Modeling with Binary Outcome

    Authors: Kun Li, Jiang Tian, Xiaojia Xiang

    Abstract: Uplift modeling has been used effectively in fields such as marketing and customer retention, to target those customers who are more likely to respond due to the campaign or treatment. Essentially, it is a machine learning technique that predicts the gain from performing some action with respect to not taking it. A popular class of uplift models is the transformation approach that redefines the ta… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  23. arXiv:2306.15161  [pdf, other

    eess.AS cs.SD

    Wespeaker baselines for VoxSRC2023

    Authors: Shuai Wang, Chengdong Liang, Xu Xiang, Bing Han, Zhengyang Chen, Hongji Wang, Wen Ding

    Abstract: This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In thi… ▽ More

    Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  24. arXiv:2304.10537  [pdf, other

    cs.CV cs.GR

    Learning Neural Duplex Radiance Fields for Real-Time View Synthesis

    Authors: Ziyu Wan, Christian Richardt, Aljaž Božič, Chao Li, Vijay Rengarajan, Seonghyeon Nam, Xiaoyu Xiang, Tuotuo Li, Bo Zhu, Rakesh Ranjan, Jing Liao

    Abstract: Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs in… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page: http://raywzy.com/NDRF

  25. arXiv:2303.00748  [pdf, other

    cs.CV

    Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

    Authors: Yawei Li, Yuchen Fan, Xiaoyu Xiang, Denis Demandolx, Rakesh Ranjan, Radu Timofte, Luc Van Gool

    Abstract: The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration. To achieve that, we start by analyzing two important properties of natural images including cross-scale similarity and anisotropic image features. Inspired by that, we propose the anchored stripe self-attention which achieves a good b… ▽ More

    Submitted 25 May, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. 12 pages, 7 figures, 11 tables

  26. arXiv:2302.01563  [pdf, other

    cs.LG

    Causal Inference Based Single-branch Ensemble Trees For Uplift Modeling

    Authors: Fanglan Zheng, Menghan Wang, Kun Li, Jiang Tian, Xiaojia Xiang

    Abstract: In this manuscript (ms), we propose causal inference based single-branch ensemble trees for uplift modeling, namely CIET. Different from standard classification methods for predictive probability modeling, CIET aims to achieve the change in the predictive probability of outcome caused by an action or a treatment. According to our CIET, two partition criteria are specifically designed to maximize t… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  27. arXiv:2212.03961  [pdf, other

    cs.CV

    FSID: Fully Synthetic Image Denoising via Procedural Scene Generation

    Authors: Gyeongmin Choe, Beibei Du, Seonghyeon Nam, Xiaoyu Xiang, Bo Zhu, Rakesh Ranjan

    Abstract: For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  28. arXiv:2211.00815  [pdf, other

    cs.SD eess.AS

    Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: Many speaker recognition challenges have been held to assess the speaker verification system in the wild and probe the performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on the voxceleb, is the most popular. Besides, another challenge called CN-Celeb Speaker Recognition Challenge (CNSRC) is also held this year, which is based on the Chinese celebrity multi-genre dataset CN-C… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by InterSpeech 2023

  29. arXiv:2210.17016  [pdf, other

    cs.SD eess.AS

    Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

    Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

    Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

  30. arXiv:2209.09076  [pdf, other

    cs.SD eess.AS

    SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, the online system and the offline system. Different ResNet-based backbones and loss functions are explored. Our final fusion system achieved 3rd place in track1. For track3, we implemented statistic adaptation and jointly training based domain adaptat… ▽ More

    Submitted 20 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: System description of VoxSRC 2022

  31. arXiv:2208.08721  [pdf, other

    cs.CV

    Temporal Up-Sampling for Asynchronous Events

    Authors: Xijie Xiang, Lin Zhu, Jianing Li, Yonghong Tian, Tiejun Huang

    Abstract: The event camera is a novel bio-inspired vision sensor. When the brightness change exceeds the preset threshold, the sensor generates events asynchronously. The number of valid events directly affects the performance of event-based tasks, such as reconstruction, detection, and recognition. However, when in low-brightness or slow-moving scenes, events are often sparse and accompanied by noise, whic… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: 8 pages, 7 figures, conference

    Journal ref: ICME 2022

  32. arXiv:2207.13069  [pdf

    cs.CR cs.DC

    Spatial data sharing with secure multi-party computation for exploratory spatial data analysis

    Authors: Shuo Shen, Xinyan Zhu, Yanlei Ma, XIe Xiang, Sun Lilin, Xie Hongjun, An Rui

    Abstract: Spatial data sharing plays a significant role in opening data research and promoting government agency transparency. However, valuable spatial data, like high-precision geographic information and personal traffic records, cannot be made public because they may incur leakage risks such as intrusion, theft, and the unauthorised sale of proprietary information. When participants with confidential dat… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: 16 Pages, 5 Figures, 6 Tables

  33. arXiv:2206.02146  [pdf, other

    cs.CV eess.IV

    Recurrent Video Restoration Transformer with Guided Deformable Attention

    Authors: Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc Van Gool

    Abstract: Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusi… ▽ More

    Submitted 12 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted by NeurIPS 2022. Code: https://github.com/JingyunLiang/RVRT

  34. arXiv:2205.10619  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

    Authors: Zihan Zhang, Xiang Xiang, Xuehua Peng, Jianbo Shao

    Abstract: Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  35. arXiv:2205.10611  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Lightweight Human Pose Estimation Using Heatmap-Weighting Loss

    Authors: Shiqi Li, Xiang Xiang

    Abstract: Recent research on human pose estimation exploits complex structures to improve performance on benchmark datasets, ignoring the resource overhead and inference speed when the model is actually deployed. In this paper, we lighten the computation cost and parameters of the deconvolution head network in SimpleBaseline and introduce an attention mechanism that utilizes original, inter-level, and intra… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

    Comments: 7 pages, 5 figures

  36. arXiv:2205.10490  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aligning Logits Generatively for Principled Black-Box Knowledge Distillation

    Authors: Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

    Abstract: Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization di… ▽ More

    Submitted 30 March, 2024; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: To appear at CVPR 2024; significantly rewritten with extra experiments since the preliminary report

  37. arXiv:2203.14863  [pdf, other

    cs.CV cs.MM

    HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars

    Authors: Xiaoyu Xiang, Jon Morton, Fitsum A Reda, Lucas Young, Federico Perazzi, Rakesh Ranjan, Amit Kumar, Andrea Colaco, Jan Allebach

    Abstract: A promising direction for recovering the lost information in low-resolution headshot images is utilizing a set of high-resolution exemplars from the same identity. Complementary images in the reference set can improve the generated headshot quality across many different views and poses. However, it is challenging to make the best use of multiple exemplars: the quality and alignment of each exempla… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Technical Report

  38. arXiv:2203.08140  [pdf, other

    cs.CV cs.AI cs.MM

    Learning Spatio-Temporal Downsampling for Effective Video Upscaling

    Authors: Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan

    Abstract: Downsampling is one of the most basic image processing operations. Improper spatio-temporal downsampling applied on videos can cause aliasing issues such as moiré patterns in space and the wagon-wheel effect in time. Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artif… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: Main paper: 13 pages, 8 figures; appendix: 8 pages, 10 figures

    ACM Class: I.2; I.4.3; I.4.4

  39. Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

    Authors: Youming Deng, Yansheng Li, Yongjun Zhang, Xiang Xiang, Jian Wang, Jingdong Chen, Jiayi Ma

    Abstract: As far as Scene Graph Generation (SGG), coarse and fine predicates mix in the dataset due to the crowd-sourced labeling, and the long-tail problem is also pronounced. Given this tricky situation, many existing SGG methods treat the predicates equally and learn the model under the supervision of mixed-granularity predicates in one stage, leading to relatively coarse predictions. In order to allevia… ▽ More

    Submitted 21 October, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: ECCV 2022

  40. arXiv:2203.06841  [pdf, other

    cs.CV

    STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

    Authors: Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao

    Abstract: The target of space-time video super-resolution (STVSR) is to increase the spatial-temporal resolution of low-resolution (LR) and low frame rate (LFR) videos. Recent approaches based on deep learning have made significant improvements, but most of them only use two adjacent frames, that is, short-term features, to synthesize the missing frame embedding, which cannot fully explore the information f… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 March, 2022; originally announced March 2022.

  41. arXiv:2203.00452  [pdf, other

    cs.CV cs.LG

    Long-Tailed Classification with Gradual Balanced Loss and Adaptive Feature Generation

    Authors: Zihan Zhang, Xiang Xiang

    Abstract: The real-world data distribution is essentially long-tailed, which poses great challenge to the deep model. In this work, we propose a new method, Gradual Balanced Loss and Adaptive Feature Generator (GLAG) to alleviate imbalance. GLAG first learns a balanced and robust feature model with Gradual Balanced Loss, then fixes the feature model and augments the under-represented tail classes on the fea… ▽ More

    Submitted 27 February, 2022; originally announced March 2022.

  42. arXiv:2202.07356  [pdf, other

    stat.ML cs.LG

    Realistic Counterfactual Explanations with Learned Relations

    Authors: Xintao Xiang, Artem Lenskiy

    Abstract: Many existing methods of counterfactual explanations ignore the intrinsic relationships between data attributes and thus fail to generate realistic counterfactuals. Moreover, the existing models that account for relationships require domain knowledge, which limits their applicability in complex real-world applications. In this paper, we propose a novel approach to realistic counterfactual explanat… ▽ More

    Submitted 29 May, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  43. arXiv:2202.01256  [pdf, other

    cs.AI

    Introduction to The Dynamic Pickup and Delivery Problem Benchmark -- ICAPS 2021 Competition

    Authors: Jianye Hao, Jiawen Lu, Xijun Li, Xialiang Tong, Xiang Xiang, Mingxuan Yuan, Hankz Hankui Zhuo

    Abstract: The Dynamic Pickup and Delivery Problem (DPDP) is an essential problem within the logistics domain. So far, research on this problem has mainly focused on using artificial data which fails to reflect the complexity of real-world problems. In this draft, we would like to introduce a new benchmark from real business scenarios as well as a simulator supporting the dynamic evaluation. The benchmark an… ▽ More

    Submitted 18 January, 2022; originally announced February 2022.

  44. Low-Interception Waveform: To Prevent the Recognition of Spectrum Waveform Modulation via Adversarial Examples

    Authors: Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu

    Abstract: Deep learning is applied to many complex tasks in the field of wireless communication, such as modulation recognition of spectrum waveforms, because of its convenience and efficiency. This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform. Some existing works address this problem directly using the conc… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 4 pages, 4 figures, published in 2021 34th General Assembly and Scientific Symposium of the International Union of Radio Science, URSI GASS 2021

    Journal ref: URSI GASS, 2021, pp. 1-4

  45. Adversarial Jamming for a More Effective Constellation Attack

    Authors: Haidong Xie, Yizhou Xu, Yuanqing Chen, Nan Ji, Shuai Yuan, Naijin Liu, Xueshuang Xiang

    Abstract: The common jamming mode in wireless communication is band barrage jamming, which is controllable and difficult to resist. Although this method is simple to implement, it is obviously not the best jamming waveform. Therefore, based on the idea of adversarial examples, we propose the adversarial jamming waveform, which can independently optimize and find the best jamming waveform. We attack QAM with… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 3 pages, 2 figures, published in The 13th International Symposium on Antennas, Propagation and EM Theory (ISAPE 2021)

  46. arXiv:2111.14806  [pdf, other

    cs.CV cs.LG

    Coarse-To-Fine Incremental Few-Shot Learning

    Authors: Xiang Xiang, Yuwen Tan, Qian Wan, Jing Ma

    Abstract: Different from fine-tuning models pre-trained on a large-scale dataset of preset classes, class-incremental learning (CIL) aims to recognize novel classes over time without forgetting pre-trained classes. However, a given model will be challenged by test images with finer-grained classes, e.g., a basenji is at most recognized as a dog. Such images form a new training set (i.e., support set) so tha… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  47. arXiv:2111.07032  [pdf, other

    cs.LG

    Learning to Evolve on Dynamic Graphs

    Authors: Xintao Xiang, Tiancheng Huang, Donglin Wang

    Abstract: Representation learning in dynamic graphs is a challenging problem because the topology of graph and node features vary at different time. This requires the model to be able to effectively capture both graph topology information and temporal information. Most existing works are built on recurrent neural networks (RNNs), which are used to exact temporal information of dynamic graphs, and thus they… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  48. arXiv:2108.04541  [pdf, other

    cs.AI

    Accelerating Evolutionary Neural Architecture Search via Multi-Fidelity Evaluation

    Authors: Shangshang Yang, Ye Tian, Xiaoshu Xiang, Shichen Peng, Xingyi Zhang

    Abstract: Evolutionary neural architecture search (ENAS) has recently received increasing attention by effectively finding high-quality neural architectures, which however consumes high computational cost by training the architecture encoded by each individual for complete epochs in individual evaluation. Numerous ENAS approaches have been developed to reduce the evaluation cost, but it is often difficult f… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 15 pages, 11 figures

    MSC Class: 68W50; 68T07 ACM Class: I.2.6

  49. arXiv:2104.09461  [pdf

    cs.CV cs.AI

    Entropy-based Optimization via A* Algorithm for Parking Space Recommendation

    Authors: Xin Wei, Runqi Qiu, Houyu Yu, Yurun Yang, Haoyu Tian, Xiang Xiang

    Abstract: This paper addresses the path planning problems for recommending parking spaces, given the difficulties of identifying the most optimal route to vacant parking spaces and the shortest time to leave the parking space. Our optimization approach is based on the entropy method and realized by the A* algorithm. Experiments have shown that the combination of A* and the entropy value induces the optimal… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  50. arXiv:2104.07473  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

    Authors: Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

    Abstract: In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A naïve method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in t… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Journal version of "Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution"(CVPR-2020). 14 pages, 14 figures