Skip to main content

Showing 1–50 of 64 results for author: Guan, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12709  [pdf, other

    cs.CV

    MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

    Authors: Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie

    Abstract: Multimodal large language models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, a generalist MLLM typically underperforms compared with a specialist MLLM on most VL tasks, which can be attributed to task interference. In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM. Our MoM… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Github: https://github.com/JiuTian-VL/MoME

  2. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  4. arXiv:2406.05985  [pdf, other

    cs.RO

    LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding

    Authors: Jiawei Hou, Wenhao Guan, Xiangyang Xue, Taiping Zeng

    Abstract: Spatial cognition empowers animals with remarkably efficient navigation abilities, largely depending on the scene-level understanding of spatial environments. Recently, it has been found that a neural population in the postrhinal cortex of rat brains is more strongly tuned to the spatial layout rather than objects in a scene. Inspired by the representations of spatial layout in local scenes to enc… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  5. arXiv:2405.16279  [pdf, other

    physics.ins-det cs.AI

    AI-Assisted Detector Design for the EIC (AID(2)E)

    Authors: M. Diefenthaler, C. Fanelli, L. O. Gerlach, W. Guan, T. Horn, A. Jentsch, M. Lin, K. Nagai, H. Nayak, C. Pecar, K. Suresh, A. Vossen, T. Wang, T. Wenaus

    Abstract: Artificial Intelligence is poised to transform the design of complex, large-scale detectors like the ePIC at the future Electron Ion Collider. Featuring a central detector with additional detecting systems in the far forward and far backward regions, the ePIC experiment incorporates numerous design parameters and objectives, including performance, physics reach, and cost, constrained by mechanical… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures, AI4EIC 2023 proceeding

  6. arXiv:2404.18426  [pdf, other

    cs.CV

    Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images

    Authors: Wenbin Guan, Zijiu Yang, Xiaohong Wu, Liqiong Chen, Feng Huang, Xiaohai He, Honggang Chen

    Abstract: Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwie… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  7. arXiv:2404.16555  [pdf, other

    cs.IR

    MMGRec: Multimodal Generative Recommendation with Transformer Model

    Authors: Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie

    Abstract: Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.03179  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

    Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng

    Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  9. arXiv:2403.02710  [pdf, other

    cs.CV cs.RO

    FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

    Authors: Jiawei Hou, Xiaoyan Li, Wenhao Guan, Gang Zhang, Di Feng, Yuheng Du, Xiangyang Xue, Jian Pu

    Abstract: In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  10. arXiv:2401.04312  [pdf, other

    cs.IR

    Prompt-based Multi-interest Learning Method for Sequential Recommendation

    Authors: Xue Dong, Xuemeng Song, Tongliang Liu, Weili Guan

    Abstract: Multi-interest learning method for sequential recommendation aims to predict the next item according to user multi-faceted interests given the user historical interactions. Existing methods mainly consist of a multi-interest extractor that embeds the multiple user interests based on the user interactions, and a multi-interest aggregator that aggregates the learned multi-interest embeddings to deri… ▽ More

    Submitted 28 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  11. arXiv:2312.11911  [pdf, other

    cs.CV cs.RO

    EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping

    Authors: Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu

    Abstract: Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging t… ▽ More

    Submitted 23 May, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  12. arXiv:2312.10687  [pdf, other

    eess.AS cs.SD

    MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

    Authors: Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

    Abstract: The style transfer task in Text-to-Speech refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide… ▽ More

    Submitted 31 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI2024

  13. arXiv:2312.04921  [pdf, other

    astro-ph.IM cs.DC

    Integrating the PanDA Workload Management System with the Vera C. Rubin Observatory

    Authors: Edward Karavakis, Wen Guan, Zhaoyu Yang, Tadashi Maeno, Torre Wenaus, Jennifer Adelman-McCarthy, Fernando Barreiro Megino, Kaushik De, Richard Dubois, Michelle Gower, Tim Jenness, Alexei Klimentov, Tatiana Korchuganova, Mikolaj Kowalik, Fa-Hui Lin, Paul Nilsson, Sergey Padolski, Wei Yang, Shuwei Ye

    Abstract: The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored ev… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 8 pages, 3 figures, 26th International Conference on Computing in High Energy & Nuclear Physics

  14. arXiv:2311.14326  [pdf, other

    physics.soc-ph cs.SI

    Temporal link prediction methods based on behavioral synchrony

    Authors: Yueran Duan, Qing Guan, Petter Holme, Yacheng Yang, Wei Guan

    Abstract: Link prediction -- to identify potential missing or spurious links in temporal network data -- has typically been based on local structures, ignoring long-term temporal effects. In this chapter, we propose link-prediction methods based on agents' behavioral synchrony. Since synchronous behavior signals similarity and similar agents are known to have a tendency to connect in the future, behavioral… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Journal ref: Temporal Network Theory (2nd ed.), Petter Holme and Jari Saramaki, eds., (Springer, Cham, 2023), pp. 381-402

  15. arXiv:2311.02327  [pdf, other

    cs.RO cs.DB

    ECMD: An Event-Centric Multisensory Driving Dataset for SLAM

    Authors: Peiyu Chen, Weipeng Guan, Feng Huang, Yihan Zhong, Weisong Wen, Li-Ta Hsu, Peng Lu

    Abstract: Leveraging multiple sensors enhances complex environmental perception and increases resilience to varying luminance conditions and high-speed motion patterns, achieving precise localization and mapping. This paper proposes, ECMD, an event-centric multisensory dataset containing 81 sequences and covering over 200 km of various challenging driving scenarios including high-speed motion, repetitive sc… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  16. arXiv:2310.07259  [pdf, other

    cs.CV cs.AI

    Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

    Authors: Haoyu Zhang, Meng Liu, Yaowei Wang, Da Cao, Weili Guan, Liqiang Nie

    Abstract: In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable progress made by existing approaches, they still face the challenges of incrementally understanding complex dialog history and assimilating video information. In response to these challenges… ▽ More

    Submitted 22 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  17. arXiv:2309.17056  [pdf, other

    cs.SD eess.AS

    ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech

    Authors: Wenhao Guan, Qi Su, Haodong Zhou, Shiyu Miao, Xingjia Xie, Lin Li, Qingyang Hong

    Abstract: The diffusion models including Denoising Diffusion Probabilistic Models (DDPM) and score-based generative models have demonstrated excellent performance in speech synthesis tasks. However, its effectiveness comes at the cost of numerous sampling steps, resulting in prolonged sampling time required to synthesize high-quality speech. This drawback hinders its practical applicability in real-world sc… ▽ More

    Submitted 31 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP2024

  18. arXiv:2308.11186  [pdf, other

    cs.CV

    Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

    Authors: Baoshuo Kan, Teng Wang, Wenpeng Lu, Xiantong Zhen, Weili Guan, Feng Zheng

    Abstract: Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language mode… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  19. arXiv:2308.01147  [pdf, other

    cs.CV cs.MM eess.IV

    Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

    Authors: Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li

    Abstract: The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive posit… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023. The code will be released at https://github.com/zgj77/FSACDM

  20. arXiv:2307.14061  [pdf, other

    cs.CV

    Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

    Authors: Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng

    Abstract: Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred to attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study to investigate the adversarial transferability of recent VLP models… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: To appear in ICCV 2023

  21. arXiv:2307.08235  [pdf, other

    cs.LG

    HeroLT: Benchmarking Heterogeneous Long-Tailed Learning

    Authors: Haohui Wang, Weijie Guan, Jianpeng Chen, Zi Wang, Dawei Zhou

    Abstract: Long-tailed data distributions are prevalent in a variety of domains, including finance, e-commerce, biomedical science, and cyber security. In such scenarios, the performance of machine learning models is often dominated by the head categories, while the learning of tail categories is significantly inadequate. Given abundant studies conducted to alleviate the issue, this work aims to provide a sy… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  22. arXiv:2306.11249  [pdf, other

    cs.CV cs.AI

    OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

    Authors: Cheng Tan, Siyuan Li, Zhangyang Gao, Wenfei Guan, Zedong Wang, Zicheng Liu, Lirong Wu, Stan Z. Li

    Abstract: Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of systematic understanding persists due to the diverse settings, complex implementation, and difficult reproducibility. Without standardization, compar… ▽ More

    Submitted 17 October, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: Accepted by NeurIPS 2023. 33 pages, 17 figures, 19 tables. Under review. For more details, please refer to https://github.com/chengtan9907/OpenSTL

  23. arXiv:2306.04301  [pdf, other

    cs.SD eess.AS

    Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

    Authors: Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li

    Abstract: With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-… ▽ More

    Submitted 11 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech2023

  24. arXiv:2305.09979  [pdf, other

    cs.MM

    Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

    Authors: Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan, Liqiang Nie

    Abstract: The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1) ignoring the multi-faceted query-target matching factors; 2) ignoring the potential unlabeled reference-target image pairs in existing benchmark datas… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  25. Check Belief Propagation Decoding of LDPC Codes

    Authors: Wu Guan, Liping Liang

    Abstract: Variant belief propagation (BP) algorithms are applied to low-density parity-check (LDPC) codes. However, conventional decoders suffer from a large resource consumption due to gathering messages from all the neighbour variable-nodes and/or check-nodes through cumulative calculations. In this paper, a check-belief propagation (CBP) decoding algorithm is proposed. Check-belief is used as the probabi… ▽ More

    Submitted 23 August, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: accepted by IEEE transactions on communications

  26. arXiv:2305.01898  [pdf, other

    cs.AI cs.RO cs.SE

    VSRQ: Quantitative Assessment Method for Safety Risk of Vehicle Intelligent Connected System

    Authors: Tian Zhang, Wenshan Guan, Hao Miao, Xiujie Huang, Zhiquan Liu, Chaonan Wang, Quanlong Guan, Liangda Fang, Zhifei Duan

    Abstract: The field of intelligent connected in modern vehicles continues to expand, and the functions of vehicles become more and more complex with the development of the times. This has also led to an increasing number of vehicle vulnerabilities and many safety issues. Therefore, it is particularly important to identify high-risk vehicle intelligent connected systems, because it can inform security person… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  27. arXiv:2304.08078  [pdf, other

    cs.CV

    Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

    Authors: Weinan Guan, Wei Wang, Jing Dong, Bo Peng, Tieniu Tan

    Abstract: Detecting maliciously falsified facial images and videos has attracted extensive attention from digital-forensics and computer-vision communities. An important topic in manipulation detection is the localization of the fake regions. Previous work related to forgery detection mostly focuses on the entire faces. However, recent forgery methods have developed to edit important facial components while… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  28. arXiv:2304.04400  [pdf, other

    cs.CV

    Identity-Guided Collaborative Learning for Cloth-Changing Person Reidentification

    Authors: Zan Gao, Shenxun Wei, Weili Guan, Lei Zhu, Meng Wang, Shenyong Chen

    Abstract: Cloth-changing person reidentification (ReID) is a newly emerging research topic that is aimed at addressing the issues of large feature variations due to cloth-changing and pedestrian view/pose changes. Although significant progress has been achieved by introducing extra information (e.g., human contour sketching information, human body keypoints, and 3D human information), cloth-changing person… ▽ More

    Submitted 17 November, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  29. arXiv:2302.13386  [pdf, other

    stat.AP cs.LG

    NBA2Vec: Dense feature representations of NBA players

    Authors: Webster Guan, Nauman Javed, Peter Lu

    Abstract: Understanding a player's performance in a basketball game requires an evaluation of the player in the context of their teammates and the opposing lineup. Here, we present NBA2Vec, a neural network model based on Word2Vec which extracts dense feature representations of each player by predicting play outcomes without the use of hand-crafted heuristics or aggregate statistical measures. Specifically,… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 8 pages, 9 figures, 3 tables. Submitted to the 2018 NBA Hackathon

  30. arXiv:2302.07577  [pdf, other

    cs.CV

    Efficient Teacher: Semi-Supervised Object Detection for YOLOv5

    Authors: Bowen Xu, Mingtao Chen, Wenlong Guan, Lulu Hu

    Abstract: Semi-Supervised Object Detection (SSOD) has been successful in improving the performance of both R-CNN series and anchor-free detectors. However, one-stage anchor-based detectors lack the structure to generate high-quality or flexible pseudo labels, leading to serious inconsistency problems in SSOD. In this paper, we propose the Efficient Teacher framework for scalable and effective one-stage anch… ▽ More

    Submitted 13 March, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: 10 pages

  31. ESVIO: Event-based Stereo Visual Inertial Odometry

    Authors: Peiyu Chen, Weipeng Guan, Peng Lu

    Abstract: Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on monocular and few research on stereo event vision. In this paper, we present ESVIO, the first event-based stereo visual-inertial odometry, whic… ▽ More

    Submitted 10 March, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

    Journal ref: IEEE Robotics and Automation Letters (Volume: 8, Issue: 6, June 2023)

  32. arXiv:2212.11761  [pdf

    cs.NI

    Optical Bar Code for Internet Access Application based on Optical camera communication and Bluetooth Control

    Authors: Shangsheng Wen, Manxi Liu, Yanyi Chen, Yirong Chen, Futong An, Yingcong Chen, Weipeng Guan

    Abstract: We demonstrate an internet access application based on optical camera communication and bluetooth. The app will access the website while the camera in the phone receives the optical signal. \c{opyright} 2022 The Author(s)

    Submitted 31 October, 2022; originally announced December 2022.

    Comments: 3 pages, 1 figure

  33. A Dimension-Augmented Physics-Informed Neural Network (DaPINN) with High Level Accuracy and Efficiency

    Authors: Weilong Guan, Kaihan Yang, Yinsheng Chen, Zhong Guan

    Abstract: Physics-informed neural networks (PINNs) have been widely applied in different fields due to their effectiveness in solving partial differential equations (PDEs). However, the accuracy and efficiency of PINNs need to be considerably improved for scientific and commercial use. To address this issue, we systematically propose a novel dimension-augmented physics-informed neural network (DaPINN), whic… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 33 pages, 12 figures

  34. arXiv:2210.07017  [pdf, other

    cs.CL

    ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision

    Authors: Qianying Liu, Wenyu Guan, Jianhao Shen, Fei Cheng, Sadao Kurohashi

    Abstract: Previous studies have introduced a weakly-supervised paradigm for solving math word problems requiring only the answer value annotation. While these methods search for correct value equation candidates as pseudo labels, they search among a narrow sub-space of the enormous equation space. To address this problem, we propose a novel search algorithm with combinatorial strategy \textbf{ComSearch}, wh… ▽ More

    Submitted 7 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: EACL 2023 long paper, 14 pages

  35. arXiv:2209.12160  [pdf, other

    cs.CV cs.RO

    PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features

    Authors: Weipeng Guan, Peiyu Chen, Yuhan Xie, Peng Lu

    Abstract: Event cameras are motion-activated sensors that capture pixel-level illumination changes instead of the intensity image with a fixed frame rate. Compared with the standard cameras, it can provide reliable visual perception during high-speed motions and in high dynamic range scenarios. However, event cameras output only a little information or even noise when the relative motion between the camera… ▽ More

    Submitted 26 September, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

  36. arXiv:2208.05706  [pdf

    cs.RO

    A Cooperative Positioning Flamework for Robot and Smart Phone Based on Visible Light Communication

    Authors: Junye Chen, Fangdi Li, Futong An, Chen Yang, Hongzhan Song, Shangsheng Wen, Weipeng Guan

    Abstract: A cooperative positioning flamework of human and robots based on visible light communication (VLC) is proposed. Based on the experiment system, we demonstrated it is feasible and has high-accuracy and real-time performance.

    Submitted 20 October, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: high accuracy, cooperative positioning system

  37. arXiv:2207.08387  [pdf

    cs.CV

    A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification

    Authors: Zan Gao, Hongwei Wei, Weili Guan, Jie Nie, Meng Wang, Shenyong Chen

    Abstract: Cloth-changing person reidentification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed. Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations. Current works mainly focus on body shape or contour sketches, but the huma… ▽ More

    Submitted 17 November, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: text overlap with arXiv:2108.04527

  38. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  39. arXiv:2205.04168  [pdf, other

    cs.IR

    Visual Encoding and Debiasing for CTR Prediction

    Authors: Si Chen, Chen Lin, Wanxian Guan, Jiayi Wei, Xingyuan Bu, He Guo, Hui Li, Xubin Li, Jian Xu, Bo Zheng

    Abstract: Extracting expressive visual features is crucial for accurate Click-Through-Rate (CTR) prediction in visual search advertising systems. Current commercial systems use off-the-shelf visual encoders to facilitate fast online service. However, the extracted visual features are coarse-grained and/or biased. In this paper, we present a visual encoding framework for CTR prediction to overcome these prob… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  40. arXiv:2201.03482  [pdf, other

    cs.IR

    Disentangled Graph Neural Networks for Session-based Recommendation

    Authors: Ansong Li, Zhiyong Cheng, Fan Liu, Zan Gao, Weili Guan, Yuxin Peng

    Abstract: Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history in the current session. Existing methods typically learn the session embedding at the item level, namely, aggregating the embeddings of items with or without the attention weights assigned to items. However, they ignore… ▽ More

    Submitted 10 January, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

  41. arXiv:2112.14375  [pdf, ps, other

    cs.LG cs.CL

    Variational Learning for the Inverted Beta-Liouville Mixture Model and Its Application to Text Categorization

    Authors: Yongfa Ling, Wenbo Guan, Qiang Ruan, Heping Song, Yuping Lai

    Abstract: The finite invert Beta-Liouville mixture model (IBLMM) has recently gained some attention due to its positive data modeling capability. Under the conventional variational inference (VI) framework, the analytically tractable solution to the optimization of the variational posterior distribution cannot be obtained, since the variational object function involves evaluation of intractable moments. Wit… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  42. arXiv:2111.09050  [pdf

    cs.RO

    Multi-Mobile Robot Localization and Navigation based on Visible Light Positioning

    Authors: Yanyi Chen, Zhiqing Zhong, Shangsheng Wen, Weipeng Guan

    Abstract: We demonstrated multi-mobile robot navigation based on Visible Light Positioning(VLP) localization. From our experiment, the VLP can accurately locate robots' positions in navigation.

    Submitted 1 November, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  43. arXiv:2109.12299  [pdf, other

    cs.CV

    A Novel Patch Convolutional Neural Network for View-based 3D Model Retrieval

    Authors: Zan Gao, Yuxiang Shao, Weili Guan, Meng Liu, Zhiyong Cheng, Shengyong Chen

    Abstract: Recently, many view-based 3D model retrieval methods have been proposed and have achieved state-of-the-art performance. Most of these methods focus on extracting more discriminative view-level features and effectively aggregating the multi-view images of a 3D model, but the latent relationship among these multi-view images is not fully explored. Thus, we tackle this problem from the perspective of… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

  44. arXiv:2108.04527  [pdf, other

    cs.CV

    Multigranular Visual-Semantic Embedding for Cloth-Changing Person Re-identification

    Authors: Zan Gao, Hongwei Wei, Weili Guan, Weizhi Nie, Meng Liu, Meng Wang

    Abstract: Person reidentification (ReID) is a very hot research topic in machine learning and computer vision, and many person ReID approaches have been proposed; however, most of these methods assume that the same person has the same clothes within a short time interval, and thus their visual appearance must be similar. However, in an actual surveillance environment, a given person has a great probability… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  45. arXiv:2108.04508  [pdf, other

    cs.CV

    TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation Localization

    Authors: Zan Gao, Chao Sun, Zhiyong Cheng, Weili Guan, Anan Liu, Meng Wang

    Abstract: Finding tampered regions in images is a hot research topic in machine learning and computer vision. Although many image manipulation location algorithms have been proposed, most of them only focus on the RGB images with different color spaces, and the frequency information that contains the potential tampering clues is often ignored. In this work, a novel end-to-end two-stream boundary-aware netwo… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  46. arXiv:2104.14755  [pdf

    cs.RO

    Technology Report : Robotic Localization and Navigation System for Visible Light Positioning and SLAM

    Authors: Weipeng Guan, Patrick Yue

    Abstract: Visible light positioning (VLP) technology is a promising technique as it can provide high accuracy positioning based on the existing lighting infrastructure. However, existing approaches often require dense lighting distributions. Additionally, due to complicated indoor environments, it is still challenging to develop a robust VLP. In this work, we proposed loosely-coupled multi-sensor fusion met… ▽ More

    Submitted 2 September, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: This is a technology report from Guan Weipeng's work in HKUST

  47. arXiv:2104.13665  [pdf, other

    cs.CV

    Robust Face-Swap Detection Based on 3D Facial Shape Information

    Authors: Weinan Guan, Wei Wang, Jing Dong, Bo Peng, Tieniu Tan

    Abstract: Maliciously-manipulated images or videos - so-called deep fakes - especially face-swap images and videos have attracted more and more malicious attackers to discredit some key figures. Previous pixel-level artifacts based detection techniques always focus on some unclear patterns but ignore some available semantic clues. Therefore, these approaches show weak interpretability and robustness. In thi… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  48. An intelligent Data Delivery Service for and beyond the ATLAS experiment

    Authors: Wen Guan, Tadashi Maeno, Brian Paul Bockelman, Torre Wenaus, Fahui Lin, Siarhei Padolski, Rui Zhang, Aleksandr Alekseev

    Abstract: The intelligent Data Delivery Service (iDDS) has been developed to cope with the huge increase of computing and storage resource usage in the coming LHC data taking. iDDS has been designed to intelligently orchestrate workflow and data management systems, decoupling data pre-processing, delivery, and main processing in various workflows. It is an experiment-agnostic service around a workflow-orien… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

    Comments: 6 pages, 5 figures

  49. arXiv:2102.10498  [pdf, ps, other

    cs.NI cs.IT eess.SP

    Customized Slicing for 6G: Enforcing Artificial Intelligence on Resource Management

    Authors: Wanqing Guan, Haijun Zhang, Victor C. M. Leung

    Abstract: Next generation wireless networks are expected to support diverse vertical industries and offer countless emerging use cases. To satisfy stringent requirements of diversified services, network slicing is developed, which enables service-oriented resource allocation by tailoring the infrastructure network into multiple logical networks. However, there are still some challenges in cross-domain multi… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

    Comments: to appear in IEEE Network Magazine

  50. arXiv:2011.10508  [pdf, other

    cs.RO cs.CG

    Planning Folding Motion with Simulation in the Loop Using Laser Forming Origami and Thermal Behaviors as an Example

    Authors: Yue Hao, Weilin Guan, Edwin A Peraza Hernandez, Jyh-Ming Lien

    Abstract: Designing a robot or structure that can fold itself into a target shape is a process that involves challenges originated from multiple sources. For example, the designer of rigid self-folding robots must consider foldability from geometric and kinematic aspects to avoid self-intersection and undesired deformations. Recent works have shown success in estimating foldability of a design using robot m… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.