Skip to main content

Showing 1–50 of 125 results for author: Fu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09209  [pdf, other

    cs.CL eess.AS

    Pronunciation Assessment with Multi-modal Large Language Models

    Authors: Kaiqi Fu, Linkai Peng, Nan Yang, Shuran Zhou

    Abstract: Large language models (LLMs), renowned for their powerful conversational abilities, are widely recognized as exceptional tools in the field of education, particularly in the context of automated intelligent instruction systems for language learning. In this paper, we propose a scoring system based on LLMs, motivated by their positive impact on text-related scoring tasks. Specifically, the speech e… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.15848  [pdf, other

    cs.CV

    Quality-guided Skin Tone Enhancement for Portrait Photography

    Authors: Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai

    Abstract: In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image contin… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  4. arXiv:2406.08804  [pdf, other

    cs.DC cs.AI cs.IR

    DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

    Authors: Kairui Fu, Shengyu Zhang, Zheqi Lv, Jingyuan Chen, Jiwei Li

    Abstract: Due to the continuously improving capabilities of mobile edges, recommender systems start to deploy models on edges to alleviate network congestion caused by frequent mobile requests. Several studies have leveraged the proximity of edge-side to real-time data, fine-tuning them to create edge-specific models. Despite their significant progress, these methods require substantial on-edge computationa… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  5. arXiv:2406.02017  [pdf, other

    cs.LG stat.ML

    On the Mode-Seeking Properties of Langevin Dynamics

    Authors: Xiwei Cheng, Kexin Fu, Farzan Farnia

    Abstract: The Langevin Dynamics framework, which aims to generate samples from the score function of a probability distribution, is widely used for analyzing and interpreting score-based generative modeling. While the convergence behavior of Langevin Dynamics under unimodal distributions has been extensively studied in the literature, in practice the data distribution could consist of multiple distinct mode… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2405.20600  [pdf, other

    cs.AI

    Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

    Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

    Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.19735  [pdf, other

    cs.CV

    Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes

    Authors: Yong-Qiang Mao, Hanbo Bi, Xuexue Li, Kaiqiang Chen, Zhirui Wang, Xian Sun, Kun Fu

    Abstract: Thanks to the application of deep learning technology in point cloud processing of the remote sensing field, point cloud segmentation has become a research hotspot in recent years, which can be applied to real-world 3D, smart cities, and other fields. Although existing solutions have made unprecedented progress, they ignore the inherent characteristics of point clouds in remote sensing fields that… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  8. arXiv:2405.19689  [pdf, other

    cs.CV cs.IR

    Uncertainty-aware sign language video retrieval with probability distribution modeling

    Authors: Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu

    Abstract: Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the mapping between sign language video and text through fine-grained modal alignment. However, due to th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  9. arXiv:2405.17140  [pdf, other

    cs.CV

    SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

    Authors: Yong-Qiang Mao, Hanbo Bi, Liangyu Xu, Kaiqiang Chen, Zhirui Wang, Xian Sun, Kun Fu

    Abstract: Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learn… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2404.13322  [pdf, other

    cs.LG cs.AI

    MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

    Authors: Kunxi Li, Tianyu Zhan, Kairui Fu, Shengyu Zhang, Kun Kuang, Jiwei Li, Zhou Zhao, Fei Wu

    Abstract: In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present Mer… ▽ More

    Submitted 17 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  11. arXiv:2404.08980  [pdf, other

    cs.LG stat.ML

    Stability and Generalization in Free Adversarial Training

    Authors: Xiwei Cheng, Kexin Fu, Farzan Farnia

    Abstract: While adversarial training methods have resulted in significant improvements in the deep neural nets' robustness against norm-bounded adversarial perturbations, their generalization performance from training samples to test data has been shown to be considerably worse than standard empirical risk minimization methods. Several recent studies seek to connect the generalization behavior of adversaria… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  12. arXiv:2404.08195  [pdf, other

    cs.CV

    Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation

    Authors: Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song

    Abstract: Weakly supervised semantic segmentation (WSSS) with image-level labels intends to achieve dense tasks without laborious annotations. However, due to the ambiguous contexts and fuzzy regions, the performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity while being barely noticed by previous literature. In this wor… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  13. arXiv:2403.18238  [pdf, other

    cs.CV

    TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes

    Authors: Liangyu Xu, Wanxuan Lu, Hongfeng Yu, Yongqiang Mao, Hanbo Bi, Chenglong Liu, Xian Sun, Kun Fu

    Abstract: As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 17 pages, 9 figures

  14. arXiv:2403.09675  [pdf, other

    cs.CV cs.GR

    Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

    Authors: Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie

    Abstract: We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of ex… ▽ More

    Submitted 4 February, 2024; originally announced March 2024.

    Comments: See ancillary files for link to supplemental material

  15. arXiv:2403.04306  [pdf, other

    cs.CV cs.AI cs.LG

    Effectiveness Assessment of Recent Large Vision-Language Models

    Authors: Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khan

    Abstract: The advent of large vision-language models (LVLMs) represents a remarkable advance in the quest for artificial general intelligence. However, the model's effectiveness in both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by Visual Intelligence

  16. arXiv:2403.01968  [pdf, other

    cs.CV

    Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

    Authors: Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao

    Abstract: Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting fra… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures

  17. arXiv:2402.18467  [pdf, other

    cs.CV

    Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

    Authors: Zhiwei Yang, Kexue Fu, Minghong Duan, Linhao Qu, Shuo Wang, Zhijian Song

    Abstract: Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve segmentation tasks without dense annotations. However, attributed to the frequent coupling of co-occurring objects and the limited supervision from image-level labels, the challenging co-occurrence problem is widely present and leads to false activation of objects in WSSS. In this work, we devise a 'Separate and… ▽ More

    Submitted 21 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024

  18. SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation

    Authors: Liangyu Xu, Wanxuan Lu, Hongfeng Yu, Fanglong Yao, Xian Sun, Kun Fu

    Abstract: Extrapolating future weather radar echoes from past observations is a complex task vital for precipitation nowcasting. The spatial morphology and temporal evolution of radar echoes exhibit a certain degree of correlation, yet they also possess independent characteristics. {Existing methods learn unified spatial and temporal representations in a highly coupled feature space, emphasizing the correla… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 16 pages, 11 figures, TGRS

  19. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  20. arXiv:2402.10435  [pdf, other

    cs.CV

    Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

    Authors: Xin Zhang, Keren Fu, Qijun Zhao

    Abstract: Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we pr… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 12 pages, 6 figures

  21. arXiv:2402.09446  [pdf, other

    cs.GR physics.comp-ph

    MeshAC: A 3D Mesh Generation and Adaptation Package for Multiscale Coupling Methods

    Authors: Kejie Fu, Mingjie Liao, Yangshuai Wang, Jianjun Chen, Lei Zhang

    Abstract: This paper introduces the MeshAC package, which generates three-dimensional adaptive meshes tailored for the efficient and robust implementation of multiscale coupling methods. While Delaunay triangulation is commonly used for mesh generation across the entire computational domain, generating meshes for multiscale coupling methods is more challenging due to intrinsic discrete structures such as de… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  22. arXiv:2401.14579  [pdf

    cs.CV

    Recognizing Multiple Ingredients in Food Images Using a Single-Ingredient Classification Model

    Authors: Kun Fu, Ying Dai

    Abstract: Recognizing food images presents unique challenges due to the variable spatial layout and shape changes of ingredients with different cooking and cutting methods. This study introduces an advanced approach for recognizing ingredients segmented from food images. The method localizes the candidate regions of the ingredients using the locating and sliding window techniques. Then, these regions are as… ▽ More

    Submitted 18 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 9 pages, 21 figures, 6 tables

  23. arXiv:2401.13127  [pdf, other

    cs.RO cs.MA

    Generalization of Heterogeneous Multi-Robot Policies via Awareness and Communication of Capabilities

    Authors: Pierce Howell, Max Rudolph, Reza Torbati, Kevin Fu, Harish Ravichandar

    Abstract: Recent advances in multi-agent reinforcement learning (MARL) are enabling impressive coordination in heterogeneous multi-robot teams. However, existing approaches often overlook the challenge of generalizing learned policies to teams of new compositions, sizes, and robots. While such generalization might not be important in teams of virtual agents that can retrain policies on-demand, it is pivotal… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Presented at the 7th Conference on Robot Learning (CoRL 2023), Atlanta, USA

  24. arXiv:2401.03331  [pdf, other

    cs.CV cs.LG

    Walnut Detection Through Deep Learning Enhanced by Multispectral Synthetic Images

    Authors: Kaiming Fu, Tong Lei, Maryia Halubok, Brian N. Bailey

    Abstract: The accurate identification of walnuts within orchards brings forth a plethora of advantages, profoundly amplifying the efficiency and productivity of walnut orchard management. Nevertheless, the unique characteristics of walnut trees, characterized by their closely resembling shapes, colors, and textures between the walnuts and leaves, present a formidable challenge in precisely distinguishing be… ▽ More

    Submitted 31 October, 2023; originally announced January 2024.

    Comments: This work was presented at IEEE/RSI International Conference on Intelligent Robots and Systems (IROS) Workshop

  25. arXiv:2401.01569  [pdf, other

    cs.CV

    AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement

    Authors: Kang Fu, Yicong Peng, Zicheng Zhang, Qihang Xu, Xiaohong Liu, Jia Wang, Guangtao Zhai

    Abstract: Recently, many algorithms have employed image-adaptive lookup tables (LUTs) to achieve real-time image enhancement. Nonetheless, a prevailing trend among existing methods has been the employment of linear combinations of basic LUTs to formulate image-adaptive LUTs, which limits the generalization ability of these methods. To address this limitation, we propose a novel framework named AttentionLut… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  26. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  27. arXiv:2401.00248  [pdf, other

    cs.CV cs.AI

    Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

    Authors: Xianjie Liu, Keren Fu, Qijun Zhao

    Abstract: The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. We have high expectations regarding whether SAM, as a foundation model, can be im… ▽ More

    Submitted 22 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

  28. arXiv:2312.04831  [pdf, other

    cs.CV

    Towards Context-Stable and Visual-Consistent Image Inpainting

    Authors: Yikai Wang, Chenjie Cao, Ke Fan Xiangyang Xue Yanwei Fu

    Abstract: Recent progress in inpainting increasingly relies on generative models, leveraging their strong generation capabilities for addressing large irregular masks. However, this enhanced generation often introduces context-instability, leading to arbitrary object generation within masked regions. This paper proposes a balanced solution, emphasizing the importance of unmasked regions in guiding inpaintin… ▽ More

    Submitted 17 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Project page: https://yikai-wang.github.io/asuka/ where full-size PDF with appendix is available. Dataset: https://github.com/Yikai-Wang/asuka-misato. Yikai Wang and Chenjie Cao contribute equally

  29. arXiv:2312.03758  [pdf, other

    cs.AI cs.CL

    Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors and Historical Prices

    Authors: Shengkun Wang, YangXiao Bai, Taoran Ji, Kaiqun Fu, Linhan Wang, Chang-Tien Lu

    Abstract: Predicting stock market is vital for investors and policymakers, acting as a barometer of the economic health. We leverage social media data, a potent source of public sentiment, in tandem with macroeconomic indicators as government-compiled statistics, to refine stock market predictions. However, prior research using tweet data for stock market prediction faces three challenges. First, the qualit… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  30. ALERTA-Net: A Temporal Distance-Aware Recurrent Networks for Stock Movement and Volatility Prediction

    Authors: Shengkun Wang, YangXiao Bai, Kaiqun Fu, Linhan Wang, Chang-Tien Lu, Taoran Ji

    Abstract: For both investors and policymakers, forecasting the stock market is essential as it serves as an indicator of economic well-being. To this end, we harness the power of social media data, a rich source of public sentiment, to enhance the accuracy of stock market predictions. Diverging from conventional methods, we pioneer an approach that integrates sentiment analysis, macroeconomic indicators, se… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  31. arXiv:2310.15482  [pdf, other

    cs.CV

    Salient Object Detection in RGB-D Videos

    Authors: Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao

    Abstract: Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and… ▽ More

    Submitted 21 May, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: IEEE TIP (under major revision)

  32. arXiv:2310.15138  [pdf, other

    cs.RO cs.CV

    Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture

    Authors: Kaiming Fu, Peng Wei, Juan Villacres, Zhaodan Kong, Stavros G. Vougioukas, Brian N. Bailey

    Abstract: Fruit distribution is pivotal in shaping the future of both agriculture and agricultural robotics, paving the way for a streamlined supply chain. This study introduces an innovative methodology that harnesses the synergy of RGB imagery, LiDAR, and IMU data, to achieve intricate tree reconstructions and the pinpoint localization of fruits. Such integration not only offers insights into the fruit di… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: This work was presented at IEEE/RSI International Conference on Intelligent Robots and Systems (IROS) Workshop

  33. arXiv:2310.03941  [pdf, other

    cs.LG cs.SI

    LaTeX: Language Pattern-aware Triggering Event Detection for Adverse Experience during Pandemics

    Authors: Kaiqun Fu, Yangxiao Bai, Weiwei Zhang, Deepthi Kolady

    Abstract: The COVID-19 pandemic has accentuated socioeconomic disparities across various racial and ethnic groups in the United States. While previous studies have utilized traditional survey methods like the Household Pulse Survey (HPS) to elucidate these disparities, this paper explores the role of social media platforms in both highlighting and addressing these challenges. Drawing from real-time data sou… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:1911.08684

  34. arXiv:2308.04782  [pdf, other

    cs.CV

    PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration

    Authors: Mingzhi Yuan, Kexue Fu, Zhihao Li, Yucong Meng, Manning Wang

    Abstract: Point cloud registration is a task to estimate the rigid transformation between two unaligned scans, which plays an important role in many computer vision applications. Previous learning-based works commonly focus on supervised registration, which have limitations in practice. Recently, with the advance of inexpensive RGB-D sensors, several learning-based works utilize RGB-D data to achieve unsupe… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to the ICCV 2023

  35. arXiv:2308.01867  [pdf, other

    cs.LG cs.CV

    MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

    Authors: Manasa Manohara, Sankalp Dayal, Tariq Afzal, Rahul Bakshi, Kahkuen Fu

    Abstract: Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmet… ▽ More

    Submitted 3 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: 8 pages, 6 figures, 3 tables, TinyML Conference

  36. arXiv:2307.11991  [pdf, other

    cs.CL cs.AI

    Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models

    Authors: Tin Lai, Yukun Shi, Zicong Du, Jiajie Wu, Ken Fu, Yichao Dou, Ziqi Wang

    Abstract: The demand for psychological counselling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counselling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based assist… ▽ More

    Submitted 1 September, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

  37. arXiv:2307.01404  [pdf

    econ.GN cs.SI

    Social media use among American Indians in South Dakota: Preferences and perceptions

    Authors: Deepthi Kolady, Amrit Dumre, Weiwei Zhang, Kaiqun Fu, Marcia O'Leary, Laura Rose

    Abstract: Social media use data is widely being used in health, psychology, and marketing research to analyze human behavior. However, we have very limited knowledge on social media use among American Indians. In this context, this study was designed to assess preferences and perceptions of social media use among American Indians during COVID-19. We collected data from American Indians in South Dakota using… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 20 pages, 6 figures, 2 Tables, Appendix Tables (7)

  38. arXiv:2305.17891  [pdf, other

    cs.CV

    The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification

    Authors: Linhao Qu, Xiaoyuan Luo, Kexue Fu, Manning Wang, Zhijian Song

    Abstract: This paper introduces the novel concept of few-shot weakly supervised learning for pathology Whole Slide Image (WSI) classification, denoted as FSWC. A solution is proposed based on prompt learning and the utilization of a large language model, GPT-4. Since a WSI is too large and needs to be divided into patches for processing, WSI classification is commonly approached as a Multiple Instance Learn… ▽ More

    Submitted 28 January, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023

  39. arXiv:2305.11438  [pdf, other

    cs.CL eess.AS

    Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

    Authors: Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

    Abstract: Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that take… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  40. arXiv:2305.05260  [pdf, other

    cs.CV

    Guided Focal Stack Refinement Network for Light Field Salient Object Detection

    Authors: Bo Yuan, Yao Jiang, Keren Fu, Qijun Zhao

    Abstract: Light field salient object detection (SOD) is an emerging research direction attributed to the richness of light field data. However, most existing methods lack effective handling of focal stacks, therefore making the latter involved in a lot of interfering information and degrade the performance of SOD. To address this limitation, we propose to utilize multi-modal features to refine focal stacks… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted by ICME 2023

  41. arXiv:2305.03270  [pdf, other

    cs.RO

    Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

    Authors: Alexander Herzog, Kanishka Rao, Karol Hausman, Yao Lu, Paul Wohlhart, Mengyuan Yan, Jessica Lin, Montserrat Gonzalez Arenas, Ted Xiao, Daniel Kappler, Daniel Ho, Jarek Rettinghouse, Yevgen Chebotar, Kuang-Huei Lee, Keerthana Gopalakrishnan, Ryan Julian, Adrian Li, Chuyuan Kelly Fu, Bob Wei, Sangeetha Ramesh, Khem Holden, Kim Kleiven, David Rendleman, Sean Kirmani, Jeff Bingham , et al. (15 additional authors not shown)

    Abstract: We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Published at Robotics: Science and Systems 2023

  42. arXiv:2303.12238  [pdf, other

    cs.LG cs.SI

    DG-Trans: Dual-level Graph Transformer for Spatiotemporal Incident Impact Prediction on Traffic Networks

    Authors: Yanshen Sun, Kaiqun Fu, Chang-Tien Lu

    Abstract: The prompt estimation of traffic incident impacts can guide commuters in their trip planning and improve the resilience of transportation agencies' decision-making on resilience. However, it is more challenging than node-level and graph-level forecasting tasks, as it requires extracting the anomaly subgraph or sub-time-series from dynamic graphs. In this paper, we propose DG-Trans, a novel traffic… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  43. arXiv:2302.10444  [pdf, other

    eess.AS cs.SD

    Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

    Authors: Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

    Abstract: Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly… ▽ More

    Submitted 13 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  44. arXiv:2302.05210  [pdf, other

    cs.RO

    Boosting 3D Point Cloud Registration by Transferring Multi-modality Knowledge

    Authors: Mingzhi Yuan, Xiaoshui Huang, Kexue Fu, Zhihao Li, Manning Wang

    Abstract: The recent multi-modality models have achieved great performance in many vision tasks because the extracted features contain the multi-modality knowledge. However, most of the current registration descriptors have only concentrated on local geometric structures. This paper proposes a method to boost point cloud registration accuracy by transferring the multi-modality knowledge of pre-trained multi… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Accepted to the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

  45. arXiv:2301.11745  [pdf, other

    cs.CR cs.CV

    Side Auth: Synthesizing Virtual Sensors for Authentication

    Authors: Yan Long, Kevin Fu

    Abstract: While the embedded security research community aims to protect systems by reducing analog sensor side channels, our work argues that sensor side channels can be beneficial to defenders. This work introduces the general problem of synthesizing virtual sensors from existing circuits to authenticate physical sensors' measurands. We investigate how to apply this approach and present a preliminary anal… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Journal ref: New Security Paradigms Workshop 2022

  46. arXiv:2301.10056  [pdf

    cs.CR cs.CV cs.MM cs.SD eess.AS

    Side Eye: Characterizing the Limits of POV Acoustic Eavesdropping from Smartphone Cameras with Rolling Shutters and Movable Lenses

    Authors: Yan Long, Pirouz Naghavi, Blas Kojusner, Kevin Butler, Sara Rampazzi, Kevin Fu

    Abstract: Our research discovers how the rolling shutter and movable lens structures widely found in smartphone cameras modulate structure-borne sounds onto camera images, creating a point-of-view (POV) optical-acoustic side channel for acoustic eavesdropping. The movement of smartphone camera hardware leaks acoustic information because images unwittingly modulate ambient sound as imperceptible distortions.… ▽ More

    Submitted 26 January, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Journal ref: 2023 IEEE Symposium on Security and Privacy

  47. Elevation Estimation-Driven Building 3D Reconstruction from Single-View Remote Sensing Imagery

    Authors: Yongqiang Mao, Kaiqiang Chen, Liangjin Zhao, Wei Chen, Deke Tang, Wenjie Liu, Zhirui Wang, Wenhui Diao, Xian Sun, Kun Fu

    Abstract: Building 3D reconstruction from remote sensing images has a wide range of applications in smart cities, photogrammetry and other fields. Methods for automatic 3D urban building modeling typically employ multi-view images as input to algorithms to recover point clouds and 3D models of buildings. However, such models rely heavily on multi-view images of buildings, which are time-intensive and limit… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  48. arXiv:2211.14782  [pdf, other

    cs.CV

    Breaking Immutable: Information-Coupled Prototype Elaboration for Few-Shot Object Detection

    Authors: Xiaonan Lu, Wenhui Diao, Yongqiang Mao, Junxi Li, Peijin Wang, Xian Sun, Kun Fu

    Abstract: Few-shot object detection, expecting detectors to detect novel classes with a few instances, has made conspicuous progress. However, the prototypes extracted by existing meta-learning based methods still suffer from insufficient representative information and lack awareness of query images, which cannot be adaptively tailored to different query images. Firstly, only the support images are involved… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI2023

  49. Robust Point Cloud Registration Framework Based on Deep Graph Matching(TPAMI Version)

    Authors: Kexue Fu, Jiazheng Luo, Xiaoyuan Luo, Shaolei Liu, Chenxi Zhang, Manning Wang

    Abstract: 3D point cloud registration is a fundamental problem in computer vision and robotics. Recently, learning-based point cloud registration methods have made great progress. However, these methods are sensitive to outliers, which lead to more incorrect correspondences. In this paper, we propose a novel deep graph matching-based framework for point cloud registration. Specifically, we first transform p… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: accepted by TPAMI 2022. arXiv admin note: substantial text overlap with arXiv:2103.04256

  50. arXiv:2211.02629  [pdf, other

    eess.SP cs.AI cs.HC cs.LG q-bio.NC

    Multi-view Multi-label Fine-grained Emotion Decoding from Human Brain Activity

    Authors: Kaicheng Fu, Changde Du, Shengpei Wang, Huiguang He

    Abstract: Decoding emotional states from human brain activity plays an important role in brain-computer interfaces. Existing emotion decoding methods still have two main limitations: one is only decoding a single emotion category from a brain activity pattern and the decoded emotion categories are coarse-grained, which is inconsistent with the complex emotional expression of human; the other is ignoring the… ▽ More

    Submitted 26 October, 2022; originally announced November 2022.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems