Skip to main content

Showing 1–50 of 169 results for author: Su, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08762  [pdf, ps, other

    cs.SI cs.LG

    Commute-Time-Optimised Graphs for GNNs

    Authors: Igor Sterner, Shiye Su, Petar Veličković

    Abstract: We explore graph rewiring methods that optimise commute time. Recent graph rewiring approaches facilitate long-range interactions in sparse graphs, making such rewirings commute-time-optimal $\textit{on average}$. However, when an expert prior exists on which node pairs should or should not interact, a superior rewiring would favour short commute times between these privileged node pairs. We const… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.13918  [pdf, other

    cs.HC

    Are We There Yet? Unravelling Usability Challenges and Opportunities in Collaborative Immersive Analytics for Domain Experts

    Authors: Fahim Arsad Nafis, Alexander Rose, Simon Su, Songqing Chen, Bo Han

    Abstract: In the ever-evolving discipline of high-dimensional scientific data, collaborative immersive analytics (CIA) offers a promising frontier for domain experts in complex data visualization and interpretation. This research presents a comprehensive framework for conducting usability studies on the extended reality (XR) interface of ParaView, an open-source CIA system. By employing established human-co… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted in 26th International Conference on Human-Computer Interaction, HCII 2024, Washington, DC, USA

  4. arXiv:2406.11021  [pdf, other

    cs.CV

    $α$-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion

    Authors: Sanbao Su, Nuo Chen, Felix Juefei-Xu, Chen Feng, Fei Miao

    Abstract: In the realm of autonomous vehicle (AV) perception, comprehending 3D scenes is paramount for tasks such as planning and mapping. Semantic scene completion (SSC) aims to infer scene geometry and semantics from limited observations. While camera-based SSC has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  5. arXiv:2406.04342  [pdf, other

    cs.CV

    Learning 1D Causal Visual Representation with De-focus Attention Networks

    Authors: Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai

    Abstract: Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-foc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2405.03393  [pdf, other

    cs.RO eess.SY

    On-site scale factor linearity calibration of MEMS triaxial gyroscopes

    Authors: Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven Weidong Su

    Abstract: The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environme… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.00723  [pdf, other

    eess.SP cs.AI cs.LG

    EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks

    Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven W. Su

    Abstract: Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adj… ▽ More

    Submitted 26 April, 2024; originally announced May 2024.

  8. arXiv:2404.14961  [pdf, other

    cs.LG

    Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

    Authors: Xiaoshuang Chen, Gengrui Zhang, Yao Wang, Yulin Wu, Shuo Su, Kaiqiao Zhan, Ben Wang

    Abstract: Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise r… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures

  9. arXiv:2404.12594  [pdf, other

    cs.RO cs.AI cs.LG

    Random Network Distillation Based Deep Reinforcement Learning for AGV Path Planning

    Authors: Huilin Yin, Shengkai Su, Yinjia Lin, Pengju Zhen, Karin Festl, Daniel Watzenig

    Abstract: With the flourishing development of intelligent warehousing systems, the technology of Automated Guided Vehicle (AGV) has experienced rapid growth. Within intelligent warehousing environments, AGV is required to safely and rapidly plan an optimal path in complex and dynamic environments. Most research has studied deep reinforcement learning to address this challenge. However, in the environments w… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures

  10. arXiv:2404.11075  [pdf, other

    cs.LG cs.AI eess.SP

    EEG_GLT-Net: Optimising EEG Graphs for Real-time Motor Imagery Signals Classification

    Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven W. Su

    Abstract: Brain-Computer Interfaces connect the brain to external control devices, necessitating the accurate translation of brain signals such as from electroencephalography (EEG) into executable commands. Graph Neural Networks (GCN) have been increasingly applied for classifying EEG Motor Imagery signals, primarily because they incorporates the spatial relationships among EEG channels, resulting in improv… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  11. PrintListener: Uncovering the Vulnerability of Fingerprint Authentication via the Finger Friction Sound

    Authors: Man Zhou, Shuao Su, Qian Wang, Qi Li, Yuting Zhou, Xiaojing Ma, Zhengxiong Li

    Abstract: Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: in Proc. of NDSS, 2024

  12. arXiv:2403.17807  [pdf, other

    cs.HC

    Towards Inclusive Video Commenting: Introducing Signmaku for the Deaf and Hard-of-Hearing

    Authors: Si Chen, Haocong Cheng, Jason Situ, Desirée Kirst, Suzy Su, Saumya Malhotra, Lawrence Angrave, Qi Wang, Yun Huang

    Abstract: Previous research underscored the potential of danmaku--a text-based commenting feature on videos--in engaging hearing audiences. Yet, for many Deaf and hard-of-hearing (DHH) individuals, American Sign Language (ASL) takes precedence over English. To improve inclusivity, we introduce "Signmaku," a new commenting mechanism that uses ASL, serving as a sign language counterpart to danmaku. Through a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 14 pages, CHI 2024

    ACM Class: F.2.2; I.2.7

  13. arXiv:2403.11502  [pdf, other

    cs.NI

    Accelerating Handover in Mobile Satellite Network

    Authors: Jiasheng Wu, Shaojie Su, Xiong Wang, Jingjing Zhang, Yue Gao

    Abstract: The construction of Low Earth Orbit (LEO) satellite constellations has recently spurred tremendous attention from academia and industry. 5G and 6G standards have specified LEO satellite network as a key component of 5G and 6G networks. However, ground terminals experience frequent, high-latency handover incurred by satellites' fast travelling speed, which deteriorates the performance of latency-se… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2402.17262  [pdf, other

    cs.CL cs.AI

    Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

    Authors: Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su

    Abstract: Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses, particularly when subjected to "jailbreak." Research on jailbreak has highlighted the safety issues of LLMs. However, prior studies have predominantly focused on single-turn dialogue, ignoring the potential complexities and risks presented by multi-turn dialogue, a crucial mode through which humans deri… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: working in progress 23pages, 18 figures

  15. arXiv:2402.09846  [pdf

    physics.ao-ph cs.LG eess.SP

    A Deep Learning Approach to Radar-based QPE

    Authors: Ting-Shuo Yo, Shih-Hao Su, Jung-Lien Chu, Chiao-Wei Chang, Hung-Chi Kuo

    Abstract: In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 22 pages, 11 figures. Published in Earth and Space Science

    Journal ref: Earth Space Sci. 2021, 8, e2020EA001340

  16. arXiv:2401.09195  [pdf, other

    cs.CV

    Training-Free Semantic Video Composition via Pre-trained Diffusion Model

    Authors: Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song

    Abstract: The video composition task aims to integrate specified foregrounds and backgrounds from different videos into a harmonious composite. Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments, such as domain gaps. Therefore, we propose a training-free pipeline employing a pre-trained… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  17. arXiv:2401.06116  [pdf, other

    cs.CV

    Gaussian Shadow Casting for Neural Characters

    Authors: Luis Bolanos, Shih-Yang Su, Helge Rhodin

    Abstract: Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density prox… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 14 pages, 13 figures

  18. arXiv:2401.02150  [pdf, other

    cs.CV

    Marginal Debiased Network for Fair Visual Recognition

    Authors: Mei Wang, Weihong Deng, Sen Su

    Abstract: Deep neural networks (DNNs) are often prone to learn the spurious correlations between target classes and bias attributes, like gender and race, inherent in a major portion of training data (bias-aligned samples), thus showing unfair behavior and arising controversy in the modern pluralistic and egalitarian society. In this paper, we propose a novel marginal debiased network (MDN) to learn debiase… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  19. arXiv:2312.16490  [pdf, other

    cs.CL cs.AI cs.CY

    Understanding News Creation Intents: Frame, Dataset, and Method

    Authors: Zhengjia Wang, Danding Wang, Qiang Sheng, Juan Cao, Silong Su, Yifan Sun, Beizhe Hu, Siyuan Ma

    Abstract: As the disruptive changes in the media economy and the proliferation of alternative news media outlets, news intent has progressively deviated from ethical standards that serve the public interest. News intent refers to the purpose or intention behind the creation of a news article. While the significance of research on news intent has been widely acknowledged, the absence of a systematic news int… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  20. arXiv:2312.12747  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    ALMANACS: A Simulatability Benchmark for Language Model Explainability

    Authors: Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

    Abstract: How do we measure the efficacy of language model explainability methods? While many explainability methods have been developed, they are typically evaluated on bespoke tasks, preventing an apples-to-apples comparison. To help fill this gap, we present ALMANACS, a language model explainability benchmark. ALMANACS scores explainability methods on simulatability, i.e., how well the explanations impro… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/edmundmills/ALMANACS}{https://github.com/edmundmills/ALMANACS

  21. arXiv:2312.07549  [pdf, other

    cs.CV

    Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control

    Authors: Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song

    Abstract: Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes.Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence.To tackle this issue, we… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  22. arXiv:2312.06075  [pdf, other

    cs.CV

    Oracle Character Recognition using Unsupervised Discriminative Consistency Network

    Authors: Mei Wang, Weihong Deng, Sen Su

    Abstract: Ancient history relies on the study of ancient characters. However, real-world scanned oracle characters are difficult to collect and annotate, posing a major obstacle for oracle character recognition (OrCR). Besides, serious abrasion and inter-class similarity also make OrCR more challenging. In this paper, we propose a novel unsupervised domain adaptation method for OrCR, which enables to transf… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted by Pattern Recognition

  23. F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

    Authors: Sitong Su, Jianzhi Liu, Lianli Gao, Jingkuan Song

    Abstract: Recently Text-to-Video (T2V) synthesis has undergone a breakthrough by training transformers or diffusion models on large-scale datasets. Nevertheless, inferring such large models incurs huge costs.Previous inference acceleration works either require costly retraining or are model-specific.To address this issue, instead of retraining we explore the inference process of two mainstream T2V models us… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  24. arXiv:2312.00763  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

    Authors: Xiao Ma, Swaroop Mishra, Ariel Liu, Sophie Su, Jilin Chen, Chinmay Kulkarni, Heng-Tze Cheng, Quoc Le, Ed Chi

    Abstract: Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational "scent", or ability to specify high-level preferences or goals. We i… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 19 pages, 11 figures

  25. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  26. arXiv:2311.16635  [pdf, other

    cs.CV

    MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation

    Authors: Sitong Su, Litao Guo, Lianli Gao, Hengtao Shen, Jingkuan Song

    Abstract: Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos. Without motion information from videos, motion priors implied in prompts are vital guidance. For example, the prompt "airplane landing on the runway" indicates motion priors that the "airplane" moves downwards while the "runway" stays static. Whereas the motion priors are not fully exploited in previous approac… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  27. arXiv:2311.11227  [pdf, other

    cs.LG cs.AI cs.DC

    FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

    Authors: Shangchao Su, Bin Li, Xiangyang Xue

    Abstract: With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rende… ▽ More

    Submitted 12 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  28. arXiv:2311.08870  [pdf, other

    cs.CV cs.LG

    One-Shot Federated Learning with Classifier-Guided Diffusion Models

    Authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue

    Abstract: One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost. However, most of the existing methods require auxiliary datasets or training generators, which hinders their practicality in real-world scenarios. In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO, utilizing guidance from client classifi… ▽ More

    Submitted 16 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

  29. arXiv:2310.14498  [pdf

    physics.ed-ph cs.CY

    Reforming Physics Exams Using Openly Accessible Large Isomorphic Problem Banks created with the assistance of Generative AI: an Explorative Study

    Authors: Zhongzhou Chen, Emily Frederick, Colleen Cui, Munaimah Khan, Christopher Klatt, Mercedith Huang, Shiyang Su

    Abstract: This paper explores using large isomorphic problem banks to overcome many challenges of traditional exams in large STEM classes, especially the threat of content sharing websites and generative AI to the security of exam items. We first introduce an efficient procedure for creating large numbers of isomorphic physics problems, assisted by the large language model GPT-3 and several other open-sourc… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  30. arXiv:2309.13035  [pdf, other

    cs.RO

    PyPose v0.6: The Imperative Programming Interface for Robotics

    Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, Jingnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

    Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  31. arXiv:2309.11853  [pdf, other

    cs.CL cs.AI

    BitCoin: Bidirectional Tagging and Supervised Contrastive Learning based Joint Relational Triple Extraction Framework

    Authors: Luyao He, Zhongbao Zhang, Sen Su, Yuxin Chen

    Abstract: Relation triple extraction (RTE) is an essential task in information extraction and knowledge graph construction. Despite recent advancements, existing methods still exhibit certain limitations. They just employ generalized pre-trained models and do not consider the specificity of RTE tasks. Moreover, existing tagging-based approaches typically decompose the RTE task into two subtasks, initially i… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2112.04940 by other authors

  32. arXiv:2309.04750  [pdf, other

    cs.CV

    Mirror-Aware Neural Humans

    Authors: Daniel Ajisafe, James Tang, Shih-Yang Su, Bastian Wandt, Helge Rhodin

    Abstract: Human motion capture either requires multi-camera systems or is unreliable when using single-view input due to depth ambiguities. Meanwhile, mirrors are readily available in urban environments and form an affordable alternative by recording two views with only a single camera. However, the mirror setting poses the additional challenge of handling occlusions of real and mirror image. Going beyond e… ▽ More

    Submitted 15 May, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: The 11th International Conference on 3D Vision (3DV 2024). Project website: https://danielajisafe.github.io/mirror-aware-neural-humans/

  33. arXiv:2308.16555  [pdf, other

    cs.CV cs.RO

    E3CM: Epipolar-Constrained Cascade Correspondence Matching

    Authors: Chenbo Zhou, Shuai Su, Qijun Chen, Rui Fan

    Abstract: Accurate and robust correspondence matching is of utmost importance for various 3D computer vision tasks. However, traditional explicit programming-based methods often struggle to handle challenging scenarios, and deep learning-based methods require large well-labeled datasets for network training. In this article, we introduce Epipolar-Constrained Cascade Correspondence (E3CM), a novel approach t… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: accepted to Neurocomputing

  34. arXiv:2308.16490  [pdf, other

    cs.CV cs.AI cs.LG

    Latent Painter

    Authors: Shih-Chieh Su

    Abstract: Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as… ▽ More

    Submitted 1 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

  35. arXiv:2308.15727  [pdf, other

    cs.CL

    Quantifying and Analyzing Entity-level Memorization in Large Language Models

    Authors: Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su

    Abstract: Large language models (LLMs) have been proven capable of memorizing their training data, which can be extracted through specifically designed prompts. As the scale of datasets continues to grow, privacy risks arising from memorization have attracted increasing attention. Quantifying language model memorization helps evaluate potential privacy risks. However, prior works on quantifying memorization… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 9 pages, 7 figures

  36. arXiv:2307.16212  [pdf, other

    cs.LG cs.AI cs.GT cs.MA eess.SY

    Robust Multi-Agent Reinforcement Learning with State Uncertainty

    Authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao

    Abstract: In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design.… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: 50 pages, Published in TMLR, Transactions on Machine Learning Research (06/2023)

  37. arXiv:2307.07607  [pdf, other

    cs.RO

    SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

    Authors: Shibo Zhao, Yuanjun Gao, Tianhao Wu, Damanpreet Singh, Rushan Jiang, Haoxiang Sun, Mansi Sarawata, Yuheng Qiu, Warren Whittaker, Ian Higgins, Yi Du, Shaoshu Su, Can Xu, John Keller, Jay Karhade, Lucas Nogueira, Sourojit Saha, Ji Zhang, Wenshan Wang, Chen Wang, Sebastian Scherer

    Abstract: Simultaneous localization and mapping (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessi… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, June 2024

  38. iSLAM: Imperative SLAM

    Authors: Taimeng Fu, Shaoshu Su, Yiren Lu, Chen Wang

    Abstract: Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminating estimation drifts. Recent advancements suggest that data-driven methods are highly effective for front-end tasks, while geometry-based methods continue to be essential in the back… ▽ More

    Submitted 21 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: The paper has been accepted by IEEE Robotics and Automation Letters (RA-L)

  39. arXiv:2305.07740  [pdf, other

    cs.RO eess.SY

    Double-Iterative Gaussian Process Regression for Modeling Error Compensation in Autonomous Racing

    Authors: Shaoshu Su, Ce Hao, Catherine Weaver, Chen Tang, Wei Zhan, Masayoshi Tomizuka

    Abstract: Autonomous racing control is a challenging research problem as vehicles are pushed to their limits of handling to achieve an optimal lap time; therefore, vehicles exhibit highly nonlinear and complex dynamics. Difficult-to-model effects, such as drifting, aerodynamics, chassis weight transfer, and suspension can lead to infeasible and suboptimal trajectories. While offline planning allows optimizi… ▽ More

    Submitted 26 June, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 8 Pages, 6 Figures, Accepted by IFAC 2023 (The 22nd World Congress of the International Federation of Automatic Control)

  40. arXiv:2305.05602  [pdf, other

    cs.CV

    Collaborative Chinese Text Recognition with Personalized Federated Learning

    Authors: Shangchao Su, Haiyang Yu, Bin Li, Xiangyang Xue

    Abstract: In Chinese text recognition, to compensate for the insufficient local data and improve the performance of local few-shot character recognition, it is often necessary for one organization to collect a large amount of data from similar organizations. However, due to the natural presence of private information in text data, such as addresses and phone numbers, different organizations are unwilling to… ▽ More

    Submitted 31 August, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  41. arXiv:2305.04063  [pdf, other

    cs.CV

    Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

    Authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue

    Abstract: Recently, semi-supervised federated learning (semi-FL) has been proposed to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients. However, existing methods face several challenges such as communication costs, data heterogeneity, and training pressure on client devices. To address these challenges, we introduce the powerful diffusion models… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted by AAAI-24

  42. arXiv:2305.02058  [pdf, ps, other

    cs.AI cs.HC cs.MA cs.RO

    Human Machine Co-adaption Interface via Cooperation Markov Decision Process System

    Authors: Kairui Guo, Adrian Cheng, Yaqi Li, Jun Li, Rob Duffield, Steven W. Su

    Abstract: This paper aims to develop a new human-machine interface to improve rehabilitation performance from the perspective of both the user (patient) and the machine (robot) by introducing the co-adaption techniques via model-based reinforcement learning. Previous studies focus more on robot assistance, i.e., to improve the control strategy so as to fulfill the objective of Assist-As-Needed. In this stud… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 11 pages; 4 figures

  43. arXiv:2304.08842  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite

    Authors: Sicen Guo, Jiahang Li, Yi Feng, Dacheng Zhou, Denghuang Zhang, Chen Chen, Shuai Su, Xingyi Zhu, Qijun Chen, Rui Fan

    Abstract: In the nascent domain of urban digital twins (UDT), the prospects for leveraging cutting-edge deep learning techniques are vast and compelling. Particularly within the specialized area of intelligent road inspection (IRI), a noticeable gap exists, underscored by the current dearth of dedicated research efforts and the lack of large-scale well-annotated datasets. To foster advancements in this burg… ▽ More

    Submitted 1 January, 2024; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Database webpage: https://www.udtiri.com/, Kaggle webpage: https://www.kaggle.com/datasets/jiahangli617/udtiri

  44. arXiv:2304.02013  [pdf, other

    cs.CV

    NPC: Neural Point Characters from Video

    Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin

    Abstract: High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical s… ▽ More

    Submitted 1 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Project website: https://lemonatsu.github.io/npc/

  45. arXiv:2303.14346  [pdf, other

    cs.CV

    Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation

    Authors: Sanbao Su, Songyang Han, Yiming Li, Zhili Zhang, Chen Feng, Caiwen Ding, Fei Miao

    Abstract: Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness of autonomous vehicles. Collaborative object detection (COD) has been proposed to improve detection accuracy and reduce uncert… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted by IEEE Robotics and Automation Letters

  46. arXiv:2302.08714  [pdf, other

    cs.IR cs.AI cs.CV

    Binary Embedding-based Retrieval at Tencent

    Authors: Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan

    Abstract: Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a user query, the system of EBR aims to identify relevant information from a large corpus of documents that may be tens or hundreds of billions in size. The storage and computation turn out to be expensive and inefficient with massive documents and high concurrent queries, making it diff… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  47. arXiv:2302.06185  [pdf, other

    cs.CV

    PUPS: Point Cloud Unified Panoptic Segmentation

    Authors: Shihao Su, Jianyun Xu, Huanyu Wang, Zhenwei Miao, Xin Zhan, Dayang Hao, Xi Li

    Abstract: Point cloud panoptic segmentation is a challenging task that seeks a holistic solution for both semantic and instance segmentation to predict groupings of coherent points. Previous approaches treat semantic and instance segmentation as surrogate tasks, and they either use clustering methods or bounding boxes to gather instance groupings with costly computation and hand-crafted designs in the insta… ▽ More

    Submitted 27 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: accepted by AAAI2023

  48. arXiv:2212.05813  [pdf, other

    cs.CV eess.IV

    KonX: Cross-Resolution Image Quality Assessment

    Authors: Oliver Wiedemann, Vlad Hosu, Shaolin Su, Dietmar Saupe

    Abstract: Scale-invariance is an open problem in many computer vision subfields. For example, object labels should remain constant across scales, yet model predictions diverge in many cases. This problem gets harder for tasks where the ground-truth labels change with the presentation scale. In image quality assessment (IQA), downsampling attenuates impairments, e.g., blurs or compression artifacts, which ca… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 13 pages

    ACM Class: I.4.7; I.4.8

  49. arXiv:2212.03434  [pdf, other

    cs.CV

    Name Your Colour For the Task: Artificially Discover Colour Naming via Colour Quantisation Transformer

    Authors: Shenghan Su, Lin Gu, Yue Yang, Zenghui Zhang, Tatsuya Harada

    Abstract: The long-standing theory that a colour-naming system evolves under dual pressure of efficient communication and perceptual mechanism is supported by more and more linguistic studies, including analysing four decades of diachronic data from the Nafaanra language. This inspires us to explore whether machine learning could evolve and discover a similar colour-naming system via optimising the communic… ▽ More

    Submitted 13 October, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: ICCV 2023 Oral

  50. arXiv:2212.02705  [pdf, other

    cs.AI cs.GT cs.MA

    What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

    Authors: Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, Fei Miao

    Abstract: Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate di… ▽ More

    Submitted 12 April, 2024; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Accepted by Transactions on Machine Learning Research (TMLR)