Skip to main content

Showing 1–50 of 236 results for author: Lu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10048  [pdf, other

    cs.SD eess.AS

    Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

    Authors: Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie

    Abstract: Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  2. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at [email protected]!

  3. arXiv:2407.01595  [pdf, other

    cs.LG cs.CY cs.SE

    Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness

    Authors: Kacy Zhou, Jiawen Wen, Nan Yang, Dong Yuan, Qinghua Lu, Huaming Chen

    Abstract: While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: 11 pages

  4. arXiv:2407.00731  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Struggle in Token-Level Clinical Named Entity Recognition

    Authors: Qiuhao Lu, Rui Li, Andrew Wen, Jinlian Wang, Liwei Wang, Hongfang Liu

    Abstract: Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: AMIA 2024 Annual Symposium Proceedings

  5. arXiv:2406.11802  [pdf, other

    cs.CV

    PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

    Authors: Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal know… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.08451  [pdf, other

    cs.CV

    GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

    Authors: Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained with datasets compri… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures, a cross-app GUI navigation dataset

  7. Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models

    Authors: Zejun Zhang, Zhenchang Xing, Xiaoxue Ren, Qinghua Lu, Xiwei Xu

    Abstract: Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptab… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by FSE 2024,22 pages

  8. arXiv:2406.03215  [pdf, other

    cs.CV

    Searching Priors Makes Text-to-Video Synthesis Better

    Authors: Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi Wu

    Abstract: Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  9. arXiv:2406.02803  [pdf, other

    cs.DC

    DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Authors: Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu

    Abstract: Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2405.20039  [pdf, other

    stat.ML cs.LG stat.ME

    Task-Agnostic Machine Learning-Assisted Inference

    Authors: Jiacheng Miao, Qiongshi Lu

    Abstract: Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened up a whole new field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.17837  [pdf, other

    cs.HC

    Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces

    Authors: Qiuyu Lu, Jiawei Fang, Zhihao Yao, Yue Yang, Shiqing Lyu, Haipeng Mi, Lining Yao

    Abstract: In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototyping process for such cutting-edge devices. While these tools simplify the process through parametric design and simulatio… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  12. arXiv:2405.10467  [pdf, other

    cs.AI cs.SE

    Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

    Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

    Abstract: Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  13. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  14. arXiv:2404.18720  [pdf, other

    cs.RO

    Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

    Authors: Shimian Zhang, Qiuhong Lu

    Abstract: In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  15. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  16. arXiv:2404.14886  [pdf, other

    cs.LG

    GCEPNet: Graph Convolution-Enhanced Expectation Propagation for Massive MIMO Detection

    Authors: Qincheng Lu, Sitao Luan, Xiao-Wen Chang

    Abstract: Massive MIMO (multiple-input multiple-output) detection is an important topic in wireless communication and various machine learning based methods have been developed recently for this task. Expectation propagation (EP) and its variants are widely used for MIMO detection and have achieved the best performance. However, EP-based solvers fail to capture the correlation between unknown variables, lea… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  17. arXiv:2404.13820  [pdf, other

    cs.CC cs.NE

    Prove Symbolic Regression is NP-hard by Symbol Graph

    Authors: Jinglu Song, Qiang Lu, Bozhou Tian, Jingwen Zhang, Jake Luo, Zhiguang Wang

    Abstract: Symbolic regression (SR) is the task of discovering a symbolic expression that fits a given data set from the space of mathematical expressions. Despite the abundance of research surrounding the SR problem, there's a scarcity of works that confirm its NP-hard nature. Therefore, this paper introduces the concept of a symbol graph as a comprehensive representation of the entire mathematical expressi… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  18. An Origami-Inspired Variable Friction Surface for Increasing the Dexterity of Robotic Grippers

    Authors: Qiujie Lu, Angus B. Clark, Matthew Shen, Nicolas Rojas

    Abstract: While the grasping capability of robotic grippers has shown significant development, the ability to manipulate objects within the hand is still limited. One explanation for this limitation is the lack of controlled contact variation between the grasped object and the gripper. For instance, human hands have the ability to firmly grip object surfaces, as well as slide over object faces, an aspect th… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 8 pages, 11 figures

    Journal ref: IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2538-2545, April 2020

  19. arXiv:2404.06884  [pdf, ps, other

    cs.IT

    Demand Private Coded Caching: the Two-File Case

    Authors: Qinyi Lu, Nan Liu, Wei Kang

    Abstract: We investigate the demand private coded caching problem, which is an $(N,K)$ coded caching problem with $N$ files, $K$ users, each equipped with a cache of size $M$, and an additional privacy constraint on user demands. We first present a new virtual-user-based achievable scheme for arbitrary number of users and files. Then, for the case of 2 files and arbitrary number of users, we derive some new… ▽ More

    Submitted 6 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  20. arXiv:2404.05388  [pdf, other

    cs.SE cs.AI cs.CY cs.LG

    An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping

    Authors: Boming Xia, Qinghua Lu, Liming Zhu, Zhenchang Xing

    Abstract: The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 1st ACM International Conference on AI-powered Software (AIware)

  21. arXiv:2403.13869  [pdf, other

    cs.LG cs.AI

    Accurately Predicting Probabilities of Safety-Critical Rare Events for Intelligent Systems

    Authors: Ruoxuan Bai, Jingxuan Yang, Weiduo Gong, Yi Zhang, Qiujing Lu, Shuo Feng

    Abstract: Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  22. arXiv:2403.12556  [pdf, other

    cs.CL

    Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

    Authors: Zhigang Chen, Benjia Zhou, Jun Li, Jun Wan, Zhen Lei, Ning Jiang, Quan Lu, Guoqing Zhao

    Abstract: Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and ine… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING-2024

  23. arXiv:2403.11627  [pdf, other

    cs.CV

    LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

    Authors: Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

    Abstract: Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method f… ▽ More

    Submitted 10 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: project page: https://github.com/Young98CN/LoRA_Composer

  24. arXiv:2403.08857  [pdf, other

    cs.CV

    DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

    Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

    Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Project page: https://hunyuan-dialoggen.github.io/

  25. arXiv:2403.05753  [pdf, other

    eess.IV cs.CV

    UDCR: Unsupervised Aortic DSA/CTA Rigid Registration Using Deep Reinforcement Learning and Overlap Degree Calculation

    Authors: Wentao Liu, Bowen Liang, Weijin Xu, Tong Tian, Qingsheng Lu, Xipeng Pan, Haoyuan Li, Siyu Tian, Huihua Yang, Ruisheng Su

    Abstract: The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  26. arXiv:2403.05748  [pdf, other

    cs.RO

    Image-Guided Autonomous Guidewire Navigation in Robot-Assisted Endovascular Interventions using Reinforcement Learning

    Authors: Wentao Liu, Tong Tian, Weijin Xu, Bowen Liang, Qingsheng Lu, Xipeng Pan, Wenyi Zhao, Huihua Yang, Ruisheng Su

    Abstract: Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAG… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.01475  [pdf, other

    cs.LG cs.AI cs.SI

    Representation Learning on Heterophilic Graph with Directional Neighborhood Attention

    Authors: Qincheng Lu, Jiaqi Zhu, Sitao Luan, Xiao-Wen Chang

    Abstract: Graph Attention Network (GAT) is one of the most popular Graph Neural Network (GNN) architecture, which employs the attention mechanism to learn edge weights and has demonstrated promising performance in various applications. However, since it only incorporates information from immediate neighborhood, it lacks the ability to capture long-range and global graph information, leading to unsatisfactor… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  28. arXiv:2402.10260  [pdf, other

    cs.LG cs.CL cs.CR

    A StrongREJECT for Empty Jailbreaks

    Authors: Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

    Abstract: The rise of large language models (LLMs) has drawn attention to the existence of "jailbreaks" that allow the models to be used maliciously. However, there is no standard benchmark for measuring the severity of a jailbreak, leaving authors of jailbreak papers to create their own. We show that these benchmarks often include vague or unanswerable questions and use grading criteria that are biased tow… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Code and data at https://github.com/alexandrasouly/strongreject

  29. arXiv:2402.09558  [pdf, other

    cs.AI cs.LG

    Bidirectional Generative Pre-training for Improving Time Series Representation Learning

    Authors: Ziyang Song, Qincheng Lu, He Zhu, Yue Li

    Abstract: Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previou… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  30. arXiv:2402.09181  [pdf, other

    eess.IV cs.CV

    OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

    Authors: Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape… ▽ More

    Submitted 21 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  31. arXiv:2402.08788  [pdf

    cs.CL cs.SD eess.AS

    Syllable based DNN-HMM Cantonese Speech to Text System

    Authors: Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng

    Abstract: This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventi… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 7 pages, 3 figures, LREC 2016

    MSC Class: 94-06 ACM Class: I.2.7

  32. AccessLens: Auto-detecting Inaccessibility of Everyday Objects

    Authors: Nahyun Kwon, Qian Lu, Muhammad Hasham Qazi, Joanne Liu, Changhoon Oh, Shu Kong, Jeeeun Kim

    Abstract: In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts. This oversight, from small cabinet knobs to identical wall switches that can pose different contextual challenges, highlights an imperative need for solutions. Leveraging low-cost 3D-printed augmentations such as knob magnifiers and tactile labels seems promising… ▽ More

    Submitted 23 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: CHI2024

  33. arXiv:2401.08103  [pdf, other

    cs.CY cs.AI

    Resolving Ethics Trade-offs in Implementing Responsible AI

    Authors: Conrad Sanderson, Emma Schleiger, David Douglas, Petra Kuhnert, Qinghua Lu

    Abstract: While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measu… ▽ More

    Submitted 1 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    MSC Class: 68T01 ACM Class: K.4.1; I.2.m; C.4

  34. arXiv:2401.03697  [pdf, other

    cs.SD eess.AS

    An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

    Authors: Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

    Abstract: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-en… ▽ More

    Submitted 6 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  35. arXiv:2401.02384  [pdf, other

    cs.CV

    ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

    Authors: Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for general-purpose multimodal models. While vision-language models trained on chart data excel in comprehension, they struggle with generalization. To a… ▽ More

    Submitted 15 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Updated and corrected experimental results, removal of inappropriate experiments, and a more comprehensive experimental setup

  36. A Soft Continuum Robot with Self-Controllable Variable Curvature

    Authors: Xinran Wang, Qiujie Lu, Dongmyoung Lee, Zhongxue Gan, Nicolas Rojas

    Abstract: This paper introduces a new type of soft continuum robot, called SCoReS, which is capable of self-controlling continuously its curvature at the segment level; in contrast to previous designs which either require external forces or machine elements, or whose variable curvature capabilities are discrete -- depending on the number of locking mechanisms and segments. The ability to have a variable cur… ▽ More

    Submitted 19 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accpeted for IEEE Robotics and Automation letters in January 2024, Imperial's open access research REF 2029 open access policy

    Journal ref: IEEE Robotics and Automation Letters 2024

  37. arXiv:2312.08768  [pdf, other

    cs.CV

    Local Conditional Controlling for Text-to-Image Diffusion Models

    Authors: Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Wei Zhao, qinglin lu, Boxi Wu, Wei Liu

    Abstract: Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we introduce a new simple yet pra… ▽ More

    Submitted 6 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  38. arXiv:2312.08519  [pdf

    q-bio.NC cs.AI

    Reconciling Shared versus Context-Specific Information in a Neural Network Model of Latent Causes

    Authors: Qihong Lu, Tan T. Nguyen, Qiong Zhang, Uri Hasson, Thomas L. Griffiths, Jeffrey M. Zacks, Samuel J. Gershman, Kenneth A. Norman

    Abstract: It has been proposed that, when processing a stream of events, humans divide their experiences in terms of inferred latent causes (LCs) to support context-dependent learning. However, when shared structure is present across contexts, it is still unclear how the "splitting" of LCs and learning of shared structure can be simultaneously achieved. Here, we present the Latent Cause Network (LCNet), a n… ▽ More

    Submitted 6 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  39. arXiv:2312.06669  [pdf, ps, other

    q-bio.QM cs.LG stat.ME

    An Association Test Based on Kernel-Based Neural Networks for Complex Genetic Association Analysis

    Authors: Tingting Hou, Chang Jiang, Qing Lu

    Abstract: The advent of artificial intelligence, especially the progress of deep neural networks, is expected to revolutionize genetic research and offer unprecedented potential to decode the complex relationships between genetic variants and disease phenotypes, which could mark a significant step toward improving our understanding of the disease etiology. While deep neural networks hold great promise for g… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 34 pages, 4 figures, 3 tables

  40. arXiv:2312.05572  [pdf, other

    cs.CV

    R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning

    Authors: Zhiling Ye, LiangGuo Zhang, Dingheng Zeng, Quan Lu, Ning Jiang

    Abstract: Dynamic NeRFs have recently garnered growing attention for 3D talking portrait synthesis. Despite advances in rendering speed and visual quality, challenges persist in enhancing efficiency and effectiveness. We present R2-Talker, an efficient and effective framework enabling realistic real-time talking head synthesis. Specifically, using multi-resolution hash grids, we introduce a novel approach f… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  41. arXiv:2312.02850  [pdf, ps, other

    stat.ML cs.LG stat.ME

    A Kernel-Based Neural Network Test for High-dimensional Sequencing Data Analysis

    Authors: Tingting Hou, Chang Jiang, Qing Lu

    Abstract: The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has been rarely used in sequencing data analysis due to challenges brought by high-dimensional sequencing data (e.g., overfitting). Moreover, due to the complexity of neural netw… ▽ More

    Submitted 5 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 31 pages, 5 figures and 3 tabels

  42. arXiv:2312.02781  [pdf, other

    cs.CV cs.AI

    PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features

    Authors: Tianshun Han, Shengnan Gui, Yiqing Huang, Baihui Li, Lijian Liu, Benjia Zhou, Ning Jiang, Quan Lu, Ruicong Zhi, Yanyan Liang, Du Zhang, Jun Wan

    Abstract: Speech-driven 3D facial animation has improved a lot recently while most related works only utilize acoustic modality and neglect the influence of visual and textual cues, leading to unsatisfactory results in terms of precision and coherence. We argue that visual and textual cues are not trivial information. Therefore, we present a novel framework, namely PMMTalk, using complementary Pseudo Multi-… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  43. arXiv:2312.00817  [pdf, other

    cs.LG cs.AI

    Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting

    Authors: Ziyang Song, Qincheng Lu, Hao Xu, David L. Buckeridge, Yue Li

    Abstract: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing and Computer Vision domains. However, the development of PTMs on time-series data is lagging behind. This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale data and ability to capture long-term… ▽ More

    Submitted 14 February, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

  44. arXiv:2311.18252  [pdf, other

    cs.SE cs.AI cs.CY cs.LG

    Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI

    Authors: Dawen Zhang, Boming Xia, Yue Liu, Xiwei Xu, Thong Hoang, Zhenchang Xing, Mark Staples, Qinghua Lu, Liming Zhu

    Abstract: The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential p… ▽ More

    Submitted 10 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted by 2024 IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN)

  45. arXiv:2311.17536  [pdf, other

    cs.CV

    SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

    Authors: Liang Peng, Haoran Cheng, Zheng Yang, Ruisi Zhao, Linxuan Xia, Chaotian Song, Qinglin Lu, Boxi Wu, Wei Liu

    Abstract: Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video… ▽ More

    Submitted 6 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  46. arXiv:2311.14220  [pdf, other

    stat.ME cs.LG stat.ML

    Assumption-lean and Data-adaptive Post-Prediction Inference

    Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, Qiongshi Lu

    Abstract: A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent st… ▽ More

    Submitted 6 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  47. arXiv:2311.13158  [pdf, other

    cs.SE

    Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability

    Authors: Boming Xia, Qinghua Lu, Liming Zhu, Sung Une Lee, Yue Liu, Zhenchang Xing

    Abstract: Artificial Intelligence (AI), particularly through the advent of large-scale generative AI (GenAI) models such as Large Language Models (LLMs), has become a transformative element in contemporary technology. While these models have unlocked new possibilities, they simultaneously present significant challenges, such as concerns over data privacy and the propensity to generate misleading or fabricat… ▽ More

    Submitted 17 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  48. arXiv:2311.13148  [pdf, other

    cs.AI cs.SE

    Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents

    Authors: Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer, Jon Whittle

    Abstract: Foundation models, such as large language models (LLMs), have been widely recognised as transformative AI technologies due to their capabilities to understand and generate content, including plans with reasoning capabilities. Foundation model based agents derive their autonomy from the capabilities of foundation models, which enable them to autonomously break down a given goal into a set of manage… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  49. arXiv:2311.06998  [pdf, other

    cs.CR

    The Privacy Pillar -- A Conceptual Framework for Foundation Model-based Systems

    Authors: Tingting Bi, Guangsheng Yu, Qinghua Lu, Xiwei Xu, Nick Van Beest

    Abstract: AI and its relevant technologies, including machine learning, deep learning, chatbots, virtual assistants, and others, are currently undergoing a profound transformation of development and organizational processes within companies. Foundation models present both significant challenges and incredible opportunities. In this context, ensuring the quality attributes of foundation model-based systems i… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 10 pages

  50. arXiv:2311.02554  [pdf, other

    cs.CR eess.SP

    Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks

    Authors: Haide Wang, Ji Zhou, Qingxin Lu, Jianrui Zeng, Yongqing Liao, Weiping Liu, Changyuan Yu, Zhaohui Li

    Abstract: The security issues of passive optical networks (PONs) have always been a concern due to broadcast transmission. Physical-layer security enhancement for the coherent PON should be as significant as improving transmission performance. In this paper, we propose the advanced encryption standard (AES) algorithm and geometric constellation shaping four-level pulse amplitude modulation (GCS-PAM4) pilot-… ▽ More

    Submitted 25 December, 2023; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: The paper has been submitted to the Journal of Lightwave Technology