Skip to main content

Showing 1–50 of 137 results for author: Niu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12435  [pdf, other

    cs.CV

    F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

    Authors: Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

    Abstract: Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV24

  2. arXiv:2407.11163  [pdf, other

    cs.SI math.PR

    Exact Label Recovery in Euclidean Random Graphs

    Authors: Julia Gaudio, Charlie Guan, Xiaochun Niu, Ermin Wei

    Abstract: In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in $\mathbb{R}^d$ according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general mod… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.11196

  3. arXiv:2407.07589  [pdf

    cs.RO

    MSC-LIO: An MSCKF-Based LiDAR-Inertial Odometry with Same-Plane-Point Tracking

    Authors: Tisheng Zhang, Man Yuan, Linfu Wei, Hailiang Tang, Xiaoji Niu

    Abstract: The multi-state constraint Kalman filter (MSCKF) has been proven to be more efficient than graph optimization for visual-based odometry while with similar accuracy. However, it has not yet been properly considered and studied for LiDAR-based odometry. In this paper, we propose a novel tightly coupled LiDAR-inertial odometry based on the MSCKF framework, named MSC-LIO. An efficient LiDAR same-plane… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 9 pages

  4. arXiv:2407.04411  [pdf, other

    cs.CR cs.AI cs.CL

    Waterfall: Framework for Robust and Scalable Text Watermarking

    Authors: Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of u… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  5. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  6. arXiv:2406.12331  [pdf, other

    cs.CL cs.AI

    Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

    Authors: Weizhi Fei, Xueyan Niu, Guoqing Xie, Yanhua Zhang, Bo Bai, Lei Deng, Wei Han

    Abstract: Current Large Language Models (LLMs) face inherent limitations due to their pre-defined context lengths, which impede their capacity for multi-hop reasoning within extensive textual contexts. While existing techniques like Retrieval-Augmented Generation (RAG) have attempted to bridge this gap by sourcing external information, they fall short when direct answers are not readily available. We introd… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.08255  [pdf, other

    cs.CL

    M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

    Authors: Benjamin Hsu, Xiaoyu Liu, Huayang Li, Yoshinari Fujinuma, Maria Nadejde, Xing Niu, Yair Kittenplon, Ron Litman, Raghavendra Pappagari

    Abstract: Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: NAACL 2024, dataset at https://github.com/amazon-science/m3t-multi-modal-translation-bench

  8. arXiv:2406.07069  [pdf, other

    cs.RO eess.SY

    Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

    Authors: Xuezhi Niu, Kaige Tan, Lei Feng

    Abstract: This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the propos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.07065  [pdf, other

    cs.RO eess.SY

    Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

    Authors: Kaige Tan, Xuezhi Niu, Qinglei Ji, Lei Feng, Martin Törngren

    Abstract: This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of model… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2405.15338  [pdf, other

    cs.SD eess.AS

    SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

    Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

    Abstract: We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the gen… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  11. arXiv:2405.11442  [pdf, other

    cs.CV

    Unifying 3D Vision-Language Understanding via Promptable Queries

    Authors: Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li

    Abstract: A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene. However, a considerable gap exists between existing methods and such a unified model, due to the independent application of representation and insufficient exploration of 3D multi-task training. In this paper, we introduce PQ3D, a unified m… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Project page: https://pq3d.github.io

  12. arXiv:2405.08707  [pdf, other

    cs.LG

    Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

    Authors: Xueyan Niu, Bo Bai, Lei Deng, Wei Han

    Abstract: Increasing the size of a Transformer model does not always lead to enhanced performance. This phenomenon cannot be explained by the empirical scaling laws. Furthermore, improved generalization ability occurs as the model memorizes the training samples. We present a theoretical framework that sheds light on the memorization process and performance dynamics of transformer-based language models. We m… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  13. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  14. arXiv:2404.15637  [pdf, other

    cs.SD cs.MM eess.AS

    HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

    Authors: Xinlei Niu, Jing Zhang, Charles Patrick Martin

    Abstract: We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained conditional variational autoencoder (CVAE) that combines the strengths of a latent model with contrastive learning. HybridVC supports text and audio prompts, enabling more flexible voice style conversion. HybridVC models a latent distribution conditioned on speaker embeddings acquired by a pretrained speaker encoder… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  15. arXiv:2404.12713  [pdf, other

    cs.NI

    Energy Conserved Failure Detection for NS-IoT Systems

    Authors: Guojin Liu, Jianhong Zhou, Hang Su, Biaohong Xiong, Xianhua Niu

    Abstract: Nowadays, network slicing (NS) technology has gained widespread adoption within Internet of Things (IoT) systems to meet diverse customized requirements. In the NS based IoT systems, the detection of equipment failures necessitates comprehensive equipment monitoring, which leads to significant resource utilization, particularly within large-scale IoT ecosystems. Thus, the imperative task of reduci… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  16. arXiv:2404.04681  [pdf, other

    cs.IT

    Computation and Critical Transitions of Rate-Distortion-Perception Functions With Wasserstein Barycenter

    Authors: Chunhui Chen, Xueyan Niu, Wenhao Ye, Hao Wu, Bo Bai

    Abstract: The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. We study several variants of the RDP functions through the lens of optimal transport. By transforming the information RDP function into a Wasserstein Barycenter problem, we ide… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.14611. This paper was presented in part at the 2023 IEEE International Symposium on Information Theory

  17. arXiv:2404.01204  [pdf, other

    cs.CL

    The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

    Authors: Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties o… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  18. arXiv:2403.05916  [pdf, other

    cs.CV cs.AI

    GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

    Authors: Hao Lu, Xuesong Niu, Jiyao Wang, Yin Wang, Qingyong Hu, Jiaqi Tang, Yuting Zhang, Kaishen Yuan, Bin Huang, Zitong Yu, Dengbo He, Shuiguang Deng, Hao Chen, Yingcong Chen, Shiguang Shan

    Abstract: Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing,… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  19. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  20. arXiv:2402.15159  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Machine Unlearning of Pre-trained Large Language Models

    Authors: Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue

    Abstract: This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning meth… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: ACL 2024 main. Code and data at https://github.com/yaojin17/Unlearning_LLM

  21. arXiv:2402.10171  [pdf, other

    cs.CL cs.AI

    Data Engineering for Scaling Language Models to 128K Context

    Authors: Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

    Abstract: We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular \textit{the ability to utilize information at arbitrary input locations}, is a capability that is mostly already acquired through large-scale pretraining, and that this capability can be readily extended to contex… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Code at https://github.com/FranxYao/Long-Context-Data-Engineering

  22. arXiv:2402.08934  [pdf, other

    eess.IV cs.CV

    Extreme Video Compression with Pre-trained Diffusion Models

    Authors: Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

    Abstract: Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The conditional diffusion model takes several neural… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  23. Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning

    Authors: Xuecheng Niu, Akinori Ito, Takashi Nose

    Abstract: Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences remains an obstacle that makes the agent training process less efficient. In addition, most previous frameworks start training by randomly choosing training samples, which differs from the hu… ▽ More

    Submitted 20 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Access

    Journal ref: IEEE Access, vol. 12, pp. 46940-46952, 2024

  24. arXiv:2401.11491  [pdf

    cs.RO

    BA-LINS: A Frame-to-Frame Bundle Adjustment for LiDAR-Inertial Navigation

    Authors: Hailiang Tang, Tisheng Zhang, Liqiang Wang, Man Yuan, Xiaoji Niu

    Abstract: Bundle Adjustment (BA) has been proven to improve the accuracy of the LiDAR mapping. However, the BA method has not yet been properly employed in a dead-reckoning navigation system. In this paper, we present a frame-to-frame (F2F) BA for LiDAR-inertial navigation, named BA-LINS. Based on the direct F2F point-cloud association, the same-plane points are associated among the LiDAR keyframes. Hence,… ▽ More

    Submitted 10 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 14 pages, 14 figures

  25. arXiv:2401.09340  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

    Authors: Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang

    Abstract: 3D vision-language grounding, which focuses on aligning language with the 3D physical environment, stands as a cornerstone in the development of embodied agents. In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and int… ▽ More

    Submitted 6 March, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 21 pages

  26. arXiv:2401.05731  [pdf

    cs.IT math.RA

    On Grobner-Shirshov bases for Markov semirings

    Authors: Xiaohui Niu, Wenxi Li, Zhongzhi Wang

    Abstract: In order to investigate the relationship between Shannon information measure of random variables, scholars such as Yeung utilized information diagrams to explore the structured representation of information measures, establishing correspondences with sets. However, this method has limitations when studying information measures of five or more random variables. In this paper, we consider employing… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    MSC Class: 16Y60; 16Z10; 94A15

  27. arXiv:2312.11539  [pdf, other

    cs.AI cs.CL cs.LG

    KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know

    Authors: Shangshang Zheng, He Bai, Yizhe Zhang, Yi Su, Xiaochuan Niu, Navdeep Jaitly

    Abstract: Measuring the alignment between a Knowledge Graph (KG) and Large Language Models (LLMs) is an effective method to assess the factualness and identify the knowledge blind spots of LLMs. However, this approach encounters two primary challenges including the translation of KGs into natural language and the efficient evaluation of these extensive and complex structures. In this paper, we present KGLen… ▽ More

    Submitted 16 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  28. arXiv:2312.09870  [pdf, other

    cs.CR

    CABBA: Compatible Authenticated Bandwidth-efficient Broadcast protocol for ADS-B

    Authors: Mikaëla Ngamboé, Xiao Niu, Benoit Joly, Steven P Biegler, Paul Berthier, Rémi Benito, Greg Rice, José M Fernandez, Gabriela Nicolescu

    Abstract: The Automatic Dependent Surveillance-Broadcast (ADS-B) is a surveillance technology that becomes mandatory in many airspaces. It improves safety, increases efficiency and reduces air traffic congestion by broadcasting aircraft navigation data. Yet, ADS-B is vulnerable to spoofing attacks as it lacks mechanisms to ensure the integrity and authenticity of the data being supplied. None of the existin… ▽ More

    Submitted 12 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: The paper has been submitted to IEEE Transactions on Aerospace and Electronic Systems

  29. arXiv:2312.09571  [pdf, other

    cs.CL cs.IT

    Extending Context Window of Large Language Models via Semantic Compression

    Authors: Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

    Abstract: Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational c… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  30. arXiv:2312.04597  [pdf, other

    cs.CR cs.LG

    TrustFed: A Reliable Federated Learning Framework with Malicious-Attack Resistance

    Authors: Hangn Su, Jianhong Zhou, Xianhua Niu, Gang Feng

    Abstract: As a key technology in 6G research, federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy. However, malicious attackers among the participating clients can intentionally tamper with the training data or the trained model, compromising the accuracy and trustworthiness of the system. To address this issue, in this paper, we propose a hie… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 13 pages, 9figures

  31. arXiv:2312.01809  [pdf

    cs.RO

    SE-LIO: Semantics-enhanced Solid-State-LiDAR-Inertial Odometry for Tree-rich Environments

    Authors: Tisheng Zhang, Linfu Wei, Hailiang Tang, Liqiang Wang, Man Yuan, Xiaoji Niu

    Abstract: In this letter, we propose a semantics-enhanced solid-state-LiDAR-inertial odometry (SE-LIO) in tree-rich environments. Multiple LiDAR frames are first merged and compensated with the inertial navigation system (INS) to increase the point-cloud coverage, thus improving the accuracy of semantic segmentation. The unstructured point clouds, such as tree leaves and dynamic objects, are then removed wi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  32. arXiv:2311.14337  [pdf, other

    cs.CV

    TVT: Training-Free Vision Transformer Search on Tiny Datasets

    Authors: Zimian Wei, Hengyue Pan, Lujun Li, Peijie Dong, Zhiliang Tian, Xin Niu, Dongsheng Li

    Abstract: Training-free Vision Transformer (ViT) architecture search is presented to search for a better ViT with zero-cost proxies. While ViTs achieve significant distillation gains from CNN teacher models on small datasets, the current zero-cost proxies in ViTs do not generalize well to the distillation training paradigm according to our experimental observations. In this paper, for the first time, we inv… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  33. arXiv:2311.00697  [pdf, other

    cs.CL eess.AS

    End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

    Authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

    Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combin… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023. Code: https://github.com/amazon-science/stac-speech-translation

  34. arXiv:2310.03748  [pdf

    eess.SP cs.HC cs.LG

    Phase Synchrony Component Self-Organization in Brain Computer Interface

    Authors: Xu Niu, Na Lu, Huan Luo, Ruofan Yan

    Abstract: Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowl… ▽ More

    Submitted 11 October, 2023; v1 submitted 21 September, 2023; originally announced October 2023.

  35. arXiv:2309.15889  [pdf, other

    eess.IV cs.CV cs.IT cs.LG cs.MM

    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

    Authors: Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

    Abstract: We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. W… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 6 pages, 4 figures

  36. arXiv:2309.04842  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Leveraging Large Language Models for Exploiting ASR Uncertainty

    Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

    Abstract: While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Added references

  37. arXiv:2309.03040  [pdf, other

    cs.CR cs.LG

    Automated CVE Analysis for Threat Prioritization and Impact Prediction

    Authors: Ehsan Aghaei, Ehab Al-Shaer, Waseem Shadid, Xi Niu

    Abstract: The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  38. arXiv:2308.08244  [pdf, other

    cs.IT cs.NI eess.SP

    A Hybrid Wireless Image Transmission Scheme with Diffusion

    Authors: Xueyan Niu, Xu Wang, Deniz Gündüz, Bo Bai, Weichao Chen, Guohua Zhou

    Abstract: We propose a hybrid joint source-channel coding (JSCC) scheme, in which the conventional digital communication scheme is complemented with a generative refinement component to improve the perceptual quality of the reconstruction. The input image is decomposed into two components: the first is a coarse compressed version, and is transmitted following the conventional separation based approach. An a… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  39. arXiv:2308.07770  [pdf, other

    cs.CV

    Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

    Authors: Xin Liu, Kaishen Yuan, Xuesong Niu, Jingang Shi, Zitong Yu, Huanjing Yue, Jingyu Yang

    Abstract: Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific bench… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 13pages, 7 figures

  40. arXiv:2308.00183  [pdf, other

    cs.RO eess.SY

    Hovering Control of Flapping Wings in Tandem with Multi-Rotors

    Authors: Aniket Dhole, Bibek Gupta, Adarsh Salagame, Xuejian Niu, Yizhe Xu, Kaushik Venkatesh, Paul Ghanem, Ioannis Mandralis, Eric Sihite, Alireza Ramezani

    Abstract: This work briefly covers our efforts to stabilize the flight dynamics of Northeastern's tailless bat-inspired micro aerial vehicle, Aerobat. Flapping robots are not new. A plethora of examples is mainly dominated by insect-style design paradigms that are passively stable. However, Aerobat, in addition for being tailless, possesses morphing wings that add to the inherent complexity of flight contro… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  41. arXiv:2307.11196  [pdf, other

    cs.SI math.PR math.ST

    Exact Community Recovery in the Geometric SBM

    Authors: Julia Gaudio, Xiaochun Niu, Ermin Wei

    Abstract: We study the problem of exact community recovery in the Geometric Stochastic Block Model (GSBM), where each vertex has an unknown community label as well as a known position, generated according to a Poisson point process in $\mathbb{R}^d$. Edges are formed independently conditioned on the community labels and positions, where vertices may only be connected by an edge if they are within a prescrib… ▽ More

    Submitted 5 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

  42. arXiv:2307.10554  [pdf, other

    cs.CV cs.AI

    EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

    Authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan

    Abstract: Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV2023

  43. arXiv:2307.06632  [pdf

    cs.RO

    FF-LINS: A Consistent Frame-to-Frame Solid-State-LiDAR-Inertial State Estimator

    Authors: Hailiang Tang, Tisheng Zhang, Xiaoji Niu, Liqiang Wang, Linfu Wei, Jingnan Liu

    Abstract: Most of the existing LiDAR-inertial navigation systems are based on frame-to-map registrations, leading to inconsistency in state estimation. The newest solid-state LiDAR with a non-repetitive scanning pattern makes it possible to achieve a consistent LiDAR-inertial estimator by employing a frame-to-frame data association. In this letter, we propose a robust and consistent frame-to-frame LiDAR-ine… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  44. arXiv:2306.03824  [pdf, other

    cs.LG

    Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters

    Authors: Zhenyu Sun, Xiaochun Niu, Ermin Wei

    Abstract: Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Good generalization indicates the model can predict unseen data correctly when trained under a limited number of data. Federated learning (FL), which has emerged as a popular distributed learning framework, allows multiple devices or clients to train a shared model without viol… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Submitted to NeurIPS 2023

  45. arXiv:2306.02568  [pdf, other

    stat.ML cs.LG

    Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

    Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

    Abstract: We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by ICML 2024

  46. RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

    Authors: Gabriele Sarti, Phu Mon Htut, Xing Niu, Benjamin Hsu, Anna Currey, Georgiana Dinu, Maria Nadejde

    Abstract: Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

    Journal ref: Proceedings of ACL (2023) 1476-1490

  47. arXiv:2305.13547  [pdf, other

    cs.CL cs.NI

    Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

    Authors: Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, Dacheng Tao

    Abstract: Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulti… ▽ More

    Submitted 27 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  48. arXiv:2305.12644  [pdf

    cs.RO

    PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

    Authors: Hailiang Tang, Xiaoji Niu, Tisheng Zhang, Liqiang Wang, Guan Wang, Jingnan Liu

    Abstract: The pose-only (PO) visual representation has been proven to be equivalent to the classical multiple-view geometry, while significantly improving computational efficiency. However, its applicability for real-world navigation in large-scale complex environments has not yet been demonstrated. In this study, we present an efficient pose-only LiDAR-enhanced visual-inertial navigation system (PO-VINS) t… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  49. arXiv:2305.11808  [pdf, other

    cs.CL

    Pseudo-Label Training and Model Inertia in Neural Machine Translation

    Authors: Benjamin Hsu, Anna Currey, Xing Niu, Maria Nădejde, Georgiana Dinu

    Abstract: Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label trai… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: accepted ICLR 2023

  50. arXiv:2305.10029  [pdf, other

    cs.CV

    TextSLAM: Visual SLAM with Semantic Planar Text Features

    Authors: Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei, Wenxian Yu

    Abstract: We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of t… ▽ More

    Submitted 3 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: 19 pages, 23 figures. Whole project page: https://leeby68.github.io/TextSLAM/