Skip to main content

Showing 1–50 of 374 results for author: Vu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10817  [pdf, other

    cs.CL cs.AI cs.LG

    Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation

    Authors: Tu Vu, Kalpesh Krishna, Salaheddin Alzubi, Chris Tar, Manaal Faruqui, Yun-Hsuan Sung

    Abstract: As large language models (LLMs) advance, it becomes more challenging to reliably evaluate their output due to the high costs of human evaluation. To make progress towards better LLM autoraters, we introduce FLAMe, a family of Foundational Large Autorater Models. FLAMe is trained on our large and diverse collection of 100+ quality assessment tasks comprising 5M+ human judgments, curated and standar… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 31 pages, 5 figures, 7 tables

  2. arXiv:2407.03611  [pdf, other

    cs.SE cs.AI

    An Empirical Study on Capability of Large Language Models in Understanding Code Semantics

    Authors: Thu-Trang Nguyen, Thanh Trong Vu, Hieu Dinh Vo, Son Nguyen

    Abstract: Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks, increasing the application of code LLMs in software development. Despite the success of code LLMs, there remain significant concerns about the actual capabilities and reliability of these models, "whether these models really learn the semantics of code from the traini… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.02937  [pdf, other

    cs.CL cs.SD eess.AS

    Probing the Feasibility of Multilingual Speaker Anonymization

    Authors: Sarina Meyer, Florian Lux, Ngoc Thang Vu

    Abstract: In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing almost exclusively on English data. In this study, we extend a state-of-the-art anonymization system to nine languages by transforming language-dependen… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted at Interspeech 2024

  4. arXiv:2406.19415  [pdf, other

    cs.CL

    An Analysis of Multilingual FActScore

    Authors: Kim Trong Vu, Michael Krumdick, Varshini Reddy, Franck Dernoncourt, Viet Dac Lai

    Abstract: FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FAct… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. Joint Optimization of Switching Point and Power Control in Dynamic TDD Cell-Free Massive MIMO

    Authors: Martin Andersson, Tung T. Vu, Pål Frenger, Erik G. Larsson

    Abstract: We consider a cell-free massive multiple-input multiple-output (CFmMIMO) network operating in dynamic time division duplex (DTDD). The switching point between the uplink (UL) and downlink (DL) data transmission phases can be adapted dynamically to the instantaneous quality-of-service (QoS) requirements in order to improve energy efficiency (EE). To this end, we formulate a problem of optimizing th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Presented at the Asilomar Conference on Signals, Systems, and Computers 2023

  6. arXiv:2406.12593  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

    Authors: Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do

    Abstract: Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes. However, DSIs need full re-training to handle updates in dynamic corpora, causing significant computational inefficiencies. We introduce PromptDSI, a rehearsal-free, prompt-based approach for instance-wise incremental learning in document retrieval. Pr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 21 pages

  7. arXiv:2406.10882  [pdf, other

    cs.CL

    SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking

    Authors: Zhuang Li, Yuncheng Hua, Thuy-Trang Vu, Haolan Zhan, Lizhen Qu, Gholamreza Haffari

    Abstract: Recent studies have shown that maintaining a consistent response style by human experts and enhancing data quality in training sets can significantly improve the performance of fine-tuned Large Language Models (LLMs) while reducing the number of training examples needed. However, the precise definition of style and the relationship between style, data quality, and LLM performance remains unclear.… ▽ More

    Submitted 10 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 21 pages

  8. arXiv:2406.10880  [pdf, other

    cs.CL

    Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR

    Authors: Minghan Wang, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari

    Abstract: Recent advancements in multimodal large language models (MLLMs) have made significant progress in integrating information across various modalities, yet real-world applications in educational and scientific domains remain challenging. This paper introduces the Multimodal Scientific ASR (MS-ASR) task, which focuses on transcribing scientific conference videos by leveraging visual information from s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.08811  [pdf, other

    cs.CL

    Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

    Authors: Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari

    Abstract: Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Work in progress; 15 pages, 7 tables, 4 figures

  10. arXiv:2406.08113  [pdf, other

    cs.CV cs.RO

    Valeo4Cast: A Modular Approach to End-to-End Forecasting

    Authors: Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord

    Abstract: Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the curren… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Winning solution of the Argoverse 2 "Unified Detection, Tracking, and Forecasting" challenge, held at CVPR 2024 WAD

  11. arXiv:2406.06406  [pdf, other

    cs.CL cs.SD eess.AS

    Controlling Emotion in Text-to-Speech with Natural Language Prompts

    Authors: Thomas Bott, Florian Lux, Ngoc Thang Vu

    Abstract: In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points wi… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted at Interspeech 2024

  12. arXiv:2406.06403  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Meta Learning Text-to-Speech Synthesis in over 7000 Languages

    Authors: Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu

    Abstract: In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted at Interspeech 2024

  13. arXiv:2406.03820  [pdf, other

    cs.NI cs.AI cs.CR cs.ET cs.LG

    A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions

    Authors: Ons Aouedi, Thai-Hoc Vu, Alessio Sacco, Dinh C. Nguyen, Kandaraj Piamrat, Guido Marchetto, Quoc-Viet Pham

    Abstract: The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT… ▽ More

    Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by IEEE Communications Surveys & Tutorials

  14. arXiv:2406.01921  [pdf, other

    cs.IT cs.ET cs.NI cs.PF math.NA

    A Novel Paradigm Shift for Next-Generation: Symbiotic Backscatter Rate-Splitting Multiple Access Systems

    Authors: Thai-Hoc Vu, Daniel Benevides da Costa, Bao Vo Nguyen Quoc, Sunghwan Kim

    Abstract: Next-generation wireless networks are projected to empower a broad range of Internet-of-things (IoT) applications and services with extreme data rates, posing new challenges in delivering large-scale connectivity at a low cost to current communication paradigms. Rate-splitting multiple access (RSMA) is one of the most spotlight nominees, conceived to address spectrum scarcity while reaching massiv… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by IEEE International Conference on Communications and Electronics 2024

  15. arXiv:2405.20024  [pdf, other

    cs.NI cs.AI

    Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

    Authors: Thai-Hoc Vu, Senthil Kumar Jagatheesaperumal, Minh-Duong Nguyen, Nguyen Van Huynh, Sunghwan Kim, Quoc-Viet Pham

    Abstract: The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.14169  [pdf, other

    cs.CV

    Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography

    Authors: Nhat Chung, Sensen Gao, Tuan-Anh Vu, Jie Zhang, Aishan Liu, Yun Lin, Jin Song Dong, Qing Guo

    Abstract: Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safet… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 tables, 5 figures, work in progress

  17. arXiv:2405.09335  [pdf, other

    cs.CL

    Prompting-based Synthetic Data Generation for Few-Shot Question Answering

    Authors: Maximilian Schmidt, Andrea Bartezzaghi, Ngoc Thang Vu

    Abstract: Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024

  18. arXiv:2404.10922  [pdf, other

    cs.CL cs.SD eess.AS

    Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

    Authors: Pavel Denisov, Ngoc Thang Vu

    Abstract: Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness th… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: NAACL Findings 2024

  19. arXiv:2404.10681  [pdf, other

    cs.CV

    StyleCity: Large-Scale 3D Urban Scenes Stylization

    Authors: Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung

    Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a larg… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ECCV2024. Project page: https://chenyingshu.github.io/stylecity3d/

  20. Connectivity in Symmetric Semi-Algebraic Sets

    Authors: Cordian Riener, Robin Schabert, Thi Xuan Vu

    Abstract: Semi-algebraic set is a subset of the real space defined by polynomial equations and inequalities. In this paper, we consider the problem of deciding whether two given points in a semi-algebraic set are connected. We restrict to the case when all equations and inequalities are invariant under the action of the symmetric group and their degrees at most $d<n$, where $n$ is the number of variables. A… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  21. arXiv:2404.08590  [pdf, other

    cs.CV cs.AI

    Improving Referring Image Segmentation using Vision-Aware Text Features

    Authors: Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where t… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 30 pages including supplementary

  22. arXiv:2403.19867  [pdf, ps, other

    cs.DS cs.AI cs.LG

    Finding Decision Tree Splits in Streaming and Massively Parallel Models

    Authors: Huy Pham, Hoang Ta, Hoa T. Vu

    Abstract: In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fa… ▽ More

    Submitted 17 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  23. arXiv:2403.17647  [pdf, other

    cs.CL

    Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

    Authors: Pascal Tilli, Ngoc Thang Vu

    Abstract: The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for grap… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  24. arXiv:2403.17582  [pdf, other

    cs.CL cs.AI cs.LG

    Towards a Zero-Data, Controllable, Adaptive Dialog System

    Authors: Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

    Abstract: Conversational Tree Search (Väth et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  25. arXiv:2403.05338  [pdf, other

    cs.CL

    Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

    Authors: Wei Zhou, Heike Adel, Hendrik Schuff, Ngoc Thang Vu

    Abstract: Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution sc… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  26. Cell-Free Massive MIMO with Multi-Antenna Users and Phase Misalignments: A Novel Partially Coherent Transmission Framework

    Authors: Unnikrishnan Kunnath Ganesan, Tung Thanh Vu, Erik G. Larsson

    Abstract: Cell-free massive multiple-input multiple-output (MIMO) is a promising technology for next-generation communication systems. This work proposes a novel partially coherent (PC) transmission framework to cope with the challenge of phase misalignment among the access points (APs), which is important for unlocking the full potential of cell-free massive MIMO technology. With the PC operation, the APs… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 17 pages, 10 figures. Published in IEEE Open Journal of the Communications Society

  27. arXiv:2402.11199  [pdf, other

    cs.CL

    Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

    Authors: Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

    Abstract: Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizi… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Minh-Vuong Nguyen and Linhao Luo are co-first authors and contributed equally to the preparation of this manuscript. Accepted to ACL24-Findings

  28. arXiv:2402.10552  [pdf, other

    cs.CL

    Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models

    Authors: Minghan Wang, Thuy-Trang Vu, Yuxia Wang, Ehsan Shareghi, Gholamreza Haffari

    Abstract: Simultaneous machine translation (SimulMT) presents a challenging trade-off between translation quality and latency. Recent studies have shown that LLMs can achieve good performance in SimulMT tasks. However, this often comes at the expense of high inference cost and latency. In this paper, we propose a conversational SimulMT framework to enhance the inference efficiency of LLM-based SimulMT throu… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  29. arXiv:2402.09264  [pdf, other

    cs.LG cs.HC

    UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers

    Authors: Hong Jia, Young D. Kwon, Dong Ma, Nhat Pham, Lorena Qendro, Tam Vu, Cecilia Mascolo

    Abstract: Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's outp… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  30. arXiv:2402.03805  [pdf, other

    cs.SE

    Automated Description Generation for Software Patches

    Authors: Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, Son Nguyen

    Abstract: Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an ap… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Pre-print version of PATCHEXPLAINER

  31. arXiv:2402.01364  [pdf, other

    cs.CL cs.LG

    Continual Learning for Large Language Models: A Survey

    Authors: Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, Gholamreza Haffari

    Abstract: Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. However, updates are necessary to endow LLMs with new skills and keep them up-to-date with rapidly evolving human knowledge. This paper surveys recent works on continual learning for LLMs. Due to the unique nature of LLMs, we catalog continue learning techniques in a… ▽ More

    Submitted 7 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  32. arXiv:2401.06468  [pdf, other

    cs.CL

    Adapting Large Language Models for Document-Level Machine Translation

    Authors: Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari

    Abstract: Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation… ▽ More

    Submitted 9 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: work in progress; 23 pages, 19 tables, 7 figures

  33. arXiv:2401.06071  [pdf, other

    cs.CV cs.CL

    GroundingGPT:Language Enhanced Multi-modal Grounding Model

    Authors: Zhaowei Li, Qi Xu, Dong Zhang, Hang Song, Yiqing Cai, Qi Qi, Ran Zhou, Junting Pan, Zefeng Li, Van Tu Vu, Zhida Huang, Tao Wang

    Abstract: Multi-modal large language models have demonstrated impressive performance across various tasks in different modalities. However, existing multi-modal models primarily emphasize capturing global information within each modality while neglecting the importance of perceiving local information across modalities. Consequently, these models lack the ability to effectively understand the fine-grained de… ▽ More

    Submitted 5 March, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2401.05425  [pdf, other

    eess.SP cs.LG

    An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

    Authors: Abdul Aziz, Nhat Pham, Neel Vora, Cody Reynolds, Jaime Lehnen, Pooja Venkatesh, Zhuoran Yao, Jay Harvey, Tam Vu, Kan Ding, Phuc Nguyen

    Abstract: Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  35. arXiv:2401.02701  [pdf, ps, other

    cs.IT eess.SP

    Joint User Association and Power Control for Cell-Free Massive MIMO

    Authors: Chongzheng Hao, Tung Thanh Vu, Hien Quoc Ngo, Minh N. Dao, Xiaoyu Dang, Chenghua Wang, Michail Matthaiou

    Abstract: This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs,… ▽ More

    Submitted 20 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: minor revision of the previous version

  36. arXiv:2401.02147  [pdf, other

    cs.CL cs.CV

    Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

    Authors: Ziqiang Zheng, Yiwei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung

    Abstract: Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significan… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 51 pages, 36 figures, Repository: https://github.com/hkust-vgd/Marine_GPT-4V_Eval

  37. arXiv:2312.17505  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation

    Authors: Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Binh-Son Hua, Nhat Minh Chung, Ivor W. Tsang, Sai-Kit Yeung

    Abstract: Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In t… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: This work is under review

  38. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.11127  [pdf, other

    cs.IT eess.SP

    User-centric Flexible Resource Management Framework for LEO Satellites with Fully Regenerative Payload

    Authors: Sovit Bhandari, Thang X. Vu, Symeon Chatzinotas

    Abstract: The regenerative capabilities of next-generation satellite systems offer a novel approach to design low earth orbit (LEO) satellite communication systems, enabling full flexibility in bandwidth and spot beam management, power control, and onboard data processing. These advancements allow the implementation of intelligent spatial multiplexing techniques, addressing the ever-increasing demand for fu… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: To appear in IEEE JSAC

  40. arXiv:2312.09300  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Evaluation Improves Selective Generation in Large Language Models

    Authors: Jie Ren, Yao Zhao, Tu Vu, Peter J. Liu, Balaji Lakshminarayanan

    Abstract: Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely employed, recent research has demonstrated the limitations of using sequence-level probability estimates given by LLMs as reliable indicators of generation quali… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  41. arXiv:2312.09231  [pdf, other

    cs.CV cs.LG

    Reliability in Semantic Segmentation: Can We Use Synthetic Data?

    Authors: Thibaut Loiseau, Tuan-Hung Vu, Mickael Chen, Patrick Pérez, Matthieu Cord

    Abstract: Assessing the reliability of perception models to covariate shifts and out-of-distribution (OOD) detection is crucial for safety-critical applications such as autonomous vehicles. By nature of the task, however, the relevant data is difficult to collect and annotate. In this paper, we challenge cutting-edge generative models to automatically synthesize data for assessing reliability in semantic se… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project Page: https://valeoai.github.io/blog/publications/GenVal

  42. arXiv:2312.01284  [pdf, other

    cs.CV

    Stable Messenger: Steganography for Message-Concealed Image Generation

    Authors: Quang Nguyen, Truong Vu, Cuong Pham, Anh Tran, Khoi Nguyen

    Abstract: In the ever-expanding digital landscape, safeguarding sensitive information remains paramount. This paper delves deep into digital protection, specifically focusing on steganography. While prior research predominantly fixated on individual bit decoding, we address this limitation by introducing ``message accuracy'', a novel metric evaluating the entirety of decoded messages for a more holistic eva… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  43. arXiv:2311.17922  [pdf, other

    cs.CV

    A Simple Recipe for Language-guided Domain Generalized Segmentation

    Authors: Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

    Abstract: Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization c… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  44. arXiv:2311.14762  [pdf, other

    cs.CV cs.AI

    The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

    Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

    Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

  45. arXiv:2311.13152  [pdf, other

    cs.CV

    Test-Time Augmentation for 3D Point Cloud Classification and Segmentation

    Authors: Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revoluti… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: This paper is accepted in 3DV 2024

  46. arXiv:2311.03853  [pdf, other

    cs.NI

    On Deep Reinforcement Learning for Traffic Steering Intelligent ORAN

    Authors: Fatemeh Kavehmadavani, Van-Dinh Nguyen, Thang X. Vu, Symeon Chatzinotas

    Abstract: This paper aims to develop the intelligent traffic steering (TS) framework, which has recently been considered as one of the key developments of 3GPP for advanced 5G. Since achieving key performance indicators (KPIs) for heterogeneous services may not be possible in the monolithic architecture, a novel deep reinforcement learning (DRL)-based TS algorithm is proposed at the non-real-time (non-RT) R… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  47. Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

    Authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

    Abstract: Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

  48. arXiv:2310.17499  [pdf, other

    cs.CL cs.LG eess.AS

    The IMS Toucan System for the Blizzard Challenge 2023

    Authors: Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

    Abstract: For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synt… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Published at the Blizzard Challenge Workshop 2023, colocated with the Speech Synthesis Workshop 2023, a sattelite event of the Interspeech 2023

  49. arXiv:2310.17109  [pdf, other

    cs.CV

    LP-OVOD: Open-Vocabulary Object Detection by Linear Probing

    Authors: Chau Pham, Truong Vu, Khoi Nguyen

    Abstract: This paper addresses the challenging problem of open-vocabulary object detection (OVOD) where an object detector must identify both seen and unseen classes in test images without labeled examples of the unseen classes in training. A typical approach for OVOD is to use joint text-image embeddings of CLIP to assign box proposals to their closest text label. However, this method has a critical issue:… ▽ More

    Submitted 2 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  50. arXiv:2310.15262  [pdf, other

    cs.CL

    Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

    Authors: Injy Hamed, Nizar Habash, Ngoc Thang Vu

    Abstract: Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW.… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023