Skip to main content

Showing 1–50 of 481 results for author: Wong, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11389  [pdf, ps, other

    cs.NI eess.SP

    Spatial-spectral Cell-free Networks: A Large-scale Case Study

    Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

    Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11078  [pdf, other

    cs.LG cs.AI cs.CV

    Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator

    Authors: Thinh Nguyen, Khoa D Doan, Binh T. Nguyen, Danh Le-Phuoc, Kok-Seng Wong

    Abstract: Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant perfor… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    MSC Class: 68T07 (Primary); 68T45 (Secondary)

  3. arXiv:2407.10825  [pdf, other

    cs.LG cs.CR cs.CV

    Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

    Authors: Quang H. Nguyen, Nguyen Ngoc-Hieu, The-Anh Ta, Thanh Nguyen-Tang, Kok-Seng Wong, Hoang Thanh-Tung, Khoa D. Doan

    Abstract: Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data. Early works on clean-label attacks added triggers to a random subset of the training… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.10548  [pdf, other

    cs.IT

    Fluid Antenna Multiple Access Assisted Integrated Data and Energy Transfer: Outage and Multiplexing Gain Analysis

    Authors: Xiao Lin, Yizhe Zhao, Halvin Yang, Jie Hu, Kai-Kit Wong

    Abstract: Fluid antenna multiple access (FAMA) exploits the spatial opportunities in wireless channels to overcome multiuser interference by position (a.k.a.~port) switching, which can achieve better performance compared to traditional fixed multiple-input multiple-output (MIMO) systems. Additionally, integrated data and energy transfer (IDET) is capable of providing both wireless data transfer (WDT) and wi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  5. arXiv:2407.07917  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape

    Authors: Tuan Nguyen, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong

    Abstract: Despite the promise of Federated Learning (FL) for privacy-preserving model training on distributed data, it remains susceptible to backdoor attacks. These attacks manipulate models by embedding triggers (specific input patterns) in the training data, forcing misclassification as predefined classes during deployment. Traditional single-trigger attacks and recent work on cooperative multiple-trigge… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  6. arXiv:2407.07077  [pdf, other

    cs.CV cs.AI

    ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

    Authors: Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong

    Abstract: While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that consid… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project page: https://haoosz.github.io/ConceptExpress/

  7. arXiv:2407.05890  [pdf, other

    cs.RO cs.CL

    Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: LLM-based agents have demonstrated impressive zero-shot performance in the vision-language navigation (VLN) task. However, these zero-shot methods focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in realistic navigation scenarios. To bridge this gap, we propose AO-Planner, a novel affordances-oriented pla… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.04231  [pdf, other

    cs.CV

    Efficient GANs for Document Image Binarization Based on DWT and Normalization

    Authors: Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang

    Abstract: For document image binarization task, generative adversarial networks (GANs) can generate images where shadows and noise are effectively removed, which allow for text information extraction. The current state-of-the-art (SOTA) method proposes a three-stage network architecture that utilizes six GANs. Despite its excellent model performance, the SOTA network architecture requires long training and… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. arXiv:2407.03144  [pdf, other

    cs.CV

    Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

    Authors: Son Nguyen, Thinh Nguyen, Khoa D Doan, Kok-Seng Wong

    Abstract: Federated Learning (FL) is a distributed machine learning approach that maintains data privacy by training on decentralized data sources. Similar to centralized machine learning, FL is also susceptible to backdoor attacks, where an attacker can compromise some clients by injecting a backdoor trigger into local models of those clients, leading to the global model's behavior being manipulated as des… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.02442  [pdf, other

    cs.IT

    A New Achievable Region of the $K$-User MAC Wiretap Channel with Confidential and Open Messages Under Strong Secrecy

    Authors: Hao Xu, Kai-Kit Wong, Giuseppe Caire

    Abstract: This paper investigates the achievable region of a $K$-user discrete memoryless (DM) multiple access wiretap (MAC-WT) channel, where each user transmits both secret and open messages. All these messages are intended for Bob, while Eve is only interested in the secret messages. In the achievable coding strategy, the confidential information is protected by open messages and also by the introduction… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 61 pages, 15 figures. arXiv admin note: text overlap with arXiv:2209.05403

  11. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters

  12. arXiv:2406.16144  [pdf, other

    cs.CL

    Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  13. arXiv:2406.14867  [pdf, other

    cs.LG cs.AI cs.CL

    DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

    Authors: Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang

    Abstract: Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.14863  [pdf, other

    cs.CR cs.AR

    Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

    Authors: Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang

    Abstract: Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference 2024

  15. arXiv:2406.11258  [pdf, other

    cs.CL

    Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization

    Authors: Minda Hu, Licheng Zong, Hongru Wang, Jingyan Zhou, Jingjing Li, Yichen Gao, Kam-Fai Wong, Yu Li, Irwin King

    Abstract: Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG). However, existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries, resulting in sub-optimal performance. To address these limitations, we propose a novel plug-and-play LL… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.10729  [pdf, other

    cs.LG cs.AI cs.CV

    A Comprehensive Survey of Foundation Models in Medicine

    Authors: Wasif Khan, Seowung Leem, Kyle B. See, Joshua K. Wong, Shaoting Zhang, Ruogu Fang

    Abstract: Foundation models (FMs) are large-scale deep-learning models trained on extensive datasets using self-supervised techniques. These models serve as a base for various downstream tasks, including healthcare. FMs have been adopted with great success across various domains within healthcare, including natural language processing (NLP), computer vision, graph learning, biology, and omics. Existing heal… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 44 pages, and a more compact version is under review

  17. arXiv:2406.09779  [pdf, other

    cs.AI cs.CL cs.CV

    OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

    Authors: Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

    Abstract: Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Langu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  18. arXiv:2406.04253  [pdf, other

    cs.CV

    A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

    Authors: Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: 3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures

  19. arXiv:2406.03098  [pdf, ps, other

    cs.IT eess.SP

    A Data and Model-Driven Deep Learning Approach to Robust Downlink Beamforming Optimization

    Authors: Kai Liang, Gan Zheng, Zan Li, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: This paper investigates the optimization of the long-standing probabilistically robust transmit beamforming problem with channel uncertainties in the multiuser multiple-input single-output (MISO) downlink transmission. This problem poses significant analytical and computational challenges. Currently, the state-of-the-art optimization method relies on convex restrictions as tractable approximations… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in the IEEE Journal on Selected Areas in Communications, Special Issue on Advanced Optimization Theory and Algorithms for Next Generation Wireless Communication Networks

  20. arXiv:2405.20431  [pdf, other

    cs.LG cs.CV

    Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective

    Authors: Khiem Le, Nhan Luong-Ha, Manh Nguyen-Duc, Danh Le-Phuoc, Cuong Do, Kok-Seng Wong

    Abstract: Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates bet… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  21. arXiv:2405.12386  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

    Authors: Sisi Shao, Junhyung Park, Weng Kee Wong

    Abstract: General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  22. arXiv:2405.11520  [pdf, other

    cs.IT eess.SP

    On Performance of FAS-aided Wireless Powered NOMA Communication Systems

    Authors: Farshad Rostami Ghadi, Masoud Kaveh, Kai-Kit Wong, Riku Jantti, Zheng Yan

    Abstract: This paper studies the performance of a wireless powered communication network (WPCN) under the non-orthogonal multiple access (NOMA) scheme, where users take advantage of an emerging fluid antenna system (FAS). More precisely, we consider a scenario where a transmitter is powered by a remote power beacon (PB) to send information to the planar NOMA FAS-equipped users through Rayleigh fading channe… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: This manuscript has been submitted to the 20th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)

  23. Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

    Authors: Xubin Wang, Yunhe Wang, Zhiqing Ma, Ka-Chun Wong, Xiangtao Li

    Abstract: Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search a… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  24. arXiv:2404.00032  [pdf, other

    cs.HC cs.CV eess.IV

    Deployment of Deep Learning Model in Real World Clinical Setting: A Case Study in Obstetric Ultrasound

    Authors: Chun Kit Wong, Mary Ngo, Manxi Lin, Zahra Bashir, Amihai Heen, Morten Bo Søndergaard Svendsen, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen, Aasa Feragen

    Abstract: Despite the rapid development of AI models in medical image analysis, their validation in real-world clinical settings remains limited. To address this, we introduce a generic framework designed for deploying image-based AI models in such settings. Using this framework, we deployed a trained model for fetal ultrasound standard plane detection, and evaluated it in real-time sessions with both novic… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

    Comments: 10 pages

  25. arXiv:2404.00018  [pdf, other

    cs.HC cs.AI cs.SI

    Can AI Outperform Human Experts in Creating Social Media Creatives?

    Authors: Eunkyung Park, Raymond K. Wong, Junbum Kwon

    Abstract: Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures

    MSC Class: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

  26. arXiv:2403.17837  [pdf, other

    cs.CV cs.GR cs.LG cs.MM eess.IV

    GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

    Authors: Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy

    Abstract: High Dynamic Range (HDR) content (i.e., images and videos) has a broad range of applications. However, capturing HDR content from real-world scenes is expensive and time-consuming. Therefore, the challenging task of reconstructing visually accurate HDR images from their Low Dynamic Range (LDR) counterparts is gaining attention in the vision research community. A major challenge in this research pr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE

    MSC Class: Artificial intelligence; Computer vision; Machine learning; Deep learning ACM Class: I.3.3; I.4.5

  27. arXiv:2403.17265  [pdf, other

    cs.IT eess.SP

    Cache-Enabled Millimetre-Wave Fluid Antenna Systems: Modeling and Performance

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, Kin-Fai Tong, Yangyang Zhang

    Abstract: This letter investigates the performance of content caching in a heterogeneous cellular network (HetNet) consisting of fluid antenna system (FAS)-equipped mobile users (MUs) and millimeter-wave (mm-wave) single-antenna small base stations (SBSs), distributed according to the independent homogeneous Poisson point processes (HPPP). In particular, it is assumed that the most popular contents are cach… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  28. arXiv:2403.16516  [pdf, other

    cs.CL cs.CV

    Visually Guided Generative Text-Layout Pre-training for Document Intelligence

    Authors: Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e.g., locations of texts and table-cells). To this end, we propose visually guided generative text-layout pre-training, named ViTLP. Given a document image, the model optimizes hier… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 main conference. The first version of this paper was submitted to OpenReview (https://openreview.net/forum?id=ARtBIBAmNR) in June 2023

  29. arXiv:2403.15605  [pdf, other

    cs.CV cs.LG

    Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization

    Authors: Khiem Le, Long Ho, Cuong Do, Danh Le-Phuoc, Kok-Seng Wong

    Abstract: Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains. Federated Domain Generalization (FedDG) attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift. However, most existing FedDG methods either cause ad… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  30. arXiv:2403.14896  [pdf, other

    cs.CY

    Investigating Bias in LLM-Based Bias Detection: Disparities between LLMs and Human Perception

    Authors: Luyang Lin, Lingzhi Wang, Jinsong Guo, Kam-Fai Wong

    Abstract: The pervasive spread of misinformation and disinformation in social media underscores the critical importance of detecting media bias. While robust Large Language Models (LLMs) have emerged as foundational tools for bias prediction, concerns about inherent biases within these models persist. In this work, we investigate the presence and nature of bias within LLMs and its consequential impact on me… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  31. arXiv:2403.13446  [pdf, other

    cs.CY

    IndiTag: An Online Media Bias Analysis and Annotation System Using Fine-Grained Bias Indicators

    Authors: Luyang Lin, Lingzhi Wang, Jinsong Guo, Jing Li, Kam-Fai Wong

    Abstract: In the age of information overload and polarized discourse, understanding media bias has become imperative for informed decision-making and fostering a balanced public discourse. This paper presents IndiTag, an innovative online media bias analysis and annotation system that leverages fine-grained bias indicators to dissect and annotate bias in digital content. IndiTag offers a novel approach by i… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  32. arXiv:2403.12035  [pdf, other

    cs.CV

    CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

    Authors: Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

    Abstract: Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achie… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  33. arXiv:2403.08648  [pdf, other

    cs.IT eess.SP

    Meta Reinforcement Learning for Resource Allocation in Aerial Active-RIS-assisted Networks with Rate-Splitting Multiple Access

    Authors: Sajad Faramarzi, Sepideh Javadi, Farshad Zeinali, Hosein Zarini, Mohammad Robat Mili, Mehdi Bennis, Yonghui Li, Kai-Kit Wong

    Abstract: Mounting a reconfigurable intelligent surface (RIS) on an unmanned aerial vehicle (UAV) holds promise for improving traditional terrestrial network performance. Unlike conventional methods deploying passive RIS on UAVs, this study delves into the efficacy of an aerial active RIS (AARIS). Specifically, the downlink transmission of an AARIS network is investigated, where the base station (BS) levera… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  34. arXiv:2403.07860  [pdf, other

    cs.CV

    Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

    Authors: Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong

    Abstract: Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of component… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  35. arXiv:2403.05428  [pdf, other

    cs.MM

    Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

    Authors: Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Xi Zeng, Ruifeng Xu

    Abstract: In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, design… ▽ More

    Submitted 16 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  36. arXiv:2403.05427  [pdf, other

    cs.MM

    Reply with Sticker: New Dataset and Model for Sticker Retrieval

    Authors: Bin Liang, Bingbing Wang, Zhixin Bai, Qiwei Lang, Mingwei Sun, Kaiheng Hou, Kam-Fai Wong, Ruifeng Xu

    Abstract: Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the curren… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2403.02756  [pdf, other

    cs.CL

    Role Prompting Guided Domain Adaptation with General Capability Preserve for Large Language Models

    Authors: Rui Wang, Fei Mi, Yi Chen, Boyang Xue, Hongru Wang, Qi Zhu, Kam-Fai Wong, Ruifeng Xu

    Abstract: The growing interest in Large Language Models (LLMs) for specialized applications has revealed a significant challenge: when tailored to specific domains, LLMs tend to experience catastrophic forgetting, compromising their general capabilities and leading to a suboptimal user experience. Additionally, crafting a versatile model for multiple domains simultaneously often results in a decline in over… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  38. arXiv:2403.01852  [pdf, other

    cs.CV

    PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

    Authors: Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong

    Abstract: Recent advancements in large-scale pre-trained text-to-image models have led to remarkable progress in semantic image synthesis. Nevertheless, synthesizing high-quality images with consistent semantics and layout remains a challenge. In this paper, we propose the adaPtive LAyout-semantiC fusion modulE (PLACE) that harnesses pre-trained models to alleviate the aforementioned issues. Specifically, w… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  39. arXiv:2402.19001  [pdf, other

    cs.CV

    Analysis of the Two-Step Heterogeneous Transfer Learning for Laryngeal Blood Vessel Classification: Issue and Improvement

    Authors: Xinyi Fang, Xu Yang, Chak Fong Chong, Kei Long Wong, Yapeng Wang, Tiankui Zhang, Sio-Kei Im

    Abstract: Accurate classification of laryngeal vascular as benign or malignant is crucial for early detection of laryngeal cancer. However, organizations with limited access to laryngeal vascular images face challenges due to the lack of large and homogeneous public datasets for effective learning. Distinguished from the most familiar works, which directly transfer the ImageNet pre-trained models to the tar… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  40. arXiv:2402.18683  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication Meets Smart Propagation Engineering: Opportunities and Challenges

    Authors: Kaitao Meng, Christos Masouros, Kai-Kit Wong, Athina P. Petropulu, Lajos Hanzo

    Abstract: Both smart propagation engineering as well as integrated sensing and communication (ISAC) constitute promising candidates for next-generation (NG) mobile networks. We provide a synergistic view of these technologies, and explore their mutual benefits. First, moving beyond just intelligent surfaces, we provide a holistic view of the engineering aspects of smart propagation environments. By delving… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 5 figures, submitted to IEEE journal for possible publication

  41. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  42. arXiv:2402.16288  [pdf, other

    cs.CL cs.AI cs.IR

    PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

    Authors: Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, Kam-Fai Wong

    Abstract: Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to i… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  43. arXiv:2402.16261  [pdf, other

    cs.CL cs.IR

    UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval

    Authors: Hongru Wang, Boyang Xue, Baohang Zhou, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Kam-Fai Wong

    Abstract: Conversational retrieval refers to an information retrieval system that operates in an iterative and interactive manner, requiring the retrieval of various external resources, such as persona, knowledge, and even response, to effectively engage with the user and successfully complete the dialogue. However, most previous work trained independent retrievers for each specific resource, resulting in s… ▽ More

    Submitted 28 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  44. arXiv:2402.16116  [pdf, other

    cs.IT eess.SP

    On Performance of RIS-Aided Fluid Antenna Systems

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, Wee Kiat New, Hao Xu, Ross Murch, Yangyang Zhang

    Abstract: This letter studies the performance of reconfigurable intelligent surface (RIS)-aided communications for a fluid antenna system (FAS) enabled receiver. Specifically, a fixed singleantenna base station (BS) transmits information through a RIS to a mobile user (MU) which is equipped with a planar fluid antenna in the absence of a direct link.We first analyze the spatial correlation structures among… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  45. arXiv:2402.15006  [pdf

    cs.CR cs.LG

    opp/ai: Optimistic Privacy-Preserving AI on Blockchain

    Authors: Cathie So, KD Conway, Xiaohang Yu, Suning Yao, Kartin Wong

    Abstract: The convergence of Artificial Intelligence (AI) and blockchain technology is reshaping the digital world, offering decentralized, secure, and efficient AI services on blockchain platforms. Despite the promise, the high computational demands of AI on blockchain raise significant privacy and efficiency concerns. The Optimistic Privacy-Preserving AI (opp/ai) framework is introduced as a pioneering so… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  46. arXiv:2402.14298  [pdf, other

    cs.CL

    Multi-modal Stance Detection: New Datasets and Model

    Authors: Bin Liang, Ang Li, Jingqian Zhao, Lin Gui, Min Yang, Yue Yu, Kam-Fai Wong, Ruifeng Xu

    Abstract: Stance detection is a challenging task that aims to identify public opinion from social media platforms with respect to specific targets. Previous work on stance detection largely focused on pure texts. In this paper, we study multi-modal stance detection for tweets consisting of texts and images, which are prevalent in today's fast-growing social media platforms where people often post multi-moda… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ACL'24 Findings

  47. arXiv:2402.14296  [pdf, other

    cs.CL

    Mitigating Biases of Large Language Models in Stance Detection with Calibration

    Authors: Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu

    Abstract: Large language models (LLMs) have achieved remarkable progress in many natural language processing tasks. However, our experiment reveals that, in stance detection tasks, LLMs may generate biased stances due to sentiment-stance spurious correlations and preference towards certain individuals and topics, thus harming their performance. Therefore, in this paper, we propose to Mitigate Biases of LLMs… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  48. arXiv:2402.14228  [pdf, other

    cs.LG cs.AI

    COPR: Continual Human Preference Learning via Optimal Policy Regularization

    Authors: Han Zhang, Lin Gui, Yu Lei, Yuanzhao Zhai, Yehong Zhang, Yulan He, Hui Wang, Yue Yu, Kam-Fai Wong, Bin Liang, Ruifeng Xu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is commonly utilized to improve the alignment of Large Language Models (LLMs) with human preferences. Given the evolving nature of human preferences, continual alignment becomes more crucial and practical in comparison to traditional static alignment. Nevertheless, making RLHF compatible with Continual Learning (CL) is challenging due to its comple… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  49. arXiv:2402.13606  [pdf, other

    cs.CL

    A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models

    Authors: Boyang Xue, Hongru Wang, Rui Wang, Sheng Wang, Zezhong Wang, Yiming Du, Kam-Fai Wong

    Abstract: The tendency of Large Language Models (LLMs) to generate hallucinations and exhibit overconfidence in predictions raises concerns regarding their reliability. Confidence or uncertainty estimations indicating the extent of trustworthiness of a model's response are essential to developing reliable AI systems. Current research primarily focuses on LLM confidence estimations in English, remaining a vo… ▽ More

    Submitted 16 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  50. arXiv:2402.13514  [pdf, other

    cs.CL cs.AI

    Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions

    Authors: Hongru Wang, Boyang Xue, Baohang Zhou, Tianhua Zhang, Cunxiang Wang, Guanhua Chen, Huimin Wang, Kam-fai Wong

    Abstract: Retrieve-then-read and generate-then-read are two typical solutions to handle unknown and known questions in open-domain question-answering, while the former retrieves necessary external knowledge and the later prompt the large language models to generate internal known knowledge encoded in the parameters. However, few of previous works consider the compositional unknown questions, which consist o… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.