Search | arXiv e-print repository

doi 10.1109/TNET.2024.3423780

Federated PCA on Grassmann Manifold for IoT Anomaly Detection

Authors: Tung-Anh Nguyen, Long Tan Le, Tuan Dung Nguyen, Wei Bao, Suranga Seneviratne, Choong Seon Hong, Nguyen H. Tran

Abstract: With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with hi… ▽ More With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework, FedPCA, that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FEDPE in Euclidean space and FEDPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a subsampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to nonlinear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted for publication at IEEE/ACM Transactions on Networking

Journal ref: IEEE/ACM Transactions on Networking On page(s): 1-16 Print ISSN: 1063-6692 Online ISSN: 1558-2566 Digital Object Identifier: 10.1109/TNET.2024.3423780

arXiv:2406.16937 [pdf, other]

A Complete Survey on LLM-based AI Chatbots

Authors: Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, Chaoning Zhang

Abstract: The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the A… ▽ More The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. With the advent of OpenAI's ChatGPT, LLM-based chatbots have set new standards in the AI community. This paper presents a complete survey of the evolution and deployment of LLM-based chatbots in various sectors. We first summarize the development of foundational chatbots, followed by the evolution of LLMs, and then provide an overview of LLM-based chatbots currently in use and those in the development phase. Recognizing AI chatbots as tools for generating new knowledge, we explore their diverse applications across various industries. We then discuss the open challenges, considering how the data used to train the LLMs and the misuse of the generated knowledge can cause several issues. Finally, we explore the future outlook to augment their efficiency and reliability in numerous applications. By addressing key milestones and the present-day context of LLM-based chatbots, our survey invites readers to delve deeper into this realm, reflecting on how their next generation will reshape conversational AI. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 23 pages, 10 figures

arXiv:2406.13280 [pdf, other]

Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach

Authors: Yu Min Park, Sheikh Salman Hassan, Yan Kyaw Tun, Eui-Nam Huh, Walid Saad, Choong Seon Hong

Abstract: Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors pr… ▽ More Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration. In this work, a novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication. An optimization problem encompassing user assignment, access point beamforming, and STAR-RIS phase control for reflection and transmission is formulated. The inherent complexity of the formulated problem necessitates a decomposition approach for an efficient solution. This involves tackling different sub-problems with specialized techniques: a many-to-one matching algorithm is employed to assign users to appropriate access points, optimizing resource allocation. To facilitate efficient resource management, access points are grouped using a correlation-based K-means clustering algorithm. Multi-agent deep reinforcement learning (MADRL) is leveraged to optimize the control of the STAR-RIS. Within the proposed MADRL framework, a novel approach is introduced where each decision variable acts as an independent agent, enabling collaborative learning and decision-making. Additionally, the proposed MADRL approach incorporates convex approximation (CA). This technique utilizes suboptimal solutions from successive convex approximation (SCA) to accelerate policy learning for the agents, thereby leading to faster environment adaptation and convergence. Simulations demonstrate significant network utility improvements compared to baseline approaches. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 37 pages, 11 figures, IEEE Transactions on Communications submitted. arXiv admin note: text overlap with arXiv:2311.08708

arXiv:2406.03773 [pdf, other]

Optimizing Multi-User Semantic Communication via Transfer Learning and Knowledge Distillation

Authors: Loc X. Nguyen, Kitae Kim, Ye Lin Tun, Sheikh Salman Hassan, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base… ▽ More Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base station to multiple users with varying computing capacities. Users employ variants of Swin transformer models for source decoding and a simple architecture for channel decoding. We propose a novel training regimen, incorporating transfer learning and knowledge distillation to improve low-computing users' performance. Extensive simulations validate the proposed methods. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 5 pages, 5 figures

arXiv:2406.02000 [pdf, other]

Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications

Authors: Avi Deb Raha, Kitae Kim, Apurba Adhikary, Mrityunjoy Gain, Choong Seon Hong

Abstract: Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,… ▽ More Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead, their performance suffers due to sensitivity to lighting and environmental variations. Due to this sensitivity, Quality of Service (QoS) fluctuates, eventually affecting the stability and dependability of networks in dynamic environments. This emphasizes a critical need for more robust solutions. This paper proposes a robust beamforming technique to ensure consistent QoS under varying environmental conditions. An optimization problem has been formulated to maximize users' data rates. To solve the formulated NP-hard optimization problem, we decompose it into two subproblems: the semantic localization problem and the optimal beam selection problem. To solve the semantic localization problem, we propose a novel method that leverages the k-means clustering and YOLOv8 model. To solve the beam selection problem, we propose a novel lightweight hybrid architecture that utilizes various data sources and a weighted entropy-based mechanism to predict the optimal beams. Rapid and accurate beam predictions are needed to maintain QoS. A novel metric, Accuracy-Complexity Efficiency (ACE), has been proposed to quantify this. Six testing scenarios have been developed to evaluate the robustness of the proposed model. Finally, the simulation result demonstrates that the proposed model outperforms several state-of-the-art baselines regarding beam prediction accuracy, received power, and ACE in the developed test scenarios. △ Less

Submitted 21 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00431 [pdf, ps, other]

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

Authors: Minsu Kim, Walid Saad, Merouane Debbah, Choong Seon Hong

Abstract: The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune… ▽ More The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.19771 [pdf, other]

Data Service Maximization in Integrated Terrestrial-Non-Terrestrial 6G Networks: A Deep Reinforcement Learning Approach

Authors: Nway Nway Ei, Kitae Kim, Yan Kyaw Tun, Choong Seon Hong

Abstract: Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestria… ▽ More Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestrial networks like low earth orbit (LEO) satellites. In this work, we study a data service maximization problem in an integrated terrestrial-non-terrestrial network (I-TNT) where the ground base stations (GBSs) and LEO satellites cooperatively serve the coexisting aerial users (AUs) and ground users (GUs). Then, by considering the spectrum scarcity, interference, and QoS requirements of the users, we jointly optimize the user association, AUE's trajectory, and power allocation. To tackle the formulated mixed-integer non-convex problem, we disintegrate it into two subproblems: 1) user association problem and 2) trajectory and power allocation problem. Since the user association problem is a binary integer programming problem, we use the standard convex optimization method to solve it. Meanwhile, the trajectory and power allocation problem is solved by the deep deterministic policy gradient (DDPG) method to cope with the problem's non-convexity and dynamic network environments. Then, the two subproblems are alternately solved by the proposed iterative algorithm. By comparing with the baselines in the existing literature, extensive simulations are conducted to evaluate the performance of the proposed framework. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures

arXiv:2405.15230 [pdf, other]

$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization

Authors: Long Tan Le, Han Shu, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H. Tran

Abstract: While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited… ▽ More While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labelled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Under Review

arXiv:2404.09259 [pdf, other]

FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

Authors: Yu Qiao, Huy Q. Le, Mengchun Zhang, Apurba Adhikary, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged… ▽ More Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged signals as a form of regularization and only focusing on one aspect of these non-IID challenges. Given these limitations, this paper clarifies these two non-IID challenges and attempts to introduce cluster representation to address them from both local and global perspectives. Specifically, we propose a dual-clustered feature contrast-based FL framework with dual focuses. First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity. Then, we facilitate cross-client knowledge sharing by pulling the local representation closer to clusters shared by clients with similar semantics while pushing them away from clusters with dissimilar semantics. Second, since the sizes of local clusters belonging to the same class may differ for each client, we further utilize clustering on the global side and conduct averaging to create a consistent global signal for guiding each local training in a contrastive manner. Experimental results on multiple datasets demonstrate that our proposal achieves comparable or superior performance gain under intra-domain and inter-domain heterogeneity. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.06776 [pdf, other]

Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data

Authors: Yu Qiao, Chaoning Zhang, Apurba Adhikary, Choong Seon Hong

Abstract: Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks. However, challenges arise due to vulnerability to adversarial examples (AEs) and the non-independent and identically distributed (non-IID) nature of data distribution among devices, hindering the deployment of adversarially robust and accurate learning models at the e… ▽ More Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks. However, challenges arise due to vulnerability to adversarial examples (AEs) and the non-independent and identically distributed (non-IID) nature of data distribution among devices, hindering the deployment of adversarially robust and accurate learning models at the edge. While adversarial training (AT) is commonly acknowledged as an effective defense strategy against adversarial attacks in centralized training, we shed light on the adverse effects of directly applying AT in FL that can severely compromise accuracy, especially in non-IID challenges. Given this limitation, this paper proposes FatCC, which incorporates local logit \underline{C}alibration and global feature \underline{C}ontrast into the vanilla federated adversarial training (\underline{FAT}) process from both logit and feature perspectives. This approach can effectively enhance the federated system's robust accuracy (RA) and clean accuracy (CA). First, we propose logit calibration, where the logits are calibrated during local adversarial updates, thereby improving adversarial robustness. Second, FatCC introduces feature contrast, which involves a global alignment term that aligns each local representation with unbiased global features, thus further enhancing robustness and accuracy in federated adversarial environments. Extensive experiments across multiple datasets demonstrate that FatCC achieves comparable or superior performance gains in both CA and RA compared to other baselines. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.05131 [pdf, other]

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Authors: Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Jingyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang

Abstract: The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu… ▽ More The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI). △ Less

Submitted 7 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: First complete survey on Text-to-Video Generation, 44 pages, 20 figures

arXiv:2403.02803 [pdf, other]

Towards Robust Federated Learning via Logits Calibration on Non-IID Data

Authors: Yu Qiao, Apurba Adhikary, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution be… ▽ More Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution between edge devices can further degrade the performance of models. Consequently, both AEs and non-IID pose challenges to deploying robust learning models at the edge. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks, which can be termed as federated adversarial training (FAT). Moreover, we address the non-IID challenge by implementing a simple yet effective logits calibration strategy under the FAT framework, which can enhance the robustness of models when subjected to adversarial attacks. Specifically, we employ a direct strategy to adjust the logits output by assigning higher weights to classes with small samples during training. This approach effectively tackles the class imbalance in the training data, with the goal of mitigating biases between local and global models. Experimental results on three dataset benchmarks, MNIST, Fashion-MNIST, and CIFAR-10 show that our strategy achieves competitive results in natural and robust accuracy compared to several baselines. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE NOMS 2024

arXiv:2402.06638 [pdf, other]

doi 10.1109/ICOIN56518.2023.10048928

Transformers with Attentive Federated Aggregation for Time Series Stock Forecasting

Authors: Chu Myaet Thwal, Ye Lin Tun, Kitae Kim, Seong-Bae Park, Choong Seon Hong

Abstract: Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and cruci… ▽ More Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and crucial application, the adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. In contrast to the challenges in NLP and CV, time series problems not only add the complexity of order or temporal dependence among input sequences but also consider trend, level, and seasonality information that much of this data is valuable for decision making. The conventional training scheme has shown deficiencies regarding model overfitting, data scarcity, and privacy issues when working with transformers for a forecasting task. In this work, we propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises. Empirical results on various stock data from the Yahoo! Finance website indicate the superiority of our proposed scheme in dealing with the above challenges and data heterogeneity in federated learning. △ Less

Submitted 22 January, 2024; originally announced February 2024.

Comments: Published in IEEE ICOIN 2023

arXiv:2401.13898 [pdf, other]

Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality

Authors: Huy Q. Le, Chu Myaet Thwal, Yu Qiao, Ye Lin Tun, Minh N. H. Nguyen, Choong Seon Hong

Abstract: Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, signifi… ▽ More Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a novel approach for MFL under severely missing modalities by conducting the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on three multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating these challenges and improving the overall performance. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 12 pages, 8 figures, 5 tables

arXiv:2401.11736 [pdf, other]

doi 10.1109/BigComp51126.2021.00035

Attention on Personalized Clinical Decision Support System: Federated Learning Approach

Authors: Chu Myaet Thwal, Kyi Thar, Ye Lin Tun, Choong Seon Hong

Abstract: Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a… ▽ More Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a neural network requires a rich amount of data but heterogeneous and vulnerable properties of clinical data introduce a challenge for the traditional centralized network. Moreover, adding new inputs to a medical database requires re-training an existing model from scratch. To tackle these challenges, we proposed a deep learning-based clinical decision support system trained and managed under a federated learning paradigm. We focused on a novel strategy to guarantee the safety of patient privacy and overcome the risk of cyberattacks while enabling large-scale clinical data mining. As a result, we can leverage rich clinical data for training each local neural network without the need for exchanging the confidential data of patients. Moreover, we implemented the proposed scheme as a sequence-to-sequence model architecture integrating the attention mechanism. Thus, our objective is to provide a personalized clinical decision support system with evolvable characteristics that can deliver accurate solutions and assist healthcare professionals in medical diagnosing. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Published in IEEE BigComp 2021

arXiv:2401.11652 [pdf, other]

doi 10.1016/j.neunet.2023.11.044

OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning

Authors: Chu Myaet Thwal, Minh N. H. Nguyen, Ye Lin Tun, Seong Tae Kim, My T. Thai, Choong Seon Hong

Abstract: Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives… ▽ More Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives to modern convolutional neural networks (CNNs) for centralized training, the unprecedented size and higher computational demands hinder their deployment on resource-constrained edge devices, challenging their widespread application in FL. Since client devices in FL typically have limited computing resources and communication bandwidth, models intended for such devices must strike a balance between model size, computational efficiency, and the ability to adapt to the diverse and non-IID data distributions encountered in FL. To address these challenges, we propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources. Our models incorporate image-specific inductive biases through the LCT tokenizer by leveraging efficient depthwise separable convolutions in residual linear bottleneck blocks to extract local features, while the multi-head self-attention (MHSA) mechanism in the LCT encoder implicitly facilitates capturing global representations of images. Extensive experiments on benchmark image datasets indicate that our models outperform existing lightweight vision models while having fewer parameters and lower computational demands, making them suitable for FL scenarios with data heterogeneity and communication bottlenecks. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Published in Neural Networks

arXiv:2401.11647 [pdf, other]

LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning

Authors: Ye Lin Tun, Chu Myaet Thwal, Le Quang Huy, Minh N. H. Nguyen, Choong Seon Hong

Abstract: Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge device… ▽ More Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge devices to incrementally train a single layer of the model at a time. Our LW-FedSSL comprises server-side calibration and representation alignment mechanisms to maintain comparable performance with end-to-end federated self-supervised learning (FedSSL) while significantly lowering clients' resource requirements. In a pure layer-wise training scheme, training one layer at a time may limit effective interaction between different layers of the model. The server-side calibration mechanism takes advantage of the resource-rich server in an FL environment to ensure smooth collaboration between different layers of the global model. During the local training process, the representation alignment mechanism encourages closeness between representations of FL local models and those of the global model, thereby preserving the layer cohesion established by server-side calibration. Our experiments show that LW-FedSSL has a $3.3 \times$ lower memory requirement and a $3.2 \times$ cheaper communication cost than its end-to-end counterpart. We also explore a progressive training strategy called Prog-FedSSL that outperforms end-to-end training with a similar memory requirement and a $1.8 \times$ cheaper communication cost. △ Less

Submitted 29 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.11419 [pdf, other]

Joint UAV Deployment and Resource Allocation in THz-Assisted MEC-Enabled Integrated Space-Air-Ground Networks

Authors: Yan Kyaw Tun, György Dán, Yu Min Park, Choong Seon Hong

Abstract: Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration amon… ▽ More Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration among unmanned aerial vehicles (UAVs). We then formulate the problem of minimizing the energy consumption of devices and UAVs in the proposed MEC-enabled integrated SAG networks by optimizing tasks offloading decisions, THz sub-bands assignment, transmit power control, and UAVs deployment. The formulated problem is a mixed-integer nonlinear programming (MILP) problem with a non-convex structure, which is challenging to solve. We thus propose a block coordinate descent (BCD) approach to decompose the problem into four sub-problems: 1) device task offloading decision problem, 2) THz sub-band assignment and power control problem, 3) UAV deployment problem, and 4) UAV task offloading decision problem. We then propose to use a matching game, concave-convex procedure (CCP) method, successive convex approximation (SCA), and block successive upper-bound minimization (BSUM) approaches for solving the individual subproblems. Finally, extensive simulations are performed to demonstrate the effectiveness of our proposed algorithm. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 36 pages, 8 figures

arXiv:2312.09579 [pdf, other]

MobileSAMv2: Faster Segment Anything to Everything

Authors: Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, Choong Seon Hong

Abstract: Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by… ▽ More Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by MobileSAM via decoupled knowledge distillation. The efficiency bottleneck of SegEvery with SAM, however, lies in its mask decoder because it needs to first generate numerous masks with redundant grid-search prompts and then perform filtering to obtain the final valid masks. We propose to improve its efficiency by directly generating the final masks with only valid prompts, which can be obtained through object discovery. Our proposed approach not only helps reduce the total time on the mask decoder by at least 16 times but also achieves superior performance. Specifically, our approach yields an average performance boost of 3.6\% (42.5\% \textit{v.s.} 38.9\%) for zero-shot object proposal on the LVIS dataset with the mask AR@$K$ metric. Qualitative results show that our approach generates fine-grained masks while avoiding over-segmenting things. This project targeting faster SegEvery than the original SAM is termed MobileSAMv2 to differentiate from MobileSAM which targets faster SegAny. Moreover, we demonstrate that our new prompt sampling is also compatible with the distilled image encoders in MobileSAM, contributing to a unified framework for efficient SegAny and SegEvery. The code is available at the same link as MobileSAM Project \href{https://github.com/ChaoningZhang/MobileSAM}{\textcolor{red}{https://github.com/ChaoningZhang/MobileSAM}}. \end{abstract} △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: MobileSAM achieves faster segment anything, while MobileSAMv2 achieves faster segment everything

arXiv:2312.08714 [pdf, other]

Aerial STAR-RIS Empowered MEC: A DRL Approach for Energy Minimization

Authors: Pyae Sone Aung, Loc X. Nguyen, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Multi-access Edge Computing (MEC) addresses computational and battery limitations in devices by allowing them to offload computation tasks. To overcome the difficulties in establishing line-of-sight connections, integrating unmanned aerial vehicles (UAVs) has proven beneficial, offering enhanced data exchange, rapid deployment, and mobility. The utilization of reconfigurable intelligent surfaces (… ▽ More Multi-access Edge Computing (MEC) addresses computational and battery limitations in devices by allowing them to offload computation tasks. To overcome the difficulties in establishing line-of-sight connections, integrating unmanned aerial vehicles (UAVs) has proven beneficial, offering enhanced data exchange, rapid deployment, and mobility. The utilization of reconfigurable intelligent surfaces (RIS), specifically simultaneously transmitting and reflecting RIS (STAR-RIS) technology, further extends coverage capabilities and introduces flexibility in MEC. This study explores the integration of UAV and STAR-RIS to facilitate communication between IoT devices and an MEC server. The formulated problem aims to minimize energy consumption for IoT devices and aerial STAR-RIS by jointly optimizing task offloading, aerial STAR-RIS trajectory, amplitude and phase shift coefficients, and transmit power. Given the non-convexity of the problem and the dynamic environment, solving it directly within a polynomial time frame is challenging. Therefore, deep reinforcement learning (DRL), particularly proximal policy optimization (PPO), is introduced for its sample efficiency and stability. Simulation results illustrate the effectiveness of the proposed system compared to benchmark schemes in the literature. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.16538 [pdf, other]

Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

Authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong

Abstract: Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challen… ▽ More Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. Federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16535 [pdf, other]

doi 10.1016/j.neunet.2023.06.010

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data

Authors: Ye Lin Tun, Minh N. H. Nguyen, Chu Myaet Thwal, Jinwoo Choi, Choong Seon Hong

Abstract: Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters.… ▽ More Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters. One effective client clustering strategy is to allow clients to choose their own local models from a model pool based on their performance. However, without pre-trained model parameters, such a strategy is prone to clustering failure, in which all clients choose the same model. Unfortunately, collecting a large amount of labeled data for pre-training can be costly and impractical in distributed environments. To overcome this challenge, we leverage self-supervised contrastive learning to exploit unlabeled data for the pre-training of FL systems. Together, self-supervised pre-training and client clustering can be crucial components for tackling the data heterogeneity issues of FL. Leveraging these two crucial strategies, we propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems. In this work, we demonstrate the effectiveness of CP-CFL through extensive experiments in heterogeneous FL settings, and present various interesting observations. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Published in Neural Networks

arXiv:2311.11465 [pdf, other]

Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape

Authors: Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong

Abstract: In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of t… ▽ More In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of texture \textit{v.s.} shape. Different from label-oriented recognition tasks, the SAM is trained to predict a mask for covering the object shape based on a promt. With this said, it seems self-evident that the SAM is biased towards shape. In this work, however, we reveal an interesting finding: the SAM is strongly biased towards texture-like dense features rather than shape. This intriguing finding is supported by a novel setup where we disentangle texture and shape cues and design texture-shape cue conflict for mask prediction. △ Less

Submitted 3 June, 2023; originally announced November 2023.

arXiv:2311.08708 [pdf, other]

Joint User Pairing and Beamforming Design of Multi-STAR-RISs-Aided NOMA in the Indoor Environment via Multi-Agent Reinforcement Learning

Authors: Yu Min Park, Yan Kyaw Tun, Choong Seon Hong

Abstract: The development of 6G/B5G wireless networks, which have requirements that go beyond current 5G networks, is gaining interest from academia and industry. However, to increase 6G/B5G network quality, conventional cellular networks that rely on terrestrial base stations are constrained geographically and economically. Meanwhile, NOMA allows multiple users to share the same resources, which improves t… ▽ More The development of 6G/B5G wireless networks, which have requirements that go beyond current 5G networks, is gaining interest from academia and industry. However, to increase 6G/B5G network quality, conventional cellular networks that rely on terrestrial base stations are constrained geographically and economically. Meanwhile, NOMA allows multiple users to share the same resources, which improves the spectral efficiency of the system and has the advantage of supporting a larger number of users. Additionally, by intelligently manipulating the phase and amplitude of both the reflected and transmitted signals, STAR-RISs can achieve improved coverage, increased spectral efficiency, and enhanced communication reliability. However, STAR-RISs must simultaneously optimize the amplitude and phase shift corresponding to reflection and transmission, which makes the existing terrestrial networks more complicated and is considered a major challenging issue. Motivated by the above, we study the joint user pairing for NOMA and beamforming design of Multi-STAR-RISs in an indoor environment. Then, we formulate the optimization problem with the objective of maximizing the total throughput of MUs by jointly optimizing the decoding order, user pairing, active beamforming, and passive beamforming. However, the formulated problem is a MINLP. To address this challenge, we first introduce the decoding order for NOMA networks. Next, we decompose the original problem into two subproblems, namely: 1) MU pairing and 2) Beamforming optimization under the optimal decoding order. For the first subproblem, we employ correlation-based K-means clustering to solve the user pairing problem. Then, to jointly deal with beamforming vector optimizations, we propose MAPPO, which can make quick decisions in the given environment owing to its low complexity. △ Less

Submitted 16 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 8 pages, 9 figures, IEEE/IFIP Network Operations and Management Symposium (NOMS) 2024 submitted

arXiv:2310.13236 [pdf, other]

An Efficient Federated Learning Framework for Training Semantic Communication System

Authors: Loc X. Nguyen, Huy Q. Le, Ye Lin Tun, Pyae Sone Aung, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in pract… ▽ More Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in practice, data is mainly created on the client side. Due to privacy and security concerns, the transmission of data is restricted, which is necessary for conventional centralized training schemes. To address this challenge, we explore semantic communication in a federated learning (FL) setting that utilizes client data without leaking privacy. Additionally, we design our system to tackle the communication overhead by reducing the quantity of information delivered in each global round. In this way, we can save significant bandwidth for resource-limited devices and reduce overall network traffic. Finally, we introduce a mechanism to aggregate the global model from clients, called FedLol. Extensive simulation results demonstrate the effectiveness of our proposed technique compared to baseline methods. △ Less

Submitted 9 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 5 pages, 3 figures

arXiv:2310.09021 [pdf, other]

Generative AI-driven Semantic Communication Framework for NextG Wireless Network

Authors: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong

Abstract: This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed re… ▽ More This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed reality, and the Internet of Everything, where real-time and efficient data transmission is paramount. Therefore, to reduce communication overhead and maintain the QoS of emerging applications such as metaverse, ITS, and digital twin creation, this work proposes a novel semantic communication framework. First, an intelligent semantic transmitter is designed to capture the meaningful information (e.g., the rode-side image in ITS) by designing a domain-specific Mobile Segment Anything Model (MSAM)-based mechanism to reduce the potential communication traffic while QoS remains intact. Second, the concept of generative AI is introduced for building the SemCom to reconstruct and denoise the received semantic data frame at the receiver end. In particular, the Generative Adversarial Network (GAN) mechanism is designed to maintain a superior quality reconstruction under different signal-to-noise (SNR) channel conditions. Finally, we have tested and evaluated the proposed semantic communication (SemCom) framework with the real-world 6G scenario of ITS; in particular, the base station equipped with an RGB camera and a mmWave phased array. Experimental results demonstrate the efficacy of the proposed SemCom framework by achieving high-quality reconstruction across various SNR channel conditions, resulting in 93.45% data reduction in communication. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.15659 [pdf, other]

Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency

Authors: Long Tan Le, Tuan Dung Nguyen, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H. Tran

Abstract: Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by commun… ▽ More Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13223 [pdf, other]

Causal Reasoning: Charting a Revolutionary Course for Next-Generation AI-Native Wireless Networks

Authors: Christo Kurisummoottil Thomas, Christina Chaccour, Walid Saad, Merouane Debbah, Choong Seon Hong

Abstract: Despite the basic premise that next-generation wireless networks (e.g., 6G) will be artificial intelligence (AI)-native, to date, most existing efforts remain either qualitative or incremental extensions to existing "AI for wireless" paradigms. Indeed, creating AI-native wireless networks faces significant technical challenges due to the limitations of data-driven, training-intensive AI. These lim… ▽ More Despite the basic premise that next-generation wireless networks (e.g., 6G) will be artificial intelligence (AI)-native, to date, most existing efforts remain either qualitative or incremental extensions to existing "AI for wireless" paradigms. Indeed, creating AI-native wireless networks faces significant technical challenges due to the limitations of data-driven, training-intensive AI. These limitations include the black-box nature of the AI models, their curve-fitting nature, which can limit their ability to reason and adapt, their reliance on large amounts of training data, and the energy inefficiency of large neural networks. In response to these limitations, this article presents a comprehensive, forward-looking vision that addresses these shortcomings by introducing a novel framework for building AI-native wireless networks; grounded in the emerging field of causal reasoning. Causal reasoning, founded on causal discovery, causal representation learning, and causal inference, can help build explainable, reasoning-aware, and sustainable wireless networks. Towards fulfilling this vision, we first highlight several wireless networking challenges that can be addressed by causal discovery and representation, including ultra-reliable beamforming for terahertz (THz) systems, near-accurate physical twin modeling for digital twins, training data augmentation, and semantic communication. We showcase how incorporating causal discovery can assist in achieving dynamic adaptability, resilience, and cognition in addressing these challenges. Furthermore, we outline potential frameworks that leverage causal inference to achieve the overarching objectives of future-generation networks, including intent management, dynamic adaptability, human-level cognition, reasoning, and the critical element of time sensitivity. △ Less

Submitted 31 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2308.13735 [pdf, other]

MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree

Authors: Quang Hieu Vo, Linh-Tam Tran, Sung-Ho Bae, Lok-Won Kim, Choong Seon Hong

Abstract: Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these iss… ▽ More Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these issues, this paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs. The proposed architecture leverages an observation from previous works that an output channel in a binary convolution can be computed using another output channel and XNOR operations with weights that differ from the weights of the reused channel. We first construct a fully connected graph with vertices corresponding to output channels, where the distance between two vertices is the number of different values between the weight sets used for these outputs. Then, the MST of the graph with the minimum depth is proposed to reorder output calculations, aiming to reduce computational cost and latency. Moreover, we propose a new learning algorithm to reduce the total MST distance during training. Experimental results on benchmark models demonstrate that our method achieves significant compression ratios with negligible accuracy drops, making it a promising approach for resource-constrained edge-computing devices. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 11 pages, 9 figures, ICCV 2023

arXiv:2308.08279 [pdf, ps, other]

Deep Reinforcement Learning based Joint Spectrum Allocation and Configuration Design for STAR-RIS-Assisted V2X Communications

Authors: Pyae Sone Aung, Loc X. Nguyen, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Vehicle-to-Everything (V2X) communications play a crucial role in ensuring safe and efficient modern transportation systems. However, challenges arise in scenarios with buildings, leading to signal obstruction and coverage limitations. To alleviate these challenges, reconfigurable intelligent surface (RIS) is regarded as an effective solution for communication performance by tuning passive signal… ▽ More Vehicle-to-Everything (V2X) communications play a crucial role in ensuring safe and efficient modern transportation systems. However, challenges arise in scenarios with buildings, leading to signal obstruction and coverage limitations. To alleviate these challenges, reconfigurable intelligent surface (RIS) is regarded as an effective solution for communication performance by tuning passive signal reflection. RIS has acquired prominence in 6G networks due to its improved spectral efficiency, simple deployment, and cost-effectiveness. Nevertheless, conventional RIS solutions have coverage limitations. Therefore, researchers have started focusing on the promising concept of simultaneously transmitting and reflecting RIS (STAR-RIS), which provides 360\degree coverage while utilizing the advantages of RIS technology. In this paper, a STAR-RIS-assisted V2X communication system is investigated. An optimization problem is formulated to maximize the achievable data rate for vehicle-to-infrastructure (V2I) users while satisfying the latency and reliability requirements of vehicle-to-vehicle (V2V) pairs by jointly optimizing the spectrum allocation, amplitudes, and phase shifts of STAR-RIS elements, digital beamforming vectors for V2I links, and transmit power for V2V pairs. Since it is challenging to solve in polynomial time, we decompose our problem into two sub-problems. For the first sub-problem, we model the control variables as a Markov Decision Process (MDP) and propose a combined double deep Q-network (DDQN) with an attention mechanism so that the model can potentially focus on relevant inputs. For the latter, a standard optimization-based approach is implemented to provide a real-time solution, reducing computational costs. Extensive numerical analysis is developed to demonstrate the superiority of our proposed algorithm compared to benchmark schemes. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 12 pages, 9 figures

arXiv:2307.15469 [pdf, other]

SpaceRIS: LEO Satellite Coverage Maximization in 6G Sub-THz Networks by MAPPO DRL and Whale Optimization

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absor… ▽ More Satellite systems face a significant challenge in effectively utilizing limited communication resources to meet the demands of ground network traffic, characterized by asymmetrical spatial distribution and time-varying characteristics. Moreover, the coverage range and signal transmission distance of low Earth orbit (LEO) satellites are restricted by notable propagation attenuation, molecular absorption, and space losses in sub-terahertz (THz) frequencies. This paper introduces a novel approach to maximize LEO satellite coverage by leveraging reconfigurable intelligent surfaces (RISs) within 6G sub-THz networks. The optimization objectives encompass enhancing the end-to-end data rate, optimizing satellite-remote user equipment (RUE) associations, data packet routing within satellite constellations, RIS phase shift, and ground base station (GBS) transmit power (i.e., active beamforming). The formulated joint optimization problem poses significant challenges owing to its time-varying environment, non-convex characteristics, and NP-hard complexity. To address these challenges, we propose a block coordinate descent (BCD) algorithm that integrates balanced K-means clustering, multi-agent proximal policy optimization (MAPPO) deep reinforcement learning (DRL), and whale optimization (WOA) techniques. The performance of the proposed approach is demonstrated through comprehensive simulation results, exhibiting its superiority over existing baseline methods in the literature. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.13214 [pdf, other]

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

Authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL app… ▽ More Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines. △ Less

Submitted 6 November, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.10575 [pdf, other]

Boosting Federated Learning Convergence with Prototype Regularization

Authors: Yu Qiao, Huy Q. Le, Choong Seon Hong

Abstract: As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneit… ▽ More As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.03402 [pdf, other]

Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity

Authors: Loc X. Nguyen, Ye Lin Tun, Yan Kyaw Tun, Minh N. H. Nguyen, Chaoning Zhang, Zhu Han, Choong Seon Hong

Abstract: Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by… ▽ More Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by users and their computing capacities. To address this issue, we explore a semantic communication system that caters to multiple users with different model architectures by using a multi-purpose transmitter at the base station (BS). Specifically, the BS in the proposed framework employs semantic and channel encoders to encode the image for transmission, while the receiver utilizes its local channel and semantic decoder to reconstruct the original image. Our joint source-channel encoder at the BS can effectively extract and compress semantic features for specific users by considering the signal-to-noise ratio (SNR) and computing capacity of the user. Based on the network status, the joint source-channel encoder at the BS can adaptively adjust the length of the transmitted signal. A longer signal ensures more information for high-quality image reconstruction for the user, while a shorter signal helps avoid network congestion. In addition, we propose a hybrid loss function for training, which enhances the perceptual details of reconstructed images. Finally, we conduct a series of extensive evaluations and ablation studies to validate the effectiveness of the proposed system. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 14 pages, 10 figures

arXiv:2307.02663 [pdf, other]

Convergence of Communications, Control, and Machine Learning for Secure and Autonomous Vehicle Navigation

Authors: Tengchan Zeng, Aidin Ferdowsi, Omid Semiari, Walid Saad, Choong Seon Hong

Abstract: Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for dec… ▽ More Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for decision-making on longitudinal and lateral movements. However, enabling autonomous navigation for CAVs requires a convergent integration of communication, control, and learning systems. The goal of this article is to explicitly expose the challenges related to this convergence and propose solutions to address them in two major use cases: Uncoordinated and coordinated CAVs. In particular, challenges related to the navigation of uncoordinated CAVs include stable path tracking, robust control against cyber-physical attacks, and adaptive navigation controller design. Meanwhile, when multiple CAVs coordinate their movements during navigation, fundamental problems such as stable formation, fast collaborative learning, and distributed intrusion detection are analyzed. For both cases, solutions using the convergence of communication theory, control theory, and machine learning are proposed to enable effective and secure CAV navigation. Preliminary simulation results are provided to show the merits of proposed solutions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 3 figures and 7 pages

arXiv:2306.14289 [pdf, other]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Authors: Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong

Abstract: Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight imag… ▽ More Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find that this is mainly caused by the coupled optimization of the image encoder and mask decoder, motivated by which we propose decoupled distillation. Concretely, we distill the knowledge from the heavy image encoder (ViT-H in the original SAM) to a lightweight image encoder, which can be automatically compatible with the mask decoder in the original SAM. The training can be completed on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM. For inference speed, With a single GPU, MobileSAM runs around 10ms per image: 8ms on the image encoder and 4ms on the mask decoder. With superior performance, our MobileSAM is around 5 times faster than the concurrent FastSAM and 7 times smaller, making it more suitable for mobile applications. Moreover, we show that MobileSAM can run relatively smoothly on CPU. The code for our project is provided at \href{https://github.com/ChaoningZhang/MobileSAM}{\textcolor{red}{MobileSAM}}), with a demo showing that MobileSAM can run relatively smoothly on CPU. △ Less

Submitted 1 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: First work to make SAM lightweight for mobile applications

arXiv:2306.07713 [pdf, other]

Robustness of SAM: Segment Anything Under Corruptions and Beyond

Authors: Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Chenshuang Zhang, Choong Seon Hong

Abstract: Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object and demonstrates impressive zero-shot transfer performance with the guidance of prompts. However, there is currently a lack of comprehensive evaluation regarding its robustness under various corruptions. Understanding the robustness of SAM across different corruption scenarios is crucial for its… ▽ More Segment anything model (SAM), as the name suggests, is claimed to be capable of cutting out any object and demonstrates impressive zero-shot transfer performance with the guidance of prompts. However, there is currently a lack of comprehensive evaluation regarding its robustness under various corruptions. Understanding the robustness of SAM across different corruption scenarios is crucial for its real-world deployment. Prior works show that SAM is biased towards texture (style) rather than shape, motivated by which we start by investigating its robustness against style transfer, which is synthetic corruption. Following by interpreting the effects of synthetic corruption as style changes, we proceed to conduct a comprehensive evaluation for its robustness against 15 types of common corruption. These corruptions mainly fall into categories such as digital, noise, weather, and blur, and within each corruption category, we explore 5 severity levels to simulate real-world corruption scenarios. Beyond the corruptions, we further assess the robustness of SAM against local occlusion and local adversarial patch attacks. To the best of our knowledge, our work is the first of its kind to evaluate the robustness of SAM under style change, local occlusion, and local adversarial patch attacks. Given that patch attacks visible to human eyes are easily detectable, we further assess its robustness against global adversarial attacks that are imperceptible to human eyes. Overall, this work provides a comprehensive empirical study of the robustness of SAM, evaluating its performance under various corruptions and extending the assessment to critical aspects such as local occlusion, local adversarial patch attacks, and global adversarial attacks. These evaluations yield valuable insights into the practical applicability and effectiveness of SAM in addressing real-world challenges. △ Less

Submitted 4 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: The first work evaluates the robustness of SAM under various corruptions such as style transfer, local occlusion, and adversarial patch attack

arXiv:2306.06211 [pdf, other]

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Authors: Chaoning Zhang, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

Abstract: Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attem… ▽ More Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version. △ Less

Submitted 3 July, 2023; v1 submitted 12 May, 2023; originally announced June 2023.

Comments: First survey on Segment Anything Model (SAM), work under progress

arXiv:2305.06131 [pdf, other]

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Authors: Chenghao Li, Chaoning Zhang, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, Choong Seon Hong

Abstract: Generative AI (AIGC, a.k.a. AI generated content) has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AIGC. Due to advancements in text-to-image and 3D modeling technologies (like NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts the firs… ▽ More Generative AI (AIGC, a.k.a. AI generated content) has made significant progress in recent years, with text-guided content generation being the most practical as it facilitates interaction between human instructions and AIGC. Due to advancements in text-to-image and 3D modeling technologies (like NeRF), text-to-3D has emerged as a nascent yet highly active research field. Our work conducts the first comprehensive survey and follows up on subsequent research progress in the overall field, aiming to help readers interested in this direction quickly catch up with its rapid development. First, we introduce 3D data representations, including both Euclidean and non-Euclidean data. Building on this foundation, we introduce various foundational technologies and summarize how recent work combines these foundational technologies to achieve satisfactory text-to-3D results. Additionally, we present mainstream baselines and research directions in recent text-to-3D technology, including fidelity, efficiency, consistency, controllability, diversity, and applicability. Furthermore, we summarize the usage of text-to-3D technology in various applications, including avatar generation, texture generation, shape editing, and scene generation. △ Less

Submitted 10 June, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.00278 [pdf, other]

Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Authors: Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong

Abstract: Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unkn… ▽ More Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unknown whether SAM can detect things in challenging setups like transparent objects. In this work, we perform an empirical evaluation of two glass-related challenging scenarios: mirror and transparent objects. We found that SAM often fails to detect the glass in both scenarios, which raises concern for deploying the SAM in safety-critical situations that have various forms of glass. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2304.06488 [pdf, other]

One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

Authors: Chaoning Zhang, Chenshuang Zhang, Chenghao Li, Yu Qiao, Sheng Zheng, Sumit Kumar Dam, Mengchun Zhang, Jung Uk Kim, Seong Tae Kim, Jinwoo Choi, Gyeong-Moon Park, Sung-Ho Bae, Lik-Hang Lee, Pan Hui, In So Kweon, Choong Seon Hong

Abstract: OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT… ▽ More OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated ([email protected])

arXiv:2304.01950 [pdf, other]

MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence

Authors: Yu Qiao, Md. Shirajum Munir, Apurba Adhikary, Huy Q. Le, Avi Deb Raha, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a… ▽ More Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a single prototype may not represent a class well. Motivated by this, this paper proposes a multi-prototype federated contrastive learning approach (MP-FedCL) which demonstrates the effectiveness of using a multi-prototype strategy over a single-prototype under non-IID settings, including both label and feature skewness. Specifically, a multi-prototype computation strategy based on \textit{k-means} is first proposed to capture different embedding representations for each class space, using multiple prototypes ($k$ centroids) to represent a class in the embedding space. In each global round, the computed multiple prototypes and their respective model parameters are sent to the edge server for aggregation into a global prototype pool, which is then sent back to all clients to guide their local training. Finally, local training for each client minimizes their own supervised learning tasks and learns from shared prototypes in the global prototype pool through supervised contrastive learning, which encourages them to learn knowledge related to their own class from others and reduces the absorption of unrelated knowledge in each global iteration. Experimental results on MNIST, Digit-5, Office-10, and DomainNet show that our method outperforms multiple baselines, with an average test accuracy improvement of about 4.6\% and 10.4\% under feature and label non-IID distributions, respectively. △ Less

Submitted 11 October, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Internet of Things

arXiv:2303.12296 [pdf, other]

Prototype Helps Federated Learning: Towards Faster Convergence

Authors: Yu Qiao, Seong-Bae Park, Sun Moo Kang, Choong Seon Hong

Abstract: Federated learning (FL) is a distributed machine learning technique in which multiple clients cooperate to train a shared model without exchanging their raw data. However, heterogeneity of data distribution among clients usually leads to poor model inference. In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few ch… ▽ More Federated learning (FL) is a distributed machine learning technique in which multiple clients cooperate to train a shared model without exchanging their raw data. However, heterogeneity of data distribution among clients usually leads to poor model inference. In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few changes to the last global iteration of the typical federated learning process. In the last iteration, the server aggregates the prototypes transmitted from distributed clients and then sends them back to local clients for their respective model inferences. Experiments on two baseline datasets show that our proposal can achieve higher accuracy (at least 1%) and relatively efficient communication than two popular baselines under different heterogeneous settings. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 3 pages, 3 figures

arXiv:2303.11717 [pdf, other]

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Authors: Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Donguk kim, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

Abstract: As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT,… ▽ More As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. Impressed by the capability of the ChatGPT, many people are wondering about its limits: can GPT-5 (or other future GPT variants) help ChatGPT unify all AIGC tasks for diversified content creation? Toward answering this question, a comprehensive review of existing AIGC tasks is needed. As such, our work comes to fill this gap promptly by offering a first look at AIGC, ranging from its techniques to applications. Modern generative AI relies on various technical foundations, ranging from model architecture and self-supervised pretraining to generative modeling methods (like GAN and diffusion models). After introducing the fundamental techniques, this work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT's future. Moreover, we summarize their significant applications in some mainstream industries, such as education and creativity content. Finally, we discuss the challenges currently faced and present an outlook on how generative AI might evolve in the near future. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 56 pages, 548 citations

arXiv:2212.05757 [pdf, other]

Satellite-based ITS Data Offloading & Computation in 6G Networks: A Cooperative Multi-Agent Proximal Policy Optimization DRL with Attention Approach

Authors: Sheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun, Walid Saad, Zhu Han, Choong Seon Hong

Abstract: The proliferation of intelligent transportation systems (ITS) has led to increasing demand for diverse network applications. However, conventional terrestrial access networks (TANs) are inadequate in accommodating various applications for remote ITS nodes, i.e., airplanes and ships. In contrast, satellite access networks (SANs) offer supplementary support for TANs, in terms of coverage flexibility… ▽ More The proliferation of intelligent transportation systems (ITS) has led to increasing demand for diverse network applications. However, conventional terrestrial access networks (TANs) are inadequate in accommodating various applications for remote ITS nodes, i.e., airplanes and ships. In contrast, satellite access networks (SANs) offer supplementary support for TANs, in terms of coverage flexibility and availability. In this study, we propose a novel approach to ITS data offloading and computation services based on SANs. We use low-Earth orbit (LEO) and cube satellites (CubeSats) as independent mobile edge computing (MEC) servers that schedule the processing of data generated by ITS nodes. To optimize offloading task selection, computing, and bandwidth resource allocation for different satellite servers, we formulate a joint delay and rental price minimization problem that is mixed-integer non-linear programming (MINLP) and NP-hard. We propose a cooperative multi-agent proximal policy optimization (Co-MAPPO) deep reinforcement learning (DRL) approach with an attention mechanism to deal with intelligent offloading decisions. We also decompose the remaining subproblem into three independent subproblems for resource allocation and use convex optimization techniques to obtain their optimal closed-form analytical solutions. We conduct extensive simulations and compare our proposed approach to baselines, resulting in performance improvements of 9.9%, 5.2%, and 4.2%, respectively. △ Less

Submitted 14 June, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: 18 Pages, 20 Figures, Submitted to IEEE Transactions on Mobile Computing (TMC)-(Under Major Revision)

arXiv:2211.03703 [pdf, other]

Machine Learning for Metaverse-enabled Wireless Systems: Vision, Requirements, and Challenges

Authors: Latif U. Khan, Ibrar Yaqoob, Khaled Salah, Choong Seon Hong, Dusit Niyato, Zhu Han, Mohsen Guizani

Abstract: Today's wireless systems are posing key challenges in terms of quality of service and quality of physical experience. Metaverse has the potential to reshape, transform, and add innovations to the existing wireless systems. A metaverse is a collective virtual open space that can enable wireless systems using digital twins, digital avatars, and interactive experience technologies. Machine learning (… ▽ More Today's wireless systems are posing key challenges in terms of quality of service and quality of physical experience. Metaverse has the potential to reshape, transform, and add innovations to the existing wireless systems. A metaverse is a collective virtual open space that can enable wireless systems using digital twins, digital avatars, and interactive experience technologies. Machine learning (ML) is indispensable for modeling twins, avatars, and deploying interactive experience technologies. In this paper, we present the role of ML in enabling metaverse-based wireless systems. We identify and discuss a set of key requirements for advancing ML in the metaverse-based wireless systems. Moreover, we present a case study of distributed split federated learning for efficiently training meta-space models. Finally, we discuss the future challenges along with potential solutions. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2210.15850 [pdf, other]

doi 10.1109/BigComp51126.2021.00039

Federated Learning based Energy Demand Prediction with Clustered Aggregation

Authors: Ye Lin Tun, Kyi Thar, Chu Myaet Thwal, Choong Seon Hong

Abstract: To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Colle… ▽ More To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Collecting data from a large number of distributed clients for centralized model training is expensive in terms of communication resources. To take advantage of distributed data in edge systems, centralized training can be replaced by federated learning where each client only needs to upload model updates produced by training on its local data. These model updates are aggregated into a single global model by the server. But since different clients can have different attributes, model updates can have diverse weights and as a result, it can take a long time for the aggregated global model to converge. To speed up the convergence process, we can apply clustering to group clients based on their properties and aggregate model updates from the same cluster together to produce a cluster specific global model. In this paper, we propose a recurrent neural network based energy demand predictor, trained with federated learning on clustered clients to take advantage of distributed data and speed up the convergence process. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted by BigComp 2021

arXiv:2210.15827 [pdf, other]

doi 10.1109/BigComp57234.2023.00017

Federated Learning with Intermediate Representation Regularization

Authors: Ye Lin Tun, Chu Myaet Thwal, Yu Min Park, Seong-Bae Park, Choong Seon Hong

Abstract: In contrast to centralized model training that involves data collection, federated learning (FL) enables remote clients to collaboratively train a model without exposing their private data. However, model performance usually degrades in FL due to the heterogeneous data generated by clients of diverse characteristics. One promising strategy to maintain good performance is by limiting the local trai… ▽ More In contrast to centralized model training that involves data collection, federated learning (FL) enables remote clients to collaboratively train a model without exposing their private data. However, model performance usually degrades in FL due to the heterogeneous data generated by clients of diverse characteristics. One promising strategy to maintain good performance is by limiting the local training from drifting far away from the global model. Previous studies accomplish this by regularizing the distance between the representations learned by the local and global models. However, they only consider representations from the early layers of a model or the layer preceding the output layer. In this study, we introduce FedIntR, which provides a more fine-grained regularization by integrating the representations of intermediate layers into the local training process. Specifically, FedIntR computes a regularization term that encourages the closeness between the intermediate layer representations of the local and global models. Additionally, FedIntR automatically determines the contribution of each layer's representation to the regularization term based on the similarity between local and global representations. We conduct extensive experiments on various datasets to show that FedIntR can achieve equivalent or higher performance compared to the state-of-the-art approaches. Our code is available at https://github.com/YLTun/FedIntR. △ Less

Submitted 20 April, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: IEEE BigComp 2023

arXiv:2210.06649 [pdf, other]

Neuro-symbolic Explainable Artificial Intelligence Twin for Zero-touch IoE in Wireless Network

Authors: Md. Shirajum Munir, Ki Tae Kim, Apurba Adhikary, Walid Saad, Sachin Shetty, Seong-Bae Park, Choong Seon Hong

Abstract: Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning… ▽ More Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning of such behavior. In this paper, a novel neuro-symbolic explainable artificial intelligence twin framework is proposed to enable trustworthy ZSM for a wireless IoE. The physical space of the XAI twin executes a neural-network-driven multivariate regression to capture the time-dependent wireless IoE environment while determining unconscious decisions of IoE service aggregation. Subsequently, the virtual space of the XAI twin constructs a directed acyclic graph (DAG)-based Bayesian network that can infer a symbolic reasoning score over unconscious decisions through a first-order probabilistic language model. Furthermore, a Bayesian multi-arm bandits-based learning problem is proposed for reducing the gap between the expected explained score and the current obtained score of the proposed neuro-symbolic XAI twin. To address the challenges of extensible, modular, and stateless management functions in ZSM, the proposed neuro-symbolic XAI twin framework consists of two learning systems: 1) an implicit learner that acts as an unconscious learner in physical space, and 2) an explicit leaner that can exploit symbolic reasoning based on implicit learner decisions and prior evidence. Experimental results show that the proposed neuro-symbolic XAI twin can achieve around 96.26% accuracy while guaranteeing from 18% to 44% more trust score in terms of reasoning and closed-loop automation. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Submitted to a journal for peer review

arXiv:2209.07228 [pdf, other]

doi 10.1109/TVT.2023.3311537

Joint Trajectory and Resource Optimization of MEC-Assisted UAVs in Sub-THz Networks: A Resources-based Multi-Agent Proximal Policy Optimization DRL with Attention Mechanism

Authors: Yu Min Park, Sheikh Salman Hassan, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: THz band communication technology will be used in the 6G networks to enable high-speed and high-capacity data service demands. However, THz-communication losses arise owing to limitations, i.e., molecular absorption, rain attenuation, and coverage range. Furthermore, to maintain steady THz-communications and overcome coverage distances in rural and suburban regions, the required number of BSs is v… ▽ More THz band communication technology will be used in the 6G networks to enable high-speed and high-capacity data service demands. However, THz-communication losses arise owing to limitations, i.e., molecular absorption, rain attenuation, and coverage range. Furthermore, to maintain steady THz-communications and overcome coverage distances in rural and suburban regions, the required number of BSs is very high. Consequently, a new communication platform that enables aerial communication services is required. Furthermore, the airborne platform supports LoS communications rather than NLoS communications, which helps overcome these losses. Therefore, in this work, we investigate the deployment and resource optimization for MEC-enabled UAVs, which can provide THz-based communications in remote regions. To this end, we formulate an optimization problem to minimize the sum of the energy consumption of both MEC-UAV and MUs and the delay incurred by MUs under the given task information. The formulated problem is a MINLP problem, which is NP-hard. We decompose the main problem into two subproblems to address the formulated problem. We solve the first subproblem with a standard optimization solver, i.e., CVXPY, due to its convex nature. To solve the second subproblem, we design a RMAPPO DRL algorithm with an attention mechanism. The considered attention mechanism is utilized for encoding a diverse number of observations. This is designed by the network coordinator to provide a differentiated fit reward to each agent in the network. The simulation results show that the proposed algorithm outperforms the benchmark and yields a network utility which is $2.22\%$, $15.55\%$, and $17.77\%$ more than the benchmarks. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 13 pages, 12 figures

Showing 1–50 of 137 results for author: Hong, C S