When NOMA Meets AIGC: Enhanced Wireless Federated Learning

Ding Xu, , Lingjie Duan, , and Hongbo Zhu Ding Xu is with the Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (E-mail: [email protected]). He is also with the Pillar of Engineering Systems and Design, Singapore University of Technology and Design, Singapore 487372, Singapore.Lingjie Duan is with the Pillar of Engineering Systems and Design, Singapore University of Technology and Design, Singapore 487372, Singapore (E-mail: [email protected]).Hongbo Zhu is with the Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (E-mail: [email protected]).

Abstract

Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types of data to compensate for the insufficient local data. Nevertheless, downloading synthetic data or uploading local models iteratively takes a lot of time, especially for a large amount of devices. To address this issue, we propose to leverage non-orthogonal multiple access (NOMA) to achieve efficient synthetic data and local model transmission. This paper is the first to combine AIGC and NOMA with WFL to maximally enhance the learning performance. For the proposed NOMA+AIGC-enhanced WFL, the problem of jointly optimizing the synthetic data distribution, two-way communication and computation resource allocation to minimize the global learning error is investigated. The problem belongs to NP-hard mixed integer nonlinear programming, whose optimal solution is intractable to find. We first employ the block coordinate descent method to decouple the complicated-coupled variables, and then resort to our analytical method to derive an efficient low-complexity local optimal solution with partial closed-form results. Extensive simulations validate the superiority of the proposed scheme compared to the existing and benchmark schemes such as the frequency/time division multiple access based AIGC-enhanced schemes.

Index Terms:

Non-orthogonal multiple access, artificial intelligence generated content, wireless federated learning, synthetic data.

I Introduction

Nowadays, due to the explosive growth of Internet of Things (IoT) devices, the data collected by IoT devices are needed to be analyzed using machine learning techniques to support various IoT applications such as augmented reality and virtual reality [1]. However, because of the limited communication resources, it is unbearable to transmit all of the collected data to a data center to perform centralized machine learning. Luckily, the computing capability of devices is surging due to the fast development of chip technology, and it motivates to implement distributed machine learning that lets each device train a learning model locally using its collected data. Wireless federated learning (WFL) is one of the most famous distributed learning framework that allows devices to collaboratively learn a global model while protecting the data privacy [2]. In WFL, devices train local models using local data independently based on a received global model, then the trained local models are transmitted to the WFL server for aggregating to a global model, while the new global model is redistributed back to the devices, and the process repeated until the global model converges. Therefore, wireless transmission schemes affect the WFL performance greatly and need to be tailored for WFL.

Meanwhile, due to data scarcity and heterogeneity, data are limited and unevenly distributed among devices and some important portions of data are missing locally at some devices, which leads to poor learning convergence accuracy [3]. Therefore, to improve global model accuracy, some measures have to be taken. For example, device selection can be performed to select proper devices for global model aggregation [2]. However, this will introduce the fairness issue. Another method is to let devices collect the missing portions of data. However, due to physical limitations, sometimes it is difficult for the devices to collect the missing portions of local data by themselves. Even if collecting data is possible, it will certainly introduce much more latency and energy consumption, which is not desirable for energy-limited devices and delay-sensitive services. In this regard, artificial intelligence generated content (AIGC) [4], which is a promising technology for synthesizing data, can be adopted to generate synthetic data for device local training. Specifically, AIGC can automatically create various data such as texts, images, and videos as the devices request [5, 6], and can save time and resources that may otherwise be spent on the data collection. Since AIGC usually requires high computation capability, it can be deployed at an AIGC server in the cloud computing center, and devices can request the server for synthesizing specific data and then download the synthetic data for local model training.

I-A Related Works

In the context of WFL networks, many works such as [7, 8, 9, 10, 11] have studied various important problems regarding the implementation of WFL. Specifically, in [7], an algorithm for jointly optimizing the learning, radio resource allocation and device selection based on the Hungarian method was proposed to minimize the WFL loss function. In [8], the problem of joint learning and communication resource optimization to minimize the total energy consumption was investigated, and an iterative algorithm was proposed to achieve a local optimal solution. In [9], the computation and the communication resources as well as the number of the local model parameter quantization bits were jointly optimized to minimize the WFL convergence time. In [10], an iterative algorithm for jointly scheduling devices, local iterations and radio resources was developed based on the pointer network embedded deep reinforcement learning method and the breadth-first search method. In [11], an incentive mechanism based on the Stackelberg game was designed to motivate the devices to participate in collaborative model training.

In the above works, [7, 8, 10, 11] adopted the frequency division multiple access (FDMA) wireless transmission scheme, and [9] adopted the time division multiple access (TDMA) wireless transmission scheme. Both FDMA and TDMA belong to orthogonal multiple access (OMA) transmission scheme. However, OMA suffers from low transmission efficiency [12, 13]. To fully realize the potential capability of WFL, more efficient transmission schemes other than OMA are needed. In this context, non-orthogonal multiple access (NOMA) [13, 14], which supports multiple concurrent transmissions on the same spectrum band, is a promising technique to realize high transmission efficiency in WFL networks. Particularly, devices can upload their local models simultaneously using the superposition coding, and the WFL server can decode different device signals by using the successive interference cancellation (SIC) technique.

There have been some works on NOMA-enhanced WFL networks. Specifically, in [15], devices were assumed to be wirelessly powered by the base station (BS), and the problem of joint optimization of the NOMA communication and computation resources to minimize the system-wise cost was investigated. A layered algorithm based on the monotonic optimization was developed to solve the joint optimization problem. In [16], the problem of jointly optimizing device scheduling, transmit power and computation frequency allocation in a relay-assisted NOMA-enhanced WFL network to minimize the energy consumption was solved by graph theory. In [17], devices were grouped into different NOMA groups for local model uploading, and the transmit power and bandwidth of NOMA groups were jointly optimized to maximize the system convergence metric based on convex optimization. In [18], the joint optimization of device selection and resource allocation to minimize the total training latency in a NOMA-enhanced WFL network was carried by the monotonicity analysis and dual decomposition method.

Meanwhile, AIGC in wireless networks has attracted a lot of attention recently, such as the AIGC-enhanced semantic communications [19, 20, 21], blockchain-enabled AIGC [22], distributed AIGC [23], and AIGC service provider selection [24]. AIGC has also been integrated with WFL in recent works [25, 26]. Particularly, the work in [25] applied WFL to achieve efficient AIGC, and presented WFL-based techniques for AIGC to generate diverse and personalized contents. While [25] focused on how WFL can empower the AIGC, the work in [26] adopted AIGC to empower WFL. Specifically, in [26], AIGC was proposed to generate more training data for devices to minimize the device energy consumption under the learning performance constraint.

I-B Motivation and Contributions

Since devices are heterogeneous and the data are limited and unevenly distributed, AIGC can be used to generate the specified data missing at the devices for more efficient local training and improved global convergence performance in WFL. In this regard, the pioneer work [26] has considered to use AIGC to enhance the WFL performance. However, the work [26] ignored the procedure of synthetic data downloading from the AIGC server to the devices, the time of which cannot be neglected in practice. In addition, the work [26] used a low-efficient FDMA transmission scheme for local model uploading, which is inadequate for a large amount of devices. The above research gaps motivate the work in this paper to use the highly efficient NOMA transmission scheme both for synthetic data downloading and local model uploading in AIGC-enhanced WFL networks, and jointly design the synthetic data distribution, two-way communication resource and computation resource allocation to maximize the learning performance. Note that jointly designing resource optimization policy in NOMA+AIGC-enhanced WFL networks is non-trivial, since the optimization variables are highly coupled in such complicated non-convex optimization problems, while the SIC decoding policy in NOMA can further complicate the optimization.

The main contributions of this paper are summarized as follows:

(1) We are the first to propose a NOMA+AIGC-enhanced WFL system model, where devices can download synthetic data from the AIGC server based on NOMA, then train the local models based on the local data and synthetic data, and upload the local models to the WFL server based on NOMA for global model aggregation. The problem of jointly optimizing the synthetic data allocation, the time allocation, the transmit power allocation of the BS and the devices, the SIC decoding order, and the computing frequency allocation, is formulated with the objective of minimizing the global learning error, under various system constraints.

(2) We propose an efficient low-complexity algorithm with partial closed-form results to achieve a local optimal solution to the problem. First, we analytically derive the closed-form optimal computing frequency allocation to simplify the problem. Then, the block coordinate descent (BCD) method is adopted to decouple the complex problem into two simpler subproblems. The first subproblem optimizes the synthetic data allocation and the time allocation, where the optimal time allocation is analytical obtained in closed form, and the optimal synthetic data allocation is obtained via convex optimization. The second subproblem optimizes the transmit power allocation of the BS, the transmit power of the devices, and the SIC decoding order, where the optimal transmit power allocation of the BS and the transmit power of the devices are analytically obtained in a recursive form, and the optimal SIC decoding order is analytically obtained in closed form.

(3) Extensive simulation results are illustrated to demonstrate the superiority of the proposed NOMA+AIGC-enhanced scheme. The results show the advantages of combining NOMA and AIGC with WFL. It is shown that the proposed scheme outperforms the existing and benchmark schemes such as the FDMA or TDMA+AIGC-enhanced schemes, in terms of global learning accuracy, under various system configurations. Particularly, the learning performance improvement of our proposed scheme is more obvious when the data synthesizing capability of the AIGC server is stronger, the maximum latency requirement is more stringent, and the number of devices is larger.

The remainder of this paper is organized as follows. Section II presents the system model of the proposed NOMA+AIGC-enhanced WFL network. Section III formulates the optimization problem of maximizing the learning performance and presents an algorithm framework to solve the optimization problem. Section IV and V presents algorithms to optimally solve the two subproblems. Section VI provides simulation examples to demonstrate the effectiveness of the proposed NOMA+AIGC-enhanced scheme. Finally, the paper is concluded in Section VII.

II System Model

Refer to caption — Figure 1: NOMA+AIGC-enhanced WFL system model.

We consider a NOMA+AIGC-enhanced WFL network consisting of an AIGC server in the cloud, a BS and $K$ devices, as shown in Fig. 1. The BS is equipped with a WFL server for simple computation of global model aggregation, and connects with the upper AIGC server via high-speed wired backhaul. The AIGC server is assumed to be deployed at the cloud computing center and equipped with powerful computation capacity for data synthesis. Each device has a set of original local dataset for local model training. Let $D_{k}^{\mathrm{loc}}$ denote the number of samples in the local dataset of device $k.$ In order to improve the learning model accuracy, AIGC is adopted to synthesize the data required by the devices for performing local training. Specifically, each device can send the data synthesis request to the AIGC server indicating the specific data distributions/properties. After the AIGC server receives the requests from the devices, it will synthesize the requested data and then send the synthetic data to the BS and then the devices for local model training. Let $D_{k}^{\mathrm{gen}}$ denote the amount of synthetic data samples that are generated by a pre-trained AI generative model and are downloaded from the AIGC server to the device $k$ . After the AIGC server pushes the synthetic data to the devices, the devices can train the local models based on both the local dataset and the synthetic dataset.

II-A AIGC-Enhanced WFL Model

The whole AIGC-enhanced WFL procedure consists of three phases [26]. In the first phase, all the devices send the data synthesis request to the AIGC server indicating the specific data distributions/properties, and then the AIGC server synthesizes the data required by all the devices. Due to data scarcity/heterogeneity, the local data available at the devices may lack particular types of data, which may hinder the convergence of the global model. Thus, the synthetic data can compensate for the missing portions of local data to improve the learning convergence [26]. We assume that due to physical limitations, the AIGC server can generate a maximum amount of $D^{\mathrm{gen}}$ synthetic data [6]. Since the synthetic data received by all the devices cannot exceed the maximum amount to bear by the AIGC server, we have the following synthetic data constraint as given by

\sum_{k=1}^{K}D_{k}^{\mathrm{gen}}\leq D^{\mathrm{gen}}.

(1)

In the second phase, the synthetic data are transmitted from the AIGC server in the cloud to the BS and then the devices with time $T^{\mathrm{down}}$ . Since the AIGC server connects with the BS via high-speed wired backhaul, the time for transmission the synthetic data from the AIGC server to the BS is ignored. In the third phase, the WFL server first broadcasts the initial global model to the devices, then each device trains the local model using the local dataset and the synthetic dataset, and finally uploads the trained local model to the WFL server for global model aggregation and update. We assume that the number of global model iterations is fixed at $N.$ Let $D^{\mathrm{mod}},$ $T^{\mathrm{syn}},$ $T^{\mathrm{br}},$ $T^{\mathrm{loc}}$ and $T^{\mathrm{up}}$ denote the local/global model data size in bits, the time required for generating the synthetic data for all the devices, the time for the global model broadcasting of the WFL server, the time for the local model training of all the devices, and the time for the local model uploading of all the devices, respectively. Then, the time of the third phase is $N\left(T^{\mathrm{br}}+T^{\mathrm{loc}}+T^{\mathrm{up}}\right).$ We assume that there is a pre-determined maximum latency $T^{\mathrm{max}}$ for the whole WFL procedure, i.e.,

T^{\mathrm{syn}}+T^{\mathrm{down}}+N\left(T^{\mathrm{br}}+T^{\mathrm{loc}}+T^{% \mathrm{up}}\right)\leq T^{\mathrm{max}}.

(2)

Since the synthetic data are generated by the pre-trained AI generative model in the AIGC server, we assume that $T^{\mathrm{syn}}$ is modeled as a linear function of the amount of synthetic data generated for all the devices [6], i.e., $T^{\mathrm{syn}}=\varrho\sum_{k=1}^{K}D_{k}^{\mathrm{gen}}$ .

According to the results in [26], the global learning error (i.e., the global learning model accuracy) of the WFL depends on the local training dataset sizes of the devices and the number of global iterations, and can be modeled as

\triangle(\mathbf{D}^{\mathrm{gen}})=e^{\frac{N\left(\frac{\alpha}{K}\sum_{k=1% }^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)^{-\beta}-\gamma-1% \right)}{\zeta}},

(3)

where $\mathbf{D}^{\mathrm{gen}}=\{D_{k}^{\mathrm{gen}},k=1,\ldots,K\},$ $\zeta$ is a positive constant parameter, and $\alpha,$ $\beta,$ $\gamma$ are positive hyper-parameters that can be obtained through curve fitting [26].

II-B Computation Model

Let $f_{k},$ $w$ and $\tau$ denote the computing frequency of the device $k,$ the number of CPU cycles of the devices to locally train one data sample, and the local epoch, respectively. The computing frequency of each device is capped by its maximum value, i.e.,

f_{k}\leq f_{k}^{\mathrm{max}},\forall k,

(4)

where $f_{k}^{\mathrm{max}}$ is the maximum computing frequency of the device $k.$ Then, the time of the device $k$ for a single-round local model training is given by

T_{k}^{\mathrm{loc}}=\frac{w\tau\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}% }\right)}{f_{k}}.

(5)

Since each device has to finish the local model training within the required time $T^{\mathrm{loc}},$ we have

T_{k}^{\mathrm{loc}}\leq T^{\mathrm{loc}},\forall k.

(6)

In addition, the energy consumption per single-round local model training of the device $k$ is written as

E_{k}^{\mathrm{loc}}=w\tau\varpi_{k}f_{k}^{2}\left(D_{k}^{\mathrm{loc}}+D_{k}^% {\mathrm{gen}}\right),

(7)

where $\varpi_{k}$ denote the hardware energy coefficient [27].

II-C NOMA-Enhanced Communication Model

Let $h_{k}$ and $g_{k}$ denote the channel gains from the BS to the device $k,$ and from the device $k$ to the BS, respectively. The system bandwidth is $B.$ There are three procedures that involve wireless transmissions, i.e., synthetic data downloading from the BS to the devices, global model broadcasting from the BS to the devices, and local model uploading from the devices to the BS.

For synthetic data downloading, downlink NOMA is applied. Specifically, according to the downlink NOMA principle, the messages intended for the devices are transmitted simultaneously based on the superposition coding. Each device can apply SIC to cancel the messages of other devices whose channel gains are smaller than its own channel gain. Without loss of the generality, we assume the devices are sorted in the ascending order of the channel gain $h_{k},$ i.e., $h_{1}<h_{2}<\ldots<h_{K}$ . Let $p_{k}$ denote the transmit power of the message for the device $k$ . Thus, the achievable rate for synthetic data downloading of the device $k$ is given by

R_{k}^{\mathrm{down}}=B\log_{2}\left(1+\frac{h_{k}p_{k}}{\sigma^{2}B+h_{k}\sum% _{j>k}p_{j}}\right),

(8)

where $\sigma^{2}$ is the noise power spectral density. Since the allocated synthetic data has to be finished transmitting within the time $T^{\mathrm{down}},$ we have

T^{\mathrm{down}}R_{k}^{\mathrm{down}}\geq\Gamma D_{k}^{\mathrm{gen}},\forall k,

(9)

where $\Gamma$ is the size of one data sample in bits. Furthermore, the maximum transmit power of the BS is limited as

\sum_{k=1}^{K}p_{k}\leq P,

(10)

where $P$ is the maximum transmit power of the BS.

For global model broadcasting, the BS is assumed to transmit at its maximum power $P,$ and in order for all the devices to successfully receive the global model, the broadcasting rate is chosen as the minimum rate achievable for all the devices, i.e., $B\log_{2}\left(1+\frac{h_{1}P}{\sigma^{2}B}\right).$ Since the global model data has to be transmitted within the time $T^{\mathrm{br}},$ we have

T^{\mathrm{br}}B\log_{2}\left(1+\frac{h_{1}P}{\sigma^{2}B}\right)\geq D^{% \mathrm{mod}}.

(11)

For local model uploading, uplink NOMA is adopted. Specifically, the messages of all devices are transmitted to the BS simultaneously. Let $q_{k}$ denote the transmit power of the device $k,$ and $Q_{k}$ denote the maximum transmit power of the device $k.$ Then, we have

q_{k}\leq Q_{k},\forall k.

(12)

At the BS, the SIC is adopted to decode the messages of all the devices. Denote by $\pi_{k}$ the SIC decoding order of the device $k$ . Let $\boldsymbol{\pi}=\{\pi_{k},\forall k\},$ and it belongs to the set $\Pi$ of all possible SIC decoding orders of all $K$ messages. Thus, the achievable rate of the device $k$ for local model uploading is expressed as

\displaystyle R_{k}^{\mathrm{up}}=B\log_{2}\left(1+\frac{g_{k}q_{k}}{\sigma^{2% }B+\sum_{j=1,\pi_{j}>\pi_{k}}g_{j}q_{j}}\right).

(13)

In order to upload the local model to the BS within the time $T^{\mathrm{up}},$ we have

T^{\mathrm{up}}R_{k}^{\mathrm{up}}\geq D^{\mathrm{mod}},\forall k.

(14)

In addition, the energy consumption of the device $k$ for local model uploading per single-round training is given by

E_{k}^{\mathrm{up}}=T^{\mathrm{up}}q_{k}.

(15)

III NOMA+AIGC-Enhanced WFL

III-A Problem Formulation

Based on the system description in the previous section, the energy consumption of the device $k$ for a single-round training is written as

E_{k}=E_{k}^{\mathrm{loc}}+E_{k}^{\mathrm{up}}.

(16)

We assume that there is an energy budget for each device, i.e.,

E_{k}\leq E_{k}^{\mathrm{max}},\forall k,

(17)

where $E_{k}^{\mathrm{max}}$ is the maximum energy consumption of the device $k$ per single-round training.

Our focus is maximizing the learning performance under the constraints analyzed in the system description. Specifically, the optimization objective is minimizing the global learning error (i.e., to maximize the global learning model accuracy), and the optimization variables are the synthetic data allocation $\mathbf{D}^{\mathrm{gen}},$ the time allocation $\mathbf{T}=\{T^{\mathrm{down}},T^{\mathrm{br}},T^{\mathrm{loc}},T^{\mathrm{up}% }\},$ the transmit power allocation of the BS $\mathbf{p}=\{p_{k},k=1,\ldots,K\},$ the transmit power of the devices $\mathbf{q}=\{q_{k},k=1,\ldots,K\},$ the SIC decoding order $\boldsymbol{\pi},$ and the computing frequency allocation $\mathbf{f}=\{f_{k},k=1,\ldots,K\}$ . Mathematically, the optimization problem for the NOMA+AIGC-enhanced WFL is formulated as


$\displaystyle\min$	$\displaystyle\>\triangle(\mathbf{D}^{\mathrm{gen}})=e^{\frac{N\left(\frac{% \alpha}{K}\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)% ^{-\beta}-\gamma-1\right)}{\zeta}}$	(18a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\sum_{k=1}^{K}D_{k}^{\mathrm{gen}}\leq D^{\mathrm{gen}},$	(18b)
	$\displaystyle\>T^{\mathrm{syn}}+T^{\mathrm{down}}+N\left(T^{\mathrm{br}}+T^{% \mathrm{loc}}+T^{\mathrm{up}}\right)\leq T^{\mathrm{max}},$	(18c)
	$\displaystyle\>f_{k}\leq f_{k}^{\mathrm{max}},\forall k,$	(18d)
	$\displaystyle\>T_{k}^{\mathrm{loc}}\leq T^{\mathrm{loc}},\forall k$	(18e)
	$\displaystyle\>T^{\mathrm{down}}R_{k}^{\mathrm{down}}\geq\Gamma D_{k}^{\mathrm% {gen}},\forall k,$	(18f)
	$\displaystyle\>\sum_{k=1}^{K}p_{k}\leq P,$	(18g)
	$\displaystyle\>T^{\mathrm{br}}B\log_{2}\left(1+\frac{h_{1}P}{\sigma^{2}B}% \right)\geq D^{\mathrm{mod}},$	(18h)
	$\displaystyle\>q_{k}\leq Q_{k},\forall k,$	(18i)
	$\displaystyle\>T^{\mathrm{up}}R_{k}^{\mathrm{up}}\geq D^{\mathrm{mod}},\forall k,$	(18j)
	$\displaystyle\>E_{k}\leq E_{k}^{\mathrm{max}},\forall k,$	(18k)
	$\displaystyle\>\boldsymbol{\pi}\in\Pi,$	(18l)
	$\displaystyle\>D_{k}^{\mathrm{gen}}\geq 0,\forall k,$	(18m)
	$\displaystyle\>p_{k}\geq 0,\forall k,$	(18n)
	$\displaystyle\>q_{k}\geq 0,\forall k,$	(18o)
	$\displaystyle\>T^{\mathrm{down}}\geq 0,T^{\mathrm{br}}\geq 0,T^{\mathrm{loc}}% \geq 0,T^{\mathrm{up}}\geq 0,$	(18p)
	$\displaystyle\>f_{k}\geq 0,\forall k,$	(18q)
$\displaystyle\mathrm{o.v.}$	$\displaystyle\>\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},% \boldsymbol{\pi},\mathbf{f}.$	(18r)

where the abbreviation ‘ $\mathrm{s.t.}$ ’ stands for ‘subject to’ and the abbreviation ‘ $\mathrm{o.v.}$ ’ stands for ‘optimization variables’. The constraint (18b) restricts that the total transmitted synthetic data is smaller than the totally synthetic data that can be generated in the AIGC server. The constraint (18c) requires that the total time for $N$ WFL rounds is less than the pre-defined maximum latency. The constraint (18d) restricts the maximum computing frequency of each device. The constraint (18e) requires that each device finishes the local model training within the required time. The constraint (18f) requires that the synthetic data can finish transmitting within the synthetic data transmission time. The constraint (18g) restricts the transmit power of the BS. The constraint (18h) requires that the global model data can finish transmitting within the global model broadcasting time. The constraint (18i) restricts the transmit power of each device. The constraint (18j) requires that the local model data can finish transmitting within the local model uploading time. The constraint (18k) restricts the energy consumption of each device. The constraint (18l) restricts the SIC decoding order. The constraints (18m)-(18q) restrict that the optimization variables are non-negative.

III-B Proposed Solution Framework

Since $\boldsymbol{\pi}$ is discrete and the constraints (18f), (18j), (18k) are non-convex, the problem (18) is a mixed integer nonlinear programming (MINLP) problem, which is NP-hard in general, and its optimal solution is thus generally intractable to find. Before proposing an algorithm to solve the problem (18), we present analysis to equivalently simplify the problem (18).

Lemma 1.

The constraint (18e) is satisfied with strict equality by the optimal solution to the problem (18).

Proof:

See Appendix -A. ∎

Lemma 1 indicates that each device shall use up the allocated local model training time for saving energy consumption. From Lemma 1, the optimal $\mathbf{f}$ is derived as

f_{k}=\frac{w\tau\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)}{T^{% \mathrm{loc}}}.

(19)

By inserting (19) into the constraints (18d) and (18k), we have

	$\displaystyle\frac{w\tau\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)% }{T^{\mathrm{loc}}}\leq f_{k}^{\mathrm{max}},\forall k,$		(20)
	$\displaystyle\frac{\varpi_{k}w^{3}\tau^{3}\left(D_{k}^{\mathrm{loc}}+D_{k}^{% \mathrm{gen}}\right)^{3}}{(T^{\mathrm{loc}})^{2}}+E_{k}^{\mathrm{up}}\leq E_{k% }^{\mathrm{max}},\forall k.$		(21)

In addition, from (3), it can be shown that the objective function $\triangle(\mathbf{D}^{\mathrm{gen}})$ is a monotonically increasing function of $\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)^{-\beta}$ . Thus, minimizing $\triangle(\mathbf{D}^{\mathrm{gen}})$ is equivalent to minimizing $\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}\right)^{-\beta}$ .

Thanks to Lemma 1, the number of optimization variables is reduced. After the above equivalent problem transformation, the problem (18) is rewritten as


$\displaystyle\min$	$\displaystyle\>\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}% \right)^{-\beta}$	(22a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c1},\eqref{eq:p1-c2},\eqref{eq:p1-c6}-\eqref{eq:p1% -c10},\eqref{eq:p1-c12}-\eqref{eq:p1-c16},\eqref{eq:new-p1-c3},\eqref{eq:new-p% 1-c11},$
$\displaystyle\mathrm{o.v.}$	$\displaystyle\>\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},% \boldsymbol{\pi}.$	(22b)

The problem (22) is still a MINLP problem, and its optimal solution is hard to obtain. Therefore, we develop an efficient low-complexity algorithm based on the BCD method to achieve a local optimal solution. The overall flowchart of solving the problem (18) is shown in Fig. 2. Specifically, the problem (22) is decoupled into two subproblems. One subproblem on the left-hand-side optimizes $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ with given $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ as shown in the left part of Fig. 2, and the other one on the right-hand-side optimizes $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ with given $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ as shown in the right part of Fig. 2. The two subproblems are solved iteratively until the objective function value in (22a) converges. We will optimally solve the two subproblems in the next two sections.

IV Efficient Algorithm for Optimally Optimizing $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$

In this subsection, the subproblem of optimizing $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ with given $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ (the left part of Fig. 2) is investigated as


$\displaystyle\min$	$\displaystyle\>\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}% \right)^{-\beta}$	(23a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c1},\eqref{eq:p1-c2},\eqref{eq:p1-c6},\eqref{eq:p1% -c8},\eqref{eq:p1-c10},\eqref{eq:p1-c13},\eqref{eq:p1-c16},\eqref{eq:new-p1-c3% },\eqref{eq:new-p1-c11},$
$\displaystyle\mathrm{o.v.}$	$\displaystyle\>\mathbf{D}^{\mathrm{gen}},\mathbf{T}.$	(23b)

In what follows, we present important properties of the problem (23).

Lemma 2.

The optimal solution to the problem (23) satisfies the constraint (18c) with strict equality.

Proof:

See Appendix -B. ∎

Lemma 2 indicates that the total allocated time for the whole WFL procedure shall use up the maximum latency for saving the energy consumption of the devices. From Lemma 2, we get

T^{\mathrm{syn}}+T^{\mathrm{down}}+N\left(T^{\mathrm{br}}+T^{\mathrm{loc}}+T^{% \mathrm{up}}\right)=T^{\mathrm{max}}.

(24)

Lemma 3.

The optimal solution to the problem (23) satisfies the constraint (18f) with strict equality for a given $j=\arg\max_{k}\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{\mathrm{down}}}$ .

Proof:

See Appendix -C. ∎

Lemma 3 indicates that the synthetic data transmission time is determined by the device with the maximum $\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{\mathrm{down}}}.$

Lemma 4.

The optimal solution to the problem (23) satisfies the constraint (18h) with strict equality.

Proof:

It can be proved similar to Lemma 3, and is thus omitted here for brevity. ∎

Lemma 4 indicates that the BS shall use up the allocated global model downloading time for saving the energy consumption of the devices.

Lemma 5.

The optimal solution to the problem (23) satisfies the constraint (18j) with strict equality for a given $j=\min_{k}R_{k}^{\mathrm{up}}.$

Proof:

It can be proved similar to Lemma 3, and is thus omitted here for brevity. ∎

Lemma 5 indicates that only the device with the minimum local model uploading rate uses up the allocated local model uploading time, while the other devices with higher local model uploading rate shall wait until the end of the local model uploading time even though they finish the local model uploading early.

Theorem 1.

The optimal $\mathbf{T}$ for the problem (23) given $\mathbf{D}^{\mathrm{gen}}$ is given by

$\displaystyle T^{\mathrm{down}}=$	$\displaystyle\Gamma\max_{k}\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{\mathrm{down}}},$	(25)
$\displaystyle T^{\mathrm{br}}=$	$\displaystyle\frac{D^{\mathrm{mod}}}{B\log_{2}\left(1+\frac{h_{1}P}{\sigma^{2}% B}\right)},$	(26)
$\displaystyle T^{\mathrm{up}}=$	$\displaystyle\frac{D^{\mathrm{mod}}}{\min_{k}R_{k}^{\mathrm{up}}},$	(27)
$\displaystyle T^{\mathrm{loc}}=$	$\displaystyle\check{T}^{\mathrm{loc}}-\frac{\varrho\sum_{k=1}^{K}D_{k}^{% \mathrm{gen}}}{N}-\frac{\Gamma}{N}\max_{k}\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{% \mathrm{down}}},$	(28)

where

\check{T}^{\mathrm{loc}}=\frac{T^{\mathrm{max}}}{N}-\frac{D^{\mathrm{mod}}}{% \min_{k}R_{k}^{\mathrm{up}}}-\frac{D^{\mathrm{mod}}}{B\log_{2}\left(1+\frac{h_% {1}P}{\sigma^{2}B}\right)}.

(29)

Proof:

The optimal $\mathbf{T}$ can be easily derived based on Lemmas 2-5. Thus the details are omitted here for brevity. ∎

Using Theorem 1, the problem (23) is simplified as


$\displaystyle\min_{\mathbf{D}^{\mathrm{gen}}}$	$\displaystyle\>\sum_{k=1}^{K}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen}}% \right)^{-\beta}$	(30a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>D_{k}^{\mathrm{gen}}+\frac{\Gamma f_{k}^{\mathrm{max}}}{Nw\tau}% \max_{j}\frac{D_{j}^{\mathrm{gen}}}{R_{j}^{\mathrm{down}}}+\frac{f_{k}^{% \mathrm{max}}\varrho\sum_{k=1}^{K}D_{k}^{\mathrm{gen}}}{Nw\tau}$
	$\displaystyle\>\leq\frac{f_{k}^{\mathrm{max}}\check{T}^{\mathrm{loc}}}{w\tau}-% D_{k}^{\mathrm{loc}},\forall k,$	(30b)
	$\displaystyle\>\sqrt{\!\frac{\varpi_{k}w^{3}\tau^{3}\left(D_{k}^{\mathrm{loc}}% +D_{k}^{\mathrm{gen}}\right)^{3}}{E_{k}^{\mathrm{max}}-E_{k}^{\mathrm{up}}}}+% \frac{\Gamma}{N}\max_{j}\frac{D_{j}^{\mathrm{gen}}}{R_{j}^{\mathrm{down}}}$
	$\displaystyle\>+\frac{\varrho\sum_{k=1}^{K}D_{k}^{\mathrm{gen}}}{N}\leq\!% \check{T}^{\mathrm{loc}},\forall k,$	(30c)
	$\displaystyle\>\eqref{eq:p1-c1},\eqref{eq:p1-c13}.$

It can be verified that the objective function in (30a) is a convex function of $\mathbf{D}^{\mathrm{gen}}$ and all the constraints are convex or linear with respect to $\mathbf{D}^{\mathrm{gen}}$ . Thus, the problem (30) is a convex optimization problem and can be optimally solved via CVX [28].

The proposed algorithm to optimally optimize $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ is summarized in Algorithm 1. Since the complexity of solving the problem (30) is $\mathcal{O}(K^{3})$ [29], the complexity of Algorithm 1 is $\mathcal{O}(K^{3}).$

1: Solve the problem (30) using CVX to obtain

\mathbf{D}^{\mathrm{gen}}

2: Obtain the closed-form expressions for

T^{\mathrm{down}},

T^{\mathrm{br}},

T^{\mathrm{up}},

and

T^{\mathrm{loc}}

from (25), (26), (27), and (28), respectively.

Algorithm 1 Proposed algorithm to optimally optimize

\mathbf{D}^{\mathrm{gen}},\mathbf{T}

based on the convex optimization.

V Efficient Algorithm for Optimally Optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$

In this section, the subproblem of optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ with given $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ (the right part of Fig. 2) is investigated. It can be shown that the constraints related to $\mathbf{p}$ and the constraints related to $\mathbf{q},\boldsymbol{\pi}$ are different. Therefore, the subproblem of optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ can be decoupled into two problems, given by

	$\displaystyle\mathrm{Find}$	$\displaystyle\>\mathbf{p}$		(31)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c6},\eqref{eq:p1-c7},\eqref{eq:p1-c14},$

and

	$\displaystyle\mathrm{Find}$	$\displaystyle\>\mathbf{q},\boldsymbol{\pi}$		(32)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c9},\eqref{eq:p1-c10},\eqref{eq:p1-c12},\eqref{eq:% p1-c15},\eqref{eq:new-p1-c11}.$

Both the problems in (31) and (32) try to find feasible solutions, since the objective function in the original problem (22) does not depend on the optimization variables $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ . However, in order to find feasible solutions which can lead to more favorable results such that the subproblem (23) has higher objective function value, the objective functions in the problems (31) and (32) are modified respectively as

	$\displaystyle\max_{\mathbf{p}}$	$\displaystyle\>\min_{k}\frac{R_{k}^{\mathrm{down}}}{D_{k}^{\mathrm{gen}}}$		(33)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c6},\eqref{eq:p1-c7},\eqref{eq:p1-c14},$

and

	$\displaystyle\max_{\mathbf{q},\boldsymbol{\pi}}$	$\displaystyle\>\min_{k}R_{k}^{\mathrm{up}}$		(34)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c9},\eqref{eq:p1-c10},\eqref{eq:p1-c12},\eqref{eq:% p1-c15},\eqref{eq:new-p1-c11}.$

It is shown that by selecting the objective function as expressed in (33), the constraint (18f) can be more relaxed in the subproblem (23), which can lead to higher objective function value. It is also shown that by selecting the objective function as expressed in (34), the constraint (18j) can be more relaxed in the subproblem (23), which can also lead to higher objective function value. In what follows, we first solve the problem (33), and then solve the problem (34).

V-A Proposed algorithm for optimally optimizing $\mathbf{p}$

By introducing an auxiliary variable $\eta=\min_{k}\frac{R_{k}^{\mathrm{down}}}{D_{k}^{\mathrm{gen}}}$ , the problem (33) can be reformulated as


$\displaystyle\max_{\mathbf{p},\eta}$	$\displaystyle\>\eta$	(35a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>R_{k}^{\mathrm{down}}\geq\eta D_{k}^{\mathrm{gen}},\forall k,$	(35b)
	$\displaystyle\>\eqref{eq:p1-c6},\eqref{eq:p1-c7},\eqref{eq:p1-c14}.$

It is noted that the constraint (18f) is inactive as long as the problem is feasible, i.e., $\eta\geq\frac{\Gamma}{T^{\mathrm{down}}}.$ Next, we present an important property of the optimal solution to the problem (35).

Lemma 6.

The optimal solution to the problem (35) satisfies the constraint (35b) with strict equality.

Proof:

See Appendix -D ∎

From Lemma 6, the optimal $\mathbf{p}$ given $\eta$ satisfies the following equalities

B\log_{2}\left(1+\frac{h_{k}p_{k}}{\sigma^{2}B+h_{k}\sum_{j>k}p_{j}}\right)=% \eta D_{k}^{\mathrm{gen}},\forall k,

(36)

Theorem 2.

The optimal $p_{k},k=1,\ldots,K$ for the problem (35) given $\eta$ can be sequentially determined from $p_{K}$ to $p_{1}$ according to

p_{k}=\left(2^{\frac{\eta D_{k}^{\mathrm{gen}}}{B}}-1\right)\left(\frac{\sigma% ^{2}B}{h_{k}}+\sum_{j>k}p_{j}\right),\forall k.

(37)

Proof:

This theorem is a direct result from Lemma 6. By rewriting (36), we can have (37), where the optimal $p_{k}$ is determined by $p_{j},j>k$ . This completes the proof. ∎

After obtaining the optimal $\mathbf{p}$ , what remains is optimizing $\eta$ in the problem (35). From (37), it shows that $p_{k}$ is an increasing function of $\eta.$ Therefore, if $\eta$ is larger than the optimal $\eta,$ the constraint (18g) will be violated. Accordingly, the optimal $\eta$ for the problem (35) can be derived by a bisection search of $\eta$ , where $\eta$ is increased if the constraint (18g) with given $\eta$ is obeyed, and is decreased if the constraint (18g) with given $\eta$ is violated.

The proposed algorithm to optimize $\mathbf{p}$ is summarized in Algorithm 2. Since the bisection search method converges in finite number of iterations which is independent of the number of devices [30], the complexity of Algorithm 2 is merely linear in the number of devices, i.e., $\mathcal{O}(K).$

1: Initialize

\eta_{min}

and

\eta_{max}

2: repeat

\eta=\frac{\eta_{min}+\eta_{max}}{2}.

4: for all

k=K

1

5: Obtain

p_{k}

from (37).

6: end for

7: if the constraint (18g) is obeyed then

\eta_{min}=\eta.

9: else

10:

\eta_{max}=\eta.

11: end if

12: until

\eta

converges.

Algorithm 2 Proposed algorithm to obtain the optimal

\mathbf{p}

V-B Proposed algorithm for optimally optimizing $\mathbf{q},\boldsymbol{\pi}$

By integrating the constraints (18i), (18o) and (21), it follows that

0\leq q_{k}\leq q_{k}^{\mathrm{max}},\forall k,

(38)

where

q_{k}^{\mathrm{max}}=\min\left(Q_{k},\frac{E_{k}^{\mathrm{max}}}{T^{\mathrm{up% }}}-\frac{\varpi_{k}w^{3}\tau^{3}\left(D_{k}^{\mathrm{loc}}+D_{k}^{\mathrm{gen% }}\right)^{3}}{(T^{\mathrm{loc}})^{2}T^{\mathrm{up}}}\right).

(39)

Furthermore, it shows that the constraint (18j) is redundant given the objective function formulated in (34), as long as the problem (34) is feasible. Thus, the problem (34) is rewritten as

	$\displaystyle\max_{\mathbf{q},\boldsymbol{\pi}}$	$\displaystyle\>\min_{k}R_{k}^{\mathrm{up}}$		(40)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p1-c12},\eqref{eq:p15-c1}.$

By defining $\mathbf{q}^{{}^{\prime}}=\{q_{k}^{{}^{\prime}},k=1,\ldots,K\}$ , where $q_{k}^{{}^{\prime}}=\frac{q_{k}}{q_{k}^{\mathrm{max}}}$ , the problem (40) can be reformulated as


$\displaystyle\max_{\mathbf{q}^{{}^{\prime}},\boldsymbol{\pi}}$	$\displaystyle\>\min_{k}R_{k}^{\mathrm{up}}$	(41a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>0\leq q_{k}^{{}^{\prime}}\leq 1,\forall k,$	(41b)
	$\displaystyle\>\eqref{eq:p1-c12},$

where $R_{k}^{\mathrm{up}}$ is rewritten as

R_{k}^{\mathrm{up}}=B\log_{2}\left(1+\frac{g_{k}q_{k}^{\mathrm{max}}q_{k}^{{}^% {\prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{k}}g_{j}q_{j}^{\mathrm{max}}q_{j% }^{{}^{\prime}}}\right).

(42)

By introducing an auxiliary variable $\theta=\min_{k}R_{k}^{\mathrm{up}}$ , the problem (41) can be reformulated as


$\displaystyle\max_{\mathbf{q}^{{}^{\prime}},\boldsymbol{\pi},\theta}$	$\displaystyle\>\theta$	(43a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>R_{k}^{\mathrm{up}}\geq\theta,\forall k,$	(43b)
	$\displaystyle\>\eqref{eq:p1-c12},\eqref{eq:p16-c1}.$

Due to the discrete constraint (18l), it is intractable to find the optimal solution to the problem (43a) by standard optimization methods. In what follows, we derive the optimal SIC decoding order $\boldsymbol{\pi}$ by exploring the problem structure.

Theorem 3.

The optimal $\boldsymbol{\pi}$ for the problem (43) is in the descending order of $g_{k}q_{k}^{\mathrm{max}}.$

Proof:

See Appendix -E. ∎

From Theorem 3, the optimal decoding order $\boldsymbol{\pi}$ for the problem (43) can be obtained, and the problem (43) is simplified as

	$\displaystyle\max_{\mathbf{q}^{{}^{\prime}},\theta}$	$\displaystyle\>\theta$		(44)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p16-c1},\eqref{eq:p17-c1}.$

Since the above problem given $\theta$ is feasible only when $\theta$ is smaller than or equal to the optimal $\theta$ , the optimal $\theta$ can be obtained by a simple bisection search of $\theta$ , where in each search, the problem of optimizing $\mathbf{q}^{{}^{\prime}}$ given $\theta$ is investigated as

	find	$\displaystyle\>\mathbf{q}^{{}^{\prime}}$		(45)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\>\eqref{eq:p16-c1},\eqref{eq:p17-c1}.$

Lemma 7.

A feasible solution to the problem (45) satisfies the constraint (43b) with strict equality.

Proof:

See Appendix -F. ∎

From Lemma 7, the feasible solution to the problem (45) satisfies the following equalities

\displaystyle B\log_{2}\left(1+\frac{g_{k}q_{k}^{\mathrm{max}}q_{k}^{{}^{% \prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{k}}g_{j}q_{j}^{\mathrm{max}}q_{j}% ^{{}^{\prime}}}\right)=\theta,\forall k.

(46)

Theorem 4.

The feasible $q_{k}^{{}^{\prime}},k=1,\ldots,K$ for the problem (45) can be recursively obtained from the last decoding device to the first decoding device according to

q_{k}^{{}^{\prime}}=\frac{2^{\frac{\theta}{B}}-1}{g_{k}q_{k}^{\mathrm{max}}}% \left(\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{k}}g_{j}q_{j}^{\mathrm{max}}q_{j}^{{}% ^{\prime}}\right),\forall k.

(47)

Proof:

This theorem is a direct result from Lemma 7. After rewritting (46), we can have (47), which shows that $q_{k}^{{}^{\prime}}$ depends only on $q_{j}^{{}^{\prime}},\pi_{j}>\pi_{k}$ . This completes the proof. ∎

After $\mathbf{q}^{{}^{\prime}}$ has been obtained from (47), we can check whether the constraint (41b) is satisfied by the obtained $\mathbf{q}^{{}^{\prime}}$ . Specifically, the problem (45) with the given $\theta$ is infeasible if the constraint (41b) is not satisfied, and is feasible otherwise.

The proposed algorithm to optimally optimize $\mathbf{q},\boldsymbol{\pi}$ is summarized in Algorithm 3. Since the optimal decoding order is derived in closed form and the convergence of the bisection search method does not depend on the number of devices [30], the complexity of Algorithm 3 is only linear in the number of devices, i.e., $\mathcal{O}(K).$

1: Obtain the optimal decoding order

\boldsymbol{\pi}

in the descending order of

g_{k}q_{k}^{\mathrm{max}}.

2: repeat

\theta=\frac{\theta_{min}+\theta_{max}}{2}.

4: Obtain

\mathbf{q}^{{}^{\prime}}

recursively from the last decoding device to the first decoding device from (47).

5: if the constraint (41b) is obeyed then

\theta_{min}=\theta.

7: else

\theta_{max}=\theta.

9: end if

10: until

\theta

converges.

11: Obtain

q_{k}=q_{k}^{\mathrm{max}}q_{k}^{{}^{\prime}},k=1,\ldots,K.

Algorithm 3 Proposed algorithm to obtain the optimal

\mathbf{q},\boldsymbol{\pi}

V-C Convergence and Complexity Analysis of the Overall Proposed Algorithm

1: Initialize

\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},\boldsymbol{\pi}

2: repeat

3: Optimizes

\mathbf{D}^{\mathrm{gen}},\mathbf{T}

with given

\mathbf{p},\mathbf{q},\boldsymbol{\pi}

using Algorithm 1.

4: Optimizes

\mathbf{p}

with given

\mathbf{D}^{\mathrm{gen}},\mathbf{T}

using Algorithm 2.

5: Optimizes

\mathbf{q},\boldsymbol{\pi}

with given

\mathbf{D}^{\mathrm{gen}},\mathbf{T}

using Algorithm 3.

6: until the objective function value in (22a) converges.

7: Obtain

\mathbf{f}

from (19).

Algorithm 4 Overall proposed algorithm for solving the problem (18) based on the BCD method.

The overall proposed algorithm for solving the problem (18) is summarized in Algorithm 4. In this subsection, we provide convergence and complexity analysis of the overall proposed algorithm.

The convergence of the proposed algorithm is affected by the BCD method used for iteratively solving the problem (22), i.e., the algorithm 1 for optimizing $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ and the algorithms 2 and 3 for optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ are iteratively performed until convergence. The following proposition presents the convergence analysis of the adopted BCD method.

Proposition 1.

The BCD method in Algorithm 4 used for iteratively solving the problem (22) converges to a local optimal solution the problem (22).

Proof:

See Appendix -G. ∎

The above Proposition has shown the convergence of the proposed algorithm to a locally optimal solution. Then, we provide the complexity analysis. Note that the BCD method converges in finite number of iterations which is independent of the number of devices. Thus, based on the complexity analysis in Section IV, Section V-A and Section V-B, the total complexity of Algorithm 4 is $\mathcal{O}(K^{3}+2K)$ , which is only polynomial in the number of devices.

VI Simulation Results

This section provides illustrative simulation results to demonstrate the effectiveness of the proposed NOMA+AIGC-enhanced WFL scheme. Unless otherwise noted, the simulation parameters are set as follows. The setting of the AIGC server is similar to [26], where eight RTX A5000 GPUs are equipped by the AIGC server and synthesizing one data sample requires approximately $0.0646$ s, i.e., $\varrho=0.0646$ . The number of devices is $K=15$ , where the devices are randomly distributed around the BS within the distance range $[150,300]$ m. The wireless channels are assumed to follow Rayleigh fading, where the channel gains $h_{k}$ and $g_{k}$ are modeled as $h_{k}=\hat{h}_{k}\tilde{h}_{k}$ and $g_{k}=\hat{g}_{k}\tilde{g}_{k}$ , respectively. Specifically, the $\hat{h}_{k}$ (or $\hat{g}_{k}$ ) is the mean value of $h_{k}$ (or $g_{k}$ ) and is modeled as $128.1+37.6\log_{10}(d)$ in dB [31], where $d$ is the distance in km. The $\tilde{h}_{k}$ (or $\tilde{g}_{k})$ is an exponentially distributed random variable with unit mean. The $D_{k}^{\mathrm{loc}}$ , $D^{\mathrm{gen}}$ , $D^{\mathrm{mod}},$ $w,$ $f_{k}^{\mathrm{max}}$ are assumed to be uniformly distributed within $[300,500]$ samples, $[3000,5000]$ samples, $[1.5,2.5]$ Mbits, $[1,2]\times 10^{6}$ cycles, and $[1,2]$ GHz, respectively. In addition, we set $B=1$ MHz, $\sigma^{2}=-160$ dBm/Hz, $N=100,$ $T^{\mathrm{max}}=900$ s, $\tau=1$ , $\zeta=50,$ $\alpha=3.819,$ $\beta=0.198,$ $\gamma=0.231$ [26], $P=35$ dBm, $Q_{k}=20$ dBm, $\varpi=10^{-27}$ [32], $\Gamma=20$ Kbits, and $E_{k}^{max}=1.2$ Joule.

For the purpose of comparison, the following four schemes developed in existing literature or coined for benchmarking are considered:

•

FDMA+AIGC: In this scheme, FDMA is used for both synthetic data downloading and local model uploading as in [26]. Note that since [26] did not consider the synthetic data downloading phase, we modify the algorithm developed in [26] to suit our considered model.
•

TDMA+AIGC: In this scheme, TDMA is used for both synthetic data downloading and local model uploading.
•

NOMA-w/o-AIGC: In this scheme, synthetic data are not transmitted to the devices, and the proposed NOMA scheme is used for the local model uploading.
•

FDMA-w/o-AIGC: In this scheme, synthetic data are not transmitted to the devices, and FDMA is used for local model uploading as in [26].

Fig. 3 illustrates the impact of the maximum transmit power of the BS $P$ on the learning performance. It shows that the learning performance of the proposed NOMA+AIGC scheme outperforms all the other schemes including FDMA+AIGC. This indicates that our proposed scheme is more effective in improving the WFL performance. Meanwhile, it shows that the learning performance with AIGC improves a lot compared to that without AIGC. This is because by downloading synthetic data from the server, more training data are available for local training to improve the training performance. It is also shown that TDMA+AIGC even underperforms the NOMA and the FDMA schemes without AIGC. This is due to the fact that the devices that are unscheduled for synthetic data downloading in TDMA shall wait and this significantly lowers the time resource utilization and makes the maximum latency constraint vulnerable.

As $P$ increases, Fig. 3 shows that the learning performance improves. The reason for this is that a higher $P$ can let the server transmit more synthetic data to the devices to improve the learning performance for the schemes with AIGC. Meanwhile, for the schemes without AIGC, a higher $P$ can let the BS broadcast the global model to the devices with less time, and thus it is easier to satisfy the maximum latency constraint and make the investigated problem feasible, since we set the learning error to one when the problem is infeasible. When $P$ is very large, it shows that the learning performance saturates. This is since the system will be restricted by the other factors such as the energy consumption constraint and the latency constraint, and no more synthetic data can be transmitted to the devices if $P$ is very high.

Fig. 4 illustrates the impact of the total amount of synthetic data $D^{\mathrm{gen}}$ on the learning performance. It shows that the learning performance of the schemes without AIGC remains unchanged as $D^{\mathrm{gen}}$ increases. This is since the schemes without AIGC do not use synthetic data for local training. It also shows that the schemes with AIGC achieve better learning performance as $D^{\mathrm{gen}}$ increases. This is because higher amount of synthetic data available at the server can let the server transmit more synthetic data to the devices. Whereas when $D^{\mathrm{gen}}$ is very large, a larger $D^{\mathrm{gen}}$ cannot improve the learning performance further. This is since the energy consumption constraint and the latency constraint may restrict the learning performance and the synthetic data received by the devices cannot be increased even if the server has more synthetic data available. Furthermore, it shows that the learning performance of the proposed NOMA+AIGC scheme is the best among all the schemes and the performance improvement is more obvious when $D^{\mathrm{gen}}$ is larger. This means that the proposed scheme is more effective in utilizing the synthetic data if more synthetic data are available at the server for improving the learning performance.

Fig. 5 illustrates the impact of the maximum latency constraint $T^{\mathrm{max}}$ on the learning performance. It shows that all the schemes can achieve better learning performance as $T^{\mathrm{max}}$ increases. This is because a larger value of $T^{\mathrm{max}}$ can let the server transmit more synthetic data to the devices for better learning performance of the schemes with AIGC, while the maximum latency constraint is easier to be satisfied for improving the learning performance of the schemes without AIGC. It also shows that when $T^{\mathrm{max}}$ is very large, the schemes with/without AIGC converge to the same learning performance. This is since a very large $T^{\mathrm{max}}$ can let the server transmit as much data as possible under other constraints such as the energy consumption constraint, and the efficiencies of different multiple access schemes will have no impact on the learning performance. There is still a gap between the schemes with and without AIGC, since the schemes without AIGC are only easier to satisfy the problem constraints when $T^{\mathrm{max}}$ is larger, and the learning performance will remain the same if the problem is feasible.

In addition, Fig. 5 shows that the performance improvement of the proposed scheme compared to FDMA+AIGC is more obvious when $T^{\mathrm{max}}$ is smaller. This is due to the fact that the proposed scheme can utilize the time resource more efficiently than FDMA+AIGC. It also shows that the performance improvement of TDMA+AIGC with the increases of $T^{\mathrm{max}}$ is more obvious than the other schemes. This is due to the fact that the time resource utilization efficiency of TDMA+AIGC is low and a larger $T^{\mathrm{max}}$ can efficiently offset the impact of low time resource utilization efficiency on the learning performance.

Fig. 6 illustrates the impact of the model data size $D^{\mathrm{mod}}$ on the learning performance. It shows that the learning performance degrades as $D^{\mathrm{mod}}$ increases. This is because a larger $D^{\mathrm{mod}}$ requires more time for local model uploading and global model broadcasting, which leads to less time for synthetic data downloading and may also render the problem to be infeasible. It also shows that TDMA+AIGC quickly saturates to the worst learning performance. This is since TDMA+AIGC has the lowest time resource utilization efficiency, and a larger $D^{\mathrm{mod}}$ will make the required time for local model uploading and global model broadcasting much larger. Moreover, it shows that the proposed NOMA+AIGC scheme achieves the highest learning performance and the performance degradation due to the increase of $D^{\mathrm{mod}}$ is the least among all the schemes. This means that the proposed scheme can minimize the impact of the increased time needed for local model uploading and global model broadcasting to guarantee the learning performance.

Fig. 7 illustrates the impact of the energy budget $E_{k}^{\mathrm{max}}$ on the learning performance. It shows that the learning performance improves with the increase of $E_{k}^{\mathrm{max}}$ . This is because higher energy budget can let the device train with more local data for better learning performance, and can also let the device finish the local model uploading with less time consumption to satisfy the maximum latency constraint. It also shows that the learning performance saturates with the increase of $E_{k}^{\mathrm{max}}$ when $E_{k}^{\mathrm{max}}$ is very large. This is since when $E_{k}^{\mathrm{max}}$ is very large, the other constraints such as the transmit power constraint and the maximum latency constraint restrict the amount of synthetic data that can be downloaded from the server and become the main bottleneck of the learning performance.

Furthermore, Fig. 7 shows that as $E_{k}^{\mathrm{max}}$ increases, the learning performance of the schemes without AIGC saturates earlier than the schemes with AIGC except TDMA+AIGC. The reasons for this are explained as follows. A larger $E_{k}^{\mathrm{max}}$ can let the devices upload the local model to the server more quickly or train the local model in time such that the constraints of the problem for the schemes without AIGC can be satisfied more easily. However, since the training data cannot be increased, the learning performance of the schemes without AIGC is capped. While for the schemes with AIGC, besides the benefit mentioned above, more synthetic data can be downloaded from the server for local training with a higher $E_{k}^{\mathrm{max}}$ such that the learning performance can be further improved. It also shows that the learning performance improvement of the proposed NOMA+AIGC scheme compared to other schemes is obvious and does not change as $E_{k}^{\mathrm{max}}$ varies.

Fig. 8 illustrates the impact of the number of devices $K$ on the learning performance. It shows that the learning performance degrades as $K$ increases. This is due to the fact that under the maximum latency constraint, more devices lead to a higher probability of violating this constraint, since the global model broadcasting rate decreases as $K$ increases and the required time increases. Besides, more devices also lead to less synthetic data available for each device, which will degrade the learning performance, since the objective function in (22a) is a convex function of the amount of the synthetic data. It also shows that the impact of $K$ on the learning performance of TDMA+AIGC is the severest. This is because less time resource can be allocated to each device for a larger $K$ by TDMA+AIGC, and thus the constraints of the problem are much harder to be satisfied. The proposed NOMA+AIGC scheme is shown to achieve the best learning performance, and such performance improvement is still impressive when $K$ is large.

VII Conclusions

In this paper, AIGC and NOMA are jointly adopted to enhance the WFL performance. The synthetic data distribution, two-way communication and computation resource allocation are jointly optimized to minimize the global learning error, under various system constraints. Specifically, an efficient low-complexity local optimal solution to the problem with partial closed-form results is proposed based on the BCD method and the analytical method. Extensive simulation results verify the superiority of the proposed NOMA+AIGC-enhanced scheme compared to the existing and benchmark schemes such as FDMA or TDMA+AIGC-enhanced schemes, under various system configurations. Our results have demonstrated the effectiveness of jointly combining NOMA and AIGC to enhance the WFL performance.

-A Proof of Lemma 1

Suppose that the optimal computing frequency allocation is $\mathbf{f}^{*}=\{f_{k}^{*},k=1,\ldots,K\},$ where the constraint (18e) is satisfied with strict inequality by $f_{j}^{*}$ for a given $j,$ i.e., $f_{j}^{*}>\frac{w\tau\left(D_{j}^{\mathrm{loc}}+D_{j}^{\mathrm{gen}}\right)}{T% ^{\mathrm{loc}}}.$ Then, we consider another computing frequency allocation $\mathbf{f}^{\star}=\{f_{k}^{\star},k=1,\ldots,K\}$ with $f_{k}^{\star}=f_{k}^{*},k\neq j$ and $f_{j}^{\star}=\frac{w\tau\left(D_{j}^{\mathrm{loc}}+D_{j}^{\mathrm{gen}}\right% )}{T^{\mathrm{loc}}}$ . It is clear that $f_{j}^{\star}<f_{j}^{*}$ and $T_{j}^{\mathrm{loc}}=T^{\mathrm{loc}}$ with $f_{j}=f_{j}^{\star}.$ Since $\mathbf{f}$ is only related with the constraints (18d), (18e), (18k), and (18q) in the problem (18), it can be easily verified that $\mathbf{f}^{\star}$ is a feasible solution to the problem (18) and achieves the same objective function value as the optimal solution $\mathbf{f}^{*}$ . This means that the solution $\mathbf{f}^{\star}$ is also optimal. From (7) and (16), it follows that $E_{j}$ with $f_{j}=f_{j}^{\star}$ is smaller than that with $f_{j}=f_{j}^{*}$ . This means that $\mathbf{f}^{\star}$ has lower energy consumption than $\mathbf{f}^{*}$ . Thus, the optimal solution $\mathbf{f}^{\star}$ is more desirable than $\mathbf{f}^{*}$ . This completes the proof.

-B Proof of Lemma 2

Suppose that the optimal time allocation solution is $\hat{\mathbf{T}}=\{\hat{T}^{\mathrm{down}},\hat{T}^{\mathrm{br}},\hat{T}^{% \mathrm{loc}},\hat{T}^{\mathrm{up}}\},$ where $T^{\mathrm{syn}}+\hat{T}^{\mathrm{down}}+N\left(\hat{T}^{\mathrm{br}}+\hat{T}^% {\mathrm{loc}}+\hat{T}^{\mathrm{up}}\right)<T^{\mathrm{max}}.$ Then, we construct another time allocation solution $\tilde{\mathbf{T}}=\{\tilde{T}^{\mathrm{down}},\tilde{T}^{\mathrm{br}},\tilde{% T}^{\mathrm{loc}},\tilde{T}^{\mathrm{up}}\}$ with $\tilde{\mathbf{T}}=\hat{\mathbf{T}}$ except that $\tilde{T}^{\mathrm{loc}}=\frac{T^{\mathrm{max}}-T^{\mathrm{syn}}-\hat{T}^{% \mathrm{down}}}{N}-\hat{T}^{\mathrm{br}}-\hat{T}^{\mathrm{up}}$ . It is shown that the (18c) is satisfied with equality by $\tilde{\mathbf{T}}$ and $\tilde{T}^{\mathrm{loc}}>\hat{T}^{\mathrm{loc}}.$ It can be shown that all the remaining constraints of the problem (23) are still satisfied by the solution $\tilde{\mathbf{T}},$ and the objective function value achieved by $\tilde{\mathbf{T}}$ is the same as the optimal solution $\hat{\mathbf{T}}.$ Thus, $\tilde{\mathbf{T}}$ is not only a feasible solution but also an optimal solution. From (21), it is shown that the energy consumption of each device achieved by the solution $\tilde{\mathbf{T}}$ is lower than that achieved by the solution $\hat{\mathbf{T}},$ which means that the optimal solution $\tilde{\mathbf{T}}$ is more desirable than the optimal solution $\hat{\mathbf{T}}.$ This completes the proof.

-C Proof of Lemma 3

Suppose that the optimal time allocation solution is $\hat{\mathbf{T}}=\{\hat{T}^{\mathrm{down}},\hat{T}^{\mathrm{br}},\hat{T}^{% \mathrm{loc}},\hat{T}^{\mathrm{up}}\},$ where $\hat{T}^{\mathrm{down}}R_{j}^{\mathrm{down}}>\Gamma D_{j}^{\mathrm{gen}}$ for $j=\arg\max_{k}\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{\mathrm{down}}}$ . This means that $\hat{T}^{\mathrm{down}}R_{k}^{\mathrm{down}}>\Gamma D_{k}^{\mathrm{gen}},% \forall k.$ Then, we construct another solution $\tilde{\mathbf{T}}=\{\tilde{T}^{\mathrm{down}},\tilde{T}^{\mathrm{br}},\tilde{% T}^{\mathrm{loc}},\tilde{T}^{\mathrm{up}}\}$ with $\tilde{\mathbf{T}}=\hat{\mathbf{T}}$ except that $\tilde{T}^{\mathrm{down}}=\Gamma\max_{k}\frac{D_{k}^{\mathrm{gen}}}{R_{k}^{% \mathrm{down}}},\tilde{T}^{\mathrm{loc}}=\hat{T}^{\mathrm{loc}}+\hat{T}^{% \mathrm{down}}-\tilde{T}^{\mathrm{down}}.$ This means that $\tilde{T}^{\mathrm{down}}R_{j}^{\mathrm{down}}=\Gamma D_{j}^{\mathrm{gen}},% \tilde{T}^{\mathrm{down}}R_{k}^{\mathrm{down}}>\Gamma D_{k}^{\mathrm{gen}},k\neq j$ , $\tilde{T}^{\mathrm{down}}<\hat{T}^{\mathrm{down}}$ and $\tilde{T}^{\mathrm{loc}}>\hat{T}^{\mathrm{loc}}.$ It can be verified that $\tilde{\mathbf{T}}$ satisfies all the constraints of the problem (23), and achieves the same objective function value as the optimal solution $\hat{\mathbf{T}}$ . Thus, $\tilde{\mathbf{T}}$ is also an optimal solution. Since $\tilde{T}^{\mathrm{loc}}>\hat{T}^{\mathrm{loc}},$ it can be shown from (21) that the solution $\tilde{\mathbf{T}}$ achieves lower energy consumption compared to $\hat{\mathbf{T}}.$ This means that the optimal solution $\tilde{\mathbf{T}}$ is more desirable than the optimal solution $\hat{\mathbf{T}}.$ This completes the proof.

-D Proof of Lemma 6

Suppose that the optimal solution to the problem (35) is $\hat{\mathbf{p}}=\{\hat{p}_{k},k=1,\ldots,K\},\hat{\eta},$ where $B\log_{2}\left(1+\frac{h_{\hat{k}}\hat{p}_{\hat{k}}}{\sigma^{2}B+h_{\hat{k}}% \sum_{j>\hat{k}}\hat{p}_{j}}\right)>\eta D_{\hat{k}}^{\mathrm{gen}}$ for a given $\hat{k}$ , and $B\log_{2}\left(1+\frac{h_{k}\hat{p}_{k}}{\sigma^{2}B+h_{k}\sum_{j>k}\hat{p}_{j% }}\right)=\eta D_{k}^{\mathrm{gen}}$ for $k\neq\hat{k}$ . Then, we can consider another solution $\tilde{\mathbf{p}}=\{\tilde{p}_{k},k=1,\ldots,K\},\tilde{\eta}$ with $\tilde{\mathbf{p}}=\hat{\mathbf{p}},\tilde{\eta}=\hat{\eta}$ , except that $\tilde{p}_{k}$ is determined by $B\log_{2}\left(1+\frac{h_{k}\tilde{p}_{k}}{\sigma^{2}B+h_{k}\sum_{j>k}\tilde{p% }_{j}}\right)=\eta D_{k}^{\mathrm{gen}}$ . It is shown that $\tilde{p}_{k}<\hat{p}_{k}$ and thus all the constraints of the problem (35) are satisfied by the solution $\tilde{\mathbf{p}},\tilde{\eta}$ . Therefore, the solution $\tilde{\mathbf{p}},\tilde{\eta}$ is a feasible solution to the problem (35) and consumes less power than the optimal solution, which indicates that the solution $\tilde{\mathbf{p}},\tilde{\eta}$ is more desirable than the optimal solution $\hat{\mathbf{p}},\hat{\eta}.$ This completes the proof.

-E Proof of Theorem 3

We consider two devices $\hat{k}$ and $\tilde{k}$ with adjacent decoding orders and $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}<g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}$ . There are two possible decoding orders for the devices $\hat{k}$ and $\tilde{k}$ , i.e., the device $\hat{k}$ is decoded first or the device $\tilde{k}$ is decoded first. Let $I$ denote the interference caused by the devices $\hat{k}$ and $\tilde{k}$ to the devices whose decoding orders are smaller than the devices $\hat{k}$ and $\tilde{k}$ , and $I^{{}^{\prime}}$ denote the interference caused by the devices whose decoding orders are larger than the devices $\hat{k}$ and $\tilde{k}$ to the devices $\hat{k}$ and $\tilde{k}$ . To guarantee the performance of other devices, the interference caused by the devices $\hat{k}$ and $\tilde{k}$ is restricted as $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}q_{\hat{k}}^{{}^{\prime}}+g_{\tilde{k}}q_% {\tilde{k}}^{\mathrm{max}}q_{\tilde{k}}^{{}^{\prime}}\leq I.$ The problem of optimizing $q_{\hat{k}}^{{}^{\prime}}$ and $q_{\tilde{k}}^{{}^{\prime}}$ is formulated as


$\displaystyle\max_{q_{\hat{k}}^{{}^{\prime}},q_{\tilde{k}}^{{}^{\prime}},\theta}$	$\displaystyle\>\theta$	(48a)
$\displaystyle\mathrm{s.t.}$	$\displaystyle\>R_{k}^{\mathrm{up}}\geq\theta,k\in\{\hat{k},\tilde{k}\},$	(48b)
	$\displaystyle\>g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}q_{\hat{k}}^{{}^{\prime}}+% g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}q_{\tilde{k}}^{{}^{\prime}}\leq I,$	(48c)
	$\displaystyle\>0\leq q_{k}^{{}^{\prime}}\leq 1,k\in\{\hat{k},\tilde{k}\}.$	(48d)

Suppose that the device $\hat{k}$ is decoded first. Then, we have

$\displaystyle R_{\hat{k}}^{\mathrm{up}}$	$\displaystyle=B\log_{2}\left(1+\frac{g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}q_{% \hat{k}}^{{}^{\prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{\hat{k}}}g_{j}q_{j}% ^{\mathrm{max}}q_{j}^{{}^{\prime}}}\right)$
	$\displaystyle=B\log_{2}\left(\frac{\sigma^{2}B+g_{\hat{k}}q_{\hat{k}}^{\mathrm% {max}}q_{\hat{k}}^{{}^{\prime}}+g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}q_{% \tilde{k}}^{{}^{\prime}}+I^{{}^{\prime}}}{\sigma^{2}B+g_{\tilde{k}}q_{\tilde{k% }}^{\mathrm{max}}q_{\tilde{k}}^{{}^{\prime}}+I^{{}^{\prime}}}\right),$	(49)
$\displaystyle R_{\tilde{k}}^{\mathrm{up}}$	$\displaystyle=B\log_{2}\left(1+\frac{g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}% q_{\tilde{k}}^{{}^{\prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{\tilde{k}}}g_{% j}q_{j}^{\mathrm{max}}q_{j}^{{}^{\prime}}}\right)$
	$\displaystyle=B\log_{2}\left(\frac{\sigma^{2}B+g_{\tilde{k}}q_{\tilde{k}}^{% \mathrm{max}}q_{\tilde{k}}^{{}^{\prime}}+I^{{}^{\prime}}}{\sigma^{2}B+I^{{}^{% \prime}}}\right).$	(50)

The optimal solution to the problem (48) can satisfy the constraint (48b) with strict equality, otherwise $q_{\hat{k}}^{{}^{\prime}}$ oder $q_{\tilde{k}}^{{}^{\prime}}$ can be decreased to let the constraint (48b) be satisfied with equality to save energy consumption, or $\theta$ can be increased to achieve a higher objective function value. Thus, from $R_{k}^{\mathrm{up}}=\theta,k\in\{\hat{k},\tilde{k}\},$ we have

	$\displaystyle q_{\hat{k}}^{{}^{\prime}}=\frac{\left(2^{\frac{\theta}{B}}-1% \right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)2^{\frac{\theta}{B}}}{g_{\hat{k% }}q_{\hat{k}}^{\mathrm{max}}},$		(51)
	$\displaystyle q_{\tilde{k}}^{{}^{\prime}}=\frac{\left(2^{\frac{\theta}{B}}-1% \right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)}{g_{\tilde{k}}q_{\tilde{k}}^{% \mathrm{max}}}.$		(52)

By inserting (51) and (52) into the constraints (48c) and (48d), we have

	$\displaystyle\theta\leq\frac{B}{2}\log_{2}\left(1+\frac{I}{\sigma^{2}B+I^{{}^{% \prime}}}\right),$		(53)
	$\displaystyle g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}\geq\left(2^{\frac{\theta}{% B}}-1\right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)2^{\frac{\theta}{B}},$		(54)
	$\displaystyle g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}\geq\left(2^{\frac{% \theta}{B}}-1\right)\left(\sigma^{2}B+I^{{}^{\prime}}\right).$		(55)

The inequality (53) restricts the maximum $\theta$ that can be supported under the given interference constraint $I,$ while inequalities (54) and (55) also restrict the maximum $\theta$ that can be supported under the given $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}$ and $g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}.$ Since $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}<g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}},$ the inequality (54) is tighter than the inequality (55). Thus, the inequality (55) is dumb, and the maximum $\theta$ can be derived from (53) and (54) accordingly.

Then, assume that the device $\tilde{k}$ is decoded first, and we get

$\displaystyle R_{\hat{k}}^{\mathrm{up}}$	$\displaystyle=B\log_{2}\left(1+\frac{g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}q_{% \hat{k}}^{{}^{\prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{\hat{k}}}g_{j}q_{j}% ^{\mathrm{max}}q_{j}^{{}^{\prime}}}\right)$
	$\displaystyle=B\log_{2}\left(\frac{\sigma^{2}B+g_{\hat{k}}q_{\hat{k}}^{\mathrm% {max}}q_{\hat{k}}^{{}^{\prime}}+I^{{}^{\prime}}}{\sigma^{2}B+I^{{}^{\prime}}}% \right),$	(56)
$\displaystyle R_{\tilde{k}}^{\mathrm{up}}$	$\displaystyle=B\log_{2}\left(1+\frac{g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}% q_{\tilde{k}}^{{}^{\prime}}}{\sigma^{2}B+\sum_{j=1,\pi_{j}>\pi_{\tilde{k}}}g_{% j}q_{j}^{\mathrm{max}}q_{j}^{{}^{\prime}}}\right)$
	$\displaystyle=B\log_{2}\left(\frac{\sigma^{2}B+g_{\hat{k}}q_{\hat{k}}^{\mathrm% {max}}q_{\hat{k}}^{{}^{\prime}}+g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}q_{% \tilde{k}}^{{}^{\prime}}+I^{{}^{\prime}}}{\sigma^{2}B+g_{\hat{k}}q_{\hat{k}}^{% \mathrm{max}}q_{\hat{k}}^{{}^{\prime}}+I^{{}^{\prime}}}\right).$	(57)

Similarly, the optimal $q_{\hat{k}}^{{}^{\prime}}$ and $q_{\tilde{k}}^{{}^{\prime}}$ for the problem (48) when the device $\tilde{k}$ is decoded first shall let the constraint (48b) be satisfied with equality. Thus, from $R_{k}^{\mathrm{up}}=\theta,k\in\{\hat{k},\tilde{k}\},$ we get

	$\displaystyle q_{\hat{k}}^{{}^{\prime}}=\frac{\left(2^{\frac{\theta}{B}}-1% \right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)}{g_{\hat{k}}q_{\hat{k}}^{% \mathrm{max}}},$		(58)
	$\displaystyle q_{\tilde{k}}^{{}^{\prime}}=\frac{\left(2^{\frac{\theta}{B}}-1% \right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)2^{\frac{\theta}{B}}}{g_{\tilde% {k}}q_{\tilde{k}}^{\mathrm{max}}}.$		(59)

By substituting (58) and (59) into the constraints (48c) and (48d), we get

	$\displaystyle\>\theta\leq\frac{B}{2}\log_{2}\left(1+\frac{I}{\sigma^{2}B+I^{{}% ^{\prime}}}\right),$		(60)
	$\displaystyle\>g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}\geq\left(2^{\frac{\theta}% {B}}-1\right)\left(\sigma^{2}B+I^{{}^{\prime}}\right),$		(61)
	$\displaystyle\>g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}\geq\left(2^{\frac{% \theta}{B}}-1\right)\left(\sigma^{2}B+I^{{}^{\prime}}\right)2^{\frac{\theta}{B% }}.$		(62)

It is noted that the expressions (53) and (60) are the same. It is shown the constraint (61) is looser than the constraint (54), while the constraint (62) is also looser than the constraint (54) due to the fact that $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}<g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}$ . Thus, the maximum allowable $\theta$ under the constraints (60), (61) and (62) when the device $\tilde{k}$ is decoded first is equal to or larger than that when the device $\hat{k}$ is decoded first. Therefore, the optimal decoding order is decoding the device $\tilde{k}$ first.

In the above, we have proved that the optimal decoding order for the two devices $\hat{k}$ and $\tilde{k}$ with adjacent decoding orders and $g_{\hat{k}}q_{\hat{k}}^{\mathrm{max}}<g_{\tilde{k}}q_{\tilde{k}}^{\mathrm{max}}$ is decoding the device $\tilde{k}$ first. For multiple devices, since any two adjacent devices shall satisfy this decoding criterion, we can conclude that the optimal decoding order is in the descending order of $g_{k}q_{k}^{\mathrm{max}}.$ This completes the proof.

-F Proof of Lemma 7

Suppose that a feasible solution to the problem (35) satisfies $R_{\hat{k}}^{\mathrm{up}}>\theta$ for a given $\hat{k}$ . Then, we can easily decrease the value of $q_{k}^{{}^{\prime}}$ until $R_{\hat{k}}^{\mathrm{up}}=\theta$ such that all the constraints are still satisfied. Thus, the constraint (43b) can be satisfied with strict equality by the feasible solution to the problem (35). This completes the proof.

-G Proof of Proposition 1

Let $F(\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},\boldsymbol{\pi})$ denote the objective function in (22a). Then, $F(\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},\boldsymbol{\pi})$ in the $l$ -th iteration of the BCD method is given by

	$\displaystyle F(\mathbf{D}^{\mathrm{gen}}(l),\mathbf{T}(l),\mathbf{p}(l),% \mathbf{q}(l),\boldsymbol{\pi}(l))$
$\displaystyle\leq$	$\displaystyle F(\mathbf{D}^{\mathrm{gen}}(l-1),\mathbf{T}(l-1),\mathbf{p}(l),% \mathbf{q}(l),\boldsymbol{\pi}(l))$
$\displaystyle\leq$	$\displaystyle F(\mathbf{D}^{\mathrm{gen}}(l-1),\mathbf{T}(l-1),\mathbf{p}(l-1)% ,\mathbf{q}(l-1),\boldsymbol{\pi}(l-1)).$	(63)

The first inequality in (63) holds because we optimally solve the problem of optimizing $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$ with given $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ , and $\mathbf{D}^{\mathrm{gen}}(l),\mathbf{T}(l)$ optimally maximizes the objective function as compared to $\mathbf{D}^{\mathrm{gen}}(l-1),\mathbf{T}(l-1)$ . The second inequality in (63) holds because we optimally solve the problem of optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$ with given $\mathbf{D}^{\mathrm{gen}},\mathbf{T},$ and $\mathbf{p}(l),\mathbf{q}(l),\boldsymbol{\pi}(l)$ optimally maximizes the objective function as compared to $\mathbf{p}(l-1),\mathbf{q}(l-1),\boldsymbol{\pi}(l-1).$ Since $F(\mathbf{D}^{\mathrm{gen}},\mathbf{T},\mathbf{p},\mathbf{q},\boldsymbol{\pi})$ is clearly lower-bounded, the BCD method used for iteratively solving the problem (22) converges to a local optimal solution. This completes the proof.

References

[1] F. Javed, M. K. Afzal, M. Sharif, and B.-S. Kim, “Internet of things (IoT) operating systems support, networking technologies, applications, and challenges: A comparative review,” IEEE Commun. Surveys Tuts., vol. 20, no. 3, pp. 2062–2100, 2018.
[2] L. U. Khan, W. Saad, Z. Han, E. Hossain, and C. S. Hong, “Federated learning for internet of things: Recent advances, taxonomy, and open challenges,” IEEE Commun. Surveys Tuts., vol. 23, no. 3, pp. 1759–1799, 2021.
[3] S. Wang, Y.-C. Wu, M. Xia, R. Wang, and H. V. Poor, “Machine intelligence at the edge with learning centric power allocation,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7293–7308, 2020.
[4] A. Celik and A. M. Eltawil, “At the dawn of generative AI era: A tutorial-cum-survey on new frontiers in 6G wireless intelligence,” IEEE Open J. Commun. Soc., vol. 5, pp. 2433–2489, 2024.
[5] A. Karapantelakis, P. Alizadeh, A. Alabassi, K. Dey, and A. Nikou, “Generative AI in mobile networks: A survey,” Ann. Telecommun., vol. 79, no. 1, pp. 15–33, 2024.
[6] M. Xu et al., “Unleashing the power of edge-cloud generative AI in mobile networks: A survey of AIGC services,” IEEE Commun. Surveys Tuts., 2024.
[7] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federated learning over wireless networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 269–283, 2020.
[8] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy efficient federated learning over wireless communication networks,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949, 2021.
[9] P. S. Bouzinis, P. D. Diamantoulakis, and G. K. Karagiannidis, “Wireless quantized federated learning: a joint computation and communication design,” IEEE Trans. Commun., vol. 71, no. 5, pp. 2756–2770, 2023.
[10] J. Zhang, S. Chen, X. Zhou, X. Wang, and Y.-B. Lin, “Joint scheduling of participants, local iterations, and radio resources for fair federated learning over mobile edge networks,” IEEE Trans. Mobile Comput., vol. 22, no. 07, pp. 3985–3999, 2023.
[11] H. Zhao, M. Zhou, W. Xia, Y. Ni, G. Gui, and H. Zhu, “Economic and energy-efficient wireless federated learning based on stackelberg game,” IEEE Trans. Veh. Technol., vol. 73, no. 2, pp. 2995–2999, 2024.
[12] D. Xu and H. Zhu, “Sum-rate maximization of wireless powered primary users for cooperative CRNs: NOMA or TDMA at cognitive users?” IEEE Trans. Commun., vol. 69, no. 7, pp. 4862–4876, 2021.
[13] Y. Liu et al., “Evolution of NOMA toward next generation multiple access (NGMA) for 6G,” IEEE J. Sel. Areas Commun., vol. 40, no. 4, pp. 1037–1071, 2022.
[14] D. Xu, “Device scheduling and computation offloading in mobile edge computing networks: A novel NOMA scheme,” IEEE Trans. Veh. Technol., early access, Jan. 10, 2024, doi: 10.1109/TVT.2024.3352262.
[15] Y. Wu, Y. Song, T. Wang, L. Qian, and T. Q. Quek, “Non-orthogonal multiple access assisted federated learning via wireless power transfer: A cost-efficient approach,” IEEE Trans. Commun., vol. 70, no. 4, pp. 2853–2869, 2022.
[16] M. S. Al-Abiad, M. Z. Hassan, and M. J. Hossain, “Energy-efficient resource allocation for federated learning in NOMA-enabled and relay-assisted Internet of Things networks,” IEEE Internet Things J., vol. 9, no. 24, pp. 24 736–24 753, 2022.
[17] W. Li, T. Lv, Y. Cao, W. Ni, and M. Peng, “Multi-carrier NOMA-empowered wireless federated learning with optimal power and bandwidth allocation,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9762–9777, 2023.
[18] B. Wu, F. Fang, and X. Wang, “Joint age-based client selection and resource allocation for communication-efficient federated learning over NOMA networks,” IEEE Trans. Commun., vol. 72, no. 1, pp. 179–192, 2024.
[19] G. Liu et al., “Semantic communications for artificial intelligence generated content (AIGC) toward effective content creation,” IEEE Netw., early access, Jan. 11, 2024, doi: 10.1109/MNET.2024.3352917.
[20] R. Cheng, Y. Sun, D. Niyato, L. Zhang, L. Zhang, and M. A. Imran, “A wireless AI-generated content (AIGC) provisioning framework empowered by semantic communication,” arXiv preprint arXiv:2310.17705, 2023.
[21] T. Wu et al., “CDDM: Channel denoising diffusion models for wireless semantic communications,” IEEE Trans. Wireless Commun., early access, Mar. 28, 2024, doi: 10.1109/TWC.2024.3379244.
[22] Y. Liu et al., “Blockchain-empowered lifecycle management for AI-generated content products in edge networks,” IEEE Wireless Commun., early access, Feb. 05, 2024, doi: 10.1109/MWC.003.2300053.
[23] H. Du et al., “Exploring collaborative distributed diffusion-based AI-generated content (AIGC) in wireless networks,” IEEE Netw., early access, Jul. 03, 2024, doi: 10.1109/MNET.006.2300223.
[24] ——, “Diffusion-based reinforcement learning for edge-enabled AI-generated content services,” IEEE Trans. Mobile Comput., early access, Jan. 19, 2024, doi: 10.1109/TMC.2024.3356178.
[25] X. Huang et al., “Federated learning-empowered AI-generated content in wireless networks,” IEEE Netw., early access, Jan. 12, 2024, doi: 10.1109/MNET.2024.3353377.
[26] P. Li et al., “Filling the missing: Exploring generative AI for enhanced federated learning over heterogeneous mobile edge devices,” IEEE Trans. Mobile Comput., early access, Feb. 29, 2024, doi: 10.1109/TMC.2024.3371772.
[27] D. Xu, “Latency minimization for TDMA-based wireless federated learning networks,” IEEE Trans. Veh. Technol., early access, Apr. 16, 2024, doi: 10.1109/TVT.2024.3389972.
[28] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.
[29] Z. Yang, M. Chen, W. Saad, W. Xu, and M. Shikh-Bahaei, “Sum-rate maximization of uplink rate splitting multiple access (RSMA) communication,” IEEE Trans. Mobile Comput., vol. 21, no. 07, pp. 2596–2609, 2022.
[30] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[31] “Further advancements for E-UTRA physical layer aspects (release 9),” 3GPP, TS 36.814 (V9.0.0), Mar. 2010.
[32] P. S. Bouzinis, P. D. Diamantoulakis, and G. K. Karagiannidis, “Wireless federated learning (WFL) for 6G networks–part I: Research challenges and future trends,” IEEE Commun. Lett., vol. 26, no. 1, pp. 3–7, 2022.

When NOMA Meets AIGC: Enhanced Wireless Federated Learning

Abstract

Index Terms:

I Introduction

I-A Related Works

I-B Motivation and Contributions

II System Model

II-A AIGC-Enhanced WFL Model

II-B Computation Model

II-C NOMA-Enhanced Communication Model

III NOMA+AIGC-Enhanced WFL

III-A Problem Formulation

III-B Proposed Solution Framework

Lemma 1.

Proof:

IV Efficient Algorithm for Optimally Optimizing 𝐃gen,𝐓superscript𝐃gen𝐓\mathbf{D}^{\mathrm{gen}},\mathbf{T}bold_D start_POSTSUPERSCRIPT roman_gen end_POSTSUPERSCRIPT , bold_T

Lemma 2.

Proof:

Lemma 3.

Proof:

Lemma 4.

Proof:

Lemma 5.

Proof:

Theorem 1.

Proof:

V Efficient Algorithm for Optimally Optimizing 𝐩,𝐪,𝝅𝐩𝐪𝝅\mathbf{p},\mathbf{q},\boldsymbol{\pi}bold_p , bold_q , bold_italic_π

V-A Proposed algorithm for optimally optimizing 𝐩𝐩\mathbf{p}bold_p

Lemma 6.

Proof:

Theorem 2.

Proof:

V-B Proposed algorithm for optimally optimizing 𝐪,𝛑𝐪𝛑\mathbf{q},\boldsymbol{\pi}bold_q , bold_italic_π

Theorem 3.

Proof:

Lemma 7.

Proof:

Theorem 4.

Proof:

V-C Convergence and Complexity Analysis of the Overall Proposed Algorithm

Proposition 1.

Proof:

VI Simulation Results

VII Conclusions

-A Proof of Lemma 1

-B Proof of Lemma 2

-C Proof of Lemma 3

-D Proof of Lemma 6

-E Proof of Theorem 3

-F Proof of Lemma 7

-G Proof of Proposition 1

References

IV Efficient Algorithm for Optimally Optimizing $\mathbf{D}^{\mathrm{gen}},\mathbf{T}$

V Efficient Algorithm for Optimally Optimizing $\mathbf{p},\mathbf{q},\boldsymbol{\pi}$

V-A Proposed algorithm for optimally optimizing $\mathbf{p}$

V-B Proposed algorithm for optimally optimizing $\mathbf{q},\boldsymbol{\pi}$