Learning-based Power Control for Secure Covert Semantic Communication

Yansheng Liu, Jinbo Wen, Zongyao Zhang, Kun Zhu, , Jiawen Kang Y. Liu, J. Wen, Z. Zhang, and K. Zhu are with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China, and also with the Collaborative Innovation Center of Novel Software Technology and Industrialization. (e-mails: [email protected]; [email protected]; [email protected]; [email protected];) J. Kang is with the School of Automation, Guangdong University of Technology, China (e-mail: [email protected]). Corresponding author: Kun Zhu.
Abstract

Despite progress in semantic communication (SemCom), research on SemCom security is still in its infancy. To bridge this gap, we propose a general covert SemCom framework for wireless networks, reducing eavesdropping risk. Our approach transmits semantic information covertly, making it difficult for wardens to detect. Given the aim of maximizing covert SemCom performance, we formulate a power control problem in covert SemCom under energy constraints. Furthermore, we propose a learning-based approach based on the soft actor-critic algorithm, optimizing the power of the transmitter and friendly jammer. Numerical results demonstrate that our approach effectively enhances the performance of covert SemCom.

Index Terms:
Convert semantic communication, security, power control, deep reinforcement learning.

I Introduction

As a new communication paradigm, semantic communication (SemCom) focuses on transferring task-relevant information semantically, instead of on accurate bit-level transmission [1]. With the lightening of semantic models, this emerging form of communication presents unprecedented opportunities for network edge devices. However, research on SemCom security is still in its infancy. Traditional communication security technologies cannot be directly applied to SemCom [1]. Moreover, highly complex encryption mechanisms impose enormous pressures on low-power network edge devices [2]. With the development of covert communication and SemCom, covert SemCom has emerged as a novel approach to tackle the inherent security challenges in SemCom. In covert SemCom, the jammer transmits noise signals to conceal the transmitter’s activity, making potential wardens even unable to confirm the occurrence of communication activities [3]. Therefore, the communication process is hidden. And even if the communication is partially sniffed, since semantic-level encoding is adopted [4], the warden may not understand the true meaning of the sniffed information, thereby enhancing the concealment and security of communication under limited energy conditions.

However, a significant challenge for covert SemCom lies in improving concealment while maximizing the performance of SemCom under limited energy conditions. Current research on covert communication primarily focuses on traditional communication paradigms [5, 6, 7, 8]. In addition, research in the domain of SemCom security is still at a nascent stage [4, 9]. To the best of our knowledge, covert SemCom has merely been mentioned in [10], which is tailored to specific image question-answering tasks, lacking generality. Thus, covert SemCom necessitates more comprehensive investigations.

To address the challenge mentioned above, we propose a novel general covert SemCom framework to tackle the security challenges of SemCom under limited energy conditions. Specifically, we consider the power control problem in covert SemCom to ensure semantic communication quality under overall energy limitations. Then, we propose a learning-based approach based on the Soft Actor-Critic (SAC) algorithm [11] to optimize power control within SemCom systems. This approach not only safeguards the effectiveness of covert communication but also accurately decodes the semantic information. Our contributions are summarized as follows:

  • \bullet

    A general covert semantic system framework: The proposed framework seamlessly integrates covert communication principles, enabling its application across diverse data modalities including text, image, and audio. This ensures its versatility and practicality in the context of multimodal data processing. Irrespective of the data type involved, our framework facilitates efficient and concealed semantic transmission.

  • \bullet

    Power control for covert semantic communication: We consider power control for the covert SemCom scenario composed of a transmitter, a friendly jammer, a receiver, a warden, and a power regulator. Considering energy constraints and the aim of maximizing the performance of covert SemCom, we strive to optimize the power between the transmitter and the jammer, thus enhancing covert SemCom performance under limited energy conditions.

  • \bullet

    Learning-based approach for covert semantic communication power control: We propose a SAC-based approach to power control in covert semantic communication, which not only safeguards the efficiency of covert communication but also accomplishes high-quality semantic decoding. Numerical results demonstrate that the proposed approach outperforms other Deep reinforcement learning algorithms.

II System Model and Problem Formulation

II-A System Model

The proposed framework for covert SemCom is depicted in Fig. 1. The communication scenario is composed of five components: a transmitter, a receiver, a friendly jammer, a warden, and a power regulator, all of which operate within an open wireless environment. For simplicity, we take text SemCom as an example, the primary objective of the network system is to enable the transmitter to send extracted textual semantic information to the receiver without being detected by the warden, ensuring that the receiver can successfully comprehend the semantic information. The transmitter consists of a semantic encoder that extracts semantic features from the text to be transmitted and a channel encoder that generates symbols to facilitate subsequent transmission. The receiver is equipped with a channel decoder for symbol detection and a semantic decoder for text estimation [12].

To enhance communication security, we embed covert communication techniques throughout the system by introducing a friendly jammer and a power regulator into the environment, which optimizes power control between the transmitter and the jammer. The role of the jammer is to inject noise that interferes any potential warden to intercept the communication, thereby hindering unauthorized access to the information being transmitted. In addition, the role of the power regulator is to formulate and enact power strategies, gather feedback from the receiver, and persistently fine-tune these strategies for optimum performance. Unlike other physical layer security methods, covert communication improves the security of information by hiding the act of communication [3]. Specifically, the warden employs statistical methods to differentiate the transmitter’s activity status. Accordingly, the warden assesses two potential scenarios by collecting signal samples in the channel. The null hypothesis, 0subscript0\mathcal{H}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, indicates that the transmitter is inactive, and signal power detected by the warden can be mathematically represented as [13]

Peval0=Pj(λ4πDjw)αjw|hjw|2+𝒦2.subscript𝑃subscriptevalsubscript0subscript𝑃𝑗superscript𝜆4𝜋subscript𝐷𝑗𝑤subscript𝛼𝑗𝑤superscriptsubscript𝑗𝑤2superscript𝒦2\begin{split}P_{{\text{eval}}_{\mathcal{H}_{0}}}=P_{j}\left(\frac{\lambda}{4% \pi D_{jw}}\right)^{\alpha_{jw}}\left|h_{jw}\right|^{2}+\mathcal{K}^{2}.\end{split}start_ROW start_CELL italic_P start_POSTSUBSCRIPT eval start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( divide start_ARG italic_λ end_ARG start_ARG 4 italic_π italic_D start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (1)

Considering the alternative hypothesis, represented as 1subscript1\mathcal{H}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, this hypothesis suggests that the transmitter is active. Under this hypothesis, the signal power detected by the warden can be mathematically represented as [13]

Peval1=𝒦2+Pt(λ4πDtw)αtw|htw|2+Pj(λ4πDjw)αjw|hjw|2,subscript𝑃subscriptevalsubscript1superscript𝒦2subscript𝑃𝑡superscript𝜆4𝜋subscript𝐷𝑡𝑤subscript𝛼𝑡𝑤superscriptsubscript𝑡𝑤2subscript𝑃𝑗superscript𝜆4𝜋subscript𝐷𝑗𝑤subscript𝛼𝑗𝑤superscriptsubscript𝑗𝑤2\begin{split}P_{\text{eval}_{\mathcal{H}_{1}}}=&\mathcal{K}^{2}+P_{t}\left(% \frac{\lambda}{4\pi D_{tw}}\right)^{\alpha_{tw}}|h_{tw}|^{2}\\ &+P_{j}\left(\frac{\lambda}{4\pi D_{jw}}\right)^{\alpha_{jw}}|h_{jw}|^{2},\end% {split}start_ROW start_CELL italic_P start_POSTSUBSCRIPT eval start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT = end_CELL start_CELL caligraphic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_λ end_ARG start_ARG 4 italic_π italic_D start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( divide start_ARG italic_λ end_ARG start_ARG 4 italic_π italic_D start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW (2)

where Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the transmit power, Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the jamming power, 𝒦2superscript𝒦2\mathcal{K}^{2}caligraphic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the Gaussian noise, and λ𝜆\lambdaitalic_λ is the wavelength of the carrier signal. The distances between the jammer and the warden, and the transmitter and the warden are given by Djwsubscript𝐷𝑗𝑤D_{jw}italic_D start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT and Dtwsubscript𝐷𝑡𝑤D_{tw}italic_D start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT, respectively. The path loss exponents αjwsubscript𝛼𝑗𝑤\alpha_{jw}italic_α start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT and αtwsubscript𝛼𝑡𝑤\alpha_{tw}italic_α start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT account for the attenuation of the signal as it propagates through the medium for their respective links. The small-scale fading effects are captured by hjwsubscript𝑗𝑤h_{jw}italic_h start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT and htwsubscript𝑡𝑤h_{tw}italic_h start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT, representing the multipath propagation effects. The wavelength λ𝜆\lambdaitalic_λ can be derived from the frequency of the signal as λ=cf𝜆𝑐𝑓\lambda=\frac{c}{f}italic_λ = divide start_ARG italic_c end_ARG start_ARG italic_f end_ARG, where c𝑐citalic_c is the speed of light in a vacuum and f𝑓fitalic_f is the frequency of the transmitted signal.

Refer to caption
Figure 1: An illustration of learning-based power control for secure covert semantic communication.

Based on the aforementioned hypotheses, the eavesdropping decision-making of the warden is denoted as 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for the null hypothesis 0subscript0\mathcal{H}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for the alternative hypothesis 1subscript1\mathcal{H}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, following a threshold rule [14]. The warden employs statistical hypothesis testing methods to evaluate the transmitter’s activity as either 0subscript0\mathcal{H}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT oder 1subscript1\mathcal{H}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Specifically, there are two types of detection failures for the warden: a false alarm, where the decision is 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT during 0subscript0\mathcal{H}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and a miss detection, where the decision is 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT during 1subscript1\mathcal{H}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The Detection Error Probability (DEP) can be characterized as

DEP=FA+MD=(Peval0>τ)+(Peval1<τ),subscriptDEPsubscriptFAsubscriptMDsubscript𝑃subscriptevalsubscript0𝜏subscript𝑃subscriptevalsubscript1𝜏\begin{split}\mathbb{P}_{\text{DEP}}=\mathbb{P}_{\text{FA}}+\mathbb{P}_{\text{% MD}}=\mathbb{P}(P_{\text{eval}_{\mathcal{H}_{0}}}>\tau)+\mathbb{P}(P_{\text{% eval}_{\mathcal{H}_{1}}}<\tau),\end{split}start_ROW start_CELL blackboard_P start_POSTSUBSCRIPT DEP end_POSTSUBSCRIPT = blackboard_P start_POSTSUBSCRIPT FA end_POSTSUBSCRIPT + blackboard_P start_POSTSUBSCRIPT MD end_POSTSUBSCRIPT = blackboard_P ( italic_P start_POSTSUBSCRIPT eval start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_τ ) + blackboard_P ( italic_P start_POSTSUBSCRIPT eval start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT < italic_τ ) , end_CELL end_ROW (3)

where τ𝜏\tauitalic_τ denotes the detection threshold, FAsubscriptFA\mathbb{P}_{\text{FA}}blackboard_P start_POSTSUBSCRIPT FA end_POSTSUBSCRIPT denotes the probability of false alarm, and MDsubscriptMD\mathbb{P}_{\text{MD}}blackboard_P start_POSTSUBSCRIPT MD end_POSTSUBSCRIPT denotes the probability of miss detection. In the context of covert communication, covert communication is considered successful when DEP exceeds a certain threshold, denoted as ζthsubscript𝜁th\zeta_{\text{th}}italic_ζ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT, which approximates to 1 [13].

II-B Problem Formulation

For simplicity, taking text SemCom as an example, we employ the SemCom system based on the transformer architecture designed in [12]. In this scheme, the transmitter processes the input raw data, referred to as Infos𝐼𝑛𝑓subscript𝑜𝑠Info_{s}italic_I italic_n italic_f italic_o start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Upon receiving the semantic information, the receiver utilizes a corresponding decoder to reconstruct the original informational content, hereafter referred to as Infor𝐼𝑛𝑓subscript𝑜𝑟Info_{r}italic_I italic_n italic_f italic_o start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

During the channel transmission process, the Signal to Interference plus Noise Ratio (SINR) is influenced by the power of the transmitter and the jammer, directly altering the Bit Error Rate (BER) and consequently impacting the receiver’s information reconstruction accuracy. For text transmission, BER does not accurately reflect semantic performance. In addition to human judgment to assess the similarity between sentences, the Bilingual Evaluation Understudy (BLEU) score is commonly used to measure results in machine translation [15]. Thus we adopt the BLEU metric as our objective function to assess the semantic retention and reconstruction quality of the information. The energy factors ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ηjsubscript𝜂𝑗\eta_{j}italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT signify the energetic costs per unit for transmission power and jamming power, respectively. We formulate the optimization problem as

maxPt,PjBLEU(Infos,Infor)subscriptsubscript𝑃𝑡subscript𝑃𝑗BLEU𝐼𝑛𝑓subscript𝑜s𝐼𝑛𝑓subscript𝑜r\displaystyle\max_{P_{t},P_{j}}\quad\text{BLEU}(Info_{\text{s}},Info_{\text{r}})roman_max start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT BLEU ( italic_I italic_n italic_f italic_o start_POSTSUBSCRIPT s end_POSTSUBSCRIPT , italic_I italic_n italic_f italic_o start_POSTSUBSCRIPT r end_POSTSUBSCRIPT ) (4a)
s.t.DEP>ζth,s.t.subscriptDEPsubscript𝜁th\displaystyle\quad\text{s.t.}\qquad\mathbb{P}_{\text{DEP}}>\zeta_{\text{th}},s.t. blackboard_P start_POSTSUBSCRIPT DEP end_POSTSUBSCRIPT > italic_ζ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , (4b)
ηtPtEt,subscript𝜂𝑡subscript𝑃𝑡subscript𝐸𝑡\displaystyle\qquad\qquad\eta_{t}P_{t}\leq E_{t},italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (4c)
ηjPjEj,subscript𝜂𝑗subscript𝑃𝑗subscript𝐸𝑗\displaystyle\qquad\qquad\eta_{j}P_{j}\leq E_{j},italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≤ italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (4d)

where the first constraint (4b) is the covert communication constraint with a utility threshold ζthsubscript𝜁th\zeta_{\text{th}}italic_ζ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT. Covert communication is considered successful when DEP exceeds a certain threshold ζthsubscript𝜁th\zeta_{\text{th}}italic_ζ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT, which approximates to 1 [13]. The second and third constraints (4c, 4d) are the energy constraints of the transmitter and the jammer, respectively.

Our optimization problem aims to maximize the BLEU score while minimizing the total energy consumption given by ηt×Ptsubscript𝜂𝑡subscript𝑃𝑡\eta_{t}\times P_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, ηj×Pjsubscript𝜂𝑗subscript𝑃𝑗\eta_{j}\times P_{j}italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and ensuring the communications remain the covert communication constraint. Under the condition of limited energy Etsubscript𝐸𝑡E_{t}italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ejsubscript𝐸𝑗E_{j}italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, it is obvious that a lower value of transmit power Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT or a higher value of jamming power Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT facilitates easier maintenance of the covert communication constraint. However, this setting tends to elevate the BEP due to a reduced SINR, which in turn diminishes BLEU. As such, joint optimization is essential to balance covert communication and the quality of the regenerated information. However, it has to be said that the aforementioned problem is complicated because the semantic metric BLEU, Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are not an explicit mathematical relationship, rendering it challenging to solve using traditional mathematical techniques. Deep reinforcement learning, particularly the SAC, is effective for dynamic and high-dimensional environments, as it adjusts to environmental changes and ensures efficient learning in large problem spaces. The principle of maximum entropy allows it to fully explore during decision-making, avoiding local optima and enhancing the robustness of learning [11]. Therefore, we adopt SAC for optimal power control, giving the optimization problem solution in Section III.

III SAC-based Optimal Power Control

We elaborate on the utilization of the SAC algorithm for optimal power control. The algorithm provides a power control scheme, typically denoted as {Pt,Pj}subscript𝑃𝑡subscript𝑃𝑗\{P_{t},P_{j}\}{ italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, as seen in (4a). The reward of the agent (i.e., the power regulator) is determined by the actions executed pertaining to a particular state of the environment. Here are the details of the SAC formulation:

  • State space: The states, denoted as 𝐬𝐬\mathbf{s}bold_s, are construed as a vector of measurements that encapsulate the conditions of the environment. In alignment with our discussion in Section II-B, herein, the environmental vector refers to the comprehensive set of all variables influencing the optimal power control scheme, given by

    𝐬={Dtw,Dtr,Djw,Djr,αtw,αtr,αjw,αjr,\displaystyle\mathbf{s}=\{D_{\text{tw}},D_{\text{tr}},D_{\text{jw}},D_{\text{% jr}},\alpha_{\text{tw}},\alpha_{\text{tr}},\alpha_{\text{jw}},\alpha_{\text{jr% }},bold_s = { italic_D start_POSTSUBSCRIPT tw end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT jw end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT jr end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT tw end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT jw end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT jr end_POSTSUBSCRIPT , (5)
    ζth,htw,htr,hjw,hjr,𝒦2,λ,τ}.\displaystyle\zeta_{\text{th}},h_{\text{tw}},h_{\text{tr}},h_{\text{jw}},h_{% \text{jr}},\mathcal{K}^{2},\lambda,\tau\}.italic_ζ start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT tw end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT jw end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT jr end_POSTSUBSCRIPT , caligraphic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_λ , italic_τ } .
  • Action space: Based on the environment state, the agent initiates an action, denoted as 𝐚𝐚\mathbf{a}bold_a. The action space represents the control actions taken by the power regulator, structured with several control variables. The proposed scheme uses transmit power Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and jamming power Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as actions of the power regulator, given by

    𝐚={Pt,Pj}.𝐚subscript𝑃𝑡subscript𝑃𝑗\displaystyle\mathbf{a}=\{P_{t},P_{j}\}.bold_a = { italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } . (6)
  • Immediate reward: In optimal power control, the goal is to maximize the BLEU score while minimizing the total energy consumption given by ηt×Ptsubscript𝜂𝑡subscript𝑃𝑡\eta_{t}\times P_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, ηj×Pjsubscript𝜂𝑗subscript𝑃𝑗\eta_{j}\times P_{j}italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and ensuring the communications remain the covert communication constraint. To speed up the convergence, we consider the above two constraints in designing the reward functions, which is expressed as

    R(t)={BLEU(t),
    if (4b), (4c), and (4d) are met
     
    ,
    Ψ,otherwise,
    𝑅𝑡casesBLEU𝑡
    if (4b), (4c), and (4d) are met
     
    Ψotherwise,
    \displaystyle R{(t)}=\begin{cases}\text{BLEU}{(t)},&\text{\noindent\hbox{}{% \hbox{\hbox{\begin{tabular}[c]{@{}l@{}}if (\ref{eq:opta}), (\ref{eq:optb}), % and (\ref{eq:optc}) are met\end{tabular}}}}\hfill\hbox{}},\\ \Psi,&\text{otherwise,}\end{cases}italic_R ( italic_t ) = { start_ROW start_CELL BLEU ( italic_t ) , end_CELL start_CELL if (), (), and () are met , end_CELL end_ROW start_ROW start_CELL roman_Ψ , end_CELL start_CELL otherwise, end_CELL end_ROW
    (7)

    where R(t)𝑅𝑡R{(t)}italic_R ( italic_t ) is the reward at time t𝑡titalic_t, BLEU(t)BLEU𝑡\text{BLEU}{(t)}BLEU ( italic_t ) represents the degree of semantic retention and reconstruction quality of the source data and reconstructed data at time t𝑡titalic_t, and ΨΨ\Psiroman_Ψ represents the penalty factor, which varies with the scale of the environment.

  • Value function: SAC meticulously optimizes a stochastic policy, maximizing both long-term entropy and expected lifetime rewards [11]. Concomitantly, it learns a policy πτ(𝐚|𝐬)subscript𝜋𝜏conditional𝐚𝐬\pi_{\tau}(\mathbf{a}|\mathbf{s})italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_a | bold_s ) and two Q-functions, Qθ1subscript𝑄subscript𝜃1Q_{\theta_{1}}italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Qθ2subscript𝑄subscript𝜃2Q_{\theta_{2}}italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, applying the minimum of these two Q-values to construct the targets in Bellman error functions, which is given by

    y(r,𝐬,d)=γ(1d)(mini=1,2Qθi(𝐬,𝐚)αlogπτ(𝐚|𝐬))+r,𝑦𝑟𝐬𝑑𝛾1𝑑subscript𝑖12subscript𝑄subscript𝜃𝑖𝐬𝐚𝛼subscript𝜋𝜏conditional𝐚𝐬𝑟y(r,\mathbf{s},d)=\gamma(1-d)\big{(}\min_{i=1,2}Q_{\theta_{i}}(\mathbf{s},% \mathbf{a})-\alpha\log\pi_{\tau}(\mathbf{a}|\mathbf{s})\big{)}+r,italic_y ( italic_r , bold_s , italic_d ) = italic_γ ( 1 - italic_d ) ( roman_min start_POSTSUBSCRIPT italic_i = 1 , 2 end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_s , bold_a ) - italic_α roman_log italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_a | bold_s ) ) + italic_r , (8)

    where d𝑑ditalic_d is indicative of the done signal, α𝛼\alphaitalic_α serves as a coefficient that manages trade-offs, γ𝛾\gammaitalic_γ is the discount factor, and r𝑟ritalic_r denotes the reward. In every state, the policy executes actions to maximize the expected future return along with the expected future entropy, i.e., optimizing the state value function Vπ(𝐬)superscript𝑉𝜋𝐬V^{\pi}(\mathbf{s})italic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( bold_s ), which is given by

    Vπ(𝐬)=𝔼𝐚π[Qπτ(𝐬,𝐚)αlogπτ(𝐚|𝐬)].superscript𝑉𝜋𝐬subscript𝔼similar-to𝐚𝜋delimited-[]superscript𝑄subscript𝜋𝜏𝐬𝐚𝛼subscript𝜋𝜏conditional𝐚𝐬\displaystyle V^{\pi}(\mathbf{s})=\mathbb{E}_{\mathbf{a}\sim\pi}\left[Q^{\pi_{% \tau}}(\mathbf{s},\mathbf{a})-\alpha\log\pi_{\tau}(\mathbf{a}|\mathbf{s})% \right].italic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( bold_s ) = blackboard_E start_POSTSUBSCRIPT bold_a ∼ italic_π end_POSTSUBSCRIPT [ italic_Q start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_s , bold_a ) - italic_α roman_log italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_a | bold_s ) ] . (9)
  • SAC algorithm design: We consider the power regulator as a SAC agent and a stochastic policy π(𝐚n|𝐬n)𝜋conditionalsubscript𝐚𝑛subscript𝐬𝑛\pi(\mathbf{a}_{n}|\mathbf{s}_{n})italic_π ( bold_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is defined to map from states to a probability distribution over actions. At each training iteration, the power regulator observes the state 𝐬nsubscript𝐬𝑛\mathbf{s}_{n}bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and executes an action 𝐚nsubscript𝐚𝑛\mathbf{a}_{n}bold_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT sampled from the current policy π𝜋\piitalic_π. Then, the environment turns into the next state 𝐬n+1subscript𝐬𝑛1\mathbf{s}_{n+1}bold_s start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT and sends back the reward rnsubscript𝑟𝑛r_{n}italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to the power regulator. The tuple 𝐞nsubscript𝐞𝑛\mathbf{e}_{n}bold_e start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is stored as experience in a data set 𝒟𝒟\mathcal{D}caligraphic_D called replay buffer. By sampling from 𝒟𝒟\mathcal{D}caligraphic_D and updating the policy periodically according to a learning algorithm, the power regulator will finally find an optimal policy πsuperscript𝜋\pi^{*}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that maximizes the long-term reward (10). The specific learning process can be seen from the Algorithm 1.

    Rπ=𝔼π[n=0γnr(𝐬n,𝐚n)].subscript𝑅𝜋subscript𝔼𝜋delimited-[]superscriptsubscript𝑛0superscript𝛾𝑛𝑟subscript𝐬𝑛subscript𝐚𝑛\displaystyle R_{\pi}=\mathbb{E}_{\pi}\left[\sum_{n=0}^{\infty}\gamma^{n}r(% \mathbf{s}_{n},\mathbf{a}_{n})\right].italic_R start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_r ( bold_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] . (10)
Algorithm 1 SAC-based Power Control in Covert SemCom
1:Input: States from the environment 𝐬𝐬\mathbf{s}bold_s;
2:Output: Actions to the environment 𝐚={Pt,Pj,S}𝐚subscript𝑃𝑡subscript𝑃𝑗𝑆\mathbf{a}=\{P_{t},P_{j},S\}bold_a = { italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_S };
3:Initialize policy network parameters τ𝜏\tauitalic_τ, Q-function parameters θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;
4:Initialize the target network parameters equal primary parameters as θ^1θ1subscript^𝜃1subscript𝜃1\hat{\theta}_{1}\leftarrow\theta_{1}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, θ^2θ2subscript^𝜃2subscript𝜃2\hat{\theta}_{2}\leftarrow\theta_{2}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;
5:for episodes e=1,,N𝑒1𝑁e=1,\ldots,Nitalic_e = 1 , … , italic_N do
6:     Reset environment state s0subscript𝑠0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and replay buffer 𝒟𝒟\mathcal{D}caligraphic_D;
7:     for time-slot t=1,,T𝑡1𝑇t=1,\ldots,Titalic_t = 1 , … , italic_T do
8:         Take action 𝐚(t)𝐚𝑡\mathbf{a}(t)bold_a ( italic_t ) based on πτ(.|𝐬)\pi_{\tau}(.|\mathbf{s})italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( . | bold_s );
9:         Execute the control actions to the environment;
10:         Calculate rewards through (7);
11:         Update 𝐬tsubscript𝐬𝑡\mathbf{s}_{t}bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into 𝐬t+1subscript𝐬𝑡1\mathbf{s}_{t+1}bold_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, and store transition into 𝒟𝒟\mathcal{D}caligraphic_D;
12:         Sample a random mini-batch of data \mathcal{B}caligraphic_B with a size N from 𝒟𝒟\mathcal{D}caligraphic_D;
13:         Update the target Q functions using (8);
14:         Update the Q function as
15:Δθi=1|N|(𝐬,𝐚,r,𝐬)(Qθi(𝐬,𝐚)y(r,𝐬,d))2Δsubscript𝜃𝑖1𝑁subscript𝐬𝐚𝑟superscript𝐬superscriptsubscript𝑄subscript𝜃𝑖𝐬𝐚𝑦𝑟superscript𝐬𝑑2\Delta\theta_{i}=\frac{1}{|N|}\sum_{(\mathbf{s},\mathbf{a},r,\mathbf{s^{\prime% }})\in\mathcal{B}}(Q_{\theta_{i}}(\mathbf{s},\mathbf{a})-y(r,\mathbf{s^{\prime% }},d))^{2}roman_Δ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | italic_N | end_ARG ∑ start_POSTSUBSCRIPT ( bold_s , bold_a , italic_r , bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_B end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_s , bold_a ) - italic_y ( italic_r , bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for i=1,2𝑖12i=1,2italic_i = 1 , 2;
16:         Update the target network as
17:θ^iβθ^i+(1β)θisubscript^𝜃𝑖𝛽subscript^𝜃𝑖1𝛽subscript𝜃𝑖\hat{\theta}_{i}\leftarrow\beta\hat{\theta}_{i}+(1-\beta)\theta_{i}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_β over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_β ) italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for i=1,2𝑖12i=1,2italic_i = 1 , 2;
18:         Update the policy network as
19:Δτ=1|N|𝐬(Qθi(𝐬,aτ(𝐬))αlogπτ(aτ(𝐬)|𝐬)y(r,𝐬,d))2Δ𝜏1𝑁subscript𝐬superscriptsubscript𝑄subscript𝜃𝑖𝐬subscript𝑎𝜏𝐬𝛼subscript𝜋𝜏conditionalsubscript𝑎𝜏𝐬𝐬𝑦𝑟superscript𝐬𝑑2\Delta\tau=\frac{1}{|N|}\sum_{\mathbf{s}\in\mathcal{B}}(Q_{\theta_{i}}(\mathbf% {s},a_{\tau}(\mathbf{s}))-\alpha\log\pi_{\tau}(a_{\tau}(\mathbf{s})|\mathbf{s}% )-y(r,\mathbf{s^{\prime}},d))^{2}roman_Δ italic_τ = divide start_ARG 1 end_ARG start_ARG | italic_N | end_ARG ∑ start_POSTSUBSCRIPT bold_s ∈ caligraphic_B end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_s , italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_s ) ) - italic_α roman_log italic_π start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_s ) | bold_s ) - italic_y ( italic_r , bold_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for i=1,2𝑖12i=1,2italic_i = 1 , 2;
20:     end for
21:     if reach maximum episodes N then
22:         Break;
23:     end if
24:end for

IV Numerical Results

The effects of increasing transmit power on various parameters, such as the covert rate, defined as the data rate attainable during covert communications, DEP, and BEP, are illustrated. These effects are observed when the jamming power, Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, is set to 45dBW45dBW45\>\rm{dBW}45 roman_dBW and τ𝜏\tauitalic_τ is 50505050. Within a Cartesian coordinate grid where the units are in meters, the transmitter, warden, receiver, jammer, and power regulator are positioned at coordinates (0,0)00(0,0)( 0 , 0 ), (0,100)0100(0,100)( 0 , 100 ), (100,0)1000(100,0)( 100 , 0 ), (100,100)100100(100,100)( 100 , 100 ) and (50,50)5050(50,50)( 50 , 50 ), respectively. The path loss exponents are set to αtr=1subscript𝛼𝑡𝑟1\alpha_{tr}=1italic_α start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = 1, αtw=1.2subscript𝛼𝑡𝑤1.2\alpha_{tw}=1.2italic_α start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT = 1.2, and both αjwsubscript𝛼𝑗𝑤\alpha_{jw}italic_α start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT and αjrsubscript𝛼𝑗𝑟\alpha_{jr}italic_α start_POSTSUBSCRIPT italic_j italic_r end_POSTSUBSCRIPT are configured to 1.41.41.41.4. As for small-scale channel fading attributes such as htwsubscript𝑡𝑤h_{tw}italic_h start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT, htrsubscript𝑡𝑟h_{tr}italic_h start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT , hjwsubscript𝑗𝑤h_{jw}italic_h start_POSTSUBSCRIPT italic_j italic_w end_POSTSUBSCRIPT, and hjrsubscript𝑗𝑟h_{jr}italic_h start_POSTSUBSCRIPT italic_j italic_r end_POSTSUBSCRIPT, they are designed to conform with the αμ𝛼𝜇\alpha-\muitalic_α - italic_μ fading model, with the parameters α𝛼\alphaitalic_α and μ𝜇\muitalic_μ being 2222 and 4444, respectively [16].

The adopted channel coding strategy is Binary Phase-shift Keying (BPSK). As demonstrated in Fig. 2, given the jamming power, the gradual increase in transmission power undoubtedly leads to a direct increase in the covert rate. However, as the transmission power gradually increases, the signal power received by the warden gradually reaches its detection range. This increases the probability of being monitored by the warden, i.e., the DEP of the warden gradually decreases. For the receiver, there are not only data signals from the transmitter in the environment but also noise signals from the friendly jammer. When the transmit power is smaller and the noise power is larger, even though the trained SemCom model can well extract the semantic information, the final recovery result will be very poor due to the high BEP. As shown in Fig. 2, when the transmission power is approximately 49.549.549.549.5 dBWdBW\rm{dBW}roman_dBW, it can ensure covert communication while creating good channel conditions for semantic information reconstruction.

Refer to caption
Figure 2: The covert rate, DEP, and the BEP versus the transmit power, and the jamming power Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is 45dBW45dBW45\>\rm{dBW}45 roman_dBW.
Refer to caption
Figure 3: Test rewards of the proposed SAC-based algorithm and other algorithms, with batch size N=512𝑁512{N}=512italic_N = 512, discount factor γ=0.95𝛾0.95\gamma=0.95italic_γ = 0.95, and the learning rate of actor and critic networks is 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.
Refer to caption
Figure 4: The quality of data reconstruction when the receiver with different scales of the model under different BEP conditions.

Figure 3 presents the performance analysis of the proposed SAC-based algorithm for optimal power control. We compare the proposed SAC-based algorithm with other algorithms: 1) PPO-based algorithm that the power regulator utilizes prioritizes optimizing control strategies based on data reconstruction performance; 2) GDM-based algorithm that the power regulator designs schemes by denoising noise to the initial Gaussian noise to recover or discover the optimal actions [17]. As shown in Fig. 3, the proposed SAC-based algorithm outperforms PPO and GDM algorithms. Specifically, the SAC-based algorithm converges to good results faster compared to PPO, while PPO achieves satisfactory results after approximately 3500 iterations, indicating that the proposed SAC-based can more effectively find the optimal scheme. The reason is that the SAC algorithm, by maximizing policy entropy, helps converge to the optimal solution more effectively, enabling it to reach better solutions in a shorter time compared to the PPO and GDM algorithms [11, 17].

Figure 4 shows the reconstruction process under varying BEPs. We can observe that when BEP is relatively small in Case 1, the large-scale models significantly outperform the other two. This is because the transmission power is higher than the noise, and under sufficiently good channel conditions, large-scale models can effectively extract and reconstruct semantic information. However, as BEP gradually increases from 0.110.110.110.11 to 0.240.240.240.24 in Case 2, it is clear that small-scale models can achieve better performance. This is due to the low transmission power or high interference power, resulting in a low signal-to-noise ratio. The small-scale models can alleviate the interference caused by noise more effectively compared with the large-scale models. Finally, when BEP exceeds 0.240.240.240.24 in Case 3, the channel conditions have already become very poor and any type of model is unable to decode semantic information correctly. The reconstructed data always appear as noise, resulting in relatively low BLEU scores. This further emphasizes the importance of optimizing power control.

V Conclusion

In this paper, we proposed a learning-based approach to optimal power control for the proposed covert SemCom, aimed at resolving the security dilemmas inherent in SemCom. Specifically, we introduced a power regulator and a friendly jammer to achieve covert communication within the context of SemCom. To address the increased energy usage introduced by covert SemCom, we formulated power control as an optimization problem and utilized the SAC algorithm to optimize power control between the transmitter and the friendly jammer, which strikes a balance between the effectiveness of covert transmission and SemCom. Numerical results have demonstrated the effectiveness of the proposed optimization strategy within the covert SemCom framework. For future work, we will explore ways to integrate the concept of the Mixture of Experts (MoE) system into our covert SemCom framework, ensuring reliable communication across varied network conditions.

References

  • [1] H. Du, J. Wang, D. Niyato, J. Kang, Z. Xiong, M. Guizani, and D. I. Kim, “Rethinking wireless communication security in semantic Internet of Things,” IEEE Wireless Communications, vol. 30, no. 3, pp. 36–43, 2023.
  • [2] A. Chorti, A. N. Barreto, S. Köpsell, M. Zoli, M. Chafii, P. Sehier, G. Fettweis, and H. V. Poor, “Context-aware security for 6G wireless: The role of physical layer security,” IEEE Communications Standards Magazine, vol. 6, no. 1, pp. 102–108, 2022.
  • [3] X. Chen, J. An, Z. Xiong, C. Xing, N. Zhao, F. R. Yu, and A. Nallanathan, “Covert communications: A comprehensive survey,” IEEE Communications Surveys & Tutorials, 2023.
  • [4] Z. Yang, M. Chen, G. Li, Y. Yang, and Z. Zhang, “Secure semantic communications: Fundamentals and challenges,” IEEE Network, 2024.
  • [5] L. Zhang, B. Zhang, R. Guo, Z. Wang, G. Wang, J. Qiu, S. Su, Y. Liu, G. Xu, Z. Tian et al., “Research on covert communication technology based on matrix decomposition of digital currency transaction amount,” KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 4, pp. 1020–1041, 2024.
  • [6] I. Makhdoom, M. Abolhasan, and J. Lipman, “A comprehensive survey of covert communication techniques, limitations and future challenges,” Computers & Security, vol. 120, p. 102784, 2022.
  • [7] Y. Zhang, W. Ni, Y. Mao, B. Ning, S. Xiao, W. Tang, and D. Niyato, “Rate-splitting multiple access for covert communications,” IEEE Wireless Communications Letters, vol. 13, no. 6, pp. 1685–1689, 2024.
  • [8] B. Che, H. Shi, W. Yang, X. Guan, and R. Ma, “Covert wireless communication against jamming-assisted proactive detection,” IEEE Wireless Communications Letters, vol. 12, no. 8, pp. 1304–1308, 2023.
  • [9] X. Luo, Z. Chen, M. Tao, and F. Yang, “Encrypted semantic communication using adversarial training for privacy preserving,” IEEE Communications Letters, 2023.
  • [10] Y. Wang, Y. Hu, H. Du, T. Luo, and D. Niyato, “Multi-agent reinforcement learning for covert semantic communications over wireless networks,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.
  • [11] J. W. Mock and S. S. Muknahallipatna, “A comparison of PPO, TD3 and SAC reinforcement algorithms for quadruped walking gait generation,” Journal of Intelligent Learning Systems and Applications, vol. 15, no. 1, pp. 36–56, 2023.
  • [12] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2021.
  • [13] H. Du, G. Liu, D. Niyato, J. Zhang, J. Kang, Z. Xiong, B. Ai, and D. I. Kim, “Generative Al-aided joint training-free secure semantic communications via multi-modal prompts,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2024, pp. 12 896–12 900.
  • [14] X. Chen, J. An, Z. Xiong, C. Xing, N. Zhao, F. R. Yu, and A. Nallanathan, “Covert communications: A comprehensive survey,” IEEE Communications Surveys & Tutorials, 2023.
  • [15] M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, “Mutual information neural estimation,” in International conference on machine learning.   PMLR, 2018, pp. 531–540.
  • [16] M. D. Yacoub, “The α𝛼\alphaitalic_α-μ𝜇\muitalic_μ distribution: A physical fading model for the stacy distribution,” IEEE Transactions on Vehicular Technology, vol. 56, no. 1, pp. 27–34, 2007.
  • [17] J. Wen, J. Nie, J. Kang, D. Niyato, H. Du, Y. Zhang, and M. Guizani, “From generative AI to generative Internet of Things: Fundamentals, framework, and outlooks,” IEEE Internet of Things Magazine, vol. 7, no. 3, pp. 30–37, 2024.