Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario

Bei Liu; Shuting Long; Xin Su

doi:10.3390/e26100820

Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario

Entropy (Basel). 2024 Sep 25;26(10):820. doi: 10.3390/e26100820.

Authors

Bei Liu¹, Shuting Long¹, Xin Su²

Affiliations

¹ School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
² Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

Abstract

Future 6G networks will inherit and develop Network Function Virtualization (NFV) architecture. With the NFV-enabled network architecture, it becomes possible to establish different virtual networks within the same infrastructure, create different Virtual Network Functions (VNFs) in different virtual networks, and form Service Function Chains (SFCs) that meet different service requirements through the orderly combination of VNFs. These SFCs can be deployed to physical entities as needed to provide network functions that support different services. To meet the highly dynamic service requirements in the future 6G Internet of Things (IoT) scenario, the highly flexible and efficient SFC reconfiguration algorithm is the key research direction. Deep-learning-based algorithms have shown their advantages in solving this type of dynamic optimization problem. Considering that the efficiency of the traditional Actor Critic (AC) algorithm is limited, the policy does not directly participate in the value function update. In this paper, we use the Proximal Policy Optimization (PPO) clip function to restrict the difference between the new policy and the old policy, to ensure the stability of the updating process. We combine PPO with AC, and further bring the historical decision information as the network knowledge to offer better initial policies, to accelerate the training speed. We also propose the Knowledge = Assisted Actor Critic Proximal Policy Optimization (KA-ACPPO)-based SFC reconfiguration algorithm to ensure the Quality of Service (QoS) of end-to-end services. Simulation results show that the proposed KA-ACPPO algorithm can effectively reduce computing cost and power consumption.

Keywords: 6G IoT; ACPPO algorithm; SFC reconfiguration; knowledge assisted.

Grants and funding

This work was supported by National Key R&D Program of China (No. 2020YFB1806702).