Sign in to use this feature.

Years

Between: -

Search Results (863)

Search Parameters:
Keywords = actor–critic

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1279 KiB  
Article
Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario
by Bei Liu, Shuting Long and Xin Su
Entropy 2024, 26(10), 820; https://doi.org/10.3390/e26100820 - 25 Sep 2024
Viewed by 294
Abstract
Future 6G networks will inherit and develop Network Function Virtualization (NFV) architecture. With the NFV-enabled network architecture, it becomes possible to establish different virtual networks within the same infrastructure, create different Virtual Network Functions (VNFs) in different virtual networks, and form Service Function [...] Read more.
Future 6G networks will inherit and develop Network Function Virtualization (NFV) architecture. With the NFV-enabled network architecture, it becomes possible to establish different virtual networks within the same infrastructure, create different Virtual Network Functions (VNFs) in different virtual networks, and form Service Function Chains (SFCs) that meet different service requirements through the orderly combination of VNFs. These SFCs can be deployed to physical entities as needed to provide network functions that support different services. To meet the highly dynamic service requirements in the future 6G Internet of Things (IoT) scenario, the highly flexible and efficient SFC reconfiguration algorithm is the key research direction. Deep-learning-based algorithms have shown their advantages in solving this type of dynamic optimization problem. Considering that the efficiency of the traditional Actor Critic (AC) algorithm is limited, the policy does not directly participate in the value function update. In this paper, we use the Proximal Policy Optimization (PPO) clip function to restrict the difference between the new policy and the old policy, to ensure the stability of the updating process. We combine PPO with AC, and further bring the historical decision information as the network knowledge to offer better initial policies, to accelerate the training speed. We also propose the Knowledge = Assisted Actor Critic Proximal Policy Optimization (KA-ACPPO)-based SFC reconfiguration algorithm to ensure the Quality of Service (QoS) of end-to-end services. Simulation results show that the proposed KA-ACPPO algorithm can effectively reduce computing cost and power consumption. Full article
Show Figures

Figure 1

27 pages, 2504 KiB  
Perspective
Learning-Based Optimisation for Integrated Problems in Intermodal Freight Transport: Preliminaries, Strategies, and State of the Art
by Elija Deineko, Paul Jungnickel and Carina Kehrt
Appl. Sci. 2024, 14(19), 8642; https://doi.org/10.3390/app14198642 - 25 Sep 2024
Viewed by 352
Abstract
Intermodal freight transport (IFT) requires a large number of optimisation measures to ensure its attractiveness. This involves numerous control decisions on different time scales, making integrated optimisation with traditional methods almost unfeasible. Recently, a new trend in optimisation science has emerged: the application [...] Read more.
Intermodal freight transport (IFT) requires a large number of optimisation measures to ensure its attractiveness. This involves numerous control decisions on different time scales, making integrated optimisation with traditional methods almost unfeasible. Recently, a new trend in optimisation science has emerged: the application of Deep Learning (DL) to combinatorial problems. Neural combinatorial optimisation (NCO) enables real-time decision-making under uncertainties by considering rich context information—a crucial factor for seamless synchronisation, optimisation, and, consequently, for the competitiveness of IFT. The objective of this study is twofold. First, we systematically analyse and identify the key actors, operations, and optimisation problems in IFT and categorise them into six major classes. Second, we collect and structure the key methodological components of the NCO framework, including DL models, training algorithms, design strategies, and review the current State of the Art with a focus on NCO and hybrid DL models. Through this synthesis, we integrate the latest research efforts from three closely related fields: optimisation, transport planning, and NCO. Finally, we critically discuss and outline methodological design patterns and derive potential opportunities and obstacles for learning-based frameworks for integrated optimisation problems. Together, these efforts aim to enable a better integration of advanced DL techniques into transport logistics. We hope that this will help researchers and practitioners in related fields to expand their intuition and foster the development of intelligent decision-making systems and algorithms for tomorrow’s transport systems. Full article
(This article belongs to the Special Issue Transportation in the 21st Century: New Vision on Future Mobility)
Show Figures

Figure 1

16 pages, 2895 KiB  
Article
Sequence Decision Transformer for Adaptive Traffic Signal Control
by Rui Zhao, Haofeng Hu, Yun Li, Yuze Fan, Fei Gao and Zhenhai Gao
Sensors 2024, 24(19), 6202; https://doi.org/10.3390/s24196202 - 25 Sep 2024
Viewed by 393
Abstract
Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC’s capabilities. This paper introduces a novel DRL-based [...] Read more.
Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC’s capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor–critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent’s actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

21 pages, 370 KiB  
Article
Contested Borderlands: Experimental Governance and Statecraft in the Laos Golden Triangle Special Economic Zone
by Josto Luzzu
Soc. Sci. 2024, 13(10), 500; https://doi.org/10.3390/socsci13100500 - 24 Sep 2024
Viewed by 291
Abstract
The Golden Triangle Special Economic Zone (GTSEZ) in northwest Laos exemplifies an experimental governance model, where sovereign powers are partially privatized to drive economic development. Established in 2007 through a 99-year concession with the Kings Romans Group (KRG), a Chinese gaming company, the [...] Read more.
The Golden Triangle Special Economic Zone (GTSEZ) in northwest Laos exemplifies an experimental governance model, where sovereign powers are partially privatized to drive economic development. Established in 2007 through a 99-year concession with the Kings Romans Group (KRG), a Chinese gaming company, the GTSEZ is integral to Laos’s strategy of leveraging Special Economic Zones (SEZs) for modernization. This paper examines the complex dynamics between the Lao state and non-state actors within the GTSEZ, highlighting its fragmented yet pragmatic statecraft. Drawing on extensive fieldwork from 2014 to 2018, the study examines the GTSEZ’s historical connections to the opium trade and its contemporary socio-political and economic roles. The zone’s creation has generated both enthusiasm and criticism, particularly concerning sovereignty, local impacts, and controversial activities. The paper also discusses the broader implications of SEZs in Laos’s nation-building efforts, and the GTSEZ’s balance of economic openness with regulatory oversight, enhancing the understanding of experimental governance in Southeast Asia. Full article
(This article belongs to the Special Issue Contemporary Local Governance, Wellbeing and Sustainability)
6 pages, 834 KiB  
Proceeding Paper
Actor–Critic Algorithm for the Dynamic Scheduling Problem of Unrelated Parallel Batch Machines
by Xue Zhao, Yarong Chen and Mudassar Rauf
Eng. Proc. 2024, 75(1), 12; https://doi.org/10.3390/engproc2024075012 - 23 Sep 2024
Viewed by 92
Abstract
With the continuous development of the information industry, semiconductor manufacturing has become a key basic industry in the information age. Due to the demands of the process, there are more batch processes in the semiconductor manufacturing process, such as the aging test session [...] Read more.
With the continuous development of the information industry, semiconductor manufacturing has become a key basic industry in the information age. Due to the demands of the process, there are more batch processes in the semiconductor manufacturing process, such as the aging test session of chips. In this paper, in the context of semiconductor manufacturing, we consider the unrelated parallel batch processing machine (UPBPM) scheduling problem in which jobs have different processing times, arrival times, sizes, and processing eligibility constraints, where the machines have different capacity constraints and the objective of minimizing the makespan. We propose the actor–critic algorithm, incorporating the Rolling Time Window (R-AC algorithm) to solve the UPBPM scheduling problem. Through simulation experiments, the R-AC algorithm outperforms the separate heuristic scheduling rules. Full article
Show Figures

Figure 1

25 pages, 3938 KiB  
Article
Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach
by Anthony Kyung Guzmán Leguel, Hoa-Hung Nguyen, David Gómez Gutiérrez, Jinwoo Yoo and Han-You Jeong
Sensors 2024, 24(18), 6086; https://doi.org/10.3390/s24186086 - 20 Sep 2024
Viewed by 470
Abstract
Vehicle-to-everything (V2X) communication is pivotal in enhancing cooperative awareness in vehicular networks. Typically, awareness is viewed as a vehicle’s ability to perceive and share real-time kinematic information. We present a novel definition of awareness in V2X communications, conceptualizing it as a multi-faceted concept [...] Read more.
Vehicle-to-everything (V2X) communication is pivotal in enhancing cooperative awareness in vehicular networks. Typically, awareness is viewed as a vehicle’s ability to perceive and share real-time kinematic information. We present a novel definition of awareness in V2X communications, conceptualizing it as a multi-faceted concept involving vehicle detection, tracking, and maintaining their safety distances. To enhance this awareness, we propose a deep reinforcement learning framework for the joint control of beacon rate and transmit power (DRL-JCBRTP). Our DRL−JCBRTP framework integrates LSTM-based actor networks and MLP-based critic networks within the Soft Actor-Critic (SAC) algorithm to effectively learn optimal policies. Leveraging local state information, the DRL-JCBRTP scheme uses an innovative reward function to increase the minimum awareness failure distance. Our SLMLab-Gym-VEINS simulations show that the DRL-JCBRTP scheme outperforms existing beaconing schemes in minimizing awareness failure probability and maximizing awareness distance, ultimately improving driving safety. Full article
(This article belongs to the Special Issue Vehicle-to-Everything (V2X) Communication Networks)
Show Figures

Figure 1

22 pages, 2746 KiB  
Article
Robust Design of Two-Level Non-Integer SMC Based on Deep Soft Actor-Critic for Synchronization of Chaotic Fractional Order Memristive Neural Networks
by Majid Roohi, Saeed Mirzajani, Ahmad Reza Haghighi and Andreas Basse-O’Connor
Fractal Fract. 2024, 8(9), 548; https://doi.org/10.3390/fractalfract8090548 - 20 Sep 2024
Viewed by 317
Abstract
In this study, a model-free  PIφ-sliding mode control ( PIφ-SMC) methodology is proposed to synchronize a specific class of chaotic fractional-order memristive neural network systems (FOMNNSs) with delays and input saturation. The fractional-order Lyapunov stability theory is [...] Read more.
In this study, a model-free  PIφ-sliding mode control ( PIφ-SMC) methodology is proposed to synchronize a specific class of chaotic fractional-order memristive neural network systems (FOMNNSs) with delays and input saturation. The fractional-order Lyapunov stability theory is used to design a two-level  PIφ-SMC which can effectively manage the inherent chaotic behavior of delayed FOMNNSs and achieve finite-time synchronization. At the outset, an initial sliding surface is introduced. Subsequently, a robust  PIφ-sliding surface is designed as a second sliding surface, based on proportional–integral (PI) rules. The finite-time asymptotic stability of both surfaces is demonstrated. The final step involves the design of a dynamic-free control law that is robust against system uncertainties, input saturations, and delays. The independence of control rules from the functions of the system is accomplished through the application of the norm-boundedness property inherent in chaotic system states. The soft actor-critic (SAC) algorithm based deep Q-Learning is utilized to optimally adjust the coefficients embedded in the two-level  PIφ-SMC controller’s structure. By maximizing a reward signal, the optimal policy is found by the deep neural network of the SAC agent. This approach ensures that the sliding motion meets the reachability condition within a finite time. The validity of the proposed protocol is subsequently demonstrated through extensive simulation results and two numerical examples. Full article
Show Figures

Figure 1

21 pages, 5729 KiB  
Article
An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm
by Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su and Minghao Liu
Appl. Sci. 2024, 14(18), 8383; https://doi.org/10.3390/app14188383 - 18 Sep 2024
Viewed by 357
Abstract
Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency [...] Read more.
Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed. Full article
Show Figures

Figure 1

16 pages, 1860 KiB  
Article
CHAM-CLAS: A Certificateless Aggregate Signature Scheme with Chameleon Hashing-Based Identity Authentication for VANETs
by Ahmad Kabil, Heba Aslan, Marianne A. Azer and Mohamed Rasslan
Cryptography 2024, 8(3), 43; https://doi.org/10.3390/cryptography8030043 - 17 Sep 2024
Viewed by 342
Abstract
Vehicular ad hoc networks (VANETs), which are the backbone of intelligent transportation systems (ITSs), facilitate critical data exchanges between vehicles. This necessitates secure transmission, which requires guarantees of message availability, integrity, source authenticity, and user privacy. Moreover, the traceability of network participants is [...] Read more.
Vehicular ad hoc networks (VANETs), which are the backbone of intelligent transportation systems (ITSs), facilitate critical data exchanges between vehicles. This necessitates secure transmission, which requires guarantees of message availability, integrity, source authenticity, and user privacy. Moreover, the traceability of network participants is essential as it deters malicious actors and allows lawful authorities to identify message senders for accountability. This introduces a challenge: balancing privacy with traceability. Conditional privacy-preserving authentication (CPPA) schemes are designed to mitigate this conflict. CPPA schemes utilize cryptographic protocols, including certificate-based schemes, group signatures, identity-based schemes, and certificateless schemes. Due to the critical time constraints in VANETs, efficient batch verification techniques are crucial. Combining certificateless schemes with batch verification leads to certificateless aggregate signature (CLAS) schemes. In this paper, cryptanalysis of Xiong’s CLAS scheme revealed its vulnerabilities to partial key replacement and identity replacement attacks, alongside mathematical errors in the batch verification process. Our proposed CLAS scheme remedies these issues by incorporating an identity authentication module that leverages chameleon hashing within elliptic curve cryptography (CHAM-CLAS). The signature and verification modules are also redesigned to address the identified vulnerabilities in Xiong’s scheme. Additionally, we implemented the small exponents test within the batch verification module to achieve Type III security. While this enhances security, it introduces a slight performance trade-off. Our scheme has been subjected to formal security and performance analyses to ensure robustness. Full article
Show Figures

Figure 1

20 pages, 3488 KiB  
Article
Sea-Based UAV Network Resource Allocation Method Based on an Attention Mechanism
by Zhongyang Mao, Zhilin Zhang, Faping Lu, Yaozong Pan, Tianqi Zhang, Jiafang Kang, Zhiyong Zhao and Yang You
Electronics 2024, 13(18), 3686; https://doi.org/10.3390/electronics13183686 - 17 Sep 2024
Viewed by 370
Abstract
As humans continue to exploit the ocean, the number of UAV nodes at sea and the demand for their services are increasing. Given the dynamic nature of marine environments, traditional resource allocation methods lead to inefficient service transmission and ping-pong effects. This study [...] Read more.
As humans continue to exploit the ocean, the number of UAV nodes at sea and the demand for their services are increasing. Given the dynamic nature of marine environments, traditional resource allocation methods lead to inefficient service transmission and ping-pong effects. This study enhances the alignment between network resources and node services by introducing an attention mechanism and double deep Q-learning (DDQN) algorithm that optimizes the service-access strategy, curbs action outputs, and improves service-node compatibility, thereby constituting a novel method for UAV network resource allocation in marine environments. A selective suppression module minimizes the variability in action outputs, effectively mitigating the ping-pong effect, and an attention-aware module is designed to strengthen node-service compatibility, thereby significantly enhancing service transmission efficiency. Simulation results indicate that the proposed method boosts the number of completed services compared with the DDQN, soft actor–critic (SAC), and deep deterministic policy gradient (DDPG) algorithms and increases the total value of completed services. Full article
(This article belongs to the Special Issue Parallel, Distributed, Edge Computing in UAV Communication)
Show Figures

Figure 1

22 pages, 3897 KiB  
Article
Exploring Stakeholders in Elderly Community Retrofit Projects: A Tripartite Evolutionary Game Analysis
by Li Guo, Ren-Jye Dzeng, Shuya Hao, Chaojie Zhang, Shuang Zhang and Liyaning Tang
Sustainability 2024, 16(18), 8016; https://doi.org/10.3390/su16188016 - 13 Sep 2024
Viewed by 419
Abstract
Renovating aging housing is a critical project at the grassroots of social governance and a significant aspect of public welfare. However, renovation processes often encounter difficulties due to conflicts among muti-level stakeholders, influenced by multiple factors. This paper investigates the stakeholders involved in [...] Read more.
Renovating aging housing is a critical project at the grassroots of social governance and a significant aspect of public welfare. However, renovation processes often encounter difficulties due to conflicts among muti-level stakeholders, influenced by multiple factors. This paper investigates the stakeholders involved in Elderly Community Retrofit Projects (ECRPs), categorizing them into three primary groups: government organizations, renovation enterprises, and elderly families. Through the study of evolutionary game models, it was found that bounded rational actors continually adjust their optimal strategies in response to environmental changes. The government occupies a central role among stakeholders involved in ECRP. During renovation processes, governments and enterprises should provide elderly households with material or other welfare subsidies as much as possible to promote their active cooperation and participation. The integrity of enterprises is closely tied to the strength of governmental enforcement measures; hence, governments should establish a unified standard system, clarify regulatory content, and foster the orderly development of ECRPs. Full article
Show Figures

Figure 1

18 pages, 689 KiB  
Article
Research on Energy Scheduling Optimization Strategy with Compressed Air Energy Storage
by Rui Wang, Zhanqiang Zhang, Keqilao Meng, Pengbing Lei, Kuo Wang, Wenlu Yang, Yong Liu and Zhihua Lin
Sustainability 2024, 16(18), 8008; https://doi.org/10.3390/su16188008 - 13 Sep 2024
Viewed by 773
Abstract
Due to the volatility and intermittency of renewable energy, the integration of a large amount of renewable energy into the grid can have a significant impact on its stability and security. In this paper, we propose a tiered dispatching strategy for compressed air [...] Read more.
Due to the volatility and intermittency of renewable energy, the integration of a large amount of renewable energy into the grid can have a significant impact on its stability and security. In this paper, we propose a tiered dispatching strategy for compressed air energy storage (CAES) and utilize it to balance the power output of wind farms, achieving the intelligent dispatching of the source–storage–grid system. The Markov decision process framework is used to describe the energy dispatching problem of CAES through the Actor–Critic (AC) algorithm. To address the stability and low sampling efficiency issues of the AC algorithm in continuous action spaces, we employ the deep deterministic policy gradient (DDPG) algorithm, a model-free deep reinforcement learning algorithm based on deterministic policy. Furthermore, the use of Neuroevolution of Augmenting Topologies (NEAT) to improve DDPG can enhance the adaptability of the algorithm in complex environments and improve its performance. The results show that scheduling accuracy of the DDPG-NEAT algorithm reached 91.97%, which was 15.43% and 31.5% higher than the comparison with the SAC and DDPG algorithms, respectively. The algorithm exhibits excellent performance and stability in CAES energy dispatching. Full article
(This article belongs to the Section Energy Sustainability)
Show Figures

Figure 1

19 pages, 11959 KiB  
Article
Learning Autonomous Navigation in Unmapped and Unknown Environments
by Naifeng He, Zhong Yang, Chunguang Bu, Xiaoliang Fan, Jiying Wu, Yaoyu Sui and Wenqiang Que
Sensors 2024, 24(18), 5925; https://doi.org/10.3390/s24185925 - 12 Sep 2024
Viewed by 328
Abstract
Autonomous decision-making is a hallmark of intelligent mobile robots and an essential element of autonomous navigation. The challenge is to enable mobile robots to complete autonomous navigation tasks in environments with mapless or low-precision maps, relying solely on low-precision sensors. To address this, [...] Read more.
Autonomous decision-making is a hallmark of intelligent mobile robots and an essential element of autonomous navigation. The challenge is to enable mobile robots to complete autonomous navigation tasks in environments with mapless or low-precision maps, relying solely on low-precision sensors. To address this, we have proposed an innovative autonomous navigation algorithm called PEEMEF-DARC. This algorithm consists of three parts: Double Actors Regularized Critics (DARC), a priority-based excellence experience data collection mechanism, and a multi-source experience fusion strategy mechanism. The algorithm is capable of performing autonomous navigation tasks in unmapped and unknown environments without maps or prior knowledge. This algorithm enables autonomous navigation in unmapped and unknown environments without the need for maps or prior knowledge. Our enhanced algorithm improves the agent’s exploration capabilities and utilizes regularization to mitigate the overestimation of state-action values. Additionally, the priority-based excellence experience data collection module and the multi-source experience fusion strategy module significantly reduce training time. Experimental results demonstrate that the proposed method excels in navigating the unmapped and unknown, achieving effective navigation without relying on maps or precise localization. Full article
Show Figures

Figure 1

20 pages, 6757 KiB  
Article
A Task Offloading and Resource Allocation Strategy Based on Multi-Agent Reinforcement Learning in Mobile Edge Computing
by Guiwen Jiang, Rongxi Huang, Zhiming Bao and Gaocai Wang
Future Internet 2024, 16(9), 333; https://doi.org/10.3390/fi16090333 - 11 Sep 2024
Viewed by 555
Abstract
Task offloading and resource allocation is a research hotspot in cloud-edge collaborative computing. Many existing pieces of research adopted single-agent reinforcement learning to solve this problem, which has some defects such as low robustness, large decision space, and ignoring delayed rewards. In view [...] Read more.
Task offloading and resource allocation is a research hotspot in cloud-edge collaborative computing. Many existing pieces of research adopted single-agent reinforcement learning to solve this problem, which has some defects such as low robustness, large decision space, and ignoring delayed rewards. In view of the above deficiencies, this paper constructs a cloud-edge collaborative computing model, and related task queue, delay, and energy consumption model, and gives joint optimization problem modeling for task offloading and resource allocation with multiple constraints. Then, in order to solve the joint optimization problem, this paper designs a decentralized offloading and scheduling scheme based on “task-oriented” multi-agent reinforcement learning. In this scheme, we present information synchronization protocols and offloading scheduling rules and use edge servers as agents to construct a multi-agent system based on the Actor–Critic framework. In order to solve delayed rewards, this paper models the offloading and scheduling problem as a “task-oriented” Markov decision process. This process abandons the commonly used equidistant time slot model but uses dynamic and parallel slots in the step of task processing time. Finally, an offloading decision algorithm TOMAC-PPO is proposed. The algorithm applies the proximal policy optimization to the multi-agent system and combines the Transformer neural network model to realize the memory and prediction of network state information. Experimental results show that this algorithm has better convergence speed and can effectively reduce the service cost, energy consumption, and task drop rate under high load and high failure rates. For example, the proposed TOMAC-PPO can reduce the average cost by from 19.4% to 66.6% compared to other offloading schemes under the same network load. In addition, the drop rate of some baseline algorithms with 50 users can achieve 62.5% for critical tasks, while the proposed TOMAC-PPO only has 5.5%. Full article
(This article belongs to the Special Issue Convergence of Edge Computing and Next Generation Networking)
Show Figures

Figure 1

28 pages, 7828 KiB  
Article
Reinforcement Learning for Fair and Efficient Charging Coordination for Smart Grid
by Amr A. Elshazly, Mahmoud M. Badr, Mohamed Mahmoud, William Eberle, Maazen Alsabaan and Mohamed I. Ibrahem
Energies 2024, 17(18), 4557; https://doi.org/10.3390/en17184557 - 11 Sep 2024
Viewed by 776
Abstract
The integration of renewable energy sources, such as rooftop solar panels, into smart grids poses significant challenges for managing customer-side battery storage. In response, this paper introduces a novel reinforcement learning (RL) approach aimed at optimizing the coordination of these batteries. Our approach [...] Read more.
The integration of renewable energy sources, such as rooftop solar panels, into smart grids poses significant challenges for managing customer-side battery storage. In response, this paper introduces a novel reinforcement learning (RL) approach aimed at optimizing the coordination of these batteries. Our approach utilizes a single-agent, multi-environment RL system designed to balance power saving, customer satisfaction, and fairness in power distribution. The RL agent dynamically allocates charging power while accounting for individual battery levels and grid constraints, employing an actor–critic algorithm. The actor determines the optimal charging power based on real-time conditions, while the critic iteratively refines the policy to enhance overall performance. The key advantages of our approach include: (1) Adaptive Power Allocation: The RL agent effectively reduces overall power consumption by optimizing grid power allocation, leading to more efficient energy use. (2) Enhanced Customer Satisfaction: By increasing the total available power from the grid, our approach significantly reduces instances of battery levels falling below the critical state of charge (SoC), thereby improving customer satisfaction. (3) Fair Power Distribution: Fairness improvements are notable, with the highest fair reward rising by 173.7% across different scenarios, demonstrating the effectiveness of our method in minimizing discrepancies in power distribution. (4) Improved Total Reward: The total reward also shows a significant increase, up by 94.1%, highlighting the efficiency of our RL-based approach. Experimental results using a real-world dataset confirm that our RL approach markedly improves fairness, power efficiency, and customer satisfaction, underscoring its potential for optimizing smart grid operations and energy management systems. Full article
(This article belongs to the Section A1: Smart Grids and Microgrids)
Show Figures

Figure 1

Back to TopTop