RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2026 Issue 25, volume 1, Pages 234–261 (Mi trspy1417)

Artificial Intelligence, Knowledge and Data Engineering

Research on reinforcement learning algorithms for network latency reduction in edge computing

I. Filianina, A. Kapitonovb, A. Timoshchuk-Bondara

a ITMO University
b New Uzbekistan University

Abstract: Current research on decision-making algorithms in multi-access edge computing (MEC) for resource allocation often relies on simplified network topology abstractions, which limits the applicability of the results in real-world mobile network operations. This work aims to develop a realistic cellular network model using stochastic geometry methods and to comprehensively evaluate the effectiveness of modern reinforcement learning algorithms in minimizing network latency in edge computing. To create a mathematically sound model of the network environment, we used stochastic geometry methods combined with real statistical data on cellular user distribution. Applying stochastic geometry ensured accurate modeling of the spatial placement of base stations and the calculation of inter-node distances, which are critically important for determining network latency. Experimental evaluation was conducted on a refined Lightweight MEC Platform Simulator (LWMECPS) with an extended Gymnasium API, supporting Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC) algorithms. We developed a communication network model that considers the realistic spatial distribution of network elements and the temporal dynamics of user load. Based on this model, a virtualized test environment was created in LWMECPS, allowing for reproducible experiments with controllable parameters. Experimental results revealed distinct performance characteristics across the algorithms: PPO achieved a consistent latency reduction of up to 20% with stable convergence; SAC demonstrated the highest absolute improvement (a latency reduction of 38%) but exhibited initialization instability; TD3 showed moderate effectiveness (an improvement of up to 11%) but high sensitivity to hyperparameter tuning. The comparative analysis of reinforcement learning algorithms revealed key features of their application in MEC systems. The discrete nature of service placement tasks makes PPO the most suitable for practical implementation due to its convergence stability and natural support for discrete action spaces, despite SAC achieving higher peak performance. While SAC’s superior absolute results are promising, its initialization challenges and original design for continuous action spaces require additional consideration for MEC deployment. The obtained results provide scientifically sound recommendations for MEC platform developers regarding the selection of optimal algorithmic solutions based on specific system requirements and constraints.

Keywords: reinforcement learning, multi-access edge computing (MEC), proximal policy optimization (PPO), soft actor-critic (SAC), twin delayed deep deterministic policy gradient (TD3), LWMECPS, Weights & Biases (WandB).

UDC: 006.72

Received: 10.08.2025

Language: English

DOI: 10.15622/ia.25.1.8



© Steklov Math. Inst. of RAS, 2026