RUS  ENG
Full version
JOURNALS // Computer Research and Modeling // Archive

Computer Research and Modeling, 2023 Volume 15, Issue 1, Pages 45–56 (Mi crm1044)

NUMERICAL METHODS AND THE BASIS FOR THEIR APPLICATION

Modern ways to overcome neural networks catastrophic forgetting and empirical investigations on their structural issues

A. A. Kutaleva, A. A. Lapinab

a PJSC Sberbank, 32 Kutuzovskiy ave., Moscow, 121170, Russia
b MY.GAMES, 39/79 Leningradskiy ave., Moscow, 125167, Russia

Abstract: This paper presents the results of experimental validation of some structural issues concerning the practical use of methods to overcome catastrophic forgetting of neural networks. A comparison of current effective methods like EWC (Elastic Weight Consolidation) and WVA (Weight Velocity Attenuation) is made and their advantages and disadvantages are considered. It is shown that EWC is better for tasks where full retention of learned skills is required on all the tasks in the training queue, while WVA is more suitable for sequential tasks with very limited computational resources, or when reuse of representations and acceleration of learning from task to task is required rather than exact retention of the skills. The attenuation of the WVA method must be applied to the optimization step, i. e. to the increments of neural network weights, rather than to the loss function gradient itself, and this is true for any gradient optimization method except the simplest stochastic gradient descent (SGD). The choice of the optimal weights attenuation function between the hyperbolic function and the exponent is considered. It is shown that hyperbolic attenuation is preferable because, despite comparable quality at optimal values of the hyperparameter of the WVA method, it is more robust to hyperparameter deviations from the optimal value (this hyperparameter in the WVA method provides a balance between preservation of old skills and learning a new skill). Empirical observations are presented that support the hypothesis that the optimal value of this hyperparameter does not depend on the number of tasks in the sequential learning queue. And, consequently, this hyperparameter can be picked up on a small number of tasks and used on longer sequences.

Keywords: catastrophic forgetting, elastic weight consolidation, EWC, weight velocity attenuation, WVA, neural networks, continual learning, machine learning, artificial intelligence.

UDC: 004.853

Received: 12.10.2022
Revised: 14.12.2022
Accepted: 24.12.2022

DOI: 10.20537/2076-7633-2023-15-1-45-56



© Steklov Math. Inst. of RAS, 2026