RUS  ENG
Full version
JOURNALS // Program Systems: Theory and Applications // Archive

Program Systems: Theory and Applications, 2025 Volume 16, Issue 4, Pages 267–285 (Mi ps483)

Hardware, software and distributed supercomputer systems

Using multilevel data sources to prepare training sets for cyberattack detection

D. D. Kononov, S. V. Isaev

Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences, Krasnoyarsk, Russia

Abstract: Network traffic analysis is an integral part of ensuring security in information and telecommunication systems. The use of machine learning provides modern approaches with higher detection rates for cyber threats.
A new approach for generating training datasets is proposed, which introduces a new aggregation unit “session”, utilizes signature analysis and multi-level data sources, including heterogeneous ones. A list of requirements for the datasets is generated, which includes preserving the first packets of the connection, preserving hidden areas of the packets, extended information about traffic sources (country, autonomous system number ASN). The additional information will allow to detect attacks of the “hidden communication channel” type. Using the proposed approach, a software package for creating training datasets from multilevel sources at the L7, L4, L3 levels of the OSI model has been developed. In contrast to existing works, real data of network activity as well as long time intervals are used. The proposed approach allows to use the obtained training sets to create more effective methods of intrusion detection and prevention using machine learning techniques.

Key words and phrases: Internet, network security, cyber threats, network traffic analysis, datasets, machine learning.

UDC: 004.89+004.056
BBK: 32.972.1

MSC: Primary 68M25; Secondary 68-11, 62N86

Received: 10.07.2025
Accepted: 03.10.2025

DOI: 10.25209/2079-3316-2025-16-4-267-285



© Steklov Math. Inst. of RAS, 2026