RUS  ENG
Full version
JOURNALS // Informatsionnye Tekhnologii i Vychslitel'nye Sistemy // Archive

Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2022 Issue 1, Pages 40–46 (Mi itvs757)

DATA PROCESSING AND ANALYSIS

Detection of pauses between word fragments of speech recordings

E. G. Zhilyakova, S. P. Belovb, A. S. Belovb, A. A. Medvedevaa

a Federal State Autonomous Educational Institution of Higher Education "Belgorod State National Research University", Belgorod, Russia
b Autonomous non-profit organization of higher education "Belgorod University of Cooperation, Economics and Law", Belgorod, Russia

Abstract: The paper considers the problem of segmentation of recordings of speech signals into segments generated in the presence of speech (word segments), and the pauses between them. This segmentation is an important stage in the identification of speech components based on some features. It is assumed that the segments of the speech signal in pauses of speech are samples from a stationary sequence of samples (noise in pauses). As the main characteristic of noises in pauses, it is proposed to use estimates from the training sample of the mathematical expectations of the energy parts of their segments of a certain finite duration in predetermined frequency bands (subband analysis). It is shown that the use of the maximum ratio of the energy parts of the current analyzed segment to the corresponding mathematical expectations segments of noise allows you to take into account the possible presence of a speech component to the maximum extent. This effect is equivalent to maximizing the signal-to-noise ratio, that is, the proposed decision function is optimal in this sense.

Keywords: segmentation of speech recordings, subband analysis, optimal decision function.

DOI: 10.14357/20718632220105



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026