Abstract:
The paper considers the problem of segmentation of recordings of speech signals into segments generated in the presence of speech (word segments), and the pauses between them. This segmentation is an important stage in the identification of speech components based on some features. It is assumed that the segments of the speech signal in pauses of speech are samples from a stationary sequence of samples (noise in pauses). As the main characteristic of noises in pauses, it is proposed to use estimates from the training sample of the mathematical expectations of the energy parts of their segments of a certain finite duration in predetermined frequency bands (subband analysis). It is shown that the use of the maximum ratio of the energy parts of the current analyzed segment to the corresponding mathematical expectations segments of noise allows you to take into account the possible presence of a speech component to the maximum extent. This effect is equivalent to maximizing the signal-to-noise ratio, that is, the proposed decision function is optimal in this sense.
Keywords:segmentation of speech recordings, subband analysis, optimal decision function.