RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2025 Issue 24, volume 6, Pages 1683–1720 (Mi trspy1403)

Artificial Intelligence, Knowledge and Data Engineering

Analytical review of speech and multimodal methods for cognitive impairments recognition in people

M. Dolgushin, A. Karpov

St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)

Abstract: Over the past decade, there has been a noticeable increase in the number of scientific, technical, and medical publications dedicated to the automatic detection of cognitive impairments in humans based on speech and visual data. These impairments are often associated with neurodegenerative diseases such as dementia, Alzheimer’s disease, Parkinson’s disease, and other disorders. Despite the high prevalence of these conditions and their significant contribution to mortality and early disability, effective treatment options remain unavailable or severely limited in current medical practice. Consequently, early diagnosis and symptom alleviation have become areas of considerable research interest. Current studies focus on the development of automated and automatic systems based on quantitative and objective methods, neural network approaches, the integration of various modalities, and explainable artificial intelligence techniques. This paper presents a comprehensive review and analysis of key studies published since 2022 that address the automatic detection of cognitive impairments using unimodal and multimodal approaches. The review includes the most commonly used multimodal datasets in this domain, such as ADReSS, ADReSSo, and TAUKADIAL. It discusses state-of-the-art methods for detecting cognitive impairments from various modalities, including those presented in international competitions such as TAUKADIAL-2024, as well as methods developed outside of such events. According to competition results, the most effective approaches to recognizing cognitive impairments are ensemble probabilistic models trained on explainable hand-crafted features and neural features extracted from text and audio data. The review also explores multimodal approaches that incorporate visual modalities for training deep neural networks. A new direction in the field is examined, namely, the applicability of large language models to the analysis of medical texts and interpretable disease prediction. The paper systematizes methods for extracting informative features and the classifiers employed. Based on the review, key requirements for systems aimed at the automated detection of cognitive impairments are formulated.

Keywords: automatic detection of cognitive impairment, speech technologies in healthcare, explainable artificial intelligence, machine learning.

UDC: 004.934

Received: 14.04.2025

DOI: 10.15622/ia.24.6.6



© Steklov Math. Inst. of RAS, 2026