Abstract:
The use of multimodal data in emotion recognition systems has great potential for applications in various fields: healthcare, human-machine interfaces, operator monitoring, and marketing. Until recently, the development of emotion recognition systems based on multimodal data was constrained by insufficient computing power. However, with the advent of high-performance GPU-based systems and the development of efficient deep neural network architectures, there has been a surge of research aimed at using multiple modalities such as audio, video, and physiological signals to accurately detect human emotions. In addition, physiological data from wearable devices has become important due to the relative ease of its collection and the accuracy it enables. This paper discusses architectures and methods for applying deep neural networks to analyse multimodal data to improve the accuracy and reliability of emotion recognition systems, presenting current approaches to implementing such algorithms and existing open multimodal datasets.