Abstract:
The paper presents the first mathematically rigorous multimodal explainability system for three-channel physiological signals (Electrocardiogram (ECG), Photoplethysmogram (PPG), Arterial Blood Pressure (ABP)) in distinguishing true from false ventricular tachycardia (VT) alarms in intensive care units (ICUs). A novel explanation consistency metric, Coherence, based on temporal attributions from Integrated Gradients between modalities, is introduced with theoretical justification of its connection to local surrogate stability. The developed ResNetFusionClassifier architecture with an adaptive attention mechanism provides specialized processing for each modality followed by intelligent feature fusion. Experimental validation on the extended VTaC dataset (1,247 episodes from 982 patients) [clifford2016false] demonstrated Accuracy 0.873, F1-score 0.873, AUC-ROC 0.926, with a statistically significant difference in the Coherence metric between true and false alarms ($p < 0.001$). Practical application of the detection system demonstrated high recall for critical cases (Recall = 0.878) alongside a significant reduction in false alarms, confirming the clinical applicability of the developed approach for addressing the problem of "alarm fatigue" in ICUs.