Abstract:
We present JDCEmb – a new framework for training universal vector representations in goal-oriented dialogue tasks. Text encoders play a crucial role in such systems, and their quality determines the effectiveness of dialogue systems. Modern approaches to training dialogue encoders often rely on contrastive methods, which improve the distinguishability of representations but are sensitive to the selection of positive and negative pairs. This can lead to loss of important semantic information. Knowledge distillation-based methods, on the other hand, transfer more context but struggle to distinguish similar utterances and perform poorly with subtle semantic differences. JDCEmb combines the strengths of both approaches using a teacher-student architecture, where the student model is trained contrastively and aligned with the teacher model's vector representations simultaneously. This combination allows maintaining semantic richness while enhancing the distinctiveness of vector representations-crucial for dialogue systems. Experimental results on key dialogue tasks demonstrate the effectiveness of the approach: JDCEmb consistently reaches or surpasses state-of-the-art levels, outperforming strong current baseline models.