U. Ahmad, V. Ivanov, “Automating high-quality concept banks: leveraging LLMs and multimodal evaluation metrics”, Computer Research and Modeling, 2024, Volume 16, Issue 7,Pages <nobr>1555

SPECIAL ISSUE

Automating high-quality concept banks: leveraging LLMs and multimodal evaluation metrics

U. Ahmad, V. Ivanov

Innopolis University, 1 Universitetskaya st., Innopolis, 420500, Russia

Abstract: Interpretability in recent deep learning models has become an epicenter of research particularly in sensitive domains such as healthcare, and finance. Concept bottleneck models have emerged as a promising approach for achieving transparency and interpretability by leveraging a set of humanunderstandable concepts as an intermediate representation before the prediction layer. However, manual concept annotation is discouraged due to the time and effort involved. Our work explores the potential of large language models (LLMs) for generating high-quality concept banks and proposes a multimodal evaluation metric to assess the quality of generated concepts. We investigate three key research questions: the ability of LLMs to generate concept banks comparable to existing knowledge bases like ConceptNet, the sufficiency of unimodal text-based semantic similarity for evaluating concept-class label associations, and the effectiveness of multimodal information in quantifying concept generation quality compared to unimodal concept-label semantic similarity. Our findings reveal that multimodal models outperform unimodal approaches in capturing concept-class label similarity. Furthermore, our generated concepts for the CIFAR-10 and CIFAR-100 datasets surpass those obtained from ConceptNet and the baseline comparison, demonstrating the standalone capability of LLMs in generating highquality concepts. Being able to automatically generate and evaluate high-quality concepts will enable researchers to quickly adapt and iterate to a newer dataset with little to no effort before they can feed that into concept bottleneck models.

Keywords: interpretability, large language models, concept bottleneck models, machine learning

UDC: 004.056

Received: 28.10.2024
Revised: 16.11.2024
Accepted: 25.11.2024

Language: English

DOI: 10.20537/2076-7633-2024-16-7-1555-1567