Abstract:
Design patterns play a key role in software development, representing best practices that enhance code maintainability and understandability. Their identification in source code is essential for the analysis and maintenance of legacy systems. Modern large language models (LLMs) trained on code introduce new approaches to the automatic detection of design patterns. However, the impact of different code
representations on classification accuracy using LLMs remains underexplored. This study evaluates the performance of classifiers trained on embeddings generated by CodeT5, DeepSeek-Coder, and LLaMA (7B and 13B), using the DPD-Att dataset (14 categories, including “Unknown”). CodeT5 embeddings yield the highest and most stable results (up to 85% accuracy), while DeepSeek-Coder and LLaMA demonstrate competitive but less consistent performance.