Abstract:
Fine-tuning of large language models (LLMs) is widely used in contemporary deployment and development of LLMs, enabling the adaptation of pre-trained models to specific tasks with limited labeled data. Traditional first-order stochastic optimization methods, such as SGD and Adam, although widely applied in practice, do not always guarantee optimal convergence. Currently, matrix-oriented optimization algorithms are actively being developed, surpassing classical methods by better exploiting the internal structure of model parameters. One such method is Muon, which projects gradients onto the space of semi-orthogonal matrices, providing stable and rapid convergence with reduced sensitivity to hyperparameters. To further reduce memory requirements, zero-order algorithms are considered, which estimate gradients solely through forward passes without employing backpropagation. This work focuses on the study of orthogonal matrix sampling methods for the Muon algorithm within the framework of zero-order optimization during LLM fine-tuning. Various sampling strategies are compared, and their impact on fine-tuning quality and computational efficiency is evaluated. The results of this study can inform future research on zero-order optimization methods in the context of LLM fine-tuning.
Keywords:large language models, fine-tuning, zero-order optimization, matrix optimization.