RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 217–228 (Mi danma680)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Sampling of semi-orthogonal matrices for the Muon algorithm

E. D. Petrovab, G. V. Evseeva, A. V. Antonovab, A. S. Veprikovacd, N. A. Bushkovab, S. V. Moiseevb, A. N. Beznosikovacd

a Moscow Institute of Physics and Technology, Moscow, Russia
b T-technologies, Moscow
c Ivannikov Institute for System Programming of the RAS
d Innopolis University

Abstract: Fine-tuning of large language models (LLMs) is widely used in contemporary deployment and development of LLMs, enabling the adaptation of pre-trained models to specific tasks with limited labeled data. Traditional first-order stochastic optimization methods, such as SGD and Adam, although widely applied in practice, do not always guarantee optimal convergence. Currently, matrix-oriented optimization algorithms are actively being developed, surpassing classical methods by better exploiting the internal structure of model parameters. One such method is Muon, which projects gradients onto the space of semi-orthogonal matrices, providing stable and rapid convergence with reduced sensitivity to hyperparameters. To further reduce memory requirements, zero-order algorithms are considered, which estimate gradients solely through forward passes without employing backpropagation. This work focuses on the study of orthogonal matrix sampling methods for the Muon algorithm within the framework of zero-order optimization during LLM fine-tuning. Various sampling strategies are compared, and their impact on fine-tuning quality and computational efficiency is evaluated. The results of this study can inform future research on zero-order optimization methods in the context of LLM fine-tuning.

Keywords: large language models, fine-tuning, zero-order optimization, matrix optimization.

UDC: 004.8

Received: 21.08.2025
Accepted: 22.09.2025

DOI: 10.7868/S2686954325070185



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026