RUS  ENG
Full version
JOURNALS // Trudy Instituta Matematiki i Mekhaniki UrO RAN // Archive

Trudy Inst. Mat. i Mekh. UrO RAN, 2025 Volume 31, Number 3, Pages 150–166 (Mi timm2202)

Minimax approach to the Gaussian multi-armed bandit

A. V. Kolnogorov

Yaroslav-the-Wise Novgorod State University

Abstract: We consider the multi-armed bandit problem in an application to batch processing of big data if there are more than two alternative processing methods with different a priori unknown efficiencies. During processing, it is necessary to determine a more effective method and ensure its preferential use. Control is performed on the basis of cumulative incomes in batches, which, by virtue of the central limit theorem, have approximately Gaussian distributions. An important feature of batch processing is that it almost does not lead to an increase in maximum losses of the total expected income, i.e., to an increase in minimax risk if the numbers of processed data and batches into which the data is divided are large enough. This means that Gaussian multi-armed bandit provides universal approach to optimal control of big data when one-step incomes satisfy the central limit theorem. According to this approach, minimax strategy and risk are sought using the main theorem of game theory as Bayesian ones calculated relative to the worst-case prior distribution at which the Bayesian risk is maximal. For this purpose, the characterization of the worst-case prior distribution is given and recursive equations in a usual and invariant form with a control horizon equal to one are obtained. In the limiting case when the number of processed batches grows infinitely, a second-order partial differential equation is obtained. A numerical example of finding minimax risk and strategy for a 3-armed bandit is given.

Keywords: Gaussian multi-armed bandit, Game with nature, Bayesian and minimax approaches, Main theorem of game theory, Batch processing.

UDC: 519.83 + 519.244

MSC: 91A35, 62L05, 62C10

Received: 12.05.2025
Revised: 16.06.2025
Accepted: 23.06.2025

DOI: 10.21538/0134-4889-2025-31-3-fon-06



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026