Abstract:
We consider the multi-armed bandit problem in an application to batch processing of big data if there are more than two alternative processing methods with different a priori unknown efficiencies. During processing, it is necessary to determine a more effective method and ensure its preferential use. Control is performed on the basis of cumulative incomes in batches, which, by virtue of the central limit theorem, have approximately Gaussian distributions. An important feature of batch processing is that it almost does not lead to an increase in maximum losses of the total expected income, i.e., to an increase in minimax risk if the numbers of processed data and batches into which the data is divided are large enough. This means that Gaussian multi-armed bandit provides universal approach to optimal control of big data when one-step incomes satisfy the central limit theorem. According to this approach, minimax strategy and risk are sought using the main theorem of game theory as Bayesian ones calculated relative to the worst-case prior distribution at which the Bayesian risk is maximal. For this purpose, the characterization of the worst-case prior distribution is given and recursive equations in a usual and invariant form with a control horizon equal to one are obtained. In the limiting case when the number of processed batches grows infinitely, a second-order partial differential equation is obtained. A numerical example of finding minimax risk and strategy for a 3-armed bandit is given.
Keywords:Gaussian multi-armed bandit, Game with nature, Bayesian and minimax approaches, Main theorem of game theory, Batch processing.