RUS  ENG
Full version
JOURNALS // Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie // Archive

Vestnik YuUrGU. Ser. Mat. Model. Progr., 2025 Volume 18, Issue 2, Pages 102–111 (Mi vyuru762)

Programming & Computer Software

The impact of dataset size on the reliability of model testing and ranking

A. V. Chuikoa, V. V. Arlazarovab, S. A. Usilinab

a Federal Research Center “Computer Science and Control” RAS, Moscow, Russian Federation
b LLC “Smart Engines Service”, Moscow, Russian Federation

Abstract: Machine learning is widely applied across diverse domains, with research teams continually developing new recognition models that compete on open datasets. In some tasks, accuracy surpasses 99%, and the differences between top-performing models are often marginal, measured in hundredths of a percent. These minimal differences, combined with the varying size of the benchmark datasets, raise questions about the reliability of model evaluation and ranking. This paper introduces a method for determining the necessary dataset size to ensure robust hypothesis testing for model performance. It also examines the statistical significance of accuracy rankings in recent studies on MNIST, CIFAR-10, and CIFAR-100 datasets.

Keywords: dataset size, object recognition, statistical significance, model evaluation, recognition quality assessment.

UDC: 519.248

MSC: 62B15

Received: 24.12.2024

Language: English

DOI: 10.14529/mmp250209



© Steklov Math. Inst. of RAS, 2026