RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2024 Volume 36, Issue 6, Pages 7–18 (Mi tisp936)

Improving estimation models by merging independent data sources

F. Valdés-Souto, J. Valeriano-Assem

National Autonomous University of Mexico

Abstract: Software cost/effort estimation has been a key research topic for over six decades due to its industry impact. Despite numerous models, regression-based approaches dominate the literature. Challenges include insufficient datasets with enough data points and arbitrary integration of different source databases. This study proposes using the Kruskal-Wallis test to validate the integration of distinct source databases, aiming to avoid mixing unrelated data, increase data points, and enhance estimation models. A case study was conducted with data from an international company's Mexico office, which provides software development for "Microservices and APIs". Data from 2020 were analyzed. The estimation model's quality improved significantly. MMRE decreased by 25.4% (from 78.6% to 53.2%), standard deviation dropped by 97.2% (from 149.7% to 52.5%), and the Pred (25%) indicator rose by 3.2 percentage points. The number of data points increased, and linear regression constraints were met. The Kruskal-Wallis test effectively improved the estimation models by validating database integration.

Keywords: linear regression model, software estimation, effort estimation, cost estimation, functional size, COSMIC method, Kruskal-Wallis

Language: English

DOI: 10.15514/ISPRAS-2024-36(6)-1



© Steklov Math. Inst. of RAS, 2026