Abstract:
The article addresses issues that arise when forming a training dataset for machine learning models and improving uncharacteristic data methods for refining and filtering labels are proposed to improve the quality of the training dataset. The method for refining label certainty provides a more accurate correspondence between labels and objects, the filtering method identifies and eliminates erroneous or anomalous labels, which leads to the exclusion or correction of the corresponding objects. The effectiveness of the proposed methods is confirmed by examples of early detection of breast cancer with radiothermometry and image classification using a standard dataset.
Keywords:machine learning, training dataset, uncharacteristic labels, microwave radiothermometry, breast cancer.