Abstract:
Analysis of data for the presence of effects of an unknown but present cause arises in many tasks and one of the examples given in the work is related to the search for signs of fraud among many recipients of a consumer loan in a bank. To build the initial data, a method was chosen in which signs of fraud appear in transactional activity after receiving a loan, namely, the signs are based on how the funds are withdrawn. The example given is a special case of situations when in a limited set of precedents of data having a large dimension, the effects of one cause are present and repeated. Under these conditions, the task of finding repetitions of consequences of effects is of great importance. An algorithm for such a search has been built, which has a complexity less than quadratic. The complexity of the constructed algorithm for finding all coincidences in $m$ ordered precedents does not exceed $mN$ where $N$ is the length of all precedents. Given the complexity of ordering each precedent when there is the initial ordering of the entire set of characteristics, the complexity of solving the problem does not exceed $m N \log_2 N$.
Keywords:complexity of the classification task, machine learning, cause and effect relationships, searching for coincidences in the sequence of sets.