Abstract:
At present, Data Mining expands usage of statistical machine learning methods. The similarity-based
approach uses the probabilistic combinatorial formal method (VKF (variational Kalman filter) method). The main
algorithm is based on a coupling Markov chain. The authors propose a mechanism to convert lengths of preliminary
trajectories (before coalescence) to an upper bound on which it is necessary to stop excessively long successive
runs. The main result claims that the change of probabilities is exponentially small with respect to total variation
distance, if the chain uses sufficient number of preliminary runs. This proposal may be useful when there exists
a small fraction of long trajectories with respect to the rest, because it provides a balance between the size of the
bound and changes of probabilities.
Keywords:similarity; Markov chain; VKF candidate; total variation; coupling.