RUS  ENG
Full version
JOURNALS // Matematicheskaya Biologiya i Bioinformatika // Archive

Mat. Biolog. Bioinform., 2016 Volume 11, Issue 1, Pages 14–23 (Mi mbb248)

This article is cited in 1 paper

Bioinformatics

Number of overlaps in patterns

E. I. Furletovaa, M. A. Roytbergabc

a Institute of Mathematical Problems of Biology, Russian Academy of Science, Pushchino, Moscow Region, Russia
b Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
c Higher School of Economics, Moscow, Russia

Abstract: The aim of the paper is to estimate the number of overlaps in the given pattern. The pattern is a set of words of same length $m$ in an alphabet $A$. We present theoretical and experimental bounds for overlaps number in two types of patterns. Firstly, we considered random patterns which relate to uniform probability model, i.e. all letters in the alphabet and, correspondently, all words of same length are equiprobable. We proved that the average number of overlaps $P$ for random patterns consisting of $n$ words of length $m$ linearly depends on pattern size $n$ and is independent of length of pattern words. In performed computer experiments the ratio $P/n$ ranged from $0.33$ till $1.06$; the theoretical evaluations of the ratio for the patterns do not exceed $1.67$. The secondly, we studied the patterns described by position weight matrices (PWM) from the data base HOCOMOCO and various cut-offs. For such patterns the ratio $P/n$ in experiments ranged from $0.004$ till $1$, for most of the patterns it is smaller then $0.1$.

Key words: overlap, pattern, pattern occurrence in a sequence.

UDC: 510.52:519.21

Received 19.11.2015, Published 27.01.2016

DOI: 10.17537/2016.11.14



© Steklov Math. Inst. of RAS, 2026