RUS  ENG
Full version
JOURNALS // Matematicheskaya Biologiya i Bioinformatika // Archive

Mat. Biolog. Bioinform., 2017 Volume 12, Issue 2, Pages 547–558 (Mi mbb312)

This article is cited in 8 papers

Bioinformatics

Short unique sequences in bacterial genomes as strain- and species-specific signatures

V. V. Panyukovab, S. S. Kiselevc, O. V. Alikinac, N. N. Nazipovaab, O. N. Ozolinecb

a 142290, Institute of Mathematical Problems of Biology – the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
b 142290, Pushchino Research Center of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
c 142290, Institute of Cell Biophysics of Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation

Abstract: The paper presents a new approach for phylotyping that can be potentially used for pure cultures and for mixed bacterial populations. It is based on the use of short unique nucleotide sequences ($k$-mers) that are present in the genomes of all strains of the same species and are absent in bacterial genomes of other taxonomic groups. We show that the number $N$ of such sequences depends on the percentage bias towards $\mathrm{A/T}$ or $\mathrm{G/C}$ base pairs, increasing for genomes with approximately equal composition. We found that the largest contribution to the set of primarily unique sequences is given by $16$$17$-mers, while sigmoidal curves reflecting the dependence of $N$ on the length of $k$-mers showed the maximum slope increment ($\Delta N/\Delta k$) for $k = 17, 18$. Unique sequences of the length $16$$18$ bases can therefore be offered as potential markers. Comparing the sets of unique $k$-mers in the genomes of four Enterobacter strains, we estimated the level of their intraspecies stability and interspecies plasticity. As a result, we suggest discriminatory subsets as stencils for phylotyping, thereby increasing the list of genotyping markers with signatures of the new type.

Key words: microbiomes, bacterial genomes, genotyping, unique nucleotide sequences.

UDC: 579:252

Received 25.11.2017, Published 19.12.2017

Language: English

DOI: 10.17537/2017.12.547



© Steklov Math. Inst. of RAS, 2026