Abstract:
The paper presents a new approach for phylotyping that can be potentially used for pure cultures and for mixed bacterial populations. It is based on the use of short unique nucleotide sequences ($k$-mers) that are present in the genomes of all strains of the same species and are absent in bacterial genomes of other taxonomic groups. We show that the number $N$ of such sequences depends on the percentage bias towards $\mathrm{A/T}$ or $\mathrm{G/C}$ base pairs, increasing for genomes with approximately equal composition. We found that the largest contribution to the set of primarily unique sequences is given by $16$–$17$-mers, while sigmoidal curves reflecting the dependence of $N$ on the length of $k$-mers showed the maximum slope increment ($\Delta N/\Delta k$) for $k = 17, 18$. Unique sequences of the length $16$–$18$ bases can therefore be offered as potential markers. Comparing the sets of unique $k$-mers in the genomes of four Enterobacter strains, we estimated the level of their intraspecies stability and interspecies plasticity. As a result, we suggest discriminatory subsets as stencils for phylotyping, thereby increasing the list of genotyping markers with signatures of the new type.