Abstract:
Running different genome assemblers or one genome assembler with different parameters on the same input data commonly leads to a great variety of results. However, there is no generally recognized method for choosing the best assembly. This article introduces a new reference-free method based on Jellyfish software for evaluating genome assembly by kmers frequencies analysis. The proposed method sets up a correspondence between short reads obtained from sequencer and assembled genome, which allows a more accurate genome assembly assessing. The method was validated on different assemblies of Encephalitozoon cuniculi fungus organism. It was found that in most cases it correlates with reference-dependent metrics and could correctly identify the best assembly. Furthermore, an interconnection between assembly quality and standard reference-free metrics was not observed.