RUS  ENG
Full version
JOURNALS // Computer Research and Modeling // Archive

Computer Research and Modeling, 2015 Volume 7, Issue 2, Pages 329–345 (Mi crm191)

This article is cited in 2 papers

MODELS IN PHYSICS AND TECHNOLOGY

An efficient algorithm for latex documents comparing

K. V. Chuvilin

Moscow Institute of Physics and Technology (SU), 9 Institutskii per., Dolgoprudny, Moscow Region, 141700, Russia

Abstract: The problem is constructing the differences that arise on LATEX documents editing. Each document is represented as a parse tree whose nodes are called tokens. The smallest possible text representation of the document that does not change the syntax tree is constructed. All of the text is splitted into fragments whose boundaries correspond to tokens. A map of the initial text fragment sequence to the similar sequence of the edited document corresponding to the minimum distance is built with Hirschberg algorithm A map of text characters corresponding to the text fragment sequences map is cunstructed. Tokens, that chars are all deleted, or all inserted, or all not changed, are selected in the parse trees. The map for the trees formed with other tokens is built using Zhang-Shasha algorithm.

Keywords: automation, editing distance, text analysis, lexeme, machine learning, metric, parse tree, syntax tree, token, LATEX.

UDC: 519.226

Received: 16.07.2013
Revised: 04.02.2015

DOI: 10.20537/2076-7633-2015-7-2-329-345



© Steklov Math. Inst. of RAS, 2026