RUS  ENG
Full version
JOURNALS // Vestnik of Astrakhan State Technical University. Series: Management, Computer Sciences and Informatics // Archive

Vestn. Astrakhan State Technical Univ. Ser. Management, Computer Sciences and Informatics, 2012 Number 1, Pages 136–141 (Mi vagtu45)

COMPUTER SOFTWARE AND COMPUTING EQUIPMENT

Principles of construction of the multidimensional space of terms in the analysis of object-oriented collection of documents

R. V. Khrunichev

Ryazan State of Radio Engineering University

Abstract: The paper considers the problem of information retrieval in object-oriented collection of documents, the possibility of searching for documents by means of the application of the modified search model, based on the vector model. Modernization of the vector model is the ability to use object-oriented glossary of terms at the stage of preliminary processing of the text, allowing to reduce the number of terms for subsequent frequency analysis of the text. Zipf's rule and the principle of Luhn, used during the frequency analysis, can also significantly reduce the number of analyzed terms. The paper shows the principle of construction of the multidimensional space of terms, based on the vectors that describe the document. The principles of these vectors formation are given. The article also lists the advantages of the object-oriented vocabulary application in the process of constructing the space of terms, consisting in the possibility of separating of composite terms, and through this, more accurate positioning of the document in its issue upon request.

Keywords: object-oriented collection of documents, frequency analysis of the text, data warehouse, space of terms.

UDC: 002.513.5

Received: 30.11.2011
Revised: 19.12.2011



© Steklov Math. Inst. of RAS, 2026