Abstract:
The paper considers the problem of information retrieval in object-oriented collection of documents, the possibility of searching for documents by means of the application of the modified search model, based on the vector model. Modernization of the vector model is the ability to use object-oriented glossary of terms at the stage of preliminary processing of the text, allowing to reduce the number of terms for subsequent frequency analysis of the text. Zipf's rule and the principle of Luhn, used during the frequency analysis, can also significantly reduce the number of analyzed terms. The paper shows the principle of construction of the multidimensional space of terms, based on the vectors that describe the document. The principles of these vectors formation are given. The article also lists the advantages of the object-oriented vocabulary application in the process of constructing the space of terms, consisting in the possibility of separating of composite terms, and through this, more accurate positioning of the document in its issue upon request.
Keywords:object-oriented collection of documents, frequency analysis of the text, data warehouse, space of terms.