Abstract:
The paper presents an effective method for topically similar document retrieval. The exploratory patent
search based on this method is proposed. The developed method reduces complexity and time of patent expertise
providing the computer assistance of patent search and analysis. The phrases extracted by the parser as well as single
lexemes are used as descriptors for a document. This approach prevents exponential growth of the feature space
and provides effective indexing even for large text collections. The results of experiments show that the proposed
method significantly outperforms the basic keyword-based approach. Conclusions are made about the prospects of
using the method for solving other problems such as source retrieval for plagiarism detection and full-text clustering.
Keywords:exploratory search; patent search; topic modeling; topically similar document retrieval; search and
analytical engines.