Abstract:
A method for compiling a list of words recommended for expanding the RuSentiLex sentiment dictionary (developed by N. Lukashevich) is proposed. Words included in the list are determined by classification based on an algorithm that uses semantic similarity to words from RuSentiLex. This semantic similarity is determined based on statistics of co-occurrence in groups of semantically similar terms, which in turn are determined by the Word2Vec neural network. A coefficient of sentiment consistency is proposed that orders the recommended list of words according to the degree of their confirmation by associative links in the neural network. The accuracy of the proposed classification algorithm was assessed using the cross-validation method and was 98% correct in determining the positive/negative sentiment of a word. A list of 6061 words recommended for expansion was proposed. When comparing these recommended words with the KartaSlovSent dictionary, 1909 common words were found, 94.7% of which had matching sentiments.