Abstract:
This paper describes an approach to the implementation of a system that would allow automatic database model generation from a natural language description given by the user. Different machine learning technique, such as transformer, named entity recognition and relation extraction are considered and applied. The implementation of the neural network model uses the capabilities of the spaCy framework to organize a generic pipeline for training. Off-the-shelf implementations of some individual components from spaCy are also used, while the rest are custom. Moreover, we describe the process of gathering and preparing raw data for training a neural network model, and generating a proper corpus from them. For this purpose, a specialized annotating tool, Doccano, is used, which satisfies all requirements and is freely available. Finally, the paper presents the model parameters used in training and the performance metrics obtained. We've been able to achieve great results for the named entity recognition component, while the performance metrics of the relation extraction component can still be improved. The paper concludes with possible directions for further work on the implementation of the described system, including the relation extraction component improvements and new features implementation.
Keywords:natural language processing, named entity recognition, relation extraction, text analysis, classification, relational databases, model building.