Abstract:
Due to the increasing heterogeneity of data models and schemas in the modern world, robust data integration is a high-priority issue. Data integration systems have been extensively deployed across various domains, including astronomy, land use management, and materials science. However, data integration programs can be very complicated. Thus, formal verification of their correctness has emerged as an important task.
In this article, an approach to verify the correctness of data integration in an integrated system of databases on the properties of inorganic substances and materials is considered. The system, developed at the A.A. Baikov Institute of Metallurgy and Materials Science of the Russian Academy of Sciences, employs a two-stage data integration process: during the first stage, the source data marked for deletion, modification, or insertion are converted into an intermediate XML representation; in the second stage, the system invokes the corresponding procedures for XML elements in the target integrated database and updates it accordingly. The data integration programs are implemented by combining an imperative programming language with a declarative language of relational databases. Verification is performed by defining the semantics of the data schemas and data integration programs in a formal specification language and proving the correctness of data integration using automated provers.
Keywords:data integration, verification, program semantics, proof of correctness.