Redundantization of interdependent program modules for parallel control computing systems: Organization, estimation of fault-tolerance, formalized description
Abstract:
Using the formalized descriptions and mathematical models of computing processes, fault-tolerance of the parallel control computing system was determined analytically as the probability of successful completion of an arbitrary complex of interdependent program modules in the user-defined schedule time under random times of realization of these modules in the cases of both synchronous and asynchronous their redundantization. The structure of a unified diagnostic program unit for determination of the coordinate of a single error (processor failure or fault) of the system computing resources and identification of the program module with distorted results of its execution was formally substantiated, designed, and described in logical terms for the first time.