Abstract:
In this article formulates the problem of simultaneous selection of both responses and explanatory variables in multivariate linear regressions. This problem is called “key responses and relevant features selection”. The ordinary least squares method is used to estimate regressions. First, the problem of selecting a given number of key responses and relevant features by the criterion of the maximum sum of the regression determination coefficients was reduced to a mixed 0–1 integer linear programming problem. Then, restrictions on the signs of the estimates were introduced into it, which made it possible to select optimal structures of multivariate regressions. After that, restrictions on the absolute contributions of regressors to the overall determinations were added, which allows controlling the number of explanatory variables. When conducting computational experiments on real data with a fixed number of key responses, the time required to construct multivariate models using the proposed method was approximately 67.3 times less than the time required to construct them using the generating all subsets method. At the same time, tightening the restrictions on the absolute contributions of regressors further reduced the time required to solve problems.
Keywords:multivariate linear regression, ordinary least squares, coefficient of determination, key responses and relevant features selection, mixed 0–1 integer linear programming problem, absolute contribution of a variable to determination, generating all subsets method.