This article is cited in
19 papers
Information Theory
Comparison of contraction coefficients for $f$-divergences
A. Makur,
L. Zheng Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology, Cambridge, MA, USA
Abstract:
Contraction coefficients are distribution dependent constants that are used to
sharpen standard data processing inequalities for
$f$-divergences (or relative
$f$-entropies) and
produce so-called “strong” data processing inequalities. For any bivariate joint distribution,
i.e., any probability vector and stochastic matrix pair, it is known that contraction coefficients
for
$f$-divergences are upper bounded by unity and lower bounded by the contraction coefficient for
$\chi^2$-divergence. In this paper, we elucidate that the upper bound is achieved when the
joint distribution is decomposable, and the lower bound can be achieved by driving the input
$f$-divergences of the contraction coefficients to zero. Then, we establish a linear upper bound
on the contraction coefficients of joint distributions for a certain class of
$f$-divergences using
the contraction coefficient for
$\chi^2$-divergence, and refine this upper bound for the salient special
case of Kullback–Leibler (KL) divergence. Furthermore, we present an alternative proof of the
fact that the contraction coefficients for KL and
$\chi^2$-divergences are equal for bivariate Gaussian
distributions (where the former coefficient may impose a bounded second moment constraint).
Finally, we generalize the well-known result that contraction coefficients of stochastic matrices
(after extremizing over all possible probability vectors) for all nonlinear operator convex
$f$-divergences are equal. In particular, we prove that the so-called “less noisy” preorder over stochastic
matrices can be equivalently characterized by any nonlinear operator convex
$f$-divergence. As
an application of this characterization, we also derive a generalization of Samorodnitsky's strong
data processing inequality.
Keywords:
contraction coefficient, $f$-divergence/relative $f$-entropy, strong data processing inequality, less noisy preorder, maximal correlation.
UDC:
621.391.1 :
519.72 Received: 17.10.2019
Revised: 17.10.2019
Accepted: 09.03.2020
DOI:
10.31857/S0555292320020011