RUS  ENG
Full version
JOURNALS // Problemy Peredachi Informatsii // Archive

Probl. Peredachi Inf., 2020 Volume 56, Issue 2, Pages 3–63 (Mi ppi2315)

This article is cited in 19 papers

Information Theory

Comparison of contraction coefficients for $f$-divergences

A. Makur, L. Zheng

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

Abstract: Contraction coefficients are distribution dependent constants that are used to sharpen standard data processing inequalities for $f$-divergences (or relative $f$-entropies) and produce so-called “strong” data processing inequalities. For any bivariate joint distribution, i.e., any probability vector and stochastic matrix pair, it is known that contraction coefficients for $f$-divergences are upper bounded by unity and lower bounded by the contraction coefficient for $\chi^2$-divergence. In this paper, we elucidate that the upper bound is achieved when the joint distribution is decomposable, and the lower bound can be achieved by driving the input $f$-divergences of the contraction coefficients to zero. Then, we establish a linear upper bound on the contraction coefficients of joint distributions for a certain class of $f$-divergences using the contraction coefficient for $\chi^2$-divergence, and refine this upper bound for the salient special case of Kullback–Leibler (KL) divergence. Furthermore, we present an alternative proof of the fact that the contraction coefficients for KL and $\chi^2$-divergences are equal for bivariate Gaussian distributions (where the former coefficient may impose a bounded second moment constraint). Finally, we generalize the well-known result that contraction coefficients of stochastic matrices (after extremizing over all possible probability vectors) for all nonlinear operator convex $f$-divergences are equal. In particular, we prove that the so-called “less noisy” preorder over stochastic matrices can be equivalently characterized by any nonlinear operator convex $f$-divergence. As an application of this characterization, we also derive a generalization of Samorodnitsky's strong data processing inequality.

Keywords: contraction coefficient, $f$-divergence/relative $f$-entropy, strong data processing inequality, less noisy preorder, maximal correlation.

UDC: 621.391.1 : 519.72

Received: 17.10.2019
Revised: 17.10.2019
Accepted: 09.03.2020

DOI: 10.31857/S0555292320020011


 English version:
Problems of Information Transmission, 2020, 56:2, 103–156

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2026