RUS  ENG
Full version
SEMINARS

Seminar by Department of Discrete Mathematic, Steklov Mathematical Institute of RAS
February 10, 2026 16:00, Moscow, Steklov Mathematical Institute, Room 313 (8 Gubkina) + online


A modification of Heller-Heller-Gorfin test

A. P. Buzin

Lomonosov Moscow State University

Abstract: Consider an $m$-sample problem. We have $m$ independent samples with i.i.d. observations: $X_{j,1}, \dots, X_{j,n_j} \sim F_j,\ j = 1,2,\dots,m$, where $F_j,\ j = 1,2,\dots,m$ are continuous cumulative distribution functions (c.d.f.'s).
Let $n=n_1+n_2+\ldots+n_m$,
$$ \lim_{n\to \infty} n_j/n := \alpha_j,\ j=1,2,\dots,m. $$
Let $G$ be a distribution function. Further, we will use the notation $G(\Delta) := G(b)-G(a)$, where $\Delta=(a,b]$ is an interval or a ray.
Further, we will consider the fixed natural parameter $k\ge 2$. Let $T$ be some partition of $\mathbb{R}$ into half-intervals and rays $\Delta_i$, $i=1,\dots, k$. Let
$$\widehat{\chi}^2_n(T) := \sum_{j=1}^{m}\sum_{i=1}^{k}\frac{\left( \widehat{F}_{j,n_j}(\Delta_i) -\widehat{H}_n (\Delta_i) \right)^{2} n_j } { \widehat{H}_n(\Delta_i) },$$
where $\widehat{F}_{j,n_j}$, $j\le k$, are e.c.d.f.'s of the corresponding samples, $\widehat{H}_n$ is the joint e.c.d.f. This is the classical $m$-sample chi-square statistic.
A well-known problem of chi-square tests is the lack of power in the case when we poorly choose the partition. Thus, there are several modification of chi-square tests based on the brute force of partitions. One of the best known solution was suggested by Heller, Heller and Gorfine (Heller R., Gorfine M., Heller Y. A class of multivariate distribution-free tests of independence based on graphs //Journal of Statistical Planning and Inference. – 2012. – Ò. 142. – ¹. 12. – Ñ. 3097-3106.).
Their method considered all the possible partitions $T$ and compute the maximum $\widehat{\chi}^2_n(T)$ or the sample mean of $\widehat{\chi}^2_n(T)$. The test is powerful, but computationally hard. The authors didn't derive the limit distributions of the statistics. Moreover, the first diverges as $n\to\infty$. We suppose that the second statistic converges to a non-degenerate limit, but no formal results to this effect are available. Therefore, for every sample size we need to compute critical values of the test. The statistic is computationally hard, so we can realize the Monte Carlo method only for small sample sizes (100-500 observations).
We introduce the modification of HHG statistics and prove several limit theorems for them.
Consider the joint c.d.f.
$$H(\cdot):=\alpha_1 F_1(\cdot) + \alpha_2 F_2(\cdot) + \ldots + \alpha_m F_m(\cdot).$$

Let $\mathcal{T}_{\varepsilon, n}$ be the set of all partitions $T$, such that the number of points from the pooled sample into every interval of the partition is at least $\varepsilon n$. We will consider the statistics
$$ D_\varepsilon:= \sup_{T \in \mathcal{T}_{\varepsilon, n} } \widehat{\chi}^2_n(T),\quad D_\varepsilon':= \frac{1}{|\mathcal{T}_{\varepsilon, n}|} \sum_{T \in \mathcal{T}_{\varepsilon, n} } \widehat{\chi}^2_n(T). $$

Theorem Under the hypothesis, $\varepsilon$ is a fixed number, the statistics $D_\varepsilon, D\prime_\varepsilon$ have non-degenerate limiting distributions as $n\to\infty$.
In the report we’ll discuss the consistency of the tests constructed based on the statistics $D_\varepsilon$, $D'_\varepsilon$ and consider the limiting distributions of the statistics $D_\varepsilon$, $D'_\varepsilon$ under the hypothesis and alternative.
For the case $\varepsilon_n \to 0$ as $n\to \infty$, we consider the following modifications of the statistics:
$$ \begin{aligned}& D_0 := \sup_{T: \widehat{H}_n(\Delta_i(T))>\varepsilon_n} \left( \sum_{j=1}^m \sum_{i=1}^{k}\frac{\left( \widehat{F}_{j,n_j}(\Delta_i) -\widehat{H}_n (\Delta_i) \right)^{2} n_j } { \widehat{H}_n(\Delta_i) \ln^2 \left(\widehat{H}_n(\Delta_i)/2 \right) } \right), \\\\ & D_0':= \frac{1}{|\mathcal{T}_{\varepsilon_n, n}|} \sum_{T \in \mathcal{T}_{\varepsilon_n, n} } \left( \sum_{j=1}^m \sum_{i=1}^{k}\frac{\left( \widehat{F}_{j,n_j}(\Delta_i) -\widehat{H}_n (\Delta_i) \right)^{2} n_j } { \widehat{H}_n(\Delta_i) \ln^2 \left(\widehat{H}_n(\Delta_i)/2 \right) } \right). \end{aligned} $$
In the report we’ll also discuss the limiting properties of the statistics $D_0, D_0' $.


© Steklov Math. Inst. of RAS, 2026