where (X, X', Y, Y') are independent, the cdf of X and X' is F, the cdf of Y and Y' is G, is the expected value, and || . || denotes the length of a vector. Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G.
Energy distance for statistical applications was introduced in 1985 by Gábor J. Székely, who proved that for real-valued random variables is exactly twice Harald Cramér's distance:
For a simple proof of this equivalence, see Székely (2002).
In higher dimensions, however, the two distances are different because the energy distance is rotation invariant while Cramér's distance is not. (Notice that Cramér's distance is not the same as the distribution-freeCramer-von-Mises criterion.)
Generalization to metric spaces
One can generalize the notion of energy distance to probability distributions on metric spaces. Let be a metric space with its Borel sigma algebra. Let denote the collection of all probability measures on the measurable space. If μ and ν are probability measures in , then the energy-distance of μ and ν can be defined as the square root of
This is not necessarily non-negative, however. If is a strongly negative definite kernel, then is a metric, and conversely. This condition is expressed by saying that has negative type. Negative type is not sufficient for to be a metric; the latter condition is expressed by saying that has strong negative type. In this situation, the energy distance is zero if and only if X and Y are identically distributed. An example of a metric of negative type but not of strong negative type is the plane with the taxicab metric. All Euclidean spaces and even separable Hilbert spaces have strong negative type.
In the literature on kernel methods for machine learning, these generalized notions of energy distance are studied under the name of maximum mean discrepancy. Equivalence of distance based and kernel methods for hypothesesis testing is covered by several authors.
A related statistical concept, the notion of E-statistic or energy-statistic was introduced by Gábor J. Székely in the 1980s when he was giving colloquium lectures in Budapest, Hungary and at MIT, Yale, and Columbia. This concept is based on the notion of Newton’s potential energy. The idea is to consider statistical observations as heavenly bodies governed by a statistical potential energy which is zero only when an underlying statistical null hypothesis is true. Energy statistics are functions of distances between statistical observations.
Energy distance and E-statistic were considered as N-distances and N-statistic in Zinger A.A., Kakosyan A.V., Klebanov L.B. Characterization of distributions by means of mean values of some statistics in connection with some probability metrics, Stability Problems for Stochastic Models. Moscow, VNIISI, 1989,47-55. (in Russian), English Translation: A characterization of distributions by mean values of statistics and certain probabilistic metrics A. A. Zinger, A. V. Kakosyan, L. B. Klebanov in Journal of Soviet Mathematics (1992). In the same paper there was given a definition of strongly negative definite kernel, and provided a generalization on metric spaces, discussed above. The book  gives these results and their applications to statistical testing as well. The book contains also some applications to recovering the measure from its potential.
Testing for equal distributions
Consider the null hypothesis that two random variables, X and Y, have the same probability distributions: . For statistical samples from X and Y:
the following arithmetic averages of distances are computed between the X and the Y samples:
The E-statistic of the underlying null hypothesis is defined as follows:
One can prove  that and that the corresponding population value is zero if and only if X and Y have the same distribution (). Under this null hypothesis the test statistic
The E-coefficient of inhomogeneity can also be introduced. This is always between 0 and 1 and is defined as
where denotes the expected value. H = 0 exactly when X and Y have the same distribution.
A multivariate goodness-of-fit measure is defined for distributions in arbitrary dimension (not restricted by sample size). The energy goodness-of-fit statistic is
where X and X' are independent and identically distributed according to the hypothesized distribution, and . The only required condition is that X has finite moment under the null hypothesis. Under the null hypothesis , and the asymptotic distribution of Qn is a quadratic form of centered Gaussian random variables. Under an alternative hypothesis, Qn tends to infinity stochastically, and thus determines a statistically consistent test. For most applications the exponent 1 (Euclidean distance) can be applied. The important special case of testing multivariate normality is implemented in the energy package for R. Tests are also developed for heavy tailed distributions such as Pareto (power law), or stable distributions by application of exponents in (0,1).
^M. L. Rizzo and G. J. Székely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics Vol. 4, No. 2, 1034–1055. PDF
^Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, Nov. (5). Reprint.
Ledlie, Jonathan and Pietzuch, Peter and Seltzer, Margo, (2006). Stable and Accurate Network Coordinates,. Sovetskaia meditsina. ICDCS '06,. Washington, DC, USA: IEEE Computer Society,. pp. 74–83, . doi:10.1109/ICDCS.2006.79. ISBN 0-7695-2540-7. PMID1154085. PDF
^Székely, G. J., Rizzo M. L. and Bakirov, N. K. (2007). "Measuring and testing independence by correlation of distances", The Annals of Statistics, 35, 2769–2794. PDF
^Székely, G. J. and Rizzo, M. L. (2009). "Brownian distance covariance", The Annals of Applied Statistics, 3/4, 1233–1308. PDF
T. Gneiting; A. E. Raftery (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation". Journal of the American Statistical Association. 102 (477): 359–378. doi:10.1198/016214506000001437.
^Klebanov L.B. A class of Probability Metrics and its Statistical Applications, Statistics in Industry and Technology: Statistical Data Analysis, Yadolah Dodge, Ed. Birkhauser, Basel, Boston, Berlin, 2002,241-252.
^Statistics and Data Analysis, 2006, 50, 12, 3619-3628Rui Hu, Xing Qiu, Galina Glazko, Lev Klebanov, Andrei Yakovlev Detecting intergene correlation changes in microarray analysis: a new approach to gene selection, BMCBioinformatics, Vol.10, 20 (2009), 1-15.
^Yuanhui Xiao, Robert Frisina, Alexander Gordon, Lev Klebanov, Andrei Yakovlev Multivariate Search for Differentially Expressed Gene Combinations BMC Bioinformatics, 2004, 5:164; Antoni Almudevar, Lev Klebanov, Xing Qiu, Andrei Yakovlev Utility of correlation measures in analysis of gene expression, In: NeuroRX, 2006, 3, 3, 384-395; Klebanov Lev, Gordon Alexander, Land Hartmut, Yakovlev Andrei A permutation test motivated by microarray data analysis
^Viktor Benes, Radka Lechnerova, Lev Klebanov, Margarita Slamova, Peter Slama Statistical comparison of the geometry of second-phase particles, Materials Characterization , Vol. 60 (2009 ), 1076 - 1081.
^E. Vaiciukynas, A. Verikas, A. Gelzinis, M. Bacauskiene, and I. Olenina (2015)
Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data, Chemometrics and Intelligent Laboratory Systems, 146, 10-23.
This page is based on a Wikipedia article written by authors
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.