Chi-squared distribution

In probability theory and statistics, the chi-squared distribution (also chi-square or χ2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-square distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing or in construction of confidence intervals.[2][3][4][5] When it is being distinguished from the more general noncentral chi-squared distribution, this distribution is sometimes called the central chi-squared distribution.

The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks.

chi-squared
Probability density function
Chi-square pdf
Cumulative distribution function
Chi-square cdf
Notation or
Parameters (known as "degrees of freedom")
Support if , otherwise
PDF
CDF
Mean
Median
Mode
Variance
Skewness
Ex. kurtosis
Entropy
MGF
CF      [1]
PGF

Definition

If Z1, ..., Zk are independent, standard normal random variables, then the sum of their squares,

is distributed according to the chi-squared distribution with k degrees of freedom. This is usually denoted as

The chi-squared distribution has one parameter: k, a positive integer that specifies the number of degrees of freedom (the number of Zi’s).

Introduction

The chi-squared distribution is used primarily in hypothesis testing. Unlike more widely known distributions such as the normal distribution and the exponential distribution, the chi-squared distribution is not as often applied in the direct modeling of natural phenomena. It arises in the following hypothesis tests, among others.

It is also a component of the definition of the t-distribution and the F-distribution used in t-tests, analysis of variance, and regression analysis.

The primary reason that the chi-squared distribution is used extensively in hypothesis testing is its relationship to the normal distribution. Many hypothesis tests use a test statistic, such as the t-statistic in a t-test. For these hypothesis tests, as the sample size, n, increases, the sampling distribution of the test statistic approaches the normal distribution (central limit theorem). Because the test statistic (such as t) is asymptotically normally distributed, provided the sample size is sufficiently large, the distribution used for hypothesis testing may be approximated by a normal distribution. Testing hypotheses using a normal distribution is well understood and relatively easy. The simplest chi-squared distribution is the square of a standard normal distribution. So wherever a normal distribution could be used for a hypothesis test, a chi-squared distribution could be used.

Specifically, suppose that Z is a standard normal random variable, with mean = 0 and variance = 1. Z ~ N(0,1). A sample drawn at random from Z is a sample from the distribution shown in the graph of the standard normal distribution. Define a new random variable Q. To generate a random sample from Q, take a sample from Z and square the value. The distribution of the squared values is given by the random variable Q = Z2. The distribution of the random variable Q is an example of a chi-squared distribution: The subscript 1 indicates that this particular chi-squared distribution is constructed from only 1 standard normal distribution. A chi-squared distribution constructed by squaring a single standard normal distribution is said to have 1 degree of freedom. Thus, as the sample size for a hypothesis test increases, the distribution of the test statistic approaches a normal distribution, and the distribution of the square of the test statistic approaches a chi-squared distribution. Just as extreme values of the normal distribution have low probability (and give small p-values), extreme values of the chi-squared distribution have low probability.

An additional reason that the chi-squared distribution is widely used is that it is a member of the class of likelihood ratio tests (LRT).[6] LRT's have several desirable properties; in particular, LRT's commonly provide the highest power to reject the null hypothesis (Neyman–Pearson lemma). However, the normal and chi-squared approximations are only valid asymptotically. For this reason, it is preferable to use the t distribution rather than the normal approximation or the chi-squared approximation for small sample size. Similarly, in analyses of contingency tables, the chi-squared approximation will be poor for small sample size, and it is preferable to use Fisher's exact test. Ramsey shows that the exact binomial test is always more powerful than the normal approximation.[7]

Lancaster shows the connections among the binomial, normal, and chi-squared distributions, as follows.[8] De Moivre and Laplace established that a binomial distribution could be approximated by a normal distribution. Specifically they showed the asymptotic normality of the random variable

where m is the observed number of successes in N trials, where the probability of success is p, and q = 1 − p.

Squaring both sides of the equation gives

Using N = Np + N(1 − p), N = m + (Nm), and q = 1 − p, this equation simplifies to

The expression on the right is of the form that Pearson would generalize to the form:

where

= Pearson's cumulative test statistic, which asymptotically approaches a distribution.
= the number of observations of type i.
= the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction of type i in the population is
= the number of cells in the table.

In the case of a binomial outcome (flipping a coin), the binomial distribution may be approximated by a normal distribution (for sufficiently large n). Because the square of a standard normal distribution is the chi-squared distribution with one degree of freedom, the probability of a result such as 1 heads in 10 trials can be approximated either by the normal or the chi-squared distribution. However, many problems involve more than the two possible outcomes of a binomial, and instead require 3 or more categories, which leads to the multinomial distribution. Just as de Moivre and Laplace sought for and found the normal approximation to the binomial, Pearson sought for and found a multivariate normal approximation to the multinomial distribution. Pearson showed that the chi-squared distribution, the sum of multiple normal distributions, was such an approximation to the multinomial distribution [8]

Characteristics

Further properties of the chi-squared distribution can be found in the box at the upper right corner of this article.

Probability density function

The probability density function (pdf) of the chi-square distribution is

where denotes the gamma function, which has closed-form values for integer k.

For derivations of the pdf in the cases of one, two and k degrees of freedom, see Proofs related to chi-squared distribution.

Cumulative distribution function

Chernoff-bound
Chernoff bound for the CDF and tail (1-CDF) of a chi-squared random variable with ten degrees of freedom (k = 10)

Its cumulative distribution function is:

where is the lower incomplete gamma function and is the regularized gamma function.

In a special case of k = 2 this function has a simple form:

and the integer recurrence of the gamma function makes it easy to compute for other small even k.

Tables of the chi-squared cumulative distribution function are widely available and the function is included in many spreadsheets and all statistical packages.

Letting , Chernoff bounds on the lower and upper tails of the CDF may be obtained.[9] For the cases when (which include all of the cases when this CDF is less than half):

The tail bound for the cases when , similarly, is

For another approximation for the CDF modeled after the cube of a Gaussian, see under Noncentral chi-squared distribution.

Additivity

It follows from the definition of the chi-squared distribution that the sum of independent chi-squared variables is also chi-squared distributed. Specifically, if {Xi}i=1n are independent chi-squared variables with {ki}i=1n degrees of freedom, respectively, then Y = X1 + ⋯ + Xn is chi-squared distributed with k1 + ⋯ + kn degrees of freedom.

Sample mean

The sample mean of i.i.d. chi-squared variables of degree is distributed according to a gamma distribution with shape and scale parameters:

Asymptotically, given that for a scale parameter going to infinity, a Gamma distribution converges towards a normal distribution with expectation and variance , the sample mean converges towards:

Note that we would have obtained the same result invoking instead the central limit theorem, noting that for each chi-squared variable of degree the expectation is , and its variance (and hence the variance of the sample mean being ).

Entropy

The differential entropy is given by

where ψ(x) is the Digamma function.

The chi-squared distribution is the maximum entropy probability distribution for a random variate X for which and are fixed. Since the chi-squared is in the family of gamma distributions, this can be derived by substituting appropriate values in the Expectation of the log moment of gamma. For derivation from more basic principles, see the derivation in moment-generating function of the sufficient statistic.

Noncentral moments

The moments about zero of a chi-squared distribution with k degrees of freedom are given by[10][11]

Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:

Asymptotic properties

By the central limit theorem, because the chi-squared distribution is the sum of k independent random variables with finite mean and variance, it converges to a normal distribution for large k. For many practical purposes, for k > 50 the distribution is sufficiently close to a normal distribution for the difference to be ignored.[12] Specifically, if X ~ χ2(k), then as k tends to infinity, the distribution of tends to a standard normal distribution. However, convergence is slow as the skewness is and the excess kurtosis is 12/k.

The sampling distribution of ln(χ2) converges to normality much faster than the sampling distribution of χ2,[13] as the logarithm removes much of the asymmetry.[14] Other functions of the chi-squared distribution converge more rapidly to a normal distribution. Some examples are:

  • If X ~ χ2(k) then is approximately normally distributed with mean and unit variance (1922, by R. A. Fisher, see (18.23), p. 426 of.[4]
  • If X ~ χ2(k) then is approximately normally distributed with mean and variance [15] This is known as the Wilson–Hilferty transformation, see (18.24), p. 426 of.[4]

Relation to other distributions

Chi on SAS
Approximate formula for median compared with numerical quantile (top). Difference between numerical quantile and approximate formula (bottom).
  • As , (normal distribution)
  • (noncentral chi-squared distribution with non-centrality parameter )
  • If then has the chi-squared distribution
  • As a special case, if then has the chi-squared distribution
  • (The squared norm of k standard normally distributed variables is a chi-squared distribution with k degrees of freedom)
  • If and , then . (gamma distribution)
  • If then (chi distribution)
  • If , then is an exponential distribution. (See gamma distribution for more.)
  • If (Rayleigh distribution) then
  • If (Maxwell distribution) then
  • If then (Inverse-chi-squared distribution)
  • The chi-squared distribution is a special case of type 3 Pearson distribution
  • If and are independent then (beta distribution)
  • If (uniform distribution) then
  • is a transformation of Laplace distribution
  • If then
  • If follows the generalized normal distribution (version 1) with parameters then [16]
  • chi-squared distribution is a transformation of Pareto distribution
  • Student's t-distribution is a transformation of chi-squared distribution
  • Student's t-distribution can be obtained from chi-squared distribution and normal distribution
  • Noncentral beta distribution can be obtained as a transformation of chi-squared distribution and Noncentral chi-squared distribution
  • Noncentral t-distribution can be obtained from normal distribution and chi-squared distribution

A chi-squared variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables.

If Y is a k-dimensional Gaussian random vector with mean vector μ and rank k covariance matrix C, then X = (Yμ)TC−1(Y − μ) is chi-squared distributed with k degrees of freedom.

The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-squared distribution called the noncentral chi-squared distribution.

If Y is a vector of k i.i.d. standard normal random variables and A is a k×k symmetric, idempotent matrix with rank k−n then the quadratic form YTAY is chi-squared distributed with k−n degrees of freedom.

If is a positive-semidefinite covariance matrix with strictly positive diagonal entries, then for and a random -vector independent of such that and it holds that

[14]

The chi-squared distribution is also naturally related to other distributions arising from the Gaussian. In particular,

  • Y is F-distributed, Y ~ F(k1,k2) if where X1 ~ χ²(k1) and X2  ~ χ²(k2) are statistically independent.
  • If X1  ~  χ2k1 and X2  ~  χ2k2 are statistically independent, then X1 + X2  ~ χ2k1+k2. If X1 and X2 are not independent, then X1 + X2 is not chi-squared distributed.

Generalizations

The chi-squared distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

Linear combination

If are chi square random variables and , then a closed expression for the distribution of is not known. It may be, however, approximated efficiently using the property of characteristic functions of chi-squared random variables.[17]

Chi-squared distributions

Noncentral chi-squared distribution

The noncentral chi-squared distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means.

Generalized chi-squared distribution

The generalized chi-squared distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions

The chi-squared distribution is a special case of the gamma distribution, in that using the rate parameterization of the gamma distribution (or using the scale parameterization of the gamma distribution) where k is an integer.

Because the exponential distribution is also a special case of the gamma distribution, we also have that if , then is an exponential distribution.

The Erlang distribution is also a special case of the gamma distribution and thus we also have that if with even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

Occurrence and applications

The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables, each divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed sample.

  • if X1, ..., Xn are i.i.d. N(μ, σ2) random variables, then where .
  • The box below shows some statistics based on Xi ∼ Normal(μi, σ2i), i = 1, ⋯, k, independent random variables that have probability distributions related to the chi-squared distribution:
Name Statistic
chi-squared distribution
noncentral chi-squared distribution
chi distribution
noncentral chi distribution

The chi-squared distribution is also often encountered in magnetic resonance imaging.[18]

Table of χ2 values vs p-values

The p-value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. A low p-value, below the chosen significance level, indicates statistical significance, i.e., sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and not-significant results.

The table below gives a number of p-values matching to χ2 for the first 10 degrees of freedom.

Degrees of freedom (df) χ2 value[19]
1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.63 10.83
2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.61 5.99 9.21 13.82
3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.81 11.34 16.27
4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47
5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88
10 3.94 4.87 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59
P value (Probability) 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001

These values can be calculated evaluating the quantile function (also known as “inverse CDF” or “ICDF”) of the chi-squared distribution;[20] e. g., the χ2 ICDF for p = 0.05 and df = 7 yields 14.06714 ≈ 14.07 as in the table above.

History and name

This distribution was first described by the German statistician Friedrich Robert Helmert in papers of 1875–6,[21][22] where he computed the sampling distribution of the sample variance of a normal population. Thus in German this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".

The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness of fit, for which he developed his Pearson's chi-squared test, published in 1900, with computed table of values published in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII). The name "chi-squared" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter Chi, writing −½χ2 for what would appear in modern notation as −½xTΣ−1x (Σ being the covariance matrix).[23] The idea of a family of "chi-squared distributions", however, is not due to Pearson but arose as a further development due to Fisher in the 1920s.[21]

See also

References

  1. ^ M.A. Sanders. "Characteristic function of the central chi-squared distribution" (PDF). Archived from the original (PDF) on 2011-07-15. Retrieved 2009-03-06.
  2. ^ Abramowitz, Milton; Stegun, Irene Ann, eds. (1983) [June 1964]. "Chapter 26". Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series. 55 (Ninth reprint with additional corrections of tenth original printing with corrections (December 1972); first ed.). Washington D.C.; New York: United States Department of Commerce, National Bureau of Standards; Dover Publications. p. 940. ISBN 978-0-486-61272-0. LCCN 64-60036. MR 0167642. LCCN 65-12253.
  3. ^ NIST (2006). Engineering Statistics Handbook – Chi-Squared Distribution
  4. ^ a b c Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). "Chi-Squared Distributions including Chi and Rayleigh". Continuous Univariate Distributions. 1 (Second ed.). John Wiley and Sons. pp. 415–493. ISBN 978-0-471-58495-7.
  5. ^ Mood, Alexander; Graybill, Franklin A.; Boes, Duane C. (1974). Introduction to the Theory of Statistics (Third ed.). McGraw-Hill. pp. 241–246. ISBN 978-0-07-042864-5.
  6. ^ Westfall, Peter H. (2013). Understanding Advanced Statistical Methods. Boca Raton, FL: CRC Press. ISBN 978-1-4665-1210-8.
  7. ^ Ramsey, PH (1988). "Evaluating the Normal Approximation to the Binomial Test". Journal of Educational Statistics. 13 (2): 173–82. doi:10.2307/1164752. JSTOR 1164752.
  8. ^ a b Lancaster, H.O. (1969), The Chi-squared Distribution, Wiley
  9. ^ Dasgupta, Sanjoy D. A.; Gupta, Anupam K. (January 2003). "An Elementary Proof of a Theorem of Johnson and Lindenstrauss" (PDF). Random Structures and Algorithms. 22 (1): 60–65. doi:10.1002/rsa.10073. Retrieved 2012-05-01.
  10. ^ Chi-squared distribution, from MathWorld, retrieved Feb. 11, 2009
  11. ^ M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1
  12. ^ Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 118. ISBN 978-0471093152.
  13. ^ Bartlett, M. S.; Kendall, D. G. (1946). "The Statistical Analysis of Variance-Heterogeneity and the Logarithmic Transformation". Supplement to the Journal of the Royal Statistical Society. 8 (1): 128–138. doi:10.2307/2983618. JSTOR 2983618.
  14. ^ a b Pillai, Natesh S. (2016). "An unexpected encounter with Cauchy and Lévy". Annals of Statistics. 44 (5): 2089–2097. arXiv:1505.01957. doi:10.1214/15-aos1407.
  15. ^ Wilson, E. B.; Hilferty, M. M. (1931). "The distribution of chi-squared". Proc. Natl. Acad. Sci. USA. 17 (12): 684–688. Bibcode:1931PNAS...17..684W. doi:10.1073/pnas.17.12.684. PMC 1076144. PMID 16577411.
  16. ^ Bäckström, T.; Fischer, J. (January 2018). "Fast Randomization for Distributed Low-Bitrate Coding of Speech and Audio". IEEE/ACM Transaction on Audio, Speech, and Language Processing. 26 (1): 19–30. doi:10.1109/TASLP.2017.2757601.
  17. ^ Bausch, J. (2013). "On the Efficient Calculation of a Linear Combination of Chi-Square Random Variables with an Application in Counting String Vacua". J. Phys. A: Math. Theor. 46 (50): 505202. arXiv:1208.2691. Bibcode:2013JPhA...46X5202B. doi:10.1088/1751-8113/46/50/505202.
  18. ^ den Dekker A. J., Sijbers J., (2014) "Data distributions in magnetic resonance images: a review", Physica Medica, [1]
  19. ^ Chi-Squared Test Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State University. In turn citing: R. A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV. Two values have been corrected, 7.82 with 7.81 and 4.60 with 4.61
  20. ^ R Tutorial: Chi-squared Distribution
  21. ^ a b Hald 1998, pp. 633–692, 27. Sampling Distributions under Normality.
  22. ^ F. R. Helmert, "Ueber die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler und über einige damit im Zusammenhange stehende Fragen", Zeitschrift für Mathematik und Physik 21, 1876, pp. 102–219
  23. ^ R. L. Plackett, Karl Pearson and the Chi-Squared Test, International Statistical Review, 1983, 61f. See also Jeff Miller, Earliest Known Uses of Some of the Words of Mathematics.

Further reading

External links

Chi-squared

The term chi-squared or has various uses in statistics:

Chi-squared test

A chi-squared test, also written as χ2 test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, 'chi-squared test' often is used as short for Pearson's chi-squared test. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

In the standard applications of the test, the observations are classified into mutually exclusive classes, and there is some theory, or say null hypothesis, which gives the probability that any observation falls into the corresponding class. The purpose of the test is to evaluate how likely the observations that are made would be, assuming the null hypothesis is true.

Chi-squared tests are often constructed from a sum of squared errors, or through the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.

Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.

Chi distribution

In probability theory and statistics, the chi distribution is a continuous probability distribution. It is the distribution of the positive square root of the sum of squares of a set of independent random variables each following a standard normal distribution, or equivalently, the distribution of the Euclidean distance of the random variables from the origin. It is thus related to the chi-squared distribution by describing the distribution of the positive square roots of a variable obeying a chi-squared distribution.

The most familiar examples are the Rayleigh distribution (chi distribution with two degrees of freedom) and the Maxwell–Boltzmann distribution of the molecular speeds in an ideal gas (chi distribution with three degrees of freedom).

If are k independent, normally distributed random variables with means and standard deviations , then the statistic

is distributed according to the chi distribution. Accordingly, dividing by the mean of the chi distribution (scaled by the square root of n − 1) yields the correction factor in the unbiased estimation of the standard deviation of the normal distribution. The chi distribution has one parameter: which specifies the number of degrees of freedom (i.e. the number of ).

Generalized chi-squared distribution

In probability theory and statistics, the specific name generalized chi-squared distribution (also generalized chi-square distribution) arises in relation to one particular family of variants of the chi-squared distribution. There are several other such variants for which the same term is sometimes used, or which clearly are generalizations of the chi-squared distribution, and which are treated elsewhere: some are special cases of the family discussed here, for example the noncentral chi-squared distribution and the gamma distribution, while the generalized gamma distribution is outside this family. The type of generalisation of the chi-squared distribution that is discussed here is of importance because it arises in the context of the distribution of statistical estimates in cases where the usual statistical theory does not hold. For example, if a predictive model is fitted by least squares but the model errors have either autocorrelation or heteroscedasticity, then a statistical analysis of alternative model structures can be undertaken by relating changes in the sum of squares to an asymptotically valid generalized chi-squared distribution. More specifically, the distribution can be defined in terms of a quadratic form derived from a multivariate normal distribution.

Generalized integer gamma distribution

In probability and statistics, the generalized integer gamma distribution (GIG) is the distribution of the sum of independent

gamma distributed random variables, all with integer shape parameters and different rate parameters. This is a special case of the generalized chi-squared distribution. A related concept is the generalized near-integer gamma distribution (GNIG).

Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-squared test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

Inverse-chi-squared distribution

In probability and statistics, the inverse-chi-squared distribution (or inverted-chi-square distribution) is a continuous probability distribution of a positive-valued random variable. It is closely related to the chi-squared distribution and its specific importance is that it arises in the application of Bayesian inference to the normal distribution, where it can be used as the prior and posterior distribution for an unknown variance.

Inverse-gamma distribution

In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution. Perhaps the chief use of the inverse gamma distribution is in Bayesian statistics, where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution, if an uninformative prior is used, and as an analytically tractable conjugate prior, if an informative prior is required.However, it is common among Bayesians to consider an alternative parametrization of the normal distribution in terms of the precision, defined as the reciprocal of the variance, which allows the gamma distribution to be used directly as a conjugate prior. Other Bayesians prefer to parametrize the inverse gamma distribution differently, as a scaled inverse chi-squared distribution.

Jarque–Bera test

In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera. The test statistic JB is defined as

where n is the number of observations (or degrees of freedom in general); S is the sample skewness, C is the sample kurtosis, and k is the number of regressors (being 1 outside a regression context):

where and are the estimates of third and fourth central moments, respectively, is the sample mean, and is the estimate of the second central moment, the variance.

If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared distribution with two degrees of freedom, so the statistic can be used to test the hypothesis that the data are from a normal distribution. The null hypothesis is a joint hypothesis of the skewness being zero and the excess kurtosis being zero. Samples from a normal distribution have an expected skewness of 0 and an expected excess kurtosis of 0 (which is the same as a kurtosis of 3). As the definition of JB shows, any deviation from this increases the JB statistic.

For small samples the chi-squared approximation is overly sensitive, often rejecting the null hypothesis when it is true. Furthermore, the distribution of p-values departs from a uniform distribution and becomes a right-skewed uni-modal distribution, especially for small p-values. This leads to a large Type I error rate. The table below shows some p-values approximated by a chi-squared distribution that differ from their true alpha levels for small samples.

Calculated p-values equivalents to true alpha levels at given sample sizes
True α level 20 30 50 70 100
0.1 0.307 0.252 0.201 0.183 0.1560
0.05 0.1461 0.109 0.079 0.067 0.062
0.025 0.051 0.0303 0.020 0.016 0.0168
0.01 0.0064 0.0033 0.0015 0.0012 0.002

(These values have been approximated using Monte Carlo simulation in Matlab)

In MATLAB's implementation, the chi-squared approximation for the JB statistic's distribution is only used for large sample sizes (> 2000). For smaller samples, it uses a table derived from Monte Carlo simulations in order to interpolate p-values.

List of probability distributions

Many probability distributions that are important in theory or applications have been given specific names.

Noncentral F-distribution

In probability theory and statistics, the noncentral F-distribution is a continuous probability distribution that is a generalization of the (ordinary) F-distribution. It describes the distribution of the quotient (X/n1)/(Y/n2), where the numerator X has a noncentral chi-squared distribution with n1 degrees of freedom and the denominator Y has a central chi-squared distribution with n2 degrees of freedom. It is also required that X and Y are statistically independent of each other.

It is the distribution of the test statistic in analysis of variance problems when the null hypothesis is false. The noncentral F-distribution is used to find the power function of such a test.

Noncentral chi-squared distribution

In probability theory and statistics, the noncentral chi-squared or noncentral distribution is a generalization of the chi-squared distribution. This distribution often arises in the power analysis of statistical tests in which the null distribution is (perhaps asymptotically) a chi-squared distribution; important examples of such tests are the likelihood-ratio tests.

Noncentrality parameter

Noncentrality parameters are parameters of families of probability distributions that are related to other "central" families of distributions. Whereas the central distribution describes how a test statistic is distributed when the difference tested is null, noncentral distributions describe the distribution of a test statistic when the null is false (so the alternative hypothesis is true). This leads to their use in calculating statistical power.

If the noncentrality parameter of a distribution is zero, the distribution is identical to a distribution in the central family. For example, the Student's t-distribution is the central family of distributions for the noncentral t-distribution family.

Noncentrality parameters are used in the following distributions:

Noncentral t-distribution

Noncentral chi-squared distribution

Noncentral chi-distribution

Noncentral F-distribution

Noncentral beta distributionIn general, noncentrality parameters occur in distributions that are transformations of a normal distribution. The "central" versions are derived from normal distributions that have a mean of zero; the noncentral versions generalize to arbitrary means. For example, the standard (central) chi-squared distribution is the distribution of a sum of squared independent standard normal distributions, i.e., normal distributions with mean 0, variance 1. The noncentral chi-squared distribution generalizes this to normal distributions with arbitrary mean and variance.

Each of these distributions has a single noncentrality parameter. However, there are extended versions of these distributions which have two noncentrality parameters: the doubly noncentral beta distribution, the doubly noncentral F distribution and the doubly noncentral t distribution. These types of distributions occur for distributions that are defined as the quotient of two independent distributions. When both source distributions are central (either with a zero mean or a zero noncentrality parameter, depending on the type of distribution), the result is a central distribution. When one is noncentral, a (singly) noncentral distribution results, while if both are noncentral, the result is a doubly noncentral distribution. As an example, a t-distribution is defined (ignoring constant values) as the quotient of a normal distribution and the square root of an independent chi-squared distribution. Extending this definition to encompass a normal distribution with arbitrary mean produces a noncentral t-distribution, while further extending it to allow a noncentral chi-squared distribution in the denominator while produces a doubly noncentral t-distribution.

Note also that there are some "noncentral distributions" that are not usually formulated in terms of a "noncentrality parameter": see noncentral hypergeometric distributions, for example.

The noncentrality parameter of the t-distribution may be negative or positive while the noncentral parameters of the other three distributions must be greater than zero.

Pearson's chi-squared test

Pearson's chi-squared test (χ2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is suitable for unpaired data from large samples. It is the most widely used of many chi-squared tests (e.g., Yates, likelihood ratio, portmanteau test in time series, etc.) – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900. In contexts where it is important to improve a distinction between the test statistic and its distribution, names similar to Pearson χ-squared test or statistic are used.

It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability 1. A common case for this is where the events each cover an outcome of a categorical variable.

A simple example is the hypothesis that an ordinary six-sided die is "fair" (i. e., all six outcomes are equally likely to occur.)

Proofs related to chi-squared distribution

The following are proofs of several characteristics related to the chi-squared distribution.

Scaled inverse chi-squared distribution

The scaled inverse chi-squared distribution is the distribution for x = 1/s2, where s2 is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

This family of scaled inverse chi-squared distributions is closely related to two other distribution families, those of the inverse-chi-squared distribution and the inverse-gamma distribution. Compared to the inverse-chi-squared distribution, the scaled distribution has an extra parameter τ2, which scales the distribution horizontally and vertically, representing the inverse-variance of the original underlying process. Also, the scaled inverse chi-squared distribution is presented as the distribution for the inverse of the mean of ν squared deviates, rather than the inverse of their sum. The two distributions thus have the relation that if

  then  

Compared to the inverse gamma distribution, the scaled inverse chi-squared distribution describes the same data distribution, but using a different parametrization, which may be more convenient in some circumstances. Specifically, if

  then  

Either form may be used to represent the maximum entropy distribution for a fixed first inverse moment and first logarithmic moment .

The scaled inverse chi-squared distribution also has a particular use in Bayesian statistics, somewhat unrelated to its use as a predictive distribution for x = 1/s2. Specifically, the scaled inverse chi-squared distribution can be used as a conjugate prior for the variance parameter of a normal distribution. In this context the scaling parameter is denoted by σ02 rather than by τ2, and has a different interpretation. The application has been more usually presented using the inverse-gamma distribution formulation instead; however, some authors, following in particular Gelman et al. (1995/2004) argue that the inverse chi-squared parametrisation is more intuitive.

Variance

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , or .

Welch–Satterthwaite equation

In statistics and uncertainty analysis, the Welch–Satterthwaite equation is used to calculate an approximation to the effective degrees of freedom of a linear combination of independent sample variances, also known as the pooled degrees of freedom, corresponding to the pooled variance.

For n sample variances si2 (i = 1, ..., n), each respectively having νi degrees of freedom, often one computes the linear combination

where is a real positive number, typically . In general, the probability distribution of χ' cannot be expressed analytically. However, its distribution can be approximated by another chi-squared distribution, whose effective degrees of freedom are given by the Welch–Satterthwaite equation

There is no assumption that the underlying population variances σi2 are equal. This is known as the Behrens–Fisher problem.

The result can be used to perform approximate statistical inference tests. The simplest application of this equation is in performing Welch's t-test.

Zero degrees of freedom

In statistics, the non-central chi-squared distribution with zero degrees of freedom can be used in testing the null hypothesis that a sample is from a uniform distribution on the interval (0, 1). This distribution was introduced by Andrew F. Siegel in 1979.

The chi-squared distribution with n degrees of freedom is the probability distribution of the sum

where

However, if

and are independent, then the sum of squares above has a non-central chi-squared distribution with n degrees of freedom and "noncentrality parameter"

It is trivial that a "central" chi-square distribution with zero degrees of freedom concentrates all probability at zero.

All of this leaves open the question of what happens with zero degrees of freedom when the noncentrality parameter is not zero.

The noncentral chi-squared distribution with zero degrees of freedom and with noncentrality parameter μ is the distribution of

This concentrates probability eμ/2 at zero; thus it is a mixture of discrete and continuous distributions

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.