Covariance

In probability theory and statistics, covariance is a measure of the joint variability of two random variables.[1] If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive.[2] In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (i.e., the variables tend to show opposite behavior), the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter.

Definition

The covariance between two jointly distributed real-valued random variables and with finite second moments is defined as the expected product of their deviations from their individual expected values:[3][4]:p. 119

(Eq.1)

where is the expected value of , also known as the mean of . The covariance is also sometimes denoted or , in analogy to variance. By using the linearity property of expectations, this can be simplified to the expected value of their product minus the product of their expected values:

but this equation is susceptible to catastrophic cancellation (see the section on numerical computation below).

The units of measurement of the covariance are those of times those of . By contrast, correlation coefficients, which depend on the covariance, are a dimensionless measure of linear dependence. (In fact, correlation coefficients can simply be understood as a normalized version of covariance.)

Definition for complex random variables

The covariance between two complex random variables is defined as[4]:p. 119

Notice the complex conjugation of the second factor in the definition.

Discrete random variables

If the random variable pair can take on the values for , with equal probabilities , then the covariance can be equivalently written in terms of the means and as

It can also be equivalently expressed, without directly referring to the means, as[5]

More generally, if there are possible realizations of , namely but with possibly equal probabilities for , then the covariance is

Example

Suppose that and have the following joint probability mass function,[6] in which the six central cells give the discrete joint probabilities of the six hypothetical realizations :

y
1 2 3
1 1/4 1/4 0 1/2
x 2 0 1/4 1/4 1/2
1/4 1/2 1/4 1

can take on two values (1 and 2) while can take on three (1, 2, and 3). Their means are and The population standard deviations of and are and Then:

Properties

Covariance with itself

The variance is a special case of the covariance in which the two variables are identical (that is, in which one variable always takes the same value as the other):[4]:p. 121

Covariance of linear combinations

If , , , and are real-valued random variables and let are real-valued constants, then the following facts are a consequence of the definition of covariance:

For a sequence of random variables in real-valued, and constants , we have

Hoeffding's Covariance Identity

A useful identity to compute the covariance between two random variables is the Hoeffding's Covariance Identity:[7]

where is the joint distribution function of the random vector and are the marginals.

Uncorrelatedness and independence

Random variables whose covariance is zero are called uncorrelated[4]:p. 121. Similarly, the components of random vectors whose covariance matrix is zero in every entry outside the main diagonal are also called uncorrelated.

If and are independent random variables, then their covariance is zero.[4]:p. 123[8] This follows because under independence,

The converse, however, is not generally true. For example, let be uniformly distributed in and let . Clearly, and are not independent, but

In this case, the relationship between and is non-linear, while correlation and covariance are measures of linear dependence between two random variables. This example shows that if two random variables are uncorrelated, that does not in general imply that they are independent. However, if two variables are jointly normally distributed (but not if they are merely individually normally distributed), uncorrelatedness does imply independence.

Relationship to inner products

Many of the properties of covariance can be extracted elegantly by observing that it satisfies similar properties to those of an inner product:

  1. bilinear: for constants and and random variables ,
  2. symmetric:
  3. positive semi-definite: for all random variables , and implies that is a constant random variable.

In fact these properties imply that the covariance defines an inner product over the quotient vector space obtained by taking the subspace of random variables with finite second moment and identifying any two that differ by a constant. (This identification turns the positive semi-definiteness above into positive definiteness.) That quotient vector space is isomorphic to the subspace of random variables with finite second moment and mean zero; on that subspace, the covariance is exactly the L2 inner product of real-valued functions on the sample space.

As a result, for random variables with finite variance, the inequality

holds via the Cauchy–Schwarz inequality.

Proof: If , then it holds trivially. Otherwise, let random variable

Then we have

Calculating the sample covariance

The sample covariances among variables based on observations of each, drawn from an otherwise unobserved population, are given by the matrix with the entries

which is an estimate of the covariance between variable and variable .

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector , a vector whose jth element is one of the random variables. The reason the sample covariance matrix has in the denominator rather than is essentially that the population mean is not known and is replaced by the sample mean . If the population mean is known, the analogous unbiased estimate is given by

.

Generalizations

Auto-covariance matrix of real random vectors

For a vector of jointly distributed random variables with finite second moments, its auto-covariance matrix (also known as the variance–covariance matrix or simply the covariance matrix) (also denoted by ) is defined as[9]:p.335

.

Let be a random vector with covariance matrix Σ(X), and let A be a matrix that can act on . The covariance matrix of the matrix-vector product A X is:

This is a direct result of the linearity of expectation and is useful when applying a linear transformation, such as a whitening transformation, to a vector.

Cross-covariance matrix of real random vectors

For real random vectors and , the cross-covariance matrix is equal to[9]:p.336

(Eq.2)

where is the transpose of the vector (or matrix) .

The -th element of this matrix is equal to the covariance between the i-th scalar component of and the j-th scalar component of . In particular, is the transpose of .

Numerical computation

When , the equation is prone to catastrophic cancellation when computed with floating point arithmetic and thus should be avoided in computer programs when the data has not been centered before.[10] Numerically stable algorithms should be preferred in this case.[11]

Comments

The covariance is sometimes called a measure of "linear dependence" between the two random variables. That does not mean the same thing as in the context of linear algebra (see linear dependence). When the covariance is normalized, one obtains the Pearson correlation coefficient, which gives the goodness of the fit for the best possible linear function describing the relation between the variables. In this sense covariance is a linear gauge of dependence.

Applications

In genetics and molecular biology

Covariance is an important measure in biology. Certain sequences of DNA are conserved more than others among species, and thus to study secondary and tertiary structures of proteins, or of RNA structures, sequences are compared in closely related species. If sequence changes are found or no changes at all are found in noncoding RNA (such as microRNA), sequences are found to be necessary for common structural motifs, such as an RNA loop. In genetics, covariance serves a basis for computation of Genetic Relationship Matrix (GRM) (aka kinship matrix), enabling inference on population structure from sample with no known close relatives as well as inference on estimation of heritability of complex traits.

In financial economics

Covariances play a key role in financial economics, especially in portfolio theory and in the capital asset pricing model. Covariances among various assets' returns are used to determine, under certain assumptions, the relative amounts of different assets that investors should (in a normative analysis) or are predicted to (in a positive analysis) choose to hold in a context of diversification.

In meteorological and oceanographic data assimilation

The covariance matrix is important in estimating the initial conditions required for running weather forecast models. The 'forecast error covariance matrix' is typically constructed between perturbations around a mean state (either a climatological or ensemble mean). The 'observation error covariance matrix' is constructed to represent the magnitude of combined observational errors (on the diagonal) and the correlated errors between measurements (off the diagonal). This is an example of its widespread application to Kalman filtering and more general state estimation for time-varying systems.

In micrometeorology

The eddy covariance technique is a key atmospherics measurement technique where the covariance between instantaneous deviation in vertical wind speed from the mean value and instantaneous deviation in gas concentration is the basis for calculating the vertical turbulent fluxes.

In feature extraction

The covariance matrix is used to capture the spectral variability of a signal.[12]

See also

References

  1. ^ Rice, John (2007). Mathematical Statistics and Data Analysis. Belmont, CA: Brooks/Cole Cengage Learning. p. 138. ISBN 978-0534-39942-9.
  2. ^ Weisstein, Eric W. "Covariance". MathWorld.
  3. ^ Oxford Dictionary of Statistics, Oxford University Press, 2002, p. 104.
  4. ^ a b c d e Park,Kun Il (2018). Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer. ISBN 978-3-319-68074-3.
  5. ^ Yuli Zhang,Huaiyu Wu,Lei Cheng (June 2012). Some new deformation formulas about variance and covariance. Proceedings of 4th International Conference on Modelling, Identification and Control(ICMIC2012). pp. 987–992.CS1 maint: Uses authors parameter (link)
  6. ^ "Covariance of X and Y | STAT 414/415". The Pennsylvania State University. 12/9/2016. Retrieved 12/9/2016. Check date values in: |access-date=, |date= (help)
  7. ^ Papoulis (1991). Probability, Random Variables and Stochastic Processes. McGraw-Hill.
  8. ^ Siegrist, Kyle. "Covariance and Correlation". University of Alabama in Huntsville. Retrieved 12/9/2016. Check date values in: |access-date= (help)
  9. ^ a b Gubner, John A. (2006). Probability and Random Processes for Electrical and Computer Engineers. Cambridge University Press. ISBN 978-0-521-86470-1.
  10. ^ Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn., p. 232. Boston: Addison-Wesley.
  11. ^ Schubert, Erich; Gertz, Michael (2018). "Numerically stable parallel computation of (co-)variance". Proceedings of the 30th International Conference on Scientific and Statistical Database Management - SSDBM '18. Bozen-Bolzano, Italy: ACM Press: 1–12. doi:10.1145/3221269.3223036. ISBN 9781450365055.
  12. ^ Sahidullah, Md.; Kinnunen, Tomi (March 2016). "Local spectral variability features for speaker verification". Digital Signal Processing. 50: 1–11. doi:10.1016/j.dsp.2015.10.011.

External links

Analysis of covariance

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

The ANCOVA model assumes a linear relationship between the response (DV) and covariate (CV):

In this equation, the DV, is the jth observation under the ith categorical group; the CV, is the jth observation of the covariate under the ith group. Variables in the model that are derived from the observed data are (the grand mean) and (the global mean for covariate ). The variables to be fitted are (the effect of the ith level of the IV), (the slope of the line) and (the associated unobserved error term for the jth observation in the ith group).

Under this specification, the a categorical treatment effects sum to zero The standard assumptions of the linear regression model are also assumed to hold, as discussed below.

Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance.

Unit root processes, trend stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.

Correlation and dependence

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though in common usage it most often refers to how close two variables are to having a linear relationship with each other. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).

Formally, random variables are dependent if they do not satisfy a mathematical property of probabilistic independence. In informal parlance, correlation is synonymous with dependence. However, when used in a technical sense, correlation refers to any of several specific types of relationship between mean values.[clarification needed] There are several correlation coefficients, often denoted or , measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other). Other correlation coefficients have been developed to be more robust than the Pearson correlation – that is, more sensitive to nonlinear relationships. Mutual information can also be applied to measure dependence between two variables.

Covariance and contravariance

Covariance and contravariance may refer to:

Covariance and contravariance of vectors, in mathematics and theoretical physics

Covariance and contravariance of functors, in category theory

Covariance and contravariance (computer science), whether a type system preserves the ordering ≤ of types

Covariance and contravariance of vectors

In multilinear algebra and tensor analysis, covariance and contravariance describe how the quantitative description of certain geometric or physical entities changes with a change of basis.

In physics, a basis is sometimes thought of as a set of reference axes. A change of scale on the reference axes corresponds to a change of units in the problem. For instance, by changing scale from meters to centimeters (that is, dividing the scale of the reference axes by 100), the components of a measured velocity vector are multiplied by 100. Vectors exhibit this behavior of changing scale inversely to changes in scale to the reference axes and consequently are called contravariant. As a result, vectors often have units of distance or distance with other units (as, for example, velocity has units of distance divided by time).

In contrast, covectors (also called dual vectors) typically have units of the inverse of distance or the inverse of distance with other units. An example of a covector is the gradient, which has units of a spatial derivative, or distance−1. The components of covectors change in the same way as changes to scale of the reference axes and consequently are called covariant.

A third concept related to covariance and contravariance is invariance. An example of a physical observable that does not change with a change of scale on the reference axes is the mass of a particle, which has units of mass (that is, no units of distance). The single, scalar value of mass is independent of changes to the scale of the reference axes and consequently is called invariant.

Under more general changes in basis:

Curvilinear coordinate systems, such as cylindrical or spherical coordinates, are often used in physical and geometric problems. Associated with any coordinate system is a natural choice of coordinate basis for vectors based at each point of the space, and covariance and contravariance are particularly important for understanding how the coordinate description of a vector changes by passing from one coordinate system to another.

The terms covariant and contravariant were introduced by James Joseph Sylvester in 1851 in the context of associated algebraic forms theory. In the lexicon of category theory, covariance and contravariance are properties of functors; unfortunately, it is the lower-index objects (covectors) that generically have pullbacks, which are contravariant, while the upper-index objects (vectors) instead have pushforwards, which are covariant. This terminological conflict may be avoided by calling contravariant functors "cofunctors"—in accord with the "covector" terminology, and continuing the tradition of treating vectors as the concept and covectors as the coconcept.

Tensors are objects in multilinear algebra that can have aspects of both covariance and contravariance.

Covariance matrix

In probability theory and statistics, a covariance matrix, also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix, is a matrix whose element in the i, j position is the covariance between the i-th and j-th elements of a random vector. A random vector is a random variable with multiple dimensions. Each element of the vector is a scalar random variable. Each element has either a finite number of observed empirical values or a finite or infinite number of potential values. The potential values are specified by a theoretical joint probability distribution.

Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in the and directions contain all of the necessary information; a matrix would be necessary to fully characterize the two-dimensional variation.

Because the covariance of the i-th random variable with itself is simply that random variable's variance, each element on the principal diagonal of the covariance matrix is the variance of one of the random variables. Because the covariance of the i-th random variable with the j-th one is the same thing as the covariance of the j-th random variable with the i-th random variable, every covariance matrix is symmetric. Also, every covariance matrix is positive semi-definite.

The auto-covariance matrix of a random vector is typically denoted by or .

Cross-correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology.

The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

In probability and statistics, the term cross-correlations is used for referring to the correlations between the entries of two random vectors and , while the correlations of a random vector are considered to be the correlations between the entries of itself, those forming the correlation matrix (matrix of correlations) of . If each of and is a scalar random variable which is realized repeatedly in temporal sequence (a time series), then the correlations of the various temporal instances of are known as autocorrelations of , and the cross-correlations of with across time are temporal cross-correlations.

Furthermore, in probability and statistics the definition of correlation always includes a standardising factor in such a way that correlations have values between −1 and +1.

If and are two independent random variables with probability density functions and , respectively, then the probability density of the difference is formally given by the cross-correlation (in the signal-processing sense) ; however this terminology is not used in probability and statistics. In contrast, the convolution (equivalent to the cross-correlation of and ) gives the probability density function of the sum .

Functor

In mathematics, a functor is a map between categories. Functors were first considered in algebraic topology, where algebraic objects (such as the fundamental group) are associated to topological spaces, and maps between these algebraic objects are associated to continuous maps between spaces. Nowadays, functors are used throughout modern mathematics to relate various categories. Thus, functors are important in all areas within mathematics to which category theory is applied.

The word functor was borrowed by mathematicians from the philosopher Rudolf Carnap, who used the term in a linguistic context;

see function word.

Galilean invariance

Galilean invariance or Galilean relativity states that the laws of motion are the same in all inertial frames. Galileo Galilei first described this principle in 1632 in his Dialogue Concerning the Two Chief World Systems using the example of a ship travelling at constant velocity, without rocking, on a smooth sea; any observer below the deck would not be able to tell whether the ship was moving or stationary.

General covariance

In theoretical physics, general covariance, also known as diffeomorphism covariance or general invariance, consists of the invariance of the form of physical laws under arbitrary differentiable coordinate transformations. The essential idea is that coordinates do not exist a priori in nature, but are only artifices used in describing nature, and hence should play no role in the formulation of fundamental physical laws.

Hotelling's T-squared distribution

In statistics Hotelling's T-squared distribution (T2) is a multivariate distribution proportional to the F-distribution and arises importantly as the distribution of a set of statistics which are natural generalizations of the statistics underlying Student's t-distribution.

Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

Kalman filter

In statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory.

The Kalman filter has numerous applications in technology. A common application is for guidance, navigation, and control of vehicles, particularly aircraft and spacecraft. Furthermore, the Kalman filter is a widely applied concept in time series analysis used in fields such as signal processing and econometrics. Kalman filters also are one of the main topics in the field of robotic motion planning and control, and they are sometimes included in trajectory optimization. The Kalman filter also works for modeling the central nervous system's control of movement. Due to the time delay between issuing motor commands and receiving sensory feedback, use of the Kalman filter supports a realistic model for making estimates of the current state of the motor system and issuing updated commands.The algorithm works in a two-step process. In the prediction step, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some amount of error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty. The algorithm is recursive. It can run in real time, using only the present input measurements and the previously calculated state and its uncertainty matrix; no additional past information is required.

Using a Kalman filter does not assume that the errors are Gaussian. However, the filter yields the exact conditional probability estimate in the special case that all errors are Gaussian.

Extensions and generalizations to the method have also been developed, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The underlying model is similar to a hidden Markov model except that the state space of the latent variables is continuous and all latent and observed variables have Gaussian distributions.

Lorentz covariance

In relativistic physics, Lorentz symmetry, named for Hendrik Lorentz, is an equivalence of observation or observational symmetry due to special relativity implying that the laws of physics stay the same for all observers that are moving with respect to one another within an inertial frame. It has also been described as "the feature of nature that says experimental results are independent of the orientation or the boost velocity of the laboratory through space".Lorentz covariance, a related concept, is a property of the underlying spacetime manifold. Lorentz covariance has two distinct, but closely related meanings:

A physical quantity is said to be Lorentz covariant if it transforms under a given representation of the Lorentz group. According to the representation theory of the Lorentz group, these quantities are built out of scalars, four-vectors, four-tensors, and spinors. In particular, a Lorentz covariant scalar (e.g., the space-time interval) remains the same under Lorentz transformations and is said to be a Lorentz invariant (i.e., they transform under the trivial representation).

An equation is said to be Lorentz covariant if it can be written in terms of Lorentz covariant quantities (confusingly, some use the term invariant here). The key property of such equations is that if they hold in one inertial frame, then they hold in any inertial frame; this follows from the result that if all the components of a tensor vanish in one frame, they vanish in every frame. This condition is a requirement according to the principle of relativity; i.e., all non-gravitational laws must make the same predictions for identical experiments taking place at the same spacetime event in two different inertial frames of reference.On manifolds, the words covariant and contravariant refer to how objects transform under general coordinate transformations. Both covariant and contravariant four-vectors can be Lorentz covariant quantities.

Local Lorentz covariance, which follows from general relativity, refers to Lorentz covariance applying only locally in an infinitesimal region of spacetime at every point. There is a generalization of this concept to cover Poincaré covariance and Poincaré invariance.

Poincaré group

The Poincaré group, named after Henri Poincaré (1906), was first defined by Minkowski (1908) as the group of Minkowski spacetime isometries. It is a ten-dimensional non-abelian Lie group of fundamental importance in physics.

Principal component analysis

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. If there are observations with variables, then the number of distinct principal components is . This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors (each being a linear combination of the variables and containing n observations) are an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the original variables.

PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X (Golub and Van Loan, 1983), eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis (for a discussion of the differences between PCA and factor analysis see Ch. 7 of Jolliffe's Principal Component Analysis), Eckart–Young theorem (Harman, 1960), or empirical orthogonal functions (EOF) in meteorological science, empirical eigenfunction decomposition (Sirovich, 1987), empirical component analysis (Lorenz, 1956), quasiharmonic modes (Brooks et al., 1988), spectral decomposition in noise and vibration, and empirical modal analysis in structural dynamics.

PCA is mostly used as a tool in exploratory data analysis and for making predictive models. It is often used to visualize genetic distance and relatedness between populations. PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after a normalization step of the initial data. The normalization of each attribute consists of mean centering – subtracting each data value from its variable's measured mean so that its empirical mean (average) is zero – and, possibly, normalizing each variable's variance to make it equal to 1; see Z-scores. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score). If component scores are standardized to unit variance, loadings must contain the data variance in them (and that is the magnitude of eigenvalues). If component scores are not standardized (therefore they contain the data variance) then loadings must be unit-scaled, ("normalized") and these weights are called eigenvectors; they are the cosines of orthogonal rotation of variables into principal components or back.

PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way that best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA can supply the user with a lower-dimensional picture, a projection of this object when viewed from its most informative viewpoint[citation needed]. This is done by using only the first few principal components so that the dimensionality of the transformed data is reduced.

PCA is closely related to factor analysis. Factor analysis typically incorporates more domain specific assumptions about the underlying structure and solves eigenvectors of a slightly different matrix.

PCA is also related to canonical correlation analysis (CCA). CCA defines coordinate systems that optimally describe the cross-covariance between two datasets while PCA defines a new orthogonal coordinate system that optimally describes variance in a single dataset.

Sample mean and covariance

The sample mean or empirical mean and the sample covariance are statistics computed from a collection (the sample) of data on one or more random variables.

The sample mean and sample covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the sample was taken.

The sample mean is a vector each of whose elements is the sample mean of one of the random variables – that is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose i, j element is the sample covariance (an estimate of the population covariance) between the sets of observed values of two of the variables and whose i, i element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number (the arithmetic average of the observed values of that variable) and the sample covariance matrix is also simply a single value (a 1x1 matrix containing a single number, the sample variance of the observed values of that variable).

Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics and applications to numerically represent the location and dispersion, respectively, of a distribution.

Stationary process

In mathematics and statistics, a stationary process (a.k.a. a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.

Since stationarity is an assumption underlying many statistical procedures used in time series analysis, non-stationary data is often transformed to become stationary. The most common cause of violation of stationarity is a trend in the mean, which can be due either to the presence of a unit root or of a deterministic trend. In the former case of a unit root, stochastic shocks have permanent effects, and the process is not mean-reverting. In the latter case of a deterministic trend, the process is called a trend stationary process, and stochastic shocks have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean.

A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is a cyclostationary process, which is a stochastic process that varies cyclically with time.

For many applications strict-sense stationarity is too restrictive. Other forms of stationarity such as wide-sense stationarity or N-th order stationarity are then employed. The definitions for different kinds of stationarity are not consistent among different authors (see Other terminology).

Structural equation modeling

Structural equation modeling (SEM) is a form of causal modeling that includes a diverse set of mathematical models, computer algorithms, and statistical methods that fit networks of constructs to data. SEM includes confirmatory factor analysis, confirmatory composite analysis, path analysis, partial least squares path modeling, and latent growth modeling. The concept should not be confused with the related concept of structural models in econometrics, nor with structural models in economics. Structural equation models are often used to assess unobservable 'latent' constructs. They often invoke a measurement model that defines latent variables using one or more observed variables, and a structural model that imputes relationships between latent variables. The links between constructs of a structural equation model may be estimated with independent regression equations or through more involved approaches such as those employed in LISREL.Use of SEM is commonly justified in the social sciences because of its ability to impute relationships between unobserved constructs (latent variables) from observable variables. To provide a simple example, the concept of human intelligence cannot be measured directly as one could measure height or weight. Instead, psychologists develop a hypothesis of intelligence and write measurement instruments with items (questions) designed to measure intelligence according to their hypothesis. They would then use SEM to test their hypothesis using data gathered from people who took their intelligence test. With SEM, "intelligence" would be the latent variable and the test items would be the observed variables.

A simplistic model suggesting that intelligence (as measured by four questions) can predict academic performance (as measured by SAT, ACT, and high school GPA) is shown above (top right). In SEM diagrams, latent variables are commonly shown as ovals and observed variables as rectangles. The diagram above shows how error (e) influences each intelligence question and the SAT, ACT, and GPA scores, but does not influence the latent variables. SEM provides numerical estimates for each of the parameters (arrows) in the model to indicate the strength of the relationships. Thus, in addition to testing the overall theory, SEM therefore allows the researcher to diagnose which observed variables are good indicators of the latent variables.Various methods in structural equation modeling have been used in the sciences, business, and other fields. Criticism of SEM methods often addresses pitfalls in mathematical formulation, weak external validity of some accepted models and philosophical bias inherent to the standard procedures.

Weighted arithmetic mean

The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

If all the weights are equal, then the weighted mean is the same as the arithmetic mean. While weighted means generally behave in a similar fashion to arithmetic means, they do have a few counterintuitive properties, as captured for instance in Simpson's paradox.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.