Covariance

DOWNLOAD Mathematica Notebook EXPLORE THIS TOPIC IN the MathWorld Classroom

Covariance provides a measure of the strength of the correlation between two or more sets of random variates. The covariance for two random variates X and Y, each with sample size N, is defined by the expectation value

cov(X,Y)=<(X-mu_X)(Y-mu_Y)>
(1)
=<XY>-mu_Xmu_y
(2)

where mu_x=<X> and mu_y=<Y> are the respective means, which can be written out explicitly as

 cov(X,Y)=sum_(i=1)^N((x_i-x^_)(y_i-y^_))/N.
(3)

For uncorrelated variates,

 cov(X,Y)=<XY>-mu_Xmu_Y=<X><Y>-mu_Xmu_Y=0,
(4)

so the covariance is zero. However, if the variables are correlated in some way, then their covariance will be nonzero. In fact, if cov(X,Y)>0, then Y tends to increase as X increases, and if cov(X,Y)<0, then Y tends to decrease as X increases. Note that while statistically independent variables are always uncorrelated, the converse is not necessarily true.

In the special case of Y=X,

cov(X,X)=<X^2>-<X>^2
(5)
=sigma_X^2,
(6)

so the covariance reduces to the usual variance sigma_X^2=var(X). This motivates the use of the symbol sigma_(XY)=cov(X,Y), which then provides a consistent way of denoting the variance as sigma_(XX)=sigma_X^2, where sigma_X is the standard deviation.

The derived quantity

cor(X,Y)=(cov(X,Y))/(sigma_Xsigma_Y)
(7)
=(sigma_(XY))/(sqrt(sigma_(XX)sigma_(YY))),
(8)

is called statistical correlation of X and Y.

The covariance is especially useful when looking at the variance of the sum of two random variates, since

 var(X+Y)=var(X)+var(Y)+2cov(X,Y).
(9)

The covariance is symmetric by definition since

 cov(X,Y)=cov(Y,X).
(10)

Given n random variates denoted X_1, ..., X_n, the covariance sigma_(ij)=cov(X_i,X_j) of X_i and X_j is defined by

cov(X_i,X_j)=<(X_i-mu_i)(X_j-mu_j)>
(11)
=<X_iX_j>-mu_imu_j,
(12)

where mu_i=<X_i> and mu_j=<X_j> are the means of X_i and X_j, respectively. The matrix (V_(ij)) of the quantities V_(ij)=cov(X_i,X_j) is called the covariance matrix.

The covariance obeys the identities

cov(X+Z,Y)=<(X+Z)Y>-<X+Z><Y>
(13)
=<XY>+<ZY>-(<X>+<Z>)<Y>
(14)
=<XY>-<X><Y>+<ZY>-<Z><Y>
(15)
=cov(X,Y)+cov(Z,Y).
(16)

By induction, it therefore follows that

cov(sum_(i=1)^(n)X_i,Y)=sum_(i=1)^(n)cov(X_i,Y)
(17)
cov(sum_(i=1)^(n)X_i,sum_(j=1)^(m)Y_j)=sum_(i=1)^(n)cov(X_i,sum_(j=1)^(m)Y_j)
(18)
=sum_(i=1)^(n)cov(sum_(j=1)^(m)Y_j,X_i)
(19)
=sum_(i=1)^(n)sum_(j=1)^(m)cov(Y_j,X_i)
(20)
=sum_(i=1)^(n)sum_(j=1)^(m)cov(X_i,Y_j).
(21)

Wolfram Web Resources

Mathematica »

The #1 tool for creating Demonstrations and anything technical.

Wolfram|Alpha »

Explore anything with the first computational knowledge engine.

Wolfram Demonstrations Project »

Explore thousands of free applications across science, mathematics, engineering, technology, business, art, finance, social sciences, and more.

Computerbasedmath.org »

Join the initiative for modernizing math education.

Online Integral Calculator »

Solve integrals with Wolfram|Alpha.

Step-by-step Solutions »

Walk through homework problems step-by-step from beginning to end. Hints help you try the next step on your own.

Wolfram Problem Generator »

Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.

Wolfram Education Portal »

Collection of teaching and learning tools built by Wolfram education experts: dynamic textbook, lesson plans, widgets, interactive Demonstrations, and more.

Wolfram Language »

Knowledge-based programming for everyone.