Jan 22

Why probability contours for the multivariate Gaussian are elliptical

Every 2D Gaussian concentrates its mass at a particular point (a “bump”), with mass falling off steadily away from its peak.  If we plot regions that have the *same* height on the bump (the same density under the PDF), it turns out they have a particular form: an ellipse.  In this post, I’ll use math to show why it is an ellipse.

Here’s an example of the kind of contour plot I’m talking about (note that this shows *many* 2D Gaussians all on one plot). The horizontal axis shows the first coordinate, x_1, and the vertical axis shows x_2. The contours illustrate values of (x_1,x_2) that have the same probability under a particular Gaussian distribution, defined by parameters \mu, \Sigma.

The position of the contours is governed by the mean parameter \mu for each Gaussian.

The shape of the elliptical contours for each Gaussian is governed exclusively by \Sigma. For example,

  • the vertical (tall) ellipses (like the dark blue one) have covariance:

    (1)   \begin{align*} \Sigma = \left[ \begin{array}{c c} 0.005 & 0 \\ 0 & 0.05 \end{array} \right] \notag \end{align*}

  • the forward leaning (“/”) ellipses (like the orange one) have covariance:

    (2)   \begin{align*} \Sigma = \left[ \begin{array}{c c} .025 & .02 \\ .02 & .025 \end{array} \right] \notag \end{align*}

The goal of this post is to understand *why* this elliptical structure emerges no matter what covariance matrix we specify.

Multivariate Gaussian Math Basics

To start, we’ll remind ourselves of the basic math behind the multivariate Gaussian.

We are generating data x, which is a D-dimensional column vector.

We have two parameters:

  • a mean location \mu, which is a D-dimensional column vector
  • a covariance matrix \Sigma, a D-by-D positive definite matrix

The probability density function (PDF) looks like

    \[ p( x | \mu, \Sigma ) = \frac{1}{2\pi}^{D/2} \frac{1}{|\Sigma|}^{1/2} e^{ -\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\]

For the rest of this discussion, we’ll assume that D=2, since we’re interested in plotting in just 2 dimensions.

Level Sets and Ellipses

We are interested in finding a set \mathcal{X} of possible x vectors such that *every* entry x' in \mathcal{X} has the same value c = p( x' ).  Furthermore, we want this set to be exhaustive, meaning no vector not in the set also has the PDF value c.  This set \mathcal{X} often called a level set of the function p().  Note that each choice of a particular constant c defines a unique level set \mathcal{X}.

How can we find the level set for the Gaussian PDF? Well, we can see by the form of p() that x only influences the outcome through the term in the exponent.  So we might as well find a level set \mathcal{X} that holds constant the transformed function

    \[ L(x) = (x-\mu)^T \Sigma^{-1} (x-\mu) \]

This function L(x) is simpler to work with, so we prefer it to the original p(x).  The level set for L(x) given some constant c will be equivalent to the level set for the original PDF given some transformed constant c' = f(c).   It’s a nice exercise to find this transformation via algebra, but it’s not super relevant to this discussion so I’ll leave it to the reader.


We’d like to show that the level set of L(x) for some constant c defines an ellipse. Recall the general form of an axis-aligned ellipse in 2D:

    \[ 1 = \frac{ x_1^2 }{ a^2 } + \frac{ x_2^2 }{ b^2 } \]

Our goal is to show that the level set of L(x) can be shown to have elliptical form.  One immediate point to make is that the function L(x) is said to have “quadratic form“, and so does the standard elliptical form.  So intuitively, we’re on the right track.  Let’s proceed.

Change of Coordinates

The first step is to change the coordinate system.  Instead of having our system centered at the origin, (0,0), we’ll have it centered at \mu = (\mu_1, \mu_2).  This requires rewriting x as x', where x' = x + \mu.  Our new function to find level sets for is now:

    \[ L(x') = x'^T \Sigma^{-1} x' \]

If we can find a level set \mathcal{X}' for this function, we can recover the level set for L(x) just by inverting our coordinate transform.

The next step requires some knowledge of linear algebra, so we’ll make a brief aside to brush up.

Covariance Matrix Math Properties

Earlier, we mentioned that the covariance matrix \Sigma must be symmetric and positive definite.  This property ensures that we can always *invert* the matrix (that is, \Sigma^{-1} exists, which is not true for all 2D matrices).  But it carries other useful properties as well.

Most importantly, if \Sigma is positive definite, then so is its inverse \Sigma^{-1} (see this fact sheet). This implies that \Sigma^{-1} can be decomposed into a set of D eigenvector/eigenvalue pairs \{u_d, \lambda_d\} such that

  • the eigenvalues are all positive reals: \lambda_d >0
  • the eigenvectors are all orthogonal:  u_d ^T u_{d'} = 0 unless d = d'

So, we can write the inverse covariance matrix in terms of its eigen-decomposition.

    \[ \Sigma^{-1} = U \Lambda U^T \]

Using the following matrix definitions:

(3)   \begin{align*} U = \left[ \begin{array}{cc c c} | &| & & | \\ u_1 &u_2 &\cdots & u_D\\ | &| & &| \end{array} \right] \\ \Lambda = \left[ \begin{array}{cc c c} \lambda_1 &0 & &0 \\ 0 &\lambda_2 &\cdots & 0 \\ 0 &0 & &\lambda_D \end{array} \right] \\ \end{align*}

Remember that because of the orthonormal requirement, we know that U U^T = I (the identity matrix), and this means U^T = U^{-1}. This fact will be useful later.

Using this Eigendecomposition to Find Level Sets

Now, we can return to our function of interest and substitute in the decomposition:

(4)   \begin{align*} L( x' ) &= x'^T \Sigma^{-1} x' \\ &= x'^T U \Lambda U^T x' \end{align*}

Inspecting this function closely, we can make another coordinate system transform that makes our lives simpler. If we let \tilde{x} = U^T x', then

    \[ L( \tilde{x} ) = \tilde{x}' \Lambda \tilde{x} \]

Note that this transform x' \rightarrow \tilde{x} corresponds to a single 2D rotation of the coordinate system.

Why?  Recall that any matrix R is a rotation matrix if it is orthogonal and has unit determinant (see here to review).  We can show that U^T satisfies both of these properties.

  • U (and hence U^T) is orthogonal by definition, because its column vectors are orthonormal.
  • U^T has unit determinant:  we know I = UU^T, so det(I) = det(U)det(U^T). Since det(I)=1 we have

(5)   \begin{align*} 1 &= det(U) det(U^T)  \\   &=  det(U)^2 \end{align*}

which implies det(U) = \pm1. If det(U)=1, we have a valid rotation matrix. If det(U)=-1, U performs both rotation and reflection (e.g. flipping x_1 and x_2 in the output). Thus, the transform \tilde{x} =  U^T x' is a valid rotation (with a possible additional reflection).

What’s the advantage of this rotation? Because \Lambda is just a diagonal matrix, this can be rewritten simply as a weighted sum of squares:

    \[ L( \tilde{x} ) = \sum_{d=1}^D \lambda_d \tilde{x}_d^2 \]

Now, finding the level set for a constant C in 2D reduces to solving:

    \[ C = \lambda_1 \tilde{x}_1^2 + \lambda_2 \tilde{x}_2^2 \]

which can be written in elliptical form!

    \[ 1 = \frac{ \tilde{x}_1^2 }{ a^2 } + \frac{ \tilde{x}_2^2 }{ b^2 } \]

where a^2 = C/\lambda_1 and b^2 = C/\lambda_2. Note that it’s crucial that the eigenvectors \lambda_d are all positive for this analysis to hold.

We have thus accomplished our goal. Given any arbitrary covariance matrix \Sigma, the level sets of the probability density function of the Gaussian will have elliptical form.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>