Saturday, December 29, 2012

Conditioning a multivariate normal vector on partial, noisy gaussian evidence


'cause I'm tired of re-deriving this stuff!

Lemma 1

Suppose we partition the mean and covariance of a gaussian vector \(y\) as $$ y = \left[ \begin{array}{c} y_1 \\ y_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} u_1 \\ u_2 \end{array} \right] , \left[ \begin{array}{cc} C_{11} & C_{12} \\ C_{21} & C_{22} \end{array} \right] \right) $$ then the distribution of the latter part of the vector \(y_2\) conditioned on the former taking a known value \(y_1=a_1\) is multivariate gaussian with mean and variance indicated below: $$ y_2 \mid y_1=a_1 \sim N \left( u_2 + C_{21} C_{11}^{-1}(a_1-u_1), C_{22}-C_{21}C_{11}^{-1}C_{12} \right) $$ We note this here for convenience. See Wikipedia notes on conditional multivariate gaussian, for example.

Lemma 2a

Suppose \(x\) is partitioned as $$ x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$ If we hit \(x\) with a linear transformation and also add another gaussian random variable, viz: $$ y = \left[ \begin{array}{cc} I & 0 \\ I & 0 \\ 0 & I \end{array} \right] x - \left[ \begin{array}{c} \eta \\ 0 \\ 0 \end{array} \right] $$ where \(\eta\) is the multivariate normal random vector $$ \eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right) $$ with offset mean parameter \( \tilde{\mu}_1\) and variance \(\tilde{\Sigma}_{11}\) (and the use of the offset \(\tilde{a}_1\) will become apparent), then by properties of affine transformation (see Wikipedia again) and addition of multivariate random variables we have $$ y \sim N\left( \left[ \begin{array}{c} \mu_1 - \tilde{\mu}_1 + \tilde{a}_1 \\ \mu_1 \\ \mu_2 \end{array} \right], \left[ \begin{array}{ccc} \Sigma_{11} + \tilde{\Sigma}_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{12} & \Sigma_{22} \end{array} \right] \right) $$

Lemma 2b

With \(x,y\) as in Lemma 2a, let us reuse notation for the second and third blocks of \(y\), i.e. \(x_1\) and \(x_2\), since the transformation evidently leaves them untouched. We write \(\tilde{x}_1\) for the part that is changed: $$ y = \left[ \begin{array}{c} \tilde{x}_1 \\ x_1 \\ x_2 \end{array} \right] $$ But now if we condition on \(\tilde{x}_1=\tilde{a}_1\) then the conditional distribution of the second and third parts of \(y\) will of course change. It is gaussian by application of Lemma 1 and has mean, covariance given by $$ \left[ \begin{array}{c} x_1^{+} \\ x_2^{+} \end{array} \right] \sim N\left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] + \left[ \begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11}^{-1} \right) \left( \tilde{\mu}_1 - \mu_1 \right), \Sigma - \left[\begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11} \right)^{-1} \left[ \Sigma_{11} \Sigma_{21} \right] \right) $$

Interpretation of Lemmas 2a and 2b

The formulas have the following interpretation. Suppose we assume \(x\) has some prior distribution $$ x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$ and we now condition on \(x_1 -\nu = 0\) where \( \nu \sim N \left(\tilde{\mu}_1, \tilde{\Sigma}_{11} \right) \). Then the posterior distribution of \(x = [x_1\ x_2]'\) is precisely the multivariate gaussian vector \(x^{+} = [x^{+}_1 \ x^{+}_2]'\) given above. It has the interpretation of a posterior distribution when one part of the vector is conditioned not on a precise vector of values, but a noisy one as with Kalman filtering. In the calculation above we conditioned on \(x_1 - \eta = \tilde{a}_1\) where \( \eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right) \), but that is evidently the same thing as conditioning on \(x_1 = \nu \) where \( \nu \sim N \left(\tilde{\mu}_1 , \tilde{\Sigma}_{11} \right) \).
We remark that if \(\tilde{\Sigma}_{11}\) is relatively small with respect to \(\Sigma_{11}\) then this more or less forces one part of \(x\) to adopt a new mean and variance \(\tilde{\mu}_1,\tilde{\Sigma}_{11}\). For example in the scalar case with \( \Sigma_{11} = \sigma_{11} \) and \( \tilde{\Sigma}_{11} = \tilde{\sigma}_{11} \) the posterior mean of \(x_1\) is \(\tilde{\mu}_1- \frac{1}{1+\tilde{\sigma_{11}}/\sigma_{11}}(\tilde{\mu_1}-\mu_1) \) and this tends to the new, prescribed mean \(\tilde{\mu}_1\) as \( \tilde{\sigma}_{11}/\sigma_{11}\) tends to zero.

No comments:

Post a Comment