## Lemma 1

Suppose we partition the mean and covariance of a gaussian vector $$y$$ as $$y = \left[ \begin{array}{c} y_1 \\ y_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} u_1 \\ u_2 \end{array} \right] , \left[ \begin{array}{cc} C_{11} & C_{12} \\ C_{21} & C_{22} \end{array} \right] \right)$$ then the distribution of the latter part of the vector $$y_2$$ conditioned on the former taking a known value $$y_1=a_1$$ is multivariate gaussian with mean and variance indicated below: $$y_2 \mid y_1=a_1 \sim N \left( u_2 + C_{21} C_{11}^{-1}(a_1-u_1), C_{22}-C_{21}C_{11}^{-1}C_{12} \right)$$ We note this here for convenience. See Wikipedia notes on conditional multivariate gaussian, for example.

## Lemma 2a

Suppose $$x$$ is partitioned as $$x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right)$$ If we hit $$x$$ with a linear transformation and also add another gaussian random variable, viz: $$y = \left[ \begin{array}{cc} I & 0 \\ I & 0 \\ 0 & I \end{array} \right] x - \left[ \begin{array}{c} \eta \\ 0 \\ 0 \end{array} \right]$$ where $$\eta$$ is the multivariate normal random vector $$\eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right)$$ with offset mean parameter $$\tilde{\mu}_1$$ and variance $$\tilde{\Sigma}_{11}$$ (and the use of the offset $$\tilde{a}_1$$ will become apparent), then by properties of affine transformation (see Wikipedia again) and addition of multivariate random variables we have $$y \sim N\left( \left[ \begin{array}{c} \mu_1 - \tilde{\mu}_1 + \tilde{a}_1 \\ \mu_1 \\ \mu_2 \end{array} \right], \left[ \begin{array}{ccc} \Sigma_{11} + \tilde{\Sigma}_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{12} & \Sigma_{22} \end{array} \right] \right)$$

## Lemma 2b

With $$x,y$$ as in Lemma 2a, let us reuse notation for the second and third blocks of $$y$$, i.e. $$x_1$$ and $$x_2$$, since the transformation evidently leaves them untouched. We write $$\tilde{x}_1$$ for the part that is changed: $$y = \left[ \begin{array}{c} \tilde{x}_1 \\ x_1 \\ x_2 \end{array} \right]$$ But now if we condition on $$\tilde{x}_1=\tilde{a}_1$$ then the conditional distribution of the second and third parts of $$y$$ will of course change. It is gaussian by application of Lemma 1 and has mean, covariance given by $$\left[ \begin{array}{c} x_1^{+} \\ x_2^{+} \end{array} \right] \sim N\left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] + \left[ \begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11}^{-1} \right) \left( \tilde{\mu}_1 - \mu_1 \right), \Sigma - \left[\begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11} \right)^{-1} \left[ \Sigma_{11} \Sigma_{21} \right] \right)$$

## Interpretation of Lemmas 2a and 2b

The formulas have the following interpretation. Suppose we assume $$x$$ has some prior distribution $$x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right)$$ and we now condition on $$x_1 -\nu = 0$$ where $$\nu \sim N \left(\tilde{\mu}_1, \tilde{\Sigma}_{11} \right)$$. Then the posterior distribution of $$x = [x_1\ x_2]'$$ is precisely the multivariate gaussian vector $$x^{+} = [x^{+}_1 \ x^{+}_2]'$$ given above. It has the interpretation of a posterior distribution when one part of the vector is conditioned not on a precise vector of values, but a noisy one as with Kalman filtering. In the calculation above we conditioned on $$x_1 - \eta = \tilde{a}_1$$ where $$\eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right)$$, but that is evidently the same thing as conditioning on $$x_1 = \nu$$ where $$\nu \sim N \left(\tilde{\mu}_1 , \tilde{\Sigma}_{11} \right)$$.
We remark that if $$\tilde{\Sigma}_{11}$$ is relatively small with respect to $$\Sigma_{11}$$ then this more or less forces one part of $$x$$ to adopt a new mean and variance $$\tilde{\mu}_1,\tilde{\Sigma}_{11}$$. For example in the scalar case with $$\Sigma_{11} = \sigma_{11}$$ and $$\tilde{\Sigma}_{11} = \tilde{\sigma}_{11}$$ the posterior mean of $$x_1$$ is $$\tilde{\mu}_1- \frac{1}{1+\tilde{\sigma_{11}}/\sigma_{11}}(\tilde{\mu_1}-\mu_1)$$ and this tends to the new, prescribed mean $$\tilde{\mu}_1$$ as $$\tilde{\sigma}_{11}/\sigma_{11}$$ tends to zero.