'cause I'm tired of re-deriving this stuff!
Lemma 1
Suppose we partition the mean and covariance of a gaussian vector y as y=[y1y2]∼N([u1u2],[C11C12C21C22]) then the distribution of the latter part of the vector y2 conditioned on the former taking a known value y1=a1 is multivariate gaussian with mean and variance indicated below: y2∣y1=a1∼N(u2+C21C−111(a1−u1),C22−C21C−111C12) We note this here for convenience. See Wikipedia notes on conditional multivariate gaussian, for example.Lemma 2a
Suppose x is partitioned as x=[x1x2]∼N([μ1μ2],[Σ11Σ12Σ21Σ22]) If we hit x with a linear transformation and also add another gaussian random variable, viz: y=[I0I00I]x−[η00] where η is the multivariate normal random vector η∼N(˜μ1−˜a1,˜Σ11) with offset mean parameter ˜μ1 and variance ˜Σ11 (and the use of the offset ˜a1 will become apparent), then by properties of affine transformation (see Wikipedia again) and addition of multivariate random variables we have y∼N([μ1−˜μ1+˜a1μ1μ2],[Σ11+˜Σ11Σ11Σ12Σ11Σ11Σ12Σ21Σ12Σ22])Lemma 2b
With x,y as in Lemma 2a, let us reuse notation for the second and third blocks of y, i.e. x1 and x2, since the transformation evidently leaves them untouched. We write ˜x1 for the part that is changed: y=[˜x1x1x2] But now if we condition on ˜x1=˜a1 then the conditional distribution of the second and third parts of y will of course change. It is gaussian by application of Lemma 1 and has mean, covariance given by [x+1x+2]∼N([μ1μ2]+[Σ11Σ21](Σ11+˜Σ−111)(˜μ1−μ1),Σ−[Σ11Σ21](Σ11+˜Σ11)−1[Σ11Σ21])Interpretation of Lemmas 2a and 2b
The formulas have the following interpretation. Suppose we assume x has some prior distribution x=[x1x2]∼N([μ1μ2],[Σ11Σ12Σ21Σ22]) and we now condition on x1−ν=0 where ν∼N(˜μ1,˜Σ11). Then the posterior distribution of x=[x1 x2]′ is precisely the multivariate gaussian vector x+=[x+1 x+2]′ given above. It has the interpretation of a posterior distribution when one part of the vector is conditioned not on a precise vector of values, but a noisy one as with Kalman filtering. In the calculation above we conditioned on x1−η=˜a1 where η∼N(˜μ1−˜a1,˜Σ11), but that is evidently the same thing as conditioning on x1=ν where ν∼N(˜μ1,˜Σ11).We remark that if ˜Σ11 is relatively small with respect to Σ11 then this more or less forces one part of x to adopt a new mean and variance ˜μ1,˜Σ11. For example in the scalar case with Σ11=σ11 and ˜Σ11=˜σ11 the posterior mean of x1 is ˜μ1−11+~σ11/σ11(~μ1−μ1) and this tends to the new, prescribed mean ˜μ1 as ˜σ11/σ11 tends to zero.
No comments:
Post a Comment