Processing math: 100%

Wednesday, May 28, 2014

Sensitivities of a Kalman Filter estimate with respect to all past observations

Consider a sequence of observations t1,...,tk at which a latent vector process x is observed indirectly, via an observation equation yti=Hixti+ϵi We assume ϵi is mean zero multivariate gaussian with covariance Ri. For brevity we refer to yti as yi, xti as xi and so forth. We assume the evolution of x in between the times specified can be written xi+1=Aixi+ui where ui are also gaussian. In this linear gaussian system the recursive estimation of xt is achieved by the well known Kalman filter, and the contemporaneous impact of the next observation yk+1 is also (it is merely proportional to the Kalman gain).

But less well appreciated is a related computation, the derivatives of the Kalman filter estimate with respect to a past observation yi. This note establishes how said computation can be achieved by making two observations. The first is a re-representation of the Kalman estimate in the form of a weighted least squares problem (not dissimilar to the Duncan Horn representation). The second observation is that sensitivities of any weighted least squares problem can be computed using an adjoint trick.

Step 1: The Kalman filter solution as a (particular) least squares problem

We shall set up a least squares problem involving the current state xk only, and all the previous observations. We argue that the solution to this problem is identical to the Kalman filter. Since the current estimate ˆyk is a simple linear function of the current state xk, this allows us to compute the derivative of the current estimate with respect to all previous observations.

In the Kalman filter we assume a gaussian prior on the initial state x0. This can introduce annoying special cases in what follows, but we can clean up the notation instead by introducing: y1=H1x1+ϵ1x0=A1x1+u1 provided H1 and A1 are identity matrices, ϵ1 is identically zero, y1 is set equal to the mean of our prior and u0 adopts its covariance. With the boundary conditions cleaned up in this fashion we can invert the dynamical equations, assuming only that A's have left inverses A1, as follows: xj=A1j(xj+1uj) and then re-arrange the observation equations so that the only value of xi that appears is xk. yk=Hkxk+ϵkyk1=Hk1xk1+ϵk1=Hk1(A1k1(xkuk1))+ϵk1=Hk1A1k1xkHk1A1k1uk1+ϵk1yk2=Hk2xk2+ϵk2=Hk2(A1k2(xk1uk2))+ϵk2=Hk2A1k2xk1Hk2A1k2uk2+ϵk2=Hk2A1k2(A1k1(xkuk1))Hk2A1k2uk2+ϵk2=Hk2A1k2A1k1xkHk2A1k2A1k1uk1Hk2A1k2uk2+ϵk2 from which it is apparent that if we write Y=(yk,yk1,yk2,...,y1) then Y=Gxk+η where G is the concatenation of the coefficients of xk given above and η is the gaussian random variable equal to the sum of uk's and ϵk's (again, with coefficients as above, leading to a non-trivial covariance structure).

Step 2. (A review of the adjoint trick)

Suppose x solves Qx=b(y). The adjoint trick can be used to compute the derivative of g(x) w.r.t. y. In particular, if y is the observation and x the solution of a generalized least squares problem with error covariance R we can cast it in this form by writing: g(x)=HxQ=HTR1Hb(y)=HTR1y Consider now f(x,y)=0 where f(x,y)=Qxb(y) We use derivatives of ˜g=gλTf(x,y) with respect to y as a means of computing derivatives of g with respect to y. Note that ˜gy=gxxyλT(fxxy+fy) and this will simplify if we choose λ judiciously as a solution of gx=λTfx which we call the adjoint equation. For then ˜gy=gxxyλT(fxxy+fy)=λTfy=λTby Now specializing to g(x)=Hx and b(y) as above we can solve for this convenient choice of λ by writing H=gx=λTfx=λTQ=λTHTR1H where the second equality is the adjoint equation. It should be clear from this how to compute derivatives of ˜g with respect to y, and thereby compute derivatives of g with respect to y.