Readers will be familiar with statistical mechanics. But what about mechanical statistics?
This post is purely expository and concerns the simplest one dimensional Kalman filter. You'll recall the setup. We periodically observe Brownian motion subject to gaussian measurement error and attempt to infer the ground truth from those measurements. But rather than dive into a bunch of conditional expectations for gaussian linear systems we'll look for some physical intuition. We'll pretend that we've been robbed of our computers and are forced to create analog varieties just as in the good old days.
We make an observation that isn't always stressed up front in the statistical literature (Harrison and West for example) or control systems perspective (such as you will find at the at wikipedia entry for "Kalman filter", for example). Then we pursue the analogy between statistics and physics a little further, and show how the updating of a location estimate of a gaussian distribution amounts to a combination of center of mass and reduced mass calculations.
The perspective we arrive at is that most statistical calculations of a filtering variety are directly mapped to a corresponding physical system containing both fixed and free masses. Resolving these physical systems requires a change of reference frame. The calculations are therefore, somewhat unsurprisingly, precisely "center of mass" and "reduced mass" calculations from physics.
Kalman filter equations are just a center of mass calculation
Suppose the prior estimate of location for a particle is \(m\) and the prior covariance is \(P\). Suppose we make an observation \(y\) with error variance \(R\). Our posterior belief is gaussian with location \(m'\) say and variance \(P'\). The update is usually written
\begin{eqnarray}
m' & = & m + K ( y - m ) \\
P' & = & P(1-K), \ \ {\rm where} \\
K & = & \frac{P}{P+R}
\end{eqnarray}
However it is in many ways more natural to use the inverses of covariances instead. If we write \(\varphi = 1/R\), \(p = 1/P\) and \( p' = 1/P'\) and multiply by through by
\( \frac{P+R}{PR} \) we notice that the Kalman filter update is merely a center of mass calculation:
\begin{eqnarray}
m' & = & \frac{m/P + y/R} { 1/R + 1/P } = \frac{ pm + \varphi y }{ \varphi + p } \\
p' & = & \frac{1}{P'} = \frac{P+ R}{PR} = \frac{1}{P} + \frac{1}{R} = \varphi + p
\end{eqnarray}
The analogy works if we treat precision as mass. And in what follows we'll be equally interested in the analogy between force and the derivative of the negative log likelihood function.
An "analogue" gaussian smoother using perfect Hookean springs |
Futhermore the fact that combining two pieces of evidence for one latent variable can sometimes be as simple as merging the two observations at their "center of precision" corresponds to a nice accident when forces grow linearly with distance: the impact of two masses on a third is unchanged if they coalesce at their center of mass.
But there is more to the story...
Figure 1. Hierarchical model where location of a gaussian distribution is itself gaussian |
Suppose our prior is \begin{eqnarray} P( x | \mu ) & \propto & e^{-\rho(x-\mu)^2} \\ P(\mu) & \propto & e^{-p(\mu-m)^2} \end{eqnarray} where this time \(m\) represents our guess as to the location of the center of the distribution. Symbolically we might represent the prior with the following diagram.
Simplifying a spring diagram using reduced mass
The game begin in earnest when we introduce noisy evidence of our unknown location parameter \(\mu\) for our mysterious distribution. Suppose we take a draw from said distribution \(x_2\). Suppose we don't observe \(x_2\) itself but instead, a noisy measurement \(y\) whose precision (or "mass", if you will) is \(\varphi\). The noisy measurement's distribution conditional on \(x_2\) is \( P(y|x_2) \propto e^{-\varphi(y-x_2)^2}\) and corresponds to the following spring diagram.
Figure 3. Spring diagram representing noisy evidence |
Figure 4. Prior location belief plus a noisy measurement |
We ought to recall here the rules for combining springs in series, or to be more direct, the "reduced mass" trick for replacing a three body problem with a two body problem. In either situation physics reminds us that the combined action of the rightmost two masses can be simplified:
Figure 5. Prior belief plus a noisy measurement simplified using reduced mass |
We replace the mass \(\phi\) with a reduced mass \(\frac{\phi}{\phi+\rho}\) because the intermediating unit mass reduces the pull. Since it is well covered elsewhere I will not derive the reduced mass expression but notice why the reduced mass makes sense in the limits. If \(\phi \rightarrow 0\) the relative size of the yellow unit mass is huge and so the mass at \(\mu\) hardly feels the pull from the green mass at \(y\) at all. In the other extreme case, when \(\phi \rightarrow \infty\), the unit mass is sucked into the green mass and is, for all intents and purposes, stationary. Thus it acts like a fixed unit mass pulling the mass at \(\mu\) rather than a floating one.
We proceed to the final simplification of the diagram. This is pretty easy as the two green masses are inertial. Their impact on the yellow mass is equivalent to a single inertial mass at their center of mass. Thus:
Figure 6. Simplification of Figure 4 by reduced mass and center of mass calculation. |
This closes the loop and demonstrates how updating can be performed for the hierarchical model in Figure 1.
Recovering the Kalman filter update
As a parting note, we see that the limit \(\rho \rightarrow \infty\) leads to update equations \begin{eqnarray} p' & = & \frac{p}{\rho} + \frac{\varphi}{\varphi + \rho} = \frac{p+p/\rho + \varphi}{\varphi +\rho} \rightarrow \varphi + p \\ m' & = & \frac{ m \frac{p}{\rho} + y \frac{\varphi}{\varphi+\rho} } { \frac{p}{\rho} + \frac{\varphi}{\varphi + \rho} } \rightarrow \frac{ pm + \varphi y }{ \varphi + p } \end{eqnarray} which is the Kalman update as before. This is to be expected, since in the limit \(\rho \rightarrow \infty\) the problem of locating a distribution with unknown location (noisily observed) shrinks down to the problem of locating a point mass with unknown location (noisily observed).