Saturday, December 29, 2012

Conditioning a multivariate normal vector on partial, noisy gaussian evidence


'cause I'm tired of re-deriving this stuff!

Lemma 1

Suppose we partition the mean and covariance of a gaussian vector \(y\) as $$ y = \left[ \begin{array}{c} y_1 \\ y_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} u_1 \\ u_2 \end{array} \right] , \left[ \begin{array}{cc} C_{11} & C_{12} \\ C_{21} & C_{22} \end{array} \right] \right) $$ then the distribution of the latter part of the vector \(y_2\) conditioned on the former taking a known value \(y_1=a_1\) is multivariate gaussian with mean and variance indicated below: $$ y_2 \mid y_1=a_1 \sim N \left( u_2 + C_{21} C_{11}^{-1}(a_1-u_1), C_{22}-C_{21}C_{11}^{-1}C_{12} \right) $$ We note this here for convenience. See Wikipedia notes on conditional multivariate gaussian, for example.

Lemma 2a

Suppose \(x\) is partitioned as $$ x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$ If we hit \(x\) with a linear transformation and also add another gaussian random variable, viz: $$ y = \left[ \begin{array}{cc} I & 0 \\ I & 0 \\ 0 & I \end{array} \right] x - \left[ \begin{array}{c} \eta \\ 0 \\ 0 \end{array} \right] $$ where \(\eta\) is the multivariate normal random vector $$ \eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right) $$ with offset mean parameter \( \tilde{\mu}_1\) and variance \(\tilde{\Sigma}_{11}\) (and the use of the offset \(\tilde{a}_1\) will become apparent), then by properties of affine transformation (see Wikipedia again) and addition of multivariate random variables we have $$ y \sim N\left( \left[ \begin{array}{c} \mu_1 - \tilde{\mu}_1 + \tilde{a}_1 \\ \mu_1 \\ \mu_2 \end{array} \right], \left[ \begin{array}{ccc} \Sigma_{11} + \tilde{\Sigma}_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{11} & \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{12} & \Sigma_{22} \end{array} \right] \right) $$

Lemma 2b

With \(x,y\) as in Lemma 2a, let us reuse notation for the second and third blocks of \(y\), i.e. \(x_1\) and \(x_2\), since the transformation evidently leaves them untouched. We write \(\tilde{x}_1\) for the part that is changed: $$ y = \left[ \begin{array}{c} \tilde{x}_1 \\ x_1 \\ x_2 \end{array} \right] $$ But now if we condition on \(\tilde{x}_1=\tilde{a}_1\) then the conditional distribution of the second and third parts of \(y\) will of course change. It is gaussian by application of Lemma 1 and has mean, covariance given by $$ \left[ \begin{array}{c} x_1^{+} \\ x_2^{+} \end{array} \right] \sim N\left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] + \left[ \begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11}^{-1} \right) \left( \tilde{\mu}_1 - \mu_1 \right), \Sigma - \left[\begin{array}{c} \Sigma_{11} \\ \Sigma_{21} \end{array} \right] \left( \Sigma_{11} + \tilde{\Sigma}_{11} \right)^{-1} \left[ \Sigma_{11} \Sigma_{21} \right] \right) $$

Interpretation of Lemmas 2a and 2b

The formulas have the following interpretation. Suppose we assume \(x\) has some prior distribution $$ x = \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \sim N \left( \left[ \begin{array}{c} \mu_1 \\ \mu_2 \end{array} \right] , \left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array} \right] \right) $$ and we now condition on \(x_1 -\nu = 0\) where \( \nu \sim N \left(\tilde{\mu}_1, \tilde{\Sigma}_{11} \right) \). Then the posterior distribution of \(x = [x_1\ x_2]'\) is precisely the multivariate gaussian vector \(x^{+} = [x^{+}_1 \ x^{+}_2]'\) given above. It has the interpretation of a posterior distribution when one part of the vector is conditioned not on a precise vector of values, but a noisy one as with Kalman filtering. In the calculation above we conditioned on \(x_1 - \eta = \tilde{a}_1\) where \( \eta \sim N \left(\tilde{\mu}_1 - \tilde{a}_1, \tilde{\Sigma}_{11} \right) \), but that is evidently the same thing as conditioning on \(x_1 = \nu \) where \( \nu \sim N \left(\tilde{\mu}_1 , \tilde{\Sigma}_{11} \right) \).
We remark that if \(\tilde{\Sigma}_{11}\) is relatively small with respect to \(\Sigma_{11}\) then this more or less forces one part of \(x\) to adopt a new mean and variance \(\tilde{\mu}_1,\tilde{\Sigma}_{11}\). For example in the scalar case with \( \Sigma_{11} = \sigma_{11} \) and \( \tilde{\Sigma}_{11} = \tilde{\sigma}_{11} \) the posterior mean of \(x_1\) is \(\tilde{\mu}_1- \frac{1}{1+\tilde{\sigma_{11}}/\sigma_{11}}(\tilde{\mu_1}-\mu_1) \) and this tends to the new, prescribed mean \(\tilde{\mu}_1\) as \( \tilde{\sigma}_{11}/\sigma_{11}\) tends to zero.

I'm missing something about contragredient transformations and portfolio return

This entry probably just indicates that I am missing something obvious about portfolio return.
The instantaneous return \(\gamma^{\pi}_t\) on a portfolio with weights \(\pi\) decomposes as $$ \gamma^{\pi}_t = \left< \frac{dZ_t}{Z} \right>_t = \pi_t^\top \gamma_t + \underbrace{\frac{1}{2} \left( \pi_t^\top diag(\xi \xi^{T}) - \overbrace{\pi^{T}_t \xi \xi^{T} \pi_t }^{portfolio\ variance} \right)}_{excess\ return\ process} $$ where \(\gamma_t^i\) is the drift of \(\log(X^i_t)\), the log of the \(i\)'th asset and the \(i\)'th row of \(\xi\) represents the factor decomposition of the diffusion into \(n\) independent Brownian motions comprising a vector \(dW_t\). $$ d (log(X^i_t)) = \gamma^i_t dt + \xi dW_t $$ Consider the map taking vectors \(x \mapsto y = \exp(C \log(x)) \) and thereby defining a new asset vector \(Y_t\). That is, \(Y_t\) is related to \(X_t\) by a simple matrix multiplication in log coordinates.
We consider portfolios of the new assets \(Y^i\) with weights \(\varpi_i\) say. And again, the portfolio return is related to the asset return via $$ \frac{dZ^\varpi}{Z^{\varpi}} = \varpi^\top \frac{dY_t}{Y_t} $$ where the fractions indicate coordinate-wise (i.e. pointwise) division. Let us suppose further that any instantaneous portfolio return using assets \(\{X^i\}\) can be replicated by a portfolio using assets \(\{Y^i\}\) only, and vice versa. $$ \varpi^\top \frac{dY_t}{Y_t} = \frac{dZ^\varpi}{Z^{\varpi}} = \frac{dZ^\pi}{Z^{\pi}} = \pi^\top \frac{dX_t}{X_t} $$
By a slightly heavy handed application of the multivariate Ito's Lemma (hey it's good to have it lying around) with \(g(x) = C x\) we have $$ \begin{eqnarray} d (log Y_t) & = & \frac{\partial g}{\partial t} dt + \left(\nabla g\right)^\top d \left( \log (X_t) \right) + \frac{1}{2} \left(d(\log X_t)\right)^\top \left(\nabla^2 g \right) \left(d(\log X_t)\right) \\ & = & 0 + C d \left(\log (X_t)\right) + 0 \\ & = & C\gamma dt + C\xi dW_t \\ \end{eqnarray} $$ so writing the decomposition of portfolio return with \(C\gamma\) in place of \(\gamma\) and \(C\xi\) in place of \(\xi\) we observe $$ \begin{eqnarray} \left< \frac{dZ^{\varpi}_t}{Z^{\varpi}} \right>_t & = & \varpi_t^\top C\gamma_t + \frac{1}{2} \left( \varpi_t^\top diag(C\xi \xi^{\top}C^\top) - \varpi^\top_t C \xi \xi^\top C^\top \varpi_t \right) \\ & = & \pi_t^\top \gamma_t + \frac{1}{2} \left( \pi_t^\top (C^{-1})^\top diag(C\xi \xi^{\top} C^\top) - \pi^\top_t \xi \xi^\top \pi_t \right) \end{eqnarray} $$ if we use contragredient weights \(\varpi_t = (C^\top)^{-1} \pi_t\). But does the contragredient choice actually result in the same drift? The linear and portfolio variance terms are the same, but on the other hand $$ \pi_t^\top (C^{-1})^\top {\rm diag}(C\xi \xi^{\top} C^\top) \not= \pi_t^\top {\rm diag}\left(\xi \xi^{\top} \right) $$



When A Bar-Bell Bond Portfolio Optimizes Modified Excess Return

Here's a litte curiosity I might get around to publishing some day: a bar-bell portfolio maximizes modified excess return. At first I thought it maximized excess return (in the sense of Stochastic Portfolio Theory) but felt that couldn't be right. Sure enough I made a mistake and thus was born "modified excess return", a shamelessly reverse engineered criteria to make what follows work.

A simple model for zero coupon bond dynamics

Assume a lattice of zero coupon bonds with prices \(B^i(t) = B(t;t+\tau^{i})\) and integer time to maturities \(\tau^{i}=i\) as \(i\) ranges from \(1\) to \(n\) years. We assume that all bonds are priced off the same piecewise constant forward curve with knot points also at integer years. We write $$ B^{i}(t) = \exp\left(- \int_t^{t+i} f(t,s) ds \right) $$ and assume further that the changes in forward rates \( f(t,s) \) at time \(t\) for different years are independent. We presume the forward rates are driven by standard Brownian motion with the same standard deviation \(\eta\). They may also have non-trivial drift but here it suffices to observe that the vector of bonds has dynamics given by $$ d \left[ \begin{array}{c} log B^{1}(t) \\ log B^{2}(t) \\ \dots \\ log B^{n}(t) \end{array} \right] = \left[ \begin{array}{c} \gamma^1(t) \\ \gamma^2(t) \\ \vdots \\ \gamma^n(t) \end{array} \right] \ dt + \eta \left[ \begin{array}{cccc} 1 & 0 & \dots & 0 \\ 1 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 1 & 1 & 1 & 1 \end{array} \right] \left[ \begin{array}{c} dW^1(t) \\ dW^2(t) \\ \dots \\ dW^n(t) \end{array} \right] $$ or more succinctly $$ d (log B) = \gamma \ dt + \eta J \ dW $$ for scalar constant \(\eta\), an \(n\) by \(n\) matrix \(J\) (implicitly defined by the above) and some drift coefficients \(\gamma\) that we don't care too much about in this particular exercise.

Defining modified excess return for a bond portfolio

We consider a portfolio of these bonds with weights \(\pi\) summing to unity. By analogy with Stochastic Portfolio theory we consider a modified excess return given by $$ {modified\ excess\ return} = \sum_{i=1}^n \pi_i \sigma_{ii} - 2 \sum_{i,j=1}^{n} \pi_i \pi_j \sigma_{ij} + \sum_{i=1}^n \pi_i^2 \sigma_{ii} $$ where, following Stochastic Portfolio Theory notation, \(\sigma_{ij}\) is the log-asset covariance, here equal to \(\eta^2\) multiplied by the \(i\),\(j\)'th element of \(J J^{\top}\). We make no statement as to what modified excess return represents, except to compare it to $$ excess\ return = \sum_{i=1}^n \pi_i \sigma_{ii} - \sum_{i,j=1}^{n} \pi_i \pi_j \sigma_{ij} $$ which is most certainly meaningful. Indeed, the log-optimal investor may seek to maximize excess return. In contrast, the modified excess return makes the covariance term more important so one might reason that, all else being equal, choosing this modification over the bone fide excess return represents a sacrifice of long term growth in exchange for reduced portfolio variance - though the difference picks up the between-asset terms only, not the variances. $$ \begin{eqnarray*} modified\ excess\ return - excess\ return & = & - \sum_{i,j=1}^{n} \pi_i \pi_j \sigma_{ij} + \sum_{i=1}^n \pi_i^2 \sigma_{ii} \\ & = & -\sum_{i \not= j} \pi_i \pi_j \sigma_{ij} \end{eqnarray*} $$ A similar tradeoff is made, also inadvertently, by those constructing minimum variance portfolios in the tradition of Markowitz and de Finetti. In the minimum variance prescription only the portfolio variance term \(\sum_{i,j=1}^{n} \pi_i \pi_j \sigma_{ij}\) is contemplated, not \(\sum_{i=1}^n \pi_i \sigma_{ii}\).

Maximizing the modified excess return

Whatever the modification contemplated may imply, we proceed towards its surprising implication by mentally multiplying \(J\) above and noting that \( (J J')_{i,j} = min(i,j) \) because $$ \left[ \begin{array}{cccc} 1 & 0 & \dots & 0 \\ 1 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 1 & 1 & 1 & 1 \end{array} \right] \left[ \begin{array}{cccc} 1 & 1 & \dots & 1 \\ 0 & 1 & \dots & 1 \\ \vdots & \vdots & \ddots & 1 \\ 0 & 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{cccc} 1 & 1 & \dots & 1 \\ 1 & 2 & \dots & 2 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 2 & \dots & n \end{array} \right] $$ Thus in this slightly contrived forward rate model we have modified excess return proportional to $$ \psi(\pi) = \sum_{i=1}^n i \pi_i - 2\sum_{i,j=1}^{n} min(i,j) \pi_i \pi_j + \sum_{i=1}^n i \pi_i^2 $$ This leaves us with a cute little optimization. We claim that \(\psi(\pi)\) and hence the modified excess return is maximized, subject to \(\sum_{i=1}^n \pi_i = 1\), by setting the portfolio equal to a barbell. Half the portfolio is invested in the first (shortest maturity) bond and the other half on the last (longest maturity) bond. That is, $$ \pi^* = \left[ \begin{array}{c} 1/2 \\ 0 \\ \vdots \\ 0 \\ 1/2 \end{array} \right] $$ corresponding to a modified excess return of \(\psi(\pi^*) = \frac{n-1}{4}\). To prove this observe that \(\psi(\pi)\) can be re-written as follows (count the number of times each \(\pi_i\) and \(\pi_i \pi_j\) occurs) $$ \begin{eqnarray*} \psi(\pi) & = & - 2 \sum_{i,j=1}^{n} min(i,j) \pi_i \pi_j + \sum_{i=1}^n i \pi_i^2 + \sum_{i=1}^n i \pi_i \\ & = & -(\pi_1 + \pi_2 + ... + \pi_n)^2 + (\pi_1 + \pi_2 + ... + \pi_n) \\ & & - (\pi_2 + ... + \pi_n)^2 + (\pi_2 + ... + \pi_n)\\ & & \vdots \\ & & - (\pi_{n-1}+\pi_{n})^2 + (\pi_{n-1}+\pi_{n}) \\ & & - (\pi_n)^2 + \pi_n \\ & = & \sum_{i=0}^{n-1} (-u_i^2 + u_i) \\ & = & \sum_{i=1}^{n-1} (-u_i^2 + u_i) \\ & = & \sum_{i=1}^{n-1} \left(-(u_i-1/2)^2+1/4 \right) \\ & = & \frac{n-1}{4} - \sum_{i=1}^{n-1} (u_i-1/2)^2 \end{eqnarray*} $$ where we have introduced \(u_i = \sum_{j=i+1}^n \pi_j \) as the sum of portfolio weights leaving out the first \(i\), and applied the constraint \(u_0=1\). The expression is clearly maximized by setting \(u_1 ... u_n\) equal to \(1/2\). By back substitution beginning with \(\pi_n\) this implies \(\pi = \pi^*\) as claimed.

Excess return on a portfolio of lognormal assets

Here's my attempt at "Stochastic Portfolio Theory in a Nutshell", for those who, like me, hadn't noticed a bug in Markowitz Theory.

Stochastic Portfolio Theory considers the decomposition of the instantaneous return on a continuously rebalanced portfolio into the instantaneous returns of constituent assets \begin{equation} \frac{ d Z_t} {Z_t} = \pi_t^\top \frac{ dX_t}{X_t} \label{portfolio} \end{equation} Here \(Z_t\) is the value of the portfolio, \(\pi_t\) is a vector of portfolio weights, the fractions indicate pointwise (i.e. pathwise) division and \(dX_t=(dX^1_t,...,dX^n_t) \) is a \(n x 1\) vector of stocks with lognormal dynamics: $$ d \left(log(X_t\right)_{n x 1} = \left( \gamma_t \right)_{n x 1} dt + \left( \xi \right)_{n x n} \left(dW_t\right)_{n x 1} $$ where \(dW_t\) is a vector and we've emphasized the dimensions throughout. Note that by Ito's Lemma we have $$ \frac{ dX_t}{X_t} = \left( \gamma_t + \frac{1}{2} diag \left( \xi \xi^{T} \right) \right) dt + \left( \xi \right)_{n x n} dW_t $$ where \(diag\) extracts a vector of diagonal entries. So if we mentally hit this on the left with the transpose of the portfolio weights \(\pi^\top\) we can see that the right hand side of our instantaneous return equation will have a Brownian term \(\pi_t^{T} \xi dW_t\) and a drift that we'll get back to momentarily. And if we are on the ball we'll remember that \( \frac{dZ}{Z} \) and \(d(\log Z)_t \) have the same Brownian terms. Thus \( d(\log Z)_t \) will also be driven by \(\pi_t^{T} \xi dW_t\) and can write for some as yet unspecified drift \(\gamma^{\pi}\) $$ d (\log Z_t) = \gamma^{\pi}_t dt + \pi_t^{T} \xi dW_t\ $$ To clean this up we apply Ito's Lemma again to retrieve \( Z_t = \exp(\log(Z_t))\) and thereby: $$ \frac{dZ_t}{Z_t} = \left\{ \gamma^{\pi}_t + \frac{1}{2} \pi^{T}_t \xi \xi^{T} \pi_t \right\} dt + \pi_t^{T} \xi dW_t $$ which reveals the drift term on the left hand side of the instantaneous return equations expressed in terms of a highly relevant quantity: the drift of the logarithm of portfolio wealth. Indeed we call \(\gamma^{\pi}_t\) the portfolio growth process. And we observe the important equality appearing in too few investment textbooks, if any beside Bob's: $$ \gamma^{\pi} = \pi_t \cdot \gamma_t + \underbrace{\frac{1}{2} \left( \pi_t \cdot diag(\xi \xi^{T}) - \overbrace{\pi^{T}_t \xi \xi^{T} \pi_t }^{portfolio\ variance} \right)}_{excess\ return\ process} $$ Thus in log space we might say that the portfolio growth is the linear combination of the growth in individual stocks plus the term involving curly braces. We refer to that additional kick as the excess growth rate. And we further observe that it decomposes into the difference between the weighted combination of stock variances and the portfolio variance process, denoted $$ \sigma^{\pi\pi}_t = \pi^{T}_t \xi \xi^{T} \pi_t $$ That's it for now. We note that if one is interested in the return on the logarithm of one's portfolio then the full decomposition into linear and excess return is obviously more pertinent than the linear term alone. And we see why minimizing portfolio variance subject to a known linear return is not, despite its significant popularity, the most relevant exercise.

The decomposition is useful independent of the investors utility function.

For more see Dr Fernholz's book on Amazon, of this summary paper.