Processing math: 56%

Wednesday, October 29, 2014

Some special cases of the Lambert W function

Suppose f(x) is given by the functional relation ea(x)f(x)+b(x)=c(x)f(x)+d(x) then f(x)=1a(x)W(a(x)c(x)eb(x)+a(x)d(x)c(x))d(x)c(x) where W is the Lambert W Function. You can click through to Wikipedia for a proof. Here I merely list some special cases which every good citizen should instantly recognize (and it can be mildly tedious to reproduce them).

Recovering the definition of W(x)

Just to check we are on the right planet set a=1, c=1, b=ln(x) and d=0 to get xef(x)=f(x) or equivalently f(x)ef(x)=x which is the definition of the Lambert W function. Sure enough those substitutions recover f(x)=W(x) as expected. A minor modification uses a(x)=1, c(x)=1, b(x)=0 but leaves d(x) general.

The solution to ef(x)f(x)=d(x)

is f(x)=W(ed(x)) Only slightly more generally, set a(x)=a, b=ln(g(x)),c=1,d=1 for those cases where g(x) is on the wrong side, as it were.

The solution to eaf(x)=f(x)g(x)

or equivalently f(x)eaf(x)g(x)=1 is f(x)=1aW(ag(x)) In particular,

The solution to xkeaf(x)f(x)=1

which seems to crop up a fair bit for your author is f(x)=1aW(axk)

Similarly setting a=1 ...

The solution to xef(x)f(x)=1

(i.e. where x is on the wrong side but we otherwise have the Lambert W definition) must be f(x)=W(1x) We might also take b=ln(g(x)) and thus

The solution to g(x)eaf(x)=f(x)

or equivalently f(x)eaf(x)=g(x) is f(x)=1aW(ag(x)) which reduces to the Lambert W function for a=1, as we expect. It is also pretty obvious from first principles, because if we multiply both sides by a we have (af)e(af)=ag and thus af=W(ag). Next suppose we want a power of f to appear. Let b=β(x)/k, c=γ1/k, a=k/α and d=0. And then raise both sides to the power k. It follows that...

The solution to eαf(x)+β(x)=γf(x)k

is f(x)=αkW(ke1kβ(x)αγ1/k) and if, in particular, β(x)=ln(g(x)) then

The solution to eαf(x)=g(x)γf(x)k

is f(x)=αkW(kαg(x)1/kγ1/k) and if we take c=1γ and g(x)=x and α=s2 and k=2 then p>

The solution to es2f(x)xf(x)2=c

is f(x)=s2kW(2kscx)

Wednesday, August 13, 2014

A new use for the number 59575553 (ISDA shorten's 20 character Legal Entity Identifiers to 10 character Unique Trade Identifier Prefixes)

The International Swaps and Dealers Association (ISDA) faced a minor problem recently. In shortening their 20 character Legal Entity Identifiers (LEI's) into 10 character Universal Trade Identifier (UTI) prefixes they ran into collisions.

Somewhat randomly this came across my desk. So under proposal is an improvement for hashing LEI's into UTI prefixes: the modulus operation lifted from integers to the space of case insensitive alpha-numeric strings.

You can grab the python code here. Of course it's hardly a new idea.

Wednesday, May 28, 2014

Sensitivities of a Kalman Filter estimate with respect to all past observations

Consider a sequence of observations t1,...,tk at which a latent vector process x is observed indirectly, via an observation equation yti=Hixti+ϵi We assume ϵi is mean zero multivariate gaussian with covariance Ri. For brevity we refer to yti as yi, xti as xi and so forth. We assume the evolution of x in between the times specified can be written xi+1=Aixi+ui where ui are also gaussian. In this linear gaussian system the recursive estimation of xt is achieved by the well known Kalman filter, and the contemporaneous impact of the next observation yk+1 is also (it is merely proportional to the Kalman gain).

But less well appreciated is a related computation, the derivatives of the Kalman filter estimate with respect to a past observation yi. This note establishes how said computation can be achieved by making two observations. The first is a re-representation of the Kalman estimate in the form of a weighted least squares problem (not dissimilar to the Duncan Horn representation). The second observation is that sensitivities of any weighted least squares problem can be computed using an adjoint trick.

Step 1: The Kalman filter solution as a (particular) least squares problem

We shall set up a least squares problem involving the current state xk only, and all the previous observations. We argue that the solution to this problem is identical to the Kalman filter. Since the current estimate ˆyk is a simple linear function of the current state xk, this allows us to compute the derivative of the current estimate with respect to all previous observations.

In the Kalman filter we assume a gaussian prior on the initial state x0. This can introduce annoying special cases in what follows, but we can clean up the notation instead by introducing: y1=H1x1+ϵ1x0=A1x1+u1 provided H1 and A1 are identity matrices, ϵ1 is identically zero, y1 is set equal to the mean of our prior and u0 adopts its covariance. With the boundary conditions cleaned up in this fashion we can invert the dynamical equations, assuming only that A's have left inverses A1, as follows: xj=A1j(xj+1uj) and then re-arrange the observation equations so that the only value of xi that appears is xk. yk=Hkxk+ϵkyk1=Hk1xk1+ϵk1=Hk1(A1k1(xkuk1))+ϵk1=Hk1A1k1xkHk1A1k1uk1+ϵk1yk2=Hk2xk2+ϵk2=Hk2(A1k2(xk1uk2))+ϵk2=Hk2A1k2xk1Hk2A1k2uk2+ϵk2=Hk2A1k2(A1k1(xkuk1))Hk2A1k2uk2+ϵk2=Hk2A1k2A1k1xkHk2A1k2A1k1uk1Hk2A1k2uk2+ϵk2 from which it is apparent that if we write Y=(yk,yk1,yk2,...,y1) then Y=Gxk+η where G is the concatenation of the coefficients of xk given above and η is the gaussian random variable equal to the sum of uk's and ϵk's (again, with coefficients as above, leading to a non-trivial covariance structure).

Step 2. (A review of the adjoint trick)

Suppose x solves Qx=b(y). The adjoint trick can be used to compute the derivative of g(x) w.r.t. y. In particular, if y is the observation and x the solution of a generalized least squares problem with error covariance R we can cast it in this form by writing: g(x)=HxQ=HTR1Hb(y)=HTR1y Consider now f(x,y)=0 where f(x,y)=Qxb(y) We use derivatives of ˜g=gλTf(x,y) with respect to y as a means of computing derivatives of g with respect to y. Note that ˜gy=gxxyλT(fxxy+fy) and this will simplify if we choose λ judiciously as a solution of gx=λTfx which we call the adjoint equation. For then ˜gy=gxxyλT(fxxy+fy)=λTfy=λTby Now specializing to g(x)=Hx and b(y) as above we can solve for this convenient choice of λ by writing H=gx=λTfx=λTQ=λTHTR1H where the second equality is the adjoint equation. It should be clear from this how to compute derivatives of ˜g with respect to y, and thereby compute derivatives of g with respect to y.

Monday, January 13, 2014

A game played on an Ornstein-Uhlenbeck bridge

Let Xt denote the Ornstein-Uhlenbeck process dXt=κXtdt+σdWt where X0=b is assumed known. Let Yt=exp(Xt) and without serious loss of generality, choose κ and σ so that the unconditional distribution of Xt is unit variance - that is to say σ22κ=1.
We play a two period game in which we choose a time t1 and then, after Yt1 has been revealed to us (or equivalently Xt1) we choose a second time t2. We seek to maximize a utility function U=cYt1+Yt2 We can evidently assume t1>0 without loss of generality. The coefficient c is the relative importance of the first period payoff versus the second, and for concreteness one might set c=1.
A time coordinate is used merely to make the mathematical representation more familiar. It does not represent a temporal choice, necessarily, but rather an abstraction of a more general search space (such as a location to drill for oil, a choice of marketing strategy, or a technology product). The process Yt, unknown to us, represent the yet to be discovered variation in revenue as we move across the search space.
We seek to analyze how much exploration one should perform, versus sticking to what we know. Intuitively, if the initial point X(0)=b (or Y(0)=eb) is very high we might not want to risk any departure. Whereas if b is less than zero we're better off forgetting it altogether (that is, choosing t1). In what follows, we sharpen this intuition.

Lemma A: The single period game.

Conditioned on X0=b the mean and variance of Xt are given by μ(t)=beκtν(t)=σ22κ(1e2κt) So using our simplification σ22κ=1 we ought to choose t1=argmax for \lambda_t = e^{-\kappa t } taking values in [0,1]. This has qualitatively different behaviour depending on which of three regions b falls into. The three possibilities are tabulated below.

Region \lambda^*_t Optimal t \log E[ e^{X_t} ] Strategy
b < 0 0 \infty \frac{1}{2} "Reset"
0 < b < 1 b -\frac{1}{\kappa} \log( b ) \frac{1}{2}\left( b^2+1 \right) "Explore"
b > 1 1 0 b "Stay"

Thus the one period problem has utility \zeta(b) where \begin{equation} \zeta( x ) = \left\{ \begin{array}{cc} e^{\frac{1}{2}} & x < 0 \\ e^{\frac{1}{2} \left( x^2+1 \right)} & 0 \le x \le 1 \\ e^x & x > 1 \end{array} \right. \end{equation} We denote the corresponding optimal time t_1 by \pi^{(1)}( b ) . Namely: \begin{equation} \pi^{(1)}( b ) = \left\{ \begin{array}{cc} \infty & b < 0 \\ -\frac{1}{\kappa} \log( b ) & 0 \le b \le 1 \\ 0 & b > 1 \end{array} \right. \end{equation}

Lemma B: Optimizing over "outside" strategies.

For brevity we denote two period strategies by a decision function \pi( x; t, b ) that returns the second time choice t_2 = \pi( X(t_1); t, b ) after X(t_1) is revealed. A simple one parameter family of strategies, indexed by the choice t_1, is given by \begin{equation} t_2 = \pi^{(2)}( x ; t_1, b ) = \left\{ \begin{array}{cc} - \pi^{(1)}(b) & x < b \\ t_1 + \pi^{(1)}(b) & x > b \end{array} \right. \end{equation} We call these the "outside" strategies because we assume the best choice for t_2 lies outside the open interval (0, t_1) . If X(t_1) is greater than our starting point X(0)=b we will never move left, but instead use the one period solution to move right (or stay). On the other hand if the first point is revealed to be lower than our starting point we explore, if anywhere, to the left of 0 instead, again using the one period solution.
The evaluation of this strategy amounts to integration of the one period utility against the (gaussian) distribution of X(t_1). For instance if 0 < b < 1. \begin{eqnarray} L( \pi^{(2)} ) & = & c E[ e^{ X_{t_1} } ] + E[ e^{ X_{t_2} } ] \\ & = & c E[ e^{ X_{t_1} } ] + P( X_{t_1} < b ) \zeta( b ) + P( X_{t_1} > b ) E[ \zeta( X_{t_1} ) | X_{t_1} > b ] \end{eqnarray} We've already noted that the first term is just c e^{ \mu(t_1) + \frac{1}{2} \nu(t_1) } . But since \log \zeta(x) is piecewise quadratic, the others are also integrals of the type \begin{eqnarray} I(\mu,\sigma;x_1,x_2;a_0,a_1,a_2) := \frac{1}{\sqrt{2 \pi} \sigma } \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)} e^{a_0 + a_1 x + a_2 x^2} dx \end{eqnarray} admitting analytical solution. It is sufficient to note the following equalities, each achieved by substitution. \begin{eqnarray} \frac{1}{\sqrt{2 \pi} \sigma } \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{2 \pi} } \int_{x_1/\sigma}^{x_2/\sigma} e^{-\frac{1}{2}\left( u-\mu/\sigma \right)^2} e^{a_1 \sigma x + a_2 \sigma^2 u^2} du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( x-\mu \right)^2} e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{2 \pi}} \int_{x_1-\mu}^{x_2-\mu} e^{-\frac{1}{2}u^2} e^{ (a_1 \mu + a_2 \mu^2) + (a_1 + 2 a_2 \mu^2) u + a_2 u^2} du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}x^2 } e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{p}} \frac{1}{ \sqrt{2 \pi} } \int_{ x_1 \sqrt{p} }^{ x_2 \sqrt{p} } e^{ -\frac{1}{2}u^2 } e^{ \frac{a_1}{\sqrt{p}} u } du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}x^2 } e^{a_1 x} dx & = & e^{\frac{1}{2}a_1^2} \frac{1}{ \sqrt{2 \pi} } \int_{ x_1-a_1 }^{ x_2 - a_1 } e^{ -\frac{1}{2}u^2 } du \end{eqnarray} In the second to last equality p = 1 - 2 a_2 and validity requires p > 0 .

Lemma C: If inside strategies are never optimal in the symmetric case, they are never optimal.

For simplicity we'd like to assume what would seem to be the critical case, X(t_1)=b. And we shall indeed show that irrespective of b we never want to choose t_2 \in (0,t_1). Intuitively it should also be true that we never wish to choose an inside strategy in the asymmetric case either. To tidy up this dangling thread and establish some formulas we'll need, we set X(t_1)=d . We may assume d < b without loss of generality (otherwise exchange the roles, remembering that we shall establish our coming result for all values of b ). Now suppose there is a point t_2 \in (0,t_1). Using the unconditional means and variances used in the one period problem we apply Bayes Rule to find the conditional density on the Bridge: \begin{eqnarray} \rho( x; t_2 ) & \propto & e^{ -\frac{1}{2} \left( \frac{ x - b e^{-\kappa t_2} } { \sqrt{ 1-e^{-2\kappa t_2} } } \right)^2 } e^{-\frac{1}{2} \left( \frac{ x e^{-\kappa(t_1-t_2) } - d } { \sqrt{ 1-e^{-2\kappa (t_1-t_2) } } } \right)^2 } \\ & = & e^{-\frac{1}{2} \frac{ \left( x - b \lambda_2 \right)^2 } { 1-\lambda_2^2 } } e^{-\frac{1}{2} \frac{ \left( x \lambda - d \right)^2 } { 1-\lambda^2 } } \\ & = & e^{ -\frac{1}{2}\left( g_2 x^2 - 2 g_1 x + \dots \right) } \end{eqnarray} where \lambda_2 = e^{-\kappa t_2}, \lambda = e^{-\kappa (t_1-t_2)} and matching coefficients of x^2 and x in the exponent respectively we find \begin{eqnarray} g_2 & = & \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} \\ g_1 & = & \frac{b \lambda_2}{1-\lambda_2^2} + \frac{d \lambda}{1-\lambda^2} \end{eqnarray} Now the gaussian conditional density \begin{equation} \rho(x) \propto e^{ -\frac{1}{2}\left( g_2 x^2 - 2 g_1 x + \dots \right) } = e^{ -\frac{1}{2} g_2 \left( x - g_1/g_2 \right)^2 + \dots } \end{equation} evidently has precision g_2 and mean g_1/g_2. The precision (i.e. 1/variance^2) is independent of d whereas the conditional mean is increasing in d . It follows that if the low side of the bridge were raised, the inside option would become more attractive.

Lemma D: The "outside" strategy is no worse than backtracking to the middle of the bridge

From the one period problem and the additive nature of the payoff we know that all other strategies with t_2 \ge t_1 or t_2 < 0 are worse. So specializing the formula for the conditional mean and precision given above to the case d=b we write the conditional mean and variance as \begin{eqnarray} \mu^c(t) & = & \frac{ \frac{b \lambda_2}{1-\lambda_2^2} + \frac{b \lambda}{1-\lambda^2} } { \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} } = b \frac{ \lambda_2(1-\lambda^2) + \lambda(1-\lambda_2^2) } { 1-\lambda_2^2 \lambda^2 } = b \frac{ \lambda_2 + \lambda}{ 1 +\lambda \lambda_2}\\ \nu^c(t) & = & \left( \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} \right)^{-2} = \frac{ (1-\lambda_2^2)^2 (1-\lambda^2)^2 } { ( 1- \lambda_2^2 \lambda^2 )^2 } \end{eqnarray} In the case \lambda_2 = \lambda, which is the middle of the bridge, this simplifies further \begin{eqnarray} \mu^c(\lambda_2=\lambda) & = & b \frac{2\lambda}{1+\lambda^2} < b \\ \nu^c(\lambda_2=\lambda) & = & \left( \frac{ 1-\lambda^2 }{ 1+ \lambda^2} \right)^2 \end{eqnarray} But there is an outside point with \tilde{\lambda} = \frac{2\lambda}{1+\lambda^2} we might choose instead. This, by construction, will have the same mean b \tilde{\lambda}. We notice it also has the same variance: \begin{eqnarray} \nu(\tilde{\lambda}) & = & 1- \tilde{\lambda}^2 \\ & = & \frac{ (1 + \lambda^2)^2 - 4 \lambda^2 }{ (1 + \lambda^2)^2 } \\ & = & \frac{ (1 - \lambda^2)^2 }{ (1 + \lambda^2)^2 } \\ & = & \nu^c(t). \end{eqnarray} Thus we have established an outside point with equivalent utility.

Exhibit E: Pictures looking for proofs.

Returning to the symmetric case X(t_1)=b we ask is there any value for t_2 inside the interval (0,t_1) that is a better choice that going outside the bridge (and using the optimal one period solution). To put it another way, is there any value for \lambda for which there is any choice of \lambda_2 such that \begin{equation} D(\lambda_2,\lambda,b) = \log \xi(b) - \psi(b) < 0 \end{equation} where \psi(b) = \mu^c(t) + \frac{1}{2} \nu^c(t) ? Or, using the formulas above, is there a combination \lambda_2, \lambda for which \begin{equation} \log \xi(b) - \left( b \frac{ \lambda_2 + \lambda}{ 1 +\lambda \lambda_2} + \frac{1}{2} \frac{ (1-\lambda_2^2)^2 (1-\lambda^2)^2 } { ( 1- \lambda_2^2 \lambda^2 )^2 } \right) < 0 ? \end{equation} Here is a picture of the difference between the outside option and inside options in the case b=1.5 that would suggest the difference is always positive:


Here is another, for the case b=1.5.



Lemma F: For b \in (0,1) the function S(u) = D(\lambda_2=u,\lambda=u;b) satisfies S(u)\ge0 for u \in (0,1) with a single zero at u=\frac{1-\sqrt{1+b}}{b}

We consider the middle of the symmetric bridge once more, this time varying the width of the bridge. We claim that for b \in (0,1) there is a particular width bridge for which we are indifferent as to whether we choose the middle of the bridge (i.e. t_2=\frac{1}{2} t_1 or the optimal outside solution. Specializing to this case implies \lambda_2=\lambda so we write S(u) = D(\lambda_2=u,\lambda=u;b). From the formula for D given above we have \begin{eqnarray} S(u) &= & \frac{1}{2}(1+b^2) - \left( b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \frac{ (1-u^2)^4 } { ( 1- u^4 )^2 } \right) \\ &= & \frac{1}{2}(1+b^2) - \rho(u;b) \end{eqnarray} where \begin{equation} \rho(u;b) = b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \left( \frac{1-u^2}{1+u^2} \right)^2 \end{equation} A little algebra shows that \begin{equation} \frac{ \partial }{\partial u} \rho(u;b) = \frac{ 2(u^2-1)(bu^2-2u+b)}{ (1+u^2)^3} \end{equation} which is zero for any solution of b u^2-2u+b=0. If b<1 we do indeed want the root in 0,1 given by \begin{equation} u = \frac{1-\sqrt{1+b}}{b} \end{equation} So substituting u^2 = \frac{2u-b}{b} back into \rho yields, after a little simplification: \begin{equation} \rho\left(u=\frac{1-\sqrt{1+b}}{b};b\right) = \frac{1}{2}(1+b^2) \end{equation} which we recognize as precisely the value of the one period problem. This shows that \begin{eqnarray} S(u) &= & \frac{1}{2}(1+b^2) - \rho(u;b) \\ & \ge & \frac{1}{2}(1+b^2) - \max \rho(u;b) \\ & = & \frac{1}{2}(1+b^2) - \frac{1}{2}(1+b^2) \\ & = & 0 \end{eqnarray} as claimed.

Lemma G: For b >1 the function S(u) = D(\lambda_2=u,\lambda=u;b) satisfies S(u) > 0 for u \in (0,1). It approaches value zero as u \rightarrow 1, corresponding to the case of a very short bridge.

The limiting case is obvious algebraically and geometrically, since for b>1 we have \begin{eqnarray} S(u) &= & b - \left( b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \left( \frac{1-u^2}{1+u^2} \right)^2 \right) \\ & \rightarrow & 0 \end{eqnarray} as u\rightarrow 1. Moreover we can simplify to \begin{eqnarray} S(u) &= & \left(\frac{1-u^2}{1+u^2} \right) \left( 1+ \frac{1}{2} \frac{(1-u)^2}{(1+u^2)} \right) \end{eqnarray} which is evidently greater than zero.