Wednesday, April 20, 2016

Filtering bond and credit default swap markets


I presented this talk at the NYU-Tandon faculty seminar. I covered my own work on the rudimentary framework for Benchmark Solutions pricing service (now BMRK on Bloomberg) but did not go into details on all the other efforts of everyone involved. Some of the material has already been covered in my blog post on Kalman sensitivities via the adjoint state method.

Monday, December 14, 2015

Domino dancing. A medium data research workflow with Ruffus on AWS.

I don't normally blog about technology because frankly, it is too much like work. But I have been trying Domino data labs of late, a nice product by a team mostly out of Bridgewater who clearly have good experience trying to make life easier for quants. They have built a research platform most readily used with Amazon compute but with the option of an in-house install. I certainly feel their efforts are worthy of note.

I don't really know what a data science platform should be necessarily, but perhaps it is a little like picking a dance partner: they have to know when to get out of the way in order for things to move effortlessly. That was a quality I found lacking in both in-house and competing platforms that seem to assume you are trying to step in a certain direction and in the process, inadvertently make it difficult to do simple things you feel you have already mastered. In short, they feel a bit like dancing with ... err ... me.

My workflow

I'll describe my setup for context, though I doubt there is anything atypical about my needs. I had already decided on Amazon over some in-house technology, incidentally, for several key reasons including the on-demand capability and availability of an attached file system. Another preference was use of a single beefy machine rather than many small ones because some of my steps require all the data together. With 8 GPU machines standard, and a 100 processor option coming in 2016 it is hard to see the merit in administering my own box.

My econometric/algo work (a.k.a. data science these days) pushes thousands of data files through a few dozen processing steps. After running local tests I sync both data and code using the Domino client to AWS, then kick off a job there. When the job is done, I simply sync again and examine the results locally.

I use the free python library Ruffus to create a non-blocking pipeline and handle the splitting and recombining of data files therein. I can't show you my actual computation graph as it is proprietary, so here's a stock image from ruffus instead:

Imagine the same with a few dozen steps and you get the idea. Much of what I do lives on one or more of these growing DAGs.

Serving results behind a REST API

The nodes on my computation tree comprise between one and a few thousand files of the same type. The leaves comprise summary tables (sometimes documentation) for this and that. Some output becomes lookup tables for REST API's deployed on Amazon. Domino provides simple version management for the code and data serving those endpoints, making this effortless.

One merely provides the entry point: in my case a plain python function to respond to the request (no wrapping/unwrapping). At present Domino only allows a single entry point but that is hardly a restriction. I merely register the following dispatcher function to automatically exposes every other function I drop in the same module.

That's REST, pretty much for free.

Monitoring runs

No complaints here. Domino presents your runs to colleagues alongside run diagnostics such as the following.

It could be slightly easier to arrange the results, arguably, but one way around that is to maintain your own markdown index page (README.md) and point users to where you really want them to be. As a minor side point I discovered that the markdown rendering was just slightly non-standard - but I'm sure that will be knocked off the issue list by the time you read this.

Serving richer results

By default Domino tracks the new files produced by your runs and renders them to collaborators browsing the results. If you want to serve up richer content than a pile of .csv's you can have your run create an .html file and use javascript. As suggested by one of the Domino developers, here is one way of adding sortability to output tables:


Thanks John. The same goes for plots. Need something fancy? Just add the package to your requirements.txt and it will be installed prior to the run starting.

Reproducible research

In a sense Domino is pioneering what truly reproducible research might look like. I made a very poor stab at the same vague concept many years back so I am sympathetic as to the difficulties. (That effort lived on this site incidentally, and was so dismal I'd forgotten it myself until just now.)

There is an opportunity. While the workflow for developing code is obviously well established in source control programs and accompanying verbiage, the organizing of code, data and intermediate temporary data tends to be more bespoke. The Domino starting point, philosophically, is that code and data be kept in sync and this ensures absolute reproducibility of results. But they have not baked any religious assumptions into the product and one is free to break that symmetry. I'll get to why it helped me to do so a little later.

Certainly the major annoyances (code sharing with package management) and organizational features (basic comment tracking etc) are cleanly solved if your colleagues are also using Domino. One points them to a link to start the run that reproduces your work, or they can select from a menu on your recent runs.

Transparent research

While multi-directional collaboration is a marketing point for Domino, I find that the weaker variety of this (call it transparency) is a sufficient motivation. There is certainly no barrier here for non-technical end-users who want to re-run your research with different parameters and then compare the results. And that comparison is straightforward. One selects two finished runs in the run history and selects the "compare" checkbox to get the full diff.

Note however, that if you keep your code in the same project this comparison will include all your source changes. If you don't want that for yourself or your readers, a simple hack here is to use .dominoresults to hide your code and echo major meta-parameters to their own output file in lieu. Then the Domino presentation of your backtest results (say) is actually very helpful.

I'm reasonably happy with the level of transparency that the Domino platform provides. I feel it will replace a fair amount of presentation preparation (I'm a bit lazy on that front). I maintain a stable version of data and code for this purpose - or at least I aspire to - so that others can drill down into very large runs. One might hope that the days of manually converting piles of data into long powerpoint presentations are behind me, though that might be optimistic.

Perhaps an area for improvement or enhancement of the Domino platform would involve documentation. Some support for Latex generation of documents for instance.

Separating code and data again

As my data sets grew, I ultimately found it more convenient to use a separate project for code and several different projects for data. There are distinct categories of data living remotely and locally. Some data needs to be synced between local and remote instances to make it easier to run quick experiments. Some data is merely temporary. Some data should only ever live on Amazon due to it's size.

Then there is the pragmatic concern with code. I personally didn't want the possibility of large globs of data corrupting my defacto code repo. Nor was there any need for the bulk of my code to be shared with those wanting to make minor hacks to the results.

These concerns are addressed. The Domino team was very helpful and introducing read-only sharing between projects. Thereafter I moved to a pattern of "code only" and "code-light data heavy" projects. In my project holding source code I allow read-only export to the other projects.

To minimize syncing of large files (and lots of files) to amazon I use a chain of Domino data projects. My medium-data flow is as follows:

  1. Ingest data locally to ~/zipped_data_project/
  2. Sync the compressed data to Amazon with Domino client
  3. Unzip on Amazon to a /mnt//unzipped_data_project/... by allowing this second project to import files from /zipped_data_project, (and also from the source project containing the script in question and the rest of one's code).
  4. Allow a results project /mnt//endpoint_project to read the unzipped data and run the pipeline, creating result files and lookup tables. Use .dominoignore config to decide what is worth keeping.
  5. Publish the endpoint.
I've leaving out source control & backup since there are many ways to do that. I use my own hacks to ensure I always backup right before syncing, because very occasionally one may need to recover from a git issue under the hood. The implicit version control achieved by syncing with the Domino client is, for me, more of a secondary line of defense.

I suspect some future improvements to Domino might obviate the minor inconvenience of the zip/unzip shuffle. And some of my issues there relate mostly to sub-par connectivity.

Overall impression

Sweet. Hopefully not sweet as in the Pepsi versus Coke trick (i.e. ask me again when I've drunk the whole can) but even if it is regarded as mere sugar for Amazon compute, Domino data labs seems to me to represent good, very reasonably priced sugar that doesn't get in the way of itself or you. It isn't trying to replace anything, just bring together a few critical ingredients like git, docker, jupyter, python/R/Julia package management and some friends from the Apache family.

The Domino team has cut the time for basics like REST APIs, job management, launchers for non-technical users, scheduling, sensible run management and results rendering. It is drawn together seamlessly with Scala (okay the fact I know that is a tell. But let's say 99% seamless with only the occasional stack trace finding its way to the user :-). You don't need a separate AWS account and overall, it is very simple.

Many quants or supporting groups might no doubt build their own versions of the same but I've come to realize that pulling all this together, or even a poor man's version, can easily fall foul of Hofstadter's Law. It is a little too soon to say that Domino have commoditized this area, but they seem to be off to a good start and with a decent size engineering team in place, I expect Domino to get better. As a quant, that's good news. Liberating even.

Costs seem to be well below internal costs. And my recommendation is that even for hobby projects involving only python, R and/or Julia one might want to consider whether pricing like the following is unreasonable when it obviates a great deal of messing around.


That's my 2 cents for now (or my 9.3 cents as the case may be). For those who are interested I've written up a few more hacks in this followup post.

Wednesday, October 29, 2014

Some special cases of the Lambert W function

Suppose \(f(x)\) is given by the functional relation \begin{equation} e^{-a(x) f(x) + b(x)} = c(x) f(x) + d(x) \end{equation} then \begin{equation} f(x) = \frac{1}{a(x)}W\left( \frac{a(x)}{c(x)} e^{b(x)+\frac{a(x)d(x)}{c(x)} } \right) - \frac{d(x)}{c(x)} \end{equation} where \(W\) is the Lambert W Function. You can click through to Wikipedia for a proof. Here I merely list some special cases which every good citizen should instantly recognize (and it can be mildly tedious to reproduce them).

Recovering the definition of \(W(x)\)

Just to check we are on the right planet set \(a=1\), \(c=1\), \(b=\ln(x)\) and \(d=0\) to get \begin{equation} x e^{-f(x)} = f(x) \end{equation} or equivalently \( f(x)e^{f(x)} = x\) which is the definition of the Lambert W function. Sure enough those substitutions recover $$ f(x) = W(x) $$ as expected. A minor modification uses \(a(x)=1\), \(c(x)=1\), \(b(x)=0\) but leaves \(d(x)\) general.

The solution to \( e^{-f(x)} - f(x) = d(x) \)

is \begin{equation} f(x) = W \left( e^{d(x)} \right) \end{equation} Only slightly more generally, set \(a(x)=a\), \(b=\ln(g(x)), c=1, d=1\) for those cases where \(g(x)\) is on the wrong side, as it were.

The solution to \( e^{-af(x)} = f(x) g(x) \)

or equivalently \( f(x) e^{af(x)} g(x) = 1 \) is \begin{equation} f(x) = \frac{1}{a} W\left( a g(x) \right) \end{equation} In particular,

The solution to \( x^k e^{-af(x)} f(x) = 1 \)

which seems to crop up a fair bit for your author is \begin{equation} f(x) = \frac{1}{a} W\left( a x^{-k} \right) \end{equation}

Similarly setting \(a=-1\) ...

The solution to \( x e^{f(x)} f(x) = 1 \)

(i.e. where \(x\) is on the wrong side but we otherwise have the Lambert W definition) must be \begin{equation} f(x) = - W\left( -\frac{1}{x} \right) \end{equation} We might also take \(b = \ln(g(x))\) and thus

The solution to \( g(x) e^{-af(x)} = f(x) \)

or equivalently \( f(x)e^{af(x)} = g(x) \) is \begin{equation} f(x) = \frac{1}{a} W\left( a g(x) \right) \end{equation} which reduces to the Lambert W function for \(a=1\), as we expect. It is also pretty obvious from first principles, because if we multiply both sides by \(a\) we have \begin{equation} (af) e^{(af)} = ag \end{equation} and thus \(af = W(ag)\). Next suppose we want a power of \(f\) to appear. Let \(b=\beta(x)/k\), \(c = \gamma^{1/k}\), \(a = k/\alpha\) and \(d=0\). And then raise both sides to the power \(k\). It follows that...

The solution to \( e^{\alpha f(x) + \beta(x)} = \gamma f(x)^k \)

is \begin{equation} f(x) = \frac{\alpha}{k}W\left( \frac{k e^{\frac{1}{k}\beta(x)} }{\alpha \gamma^{1/k}} \right) \end{equation} and if, in particular, \(\beta(x) = -\ln(g(x))\) then

The solution to \( e^{\alpha f(x)} = g(x) \gamma f(x)^k \)

is \begin{equation} f(x) = \frac{\alpha}{k}W\left( \frac{k }{\alpha g(x)^{1/k} \gamma^{1/k}} \right) \end{equation} and if we take \(c = \frac{1}{\gamma}\) and \(g(x)=x\) and \(\alpha = \frac{s}{2}\) and \(k=2\) then p>

The solution to \( e^{-\frac{s}{2}f(x)} x f(x)^2 = c \)

is \begin{equation} f(x) = \frac{s}{2k}W\left( \frac{2k}{s} \sqrt{ \frac{c}{x}} \right) \end{equation}

Wednesday, August 13, 2014

A new use for the number 59575553 (ISDA shorten's 20 character Legal Entity Identifiers to 10 character Unique Trade Identifier Prefixes)

The International Swaps and Dealers Association (ISDA) faced a minor problem recently. In shortening their 20 character Legal Entity Identifiers (LEI's) into 10 character Universal Trade Identifier (UTI) prefixes they ran into collisions.

Somewhat randomly this came across my desk. So under proposal is an improvement for hashing LEI's into UTI prefixes: the modulus operation lifted from integers to the space of case insensitive alpha-numeric strings.

You can grab the python code here. Of course it's hardly a new idea.

Wednesday, May 28, 2014

Sensitivities of a Kalman Filter estimate with respect to all past observations

Consider a sequence of observations \(t_1,...,t_k\) at which a latent vector process \(x\) is observed indirectly, via an observation equation \begin{equation} y_{t_i} = H_i x_{t_i} + \epsilon_i \end{equation} We assume \(\epsilon_i\) is mean zero multivariate gaussian with covariance \(R_i\). For brevity we refer to \(y_{t_i}\) as \(y_i\), \(x_{t_i}\) as \(x_i\) and so forth. We assume the evolution of \(x\) in between the times specified can be written \begin{equation} x_{i+1} = A_i x_i + u_i \end{equation} where \(u_i\) are also gaussian. In this linear gaussian system the recursive estimation of \(x_t\) is achieved by the well known Kalman filter, and the contemporaneous impact of the next observation \(y_{k+1}\) is also (it is merely proportional to the Kalman gain).

But less well appreciated is a related computation, the derivatives of the Kalman filter estimate with respect to a past observation \(y_i\). This note establishes how said computation can be achieved by making two observations. The first is a re-representation of the Kalman estimate in the form of a weighted least squares problem (not dissimilar to the Duncan Horn representation). The second observation is that sensitivities of any weighted least squares problem can be computed using an adjoint trick.

Step 1: The Kalman filter solution as a (particular) least squares problem

We shall set up a least squares problem involving the current state \(x_k\) only, and all the previous observations. We argue that the solution to this problem is identical to the Kalman filter. Since the current estimate \(\hat{y}_k\) is a simple linear function of the current state \(x_k\), this allows us to compute the derivative of the current estimate with respect to all previous observations.

In the Kalman filter we assume a gaussian prior on the initial state \(x_0\). This can introduce annoying special cases in what follows, but we can clean up the notation instead by introducing: \begin{eqnarray} y_{-1} & = & H_{-1} x_{-1} + \epsilon_{-1} \\ x_0 & = & A_{-1} x_{-1} + u_{-1} \end{eqnarray} provided \(H_{-1}\) and \(A_{-1}\) are identity matrices, \(\epsilon{-1}\) is identically zero, \(y_{-1}\) is set equal to the mean of our prior and \(u_0\) adopts its covariance. With the boundary conditions cleaned up in this fashion we can invert the dynamical equations, assuming only that \(A\)'s have left inverses \(A^{-1}\), as follows: \begin{equation} x_j = A^{-1}_{j}\left( x_{j+1} - u_j \right) \end{equation} and then re-arrange the observation equations so that the only value of \(x_i\) that appears is \(x_k\). \begin{eqnarray} y_k & = & H_k x_{k} + \epsilon_k \\ y_{k-1} & = & H_{k-1} x_{k-1} + \epsilon_{k-1} \\ & = & H_{k-1} \left( A^{-1}_{k-1}\left( x_{k} - u_{k-1} \right) \right) + \epsilon_{k-1} \\ & = & H_{k-1} A^{-1}_{k-1} x_{k} - H_{k-1} A^{-1}_{k-1} u_{k-1} + \epsilon_{k-1} \\ y_{k-2} & = & H_{k-2} x_{k-2} + \epsilon_{k-2} \\ & = & H_{k-2} \left( A^{-1}_{k-2}\left( x_{k-1} - u_{k-2} \right) \right) + \epsilon_{k-2} \\ & = & H_{k-2} A^{-1}_{k-2} x_{k-1} - H_{k-2} A^{-1}_{k-2} u_{k-2} + \epsilon_{k-2} \\ & = & H_{k-2} A^{-1}_{k-2} \left( A^{-1}_{k-1}\left( x_{k} - u_{k-1} \right) \right) - H_{k-2} A^{-1}_{k-2} u_{k-2} + \epsilon_{k-2} \\ & = & H_{k-2} A^{-1}_{k-2} A^{-1}_{k-1} x_{k} - H_{k-2} A^{-1}_{k-2} A^{-1}_{k-1} u_{k-1} - H_{k-2} A^{-1}_{k-2} u_{k-2} + \epsilon_{k-2} \\ & \dots & \end{eqnarray} from which it is apparent that if we write \(Y = (y_k, y_{k-1}, y_{k-2},...,y_{-1} ) \) then \begin{equation} Y = G x_{k} + \eta \end{equation} where \(G\) is the concatenation of the coefficients of \(x_k\) given above and \(\eta\) is the gaussian random variable equal to the sum of \(u_k\)'s and \(\epsilon_k\)'s (again, with coefficients as above, leading to a non-trivial covariance structure).

Step 2. (A review of the adjoint trick)

Suppose \(x\) solves \(Qx = b(y)\). The adjoint trick can be used to compute the derivative of \(g(x)\) w.r.t. y. In particular, if \(y\) is the observation and \(x\) the solution of a generalized least squares problem with error covariance \(R\) we can cast it in this form by writing: \begin{eqnarray} g(x)& = & H x Q & = & H^T R^{-1} H \\ b(y) & = & H^T R^{-1} y \end{eqnarray} Consider now \begin{equation} f(x,y) = 0 \end{equation} where \begin{equation} f(x,y) = Q x - b(y) \end{equation} We use derivatives of \begin{equation} \tilde{g} = g - \lambda^T f(x,y) \end{equation} with respect to \(y\) as a means of computing derivatives of \(g\) with respect to \(y\). Note that \begin{equation} \frac{\partial \tilde{g}}{\partial y} = \frac{\partial g}{\partial x}\frac{\partial x}{\partial y} - \lambda^T \left( \frac{ \partial f }{\partial x }\frac{\partial x}{\partial y} + \frac{\partial f}{\partial y} \right) \end{equation} and this will simplify if we choose \(\lambda\) judiciously as a solution of \begin{equation} \frac{\partial g}{\partial x } = \lambda^T \frac{\partial f}{\partial x} \end{equation} which we call the adjoint equation. For then \begin{eqnarray} \frac{\partial \tilde{g}}{\partial y} & = & \frac{\partial g}{\partial x}\frac{\partial x}{\partial y} - \lambda^T \left( \frac{ \partial f }{\partial x }\frac{\partial x}{\partial y} + \frac{\partial f}{\partial y} \right) \\ & = & -\lambda^T \frac{\partial f}{\partial y} \\ & = & \lambda^T \frac{\partial b}{\partial y} \end{eqnarray} Now specializing to \begin{equation} g(x) = H x \end{equation} and \(b(y)\) as above we can solve for this convenient choice of \(\lambda\) by writing \begin{eqnarray} H & = & \frac{\partial g}{\partial x} \\ & = & \lambda^T \frac{\partial f}{\partial x} \\ & = & \lambda^T Q \\ & = & \lambda^T H^T R^{-1} H \end{eqnarray} where the second equality is the adjoint equation. It should be clear from this how to compute derivatives of \(\tilde{g}\) with respect to \(y\), and thereby compute derivatives of \(g\) with respect to \(y\).

Monday, January 13, 2014

A game played on an Ornstein-Uhlenbeck bridge

Let \( X_t \) denote the Ornstein-Uhlenbeck process \begin{equation} dX_t = -\kappa X_t dt + \sigma dW_t \end{equation} where \(X_0=b\) is assumed known. Let \(Y_t = \exp(X_t)\) and without serious loss of generality, choose \(\kappa\) and \(\sigma\) so that the unconditional distribution of \(X_t\) is unit variance - that is to say \(\frac{\sigma^2}{2\kappa} = 1\).
We play a two period game in which we choose a time \(t_1\) and then, after \(Y_{t_1}\) has been revealed to us (or equivalently \(X_{t_1}\)) we choose a second time \(t_2\). We seek to maximize a utility function \begin{equation} U = c Y_{t_1} + Y_{t_2} \end{equation} We can evidently assume \(t_1>0\) without loss of generality. The coefficient \(c\) is the relative importance of the first period payoff versus the second, and for concreteness one might set \(c=1\).
A time coordinate is used merely to make the mathematical representation more familiar. It does not represent a temporal choice, necessarily, but rather an abstraction of a more general search space (such as a location to drill for oil, a choice of marketing strategy, or a technology product). The process \(Y_t\), unknown to us, represent the yet to be discovered variation in revenue as we move across the search space.
We seek to analyze how much exploration one should perform, versus sticking to what we know. Intuitively, if the initial point \(X(0)=b\) (or \(Y(0)=e^b\)) is very high we might not want to risk any departure. Whereas if \(b\) is less than zero we're better off forgetting it altogether (that is, choosing \(t_1 \rightarrow \infty\)). In what follows, we sharpen this intuition.

Lemma A: The single period game.

Conditioned on \(X_0 = b\) the mean and variance of \(X_t\) are given by \begin{eqnarray} \mu(t) & = & b e^{-\kappa t} \\ \nu(t) & = & \frac{ \sigma^2 }{2 \kappa } \left( 1 - e^{-2 \kappa t} \right) \end{eqnarray} So using our simplification \(\frac{\sigma^2}{2\kappa} = 1\) we ought to choose \begin{eqnarray} t_1^* & = & \arg \max E[ e^{X(t)} ] \\ & = & \arg \max \log E[ e^{X(t)} ] \\ & = & \arg \max \left( \mu(t) + \frac{1}{2} \nu(t) \right) \\ & = & \arg \max \left( b \lambda_t + \frac{1}{2} ( 1 - \lambda_t ^2 ) \right) \\ & = & \arg \max -\frac{1}{2} \left( \lambda_t - b \right)^2 + \frac{1}{2} \left( b^2 + 1 \right) \end{eqnarray} for \(\lambda_t = e^{-\kappa t } \) taking values in \( [0,1]\). This has qualitatively different behaviour depending on which of three regions \(b\) falls into. The three possibilities are tabulated below.

Region \(\lambda^*_t\) Optimal \(t\) \( \log E[ e^{X_t} ] \) Strategy
\(b < 0\) 0 \(\infty\) \( \frac{1}{2}\) "Reset"
\(0 < b < 1\) b \(-\frac{1}{\kappa} \log( b )\) \(\frac{1}{2}\left( b^2+1 \right) \)"Explore"
\( b > 1\) 1 \(0\) \(b \)"Stay"

Thus the one period problem has utility \( \zeta(b)\) where \begin{equation} \zeta( x ) = \left\{ \begin{array}{cc} e^{\frac{1}{2}} & x < 0 \\ e^{\frac{1}{2} \left( x^2+1 \right)} & 0 \le x \le 1 \\ e^x & x > 1 \end{array} \right. \end{equation} We denote the corresponding optimal time \(t_1\) by \( \pi^{(1)}( b ) \). Namely: \begin{equation} \pi^{(1)}( b ) = \left\{ \begin{array}{cc} \infty & b < 0 \\ -\frac{1}{\kappa} \log( b ) & 0 \le b \le 1 \\ 0 & b > 1 \end{array} \right. \end{equation}

Lemma B: Optimizing over "outside" strategies.

For brevity we denote two period strategies by a decision function \( \pi( x; t, b ) \) that returns the second time choice \( t_2 = \pi( X(t_1); t, b ) \) after \(X(t_1)\) is revealed. A simple one parameter family of strategies, indexed by the choice \(t_1\), is given by \begin{equation} t_2 = \pi^{(2)}( x ; t_1, b ) = \left\{ \begin{array}{cc} - \pi^{(1)}(b) & x < b \\ t_1 + \pi^{(1)}(b) & x > b \end{array} \right. \end{equation} We call these the "outside" strategies because we assume the best choice for \(t_2\) lies outside the open interval \( (0, t_1) \). If \(X(t_1)\) is greater than our starting point \( X(0)=b \) we will never move left, but instead use the one period solution to move right (or stay). On the other hand if the first point is revealed to be lower than our starting point we explore, if anywhere, to the left of \(0\) instead, again using the one period solution.
The evaluation of this strategy amounts to integration of the one period utility against the (gaussian) distribution of \(X(t_1)\). For instance if \(0 < b < 1\). \begin{eqnarray} L( \pi^{(2)} ) & = & c E[ e^{ X_{t_1} } ] + E[ e^{ X_{t_2} } ] \\ & = & c E[ e^{ X_{t_1} } ] + P( X_{t_1} < b ) \zeta( b ) + P( X_{t_1} > b ) E[ \zeta( X_{t_1} ) | X_{t_1} > b ] \end{eqnarray} We've already noted that the first term is just \( c e^{ \mu(t_1) + \frac{1}{2} \nu(t_1) } \). But since \(\log \zeta(x) \) is piecewise quadratic, the others are also integrals of the type \begin{eqnarray} I(\mu,\sigma;x_1,x_2;a_0,a_1,a_2) := \frac{1}{\sqrt{2 \pi} \sigma } \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)} e^{a_0 + a_1 x + a_2 x^2} dx \end{eqnarray} admitting analytical solution. It is sufficient to note the following equalities, each achieved by substitution. \begin{eqnarray} \frac{1}{\sqrt{2 \pi} \sigma } \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{2 \pi} } \int_{x_1/\sigma}^{x_2/\sigma} e^{-\frac{1}{2}\left( u-\mu/\sigma \right)^2} e^{a_1 \sigma x + a_2 \sigma^2 u^2} du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}\left( x-\mu \right)^2} e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{2 \pi}} \int_{x_1-\mu}^{x_2-\mu} e^{-\frac{1}{2}u^2} e^{ (a_1 \mu + a_2 \mu^2) + (a_1 + 2 a_2 \mu^2) u + a_2 u^2} du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}x^2 } e^{a_1 x + a_2 x^2} dx & = & \frac{1}{\sqrt{p}} \frac{1}{ \sqrt{2 \pi} } \int_{ x_1 \sqrt{p} }^{ x_2 \sqrt{p} } e^{ -\frac{1}{2}u^2 } e^{ \frac{a_1}{\sqrt{p}} u } du \\ \frac{1}{\sqrt{2 \pi}} \int_{x_1}^{x_2} e^{-\frac{1}{2}x^2 } e^{a_1 x} dx & = & e^{\frac{1}{2}a_1^2} \frac{1}{ \sqrt{2 \pi} } \int_{ x_1-a_1 }^{ x_2 - a_1 } e^{ -\frac{1}{2}u^2 } du \end{eqnarray} In the second to last equality \( p = 1 - 2 a_2 \) and validity requires \( p > 0 \).

Lemma C: If inside strategies are never optimal in the symmetric case, they are never optimal.

For simplicity we'd like to assume what would seem to be the critical case, \(X(t_1)=b\). And we shall indeed show that irrespective of \(b\) we never want to choose \(t_2 \in (0,t_1)\). Intuitively it should also be true that we never wish to choose an inside strategy in the asymmetric case either. To tidy up this dangling thread and establish some formulas we'll need, we set \( X(t_1)=d \). We may assume \( d < b \) without loss of generality (otherwise exchange the roles, remembering that we shall establish our coming result for all values of \( b \) ). Now suppose there is a point \( t_2 \in (0,t_1)\). Using the unconditional means and variances used in the one period problem we apply Bayes Rule to find the conditional density on the Bridge: \begin{eqnarray} \rho( x; t_2 ) & \propto & e^{ -\frac{1}{2} \left( \frac{ x - b e^{-\kappa t_2} } { \sqrt{ 1-e^{-2\kappa t_2} } } \right)^2 } e^{-\frac{1}{2} \left( \frac{ x e^{-\kappa(t_1-t_2) } - d } { \sqrt{ 1-e^{-2\kappa (t_1-t_2) } } } \right)^2 } \\ & = & e^{-\frac{1}{2} \frac{ \left( x - b \lambda_2 \right)^2 } { 1-\lambda_2^2 } } e^{-\frac{1}{2} \frac{ \left( x \lambda - d \right)^2 } { 1-\lambda^2 } } \\ & = & e^{ -\frac{1}{2}\left( g_2 x^2 - 2 g_1 x + \dots \right) } \end{eqnarray} where \( \lambda_2 = e^{-\kappa t_2}\), \( \lambda = e^{-\kappa (t_1-t_2)} \) and matching coefficients of \(x^2\) and \(x\) in the exponent respectively we find \begin{eqnarray} g_2 & = & \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} \\ g_1 & = & \frac{b \lambda_2}{1-\lambda_2^2} + \frac{d \lambda}{1-\lambda^2} \end{eqnarray} Now the gaussian conditional density \begin{equation} \rho(x) \propto e^{ -\frac{1}{2}\left( g_2 x^2 - 2 g_1 x + \dots \right) } = e^{ -\frac{1}{2} g_2 \left( x - g_1/g_2 \right)^2 + \dots } \end{equation} evidently has precision \(g_2\) and mean \(g_1/g_2\). The precision (i.e. \(1/variance^2\)) is independent of \(d\) whereas the conditional mean is increasing in \( d \). It follows that if the low side of the bridge were raised, the inside option would become more attractive.

Lemma D: The "outside" strategy is no worse than backtracking to the middle of the bridge

From the one period problem and the additive nature of the payoff we know that all other strategies with \( t_2 \ge t_1 \) or \( t_2 < 0 \) are worse. So specializing the formula for the conditional mean and precision given above to the case \(d=b\) we write the conditional mean and variance as \begin{eqnarray} \mu^c(t) & = & \frac{ \frac{b \lambda_2}{1-\lambda_2^2} + \frac{b \lambda}{1-\lambda^2} } { \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} } = b \frac{ \lambda_2(1-\lambda^2) + \lambda(1-\lambda_2^2) } { 1-\lambda_2^2 \lambda^2 } = b \frac{ \lambda_2 + \lambda}{ 1 +\lambda \lambda_2}\\ \nu^c(t) & = & \left( \frac{1}{1-\lambda_2^2} + \frac{\lambda^2}{1-\lambda^2} \right)^{-2} = \frac{ (1-\lambda_2^2)^2 (1-\lambda^2)^2 } { ( 1- \lambda_2^2 \lambda^2 )^2 } \end{eqnarray} In the case \(\lambda_2 = \lambda\), which is the middle of the bridge, this simplifies further \begin{eqnarray} \mu^c(\lambda_2=\lambda) & = & b \frac{2\lambda}{1+\lambda^2} < b \\ \nu^c(\lambda_2=\lambda) & = & \left( \frac{ 1-\lambda^2 }{ 1+ \lambda^2} \right)^2 \end{eqnarray} But there is an outside point with \(\tilde{\lambda} = \frac{2\lambda}{1+\lambda^2}\) we might choose instead. This, by construction, will have the same mean \(b \tilde{\lambda}\). We notice it also has the same variance: \begin{eqnarray} \nu(\tilde{\lambda}) & = & 1- \tilde{\lambda}^2 \\ & = & \frac{ (1 + \lambda^2)^2 - 4 \lambda^2 }{ (1 + \lambda^2)^2 } \\ & = & \frac{ (1 - \lambda^2)^2 }{ (1 + \lambda^2)^2 } \\ & = & \nu^c(t). \end{eqnarray} Thus we have established an outside point with equivalent utility.

Exhibit E: Pictures looking for proofs.

Returning to the symmetric case \( X(t_1)=b \) we ask is there any value for \(t_2 \) inside the interval \( (0,t_1) \) that is a better choice that going outside the bridge (and using the optimal one period solution). To put it another way, is there any value for \(\lambda\) for which there is any choice of \(\lambda_2\) such that \begin{equation} D(\lambda_2,\lambda,b) = \log \xi(b) - \psi(b) < 0 \end{equation} where \( \psi(b) = \mu^c(t) + \frac{1}{2} \nu^c(t) \) ? Or, using the formulas above, is there a combination \(\lambda_2, \lambda\) for which \begin{equation} \log \xi(b) - \left( b \frac{ \lambda_2 + \lambda}{ 1 +\lambda \lambda_2} + \frac{1}{2} \frac{ (1-\lambda_2^2)^2 (1-\lambda^2)^2 } { ( 1- \lambda_2^2 \lambda^2 )^2 } \right) < 0 ? \end{equation} Here is a picture of the difference between the outside option and inside options in the case \(b=1.5\) that would suggest the difference is always positive:


Here is another, for the case \(b=1.5\).



Lemma F: For \(b \in (0,1)\) the function \(S(u) = D(\lambda_2=u,\lambda=u;b)\) satisfies \(S(u)\ge0 \) for \(u \in (0,1)\) with a single zero at \(u=\frac{1-\sqrt{1+b}}{b}\)

We consider the middle of the symmetric bridge once more, this time varying the width of the bridge. We claim that for \(b \in (0,1)\) there is a particular width bridge for which we are indifferent as to whether we choose the middle of the bridge (i.e. \(t_2=\frac{1}{2} t_1\) or the optimal outside solution. Specializing to this case implies \(\lambda_2=\lambda\) so we write \(S(u) = D(\lambda_2=u,\lambda=u;b)\). From the formula for \(D\) given above we have \begin{eqnarray} S(u) &= & \frac{1}{2}(1+b^2) - \left( b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \frac{ (1-u^2)^4 } { ( 1- u^4 )^2 } \right) \\ &= & \frac{1}{2}(1+b^2) - \rho(u;b) \end{eqnarray} where \begin{equation} \rho(u;b) = b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \left( \frac{1-u^2}{1+u^2} \right)^2 \end{equation} A little algebra shows that \begin{equation} \frac{ \partial }{\partial u} \rho(u;b) = \frac{ 2(u^2-1)(bu^2-2u+b)}{ (1+u^2)^3} \end{equation} which is zero for any solution of \(b u^2-2u+b=0\). If \(b<1\) we do indeed want the root in \(0,1\) given by \begin{equation} u = \frac{1-\sqrt{1+b}}{b} \end{equation} So substituting \(u^2 = \frac{2u-b}{b}\) back into \(\rho\) yields, after a little simplification: \begin{equation} \rho\left(u=\frac{1-\sqrt{1+b}}{b};b\right) = \frac{1}{2}(1+b^2) \end{equation} which we recognize as precisely the value of the one period problem. This shows that \begin{eqnarray} S(u) &= & \frac{1}{2}(1+b^2) - \rho(u;b) \\ & \ge & \frac{1}{2}(1+b^2) - \max \rho(u;b) \\ & = & \frac{1}{2}(1+b^2) - \frac{1}{2}(1+b^2) \\ & = & 0 \end{eqnarray} as claimed.

Lemma G: For \(b >1 \) the function \(S(u) = D(\lambda_2=u,\lambda=u;b)\) satisfies \(S(u) > 0 \) for \(u \in (0,1)\). It approaches value zero as \(u \rightarrow 1\), corresponding to the case of a very short bridge.

The limiting case is obvious algebraically and geometrically, since for \(b>1\) we have \begin{eqnarray} S(u) &= & b - \left( b \frac{ 2u}{ 1 + u^2} + \frac{1}{2} \left( \frac{1-u^2}{1+u^2} \right)^2 \right) \\ & \rightarrow & 0 \end{eqnarray} as \(u\rightarrow 1\). Moreover we can simplify to \begin{eqnarray} S(u) &= & \left(\frac{1-u^2}{1+u^2} \right) \left( 1+ \frac{1}{2} \frac{(1-u)^2}{(1+u^2)} \right) \end{eqnarray} which is evidently greater than zero.

Wednesday, November 20, 2013

Keeping punters log-happy: Some properties of a "pristine parimutuel" market clearing mechanism

Is there such a thing as a perfect probabilistic paradise for punters? A place where participants' prior probabilistic prophecies are pulled apart and placed into a philosophically pristine parimutuel posterior, without painful departure from previously perceived pseudo-optimal particulars? I believe the platonic possibility is promising (though to provoke, in practice we puritans may be prevented from partaking in this powerful portal whilst we are paralyzed by the paranoid parlance of the present day).

                                               A pristine market

Well now that I'm out of spit, here are some characteristics of a market for mutually exclusive, exhaustive outcomes that I would like to see:
  1. Every participant is forced to invest in a manner that optimizes their long run wealth. They will allocate their wealth to maximize \(E^P[\log {\rm wealth}] \) with respect to their best estimate of real world probability \(P\).
  2. Every participant's estimate of probability, on which 1. is evidently based, must include both private information (from which they might derive a probability measure \(R\) say) and the market probabilities themselves (call them \(Q\)). In contrast, to simply equate \(P\) with \(R\) in 1. is a dreadful, though rather common, mistake.
  3. Criteria 2. is achieved without the need for participants to monitor and respond to price feedback. In contrast the usual mechanism for including market information, as we find in racetrack parimutuels, is to provide provisional estimates of \(Q\) in realtime so that punters can react.
The last requirement motivated my ponderings on this topic. In our perceived utopia participants are given inventive to provide their own private probabilities \(R\) well before "race time" (as it were). They can then put their feet up and relax safe in the knowledge that 1. will be satisfied automatically.

Such is non-trivial because even investors whose bets are tiny with respect to the overall market volume (and thus have negligible price impact) must quite rationally react to partial price discovery even though in "theory", they should simply bet proportionately irrespective of the odds on offer (a lovely accident mentioned below). I use scare quotes because unless the odds are truly known, \(Q\) should really enter the optimization via the back door: an update to \(P\) acknowledging the market's ability to aggregate information.

In this setup the market clearing mechanism does the heavy lifting normally performed iteratively and imperfectly. So long as there is a rule governing the manner in which "\(R + Q = P\)" (figuratively speaking) we can anticipate every player's estimate of \(P\) given \(Q\), and thereby solve simultaneously for Q and log-optimal allocations that presumably influence Q. Here we consider the simplest rule of that sort. We shall assume P is a convex combination of \(R\) and \(Q\) with fixed coefficients.

Also for simplicity we consider the case where the market is not subsidized (though that might be an interesting direction for generalization). Then linear pricing forces us to adopt the most straightforward parimutuel clearing mechanism once we known the allocations: divide all the money wagered amongst those choosing the winning horse, in proportion to their bet.

                                       Necessary optimality conditions

Suppose market participant \(k\) allocates all his wealth \(W^k\) across \(I\) mutually exclusive outcomes. Suppose his estimate of probability for the \(i\)'th state is given by \begin{equation} p^k_i = \eta^k r^k_i + (1- \eta^k) q_i \end{equation} where \(r^k\) is his best estimate using only private information, and \(q_i\) is the market implied probability arrived at by means to be discussed.

As noted this equation is a statement of a seemingly rational philosophy, independent of how the market operates. Investor \(k\) might have noticed in the past that his private information adds some explanatory power to the market, but he probably shouldn't ignore the market prices altogether in arriving at the best estimate of real world probability.

We shall further suppose, in what follows, that all \(K\) participants are rational in another sense. They wish to optimize the log of their posterior wealth. Now it is well appreciated that if \(p^k_i\) are considered fixed this log-optimality leads simply to proportional betting, but that is not the case here. Only \(r^k_i\) and \(\eta^k\)'s are fixed, and we shall attempt to construct allocations \(u = \{u^k\}\) and clearing prices \(q_i\) that overtly depend on the investments made by participants.

To that end let \(u^k\) denote the fraction of wealth investor \(k\) invests in the \(i\)'th state. Suppose that the market clears in parimutuel fashion, meaning that all participants receive the same price. The market probability for the \(i\)'th state must be \begin{equation} q_i = \frac{ \sum_k u^k_i W^k } { \sum_{k=1}^K W^k } = \sum_k u^k_i W^k \end{equation} since we might as well suppose w.l.o.g. that \( \sum_k W^k = 1 \).

The question then arises, is there a choice of \(\{u^k_i \}_{i,k}\) such that investment by each participant is log-optimal? Intuitively one would expect a market equilibrium provided \(\eta^k\)'s are strictly between \(0\) and \(1\).

The utility function for the \(k\)'th investor is \begin{eqnarray} U^k( u^k ) & = & E^p\left[ \log\left( \frac{u^k_i}{q_i} \right) \right] \\ & = & \sum_{i=1}^I p^k_i \log\left( \frac{u^k_i}{ \sum_k u^k_i W^k } \right) \\ & = & \sum_{i=1}^I ( \eta^k r^k_i + (1- \eta^k) q_i ) \log\left( \frac{u^k_i}{ \sum_k u^k_i W^k } \right) \end{eqnarray} and by definition of \(u^k_i\) the constraints are \(\sum_i u^k_i = 1 \). Or for every \(k\) we might write \( g(u^k)=0\) where \(g(u^k) = \sum_i u^k_i - 1\). This sets up the first order Lagrangian equations for \( \Lambda(u,\lambda) = U(u) - \lambda \cdot g( u ) \) where we collect the components \(k=1,..K\). As usual for these problems we set \( \nabla \Lambda = 0 \) because for optimality the derivative of the objective function must be proportional to the derivative of the constraint function. This leads to equations of the form \begin{eqnarray} 0 & = & p^k_i( q_i(u)) \left( \frac{1}{u^k_i} - \frac{W^k}{q_i(u)} \right) - \eta^k W^k \log \left( \frac{u_i}{q_i(u)} \right) - \lambda^k, \ \ \ {\rm and} \\ 0 & = & \sum_i u^k_i - 1 \end{eqnarray} relating the allocations \(u^k_i\). Notice there are \(KI+K\) equations and \(KI+K\) free variables, including both the \(u^k_i\)'s and the \(\lambda^k\) multipliers (the \(W^k\) are fixed parameters and we have \(q_i(u) = \sum_k u^k_i W^k\) as noted above). The solution to this system of non-linear equations defines a pristine parimutuel clearing mechanism.

                                  Comparison to proportional betting

In contrast if we imagine that \(q_i\) are not a function of allocations \(u^k_i\) but fixed, and further suppose \(\eta^k = 1\) then we return to the overly stylized world where participants don't take market prices into account, preferring to believe their own homework is a sufficient statistic. The optimality conditions are instead: \begin{eqnarray} 0 & = & \frac{p^r_i}{u^k_i} - \lambda^k, \ \ \ {\rm and} \\ 0 & = & \sum_i u^k_i - 1 \end{eqnarray} It is apparent from the first equation that \(u^k_i \propto r^k_i\) and then, from the second, that \(u^k_i = r^k_i\). We recover the remarkable accident referred to as proportional betting, where the price \(q_i\) does not enter the picture. This works perfectly well for blackjack where \(r^k = p^k\) but not most real world games where markets inevitably supplement one's private information.


                                         
                                                     Critique

While I have caged this discussion in market language, it should be apparent that we have derived an algorithm for combining multiple probabilistic forecasts into a single probability distribution \(Q = \{q_i\}\), for which there is a large literature. See the references to Ranjan and Gneiting for example.

I shall concede that what is philosophically inconsistent is my use of simple linear pooling to arrive at the individual's subjective probability estimates \(p^k = \{p^k_i\}_{i=1}^I\) based on their prior \(r^k\) and the final market price \(q_i\), given that we then derive a more sophisticated combination scheme for meshing between individuals. Why not use the "better" scheme to combine \(Q\) and \(R\)?

I suppose a flippant response is "why not put a picture of a boy holding a cereal box on the cereal box?". Let me think about a more satisfying answer and get back to you.