Sunday, September 15, 2013

The Horse Race Problem: A Subspace Solution

The "horse race problem" asks for a reasonable estimate of the relative ability of entrants in a competition given that we know the probabilities of winning. The question is vague, but implies the competition awards a win to the contestant with the highest (or lowest) score, and perhaps further implies that the distribution of contestant scores is a translation family where "ability" is a location parameter.  This post formalizes the problem and provides a simple practical solution.

Geelong Cup Photo Finish 2005

Inferring running time distributions from win probabilities

The question is hardly new. As noted in Probability models for horse race outcomes by Mukhtar M. Ali, one way to estimate race outcome probabilities given a liquid market for the winner is to calibrate distributions of their running times. For example, one might assume the distribution of the \(i\)'th horse's running time \(\{\tau_i\}\)is gaussian, with known variance, and that the location of each horses' distribution is a free parameter to be estimated. More generally we assume all running time distributions belong to a translation family where \(f\) is fixed and a single parameter \(a_i\) characterizes each horse via its running time density $$ f_i(x) = f(x;a_i) = f(x-a_i) $$ We denote the corresponding probability that the \(i\)'th horse's running time \(\tau_i\) is less than \(x\) by $$ F(x - a_i) = Prob( \tau_i < x).$$ Under these assumptions the probability \(p_i\) that the \(i\)'th horse wins the race is\begin{equation} p_i = \int f(x;a_i) \prod_{j \not= i}^k \left(1-F\left(x;a_j\right) \right) dx. \end{equation} If we arbitrarily fix \(a_1=0\) we can presumably solve for \(\{a_i\}_{i>1}\). More on that in a moment.

Motivation (or lack thereof)

On application of the horse race problem is the estimation of so-called exotic probabilities: the probabilities of relevance for racegoers betting on quinellas, exactas (forecast) and trifectas. This presupposes that the market for horses winning a race is more efficient than the markets for various permutations of finishers. There is a small literature aimed at exploiting inefficiencies between the win and place (show) markets at racetracks.

The evidence is not altogether compelling, incidentally. It has been argued here that place prices on at least one betting exchange are roughly as efficient as the win market. And the ability for professional punters to beat the track take is greater for trifectas than the win market (because knowledge of two or three horses can be multiplied, as it were) suggesting that the smart money might be in trifectas instead. In a past life I gathered a fair amount of evidence to suggest just that, so don't rush off to the Dapto dogs armed only with a quaint assumption. Still, the horse race problem as stated remains a interesting mathematical problem and has motivated some people to study order statistics.

Normal running times

As noted by Ali, in the case where \(f = \phi \) is the normal distribution an approximate solution for the \(a_i\) given known \(p_i\) is given by Henery $$ a_i = \frac{ (k-1) \phi\left( \Phi^{-1}(\frac{1}{k}) \right) \left(\Phi^{-1}(p_i)- \Phi^{-1}(\frac{1}{k})\right) }{ \Phi^{-1}\left( \frac{i-\frac{3}{8}}{k + \frac{3}{4} } \right) } $$ Some other approximations are mentioned in this presentation by Victor Lo who summarizes some empirical findings.

Gamma running times

I leave it to the reader to explore another territory where analytic solutions are available. For a treatise on ranking models with gamma processes see Gamma processes, paired comparisons, and ranking by Hal Stern. (Related: Generalized Bradley-Terry models).

Arbitrarily distributed running times by brute force optimization

Here I restrict myself to some observations on the problem where \(f\) is arbitrary. As mentioned above we can set \(a_1=0\) and throw  \(\{a_i\}_{i>1}\) to a solver for \(F=0\). Here \(F = F\left(\{a_i\}_{i\not=1}\right)\) is an \(n\) dimensional vector function taking \(n-1\) arguments, and the \(i\)'th entry of \(F\) is the difference between the computed (\(q_i\)) and desired probabilities: \begin{equation} F_i = \overbrace{\int f(x;a_i) \prod_{j \not= i}^k \left(1-F\left(x;a_j\right) \right) dx}^{q_i} - p_i.\end{equation}This works fine for moderate \(n\), but for \(n>100\) it is time consuming, as can be seen in the following figure.

Computation time in seconds (left) as a function of the number of runners in the race.
Log scale on right. Matlab's implementation of Levenberg-Marquardt optimizer was used.
Solving a race with 200 runners takes on the order of one hour.

A novelty? Arbitrarily distributed running times by subspace optimization

We now introduce a method that cuts the computation time dramatically while retaining a high level of accuracy in the solution. The idea is to restrict the \(n-1\) dimensional search to a subspace of dimension \(m < n-1\). In particular if \(m\) is fixed (say \(m=15\)) then we can compute the solution in approximately linear time. A one hour computation for a large race becomes a one minute computation (not as cool as detexity, but not bad).

Obviously the subspace must be chosen with some care, lest it lie nowhere near the best solution. But before getting to that it also helps to introduce a scaled measure of discrepancy between the desired vector \(p\) and the one computed. We use the log-odds discrepancy:
$$ D(p_i, q_i) = log(p_i)-log(1-p_i)-log(q_i)+log(1-q_i)$$ rather than the naked difference of probabilities in the equation above.

Now to optimize over a subspace we define a map \(\phi(z \in R^m) \mapsto a \in R^n\) as follows. We assume the probabilities are sorted from highest to lowest. We let \(\{c_k\}_{k=1}^{m-1}\) be a collection of centroids obtained by k-means clustering of the differenced log-probabilities \(\{\log(p_1)-\log(p_i)\}\) into \(m-1\) clusters. We then enlarge the collection by adding \(0\) and \(\log(p_1)-\log(p_n)\). After sorting the enlarged list of cluster centroids (padded with extreme points) we have a collection \(c = \{0, \dots, \log(p_1)-\log(p_n)\}\) of cardinality \(m+1\). Then we define an \(n\) by \(m+1\) matrix \(A\) by setting \(A_{1j} = \delta_{1j}\), \(A_{n,j} = \delta_{m+1,j}\) where \(\delta\) is the Dirac delta function. For \(2 \le i \le m\), choosing the convex combination of the nearest two cluster points that returns \(p_i\). Thus there are two non-zero entries of \(A_{ij}\) for all but the first and last rows (unless a \(p_i\) coincides with a centroid). With slight abuse of notation, \(A\) as a linear transformation from \(R^{m+1}\) to \(R^n\) satisfying \(p = A(c)\).

The map \(\phi(z \in R^m) \mapsto a \in R^n\) is defined by composition:
       \phi = A \cdot \iota
$$where in matlab notation we would write \(\iota(z) = [0;z]\). That is to say \(\iota\) is the trivial map from \(R^m\) to \(R^{m+1}\) that assumes the first coordinate is zero. Finally then, we pass the vector function \(\tilde{F}\) taking \(m\) arguments to a solver. The \(i\)'th coordinate \(\tilde{F}\) is given by
      \tilde{F}_i = D\left( q_i\left( \phi\left(z\right)\right), p_i \right)
This works surprisingly well. The intuition is that the sorted abilities \(\{a_i\}\) can be interpolated reasonably (and very roughly, vary with the \(\log\) of \(p_i\)) so long as we have enough knot points. It helps to place the knot points near population centers to reduce the amount of interpolation. As we vary \(z\) we are varying the locations of the knot points and forcing the \(a_i\)'s to move in a subspace, but since the interpolation is pretty good the subspace lies quite close to the solution.

And as noted, it is much faster. The figure below shows computation time as a function of the number of starters when we use subspaces of dimension \(5\) and \(15\). No code optimization has been attempted and undoubtedly, better numbers could be achieved in any number of ways. But the point is that using a subspace search alone results in a \(50\)x improvement or so for \(n \approx 200\). The improvement would of course be much greater, in relative terms, for \(n \gg 200\).

Computation time as a function of race size when using a subspace search
The error introduced by the subspace approximation will vary, of course, but as a rough guide the ratio of \(p_i\) to \(q_i\) differs from unity by about half a percent when using a subspace of dimension five, even when there are \(200\) runners. So a horse at 100/1 might be calibrated at 99.5/1. This error drops by another order of magnitude when using a subspace of dimension \(15\), whereupon the calibrated and desired probabilities are, as a practical matter, indistinguishable.

Possible improvements

For known distributions we ought to be able to use more accurate interpolation. Here I merely assumed that probabilities fall off as the exponential of ability difference. For very large races we should use the stability postulate instead, at least for those in the tail.

A generalization?

It's usually around this point where start to suspect there can't be anything new here. After all, any similar optimization problem where we have an ounce of intuition as to the solution that could be similarly addressed (i.e. by minimizing interpolation with clustering). Perhaps somebody could point me to method XYZ and conclude that the horse race solution presented here is merely an application of the same.


  1. This comment has been removed by the author.

  2. Hello,

    Interesting that someone linked to my website from yours and yet there is no reference to my site ( on yours.

    I wrote a piece about place pricing for horse races, which discussed the efficiency of such prices on the Betfair betting exchange. Conclusion - they are.

    You may be interested in Edelman's alternative approach on page 72 of his The Compleat Horseplayer, which is a lot easier/faster to compute than Henery/Harville et al and yields a comparable figure.

    I've never seen a 200 runner horse race though you do get 40 runner fields in The National. Would be an interesting calavalry charge to the first fence.

  3. Hi James,

    There is a link to your column from this article (where it says "it has been argued..."). Wasn't entirely clear how to reference the article better. I certainly found it interesting. Thanks for mentioning Compleat Horseplayer. David's an old friend.

    Kind regards,


    1. Now I see it. The visited link colour threw me.

      The betting exchanges are much more efficient compared to the pari-mutuel markets. I think I can be sure in assuming that market makers on Betfair's place market are using one of the pricing models to mark-up initial prices. I once built a bot to trade on skewed place prices and only managed to break even.

      A combination of the win and place derived probabilities are then used to mark-up the forecast (1-2) and reverse-forecast (1-2 & 2-1) markets. The pari-mutuel market probs are more hidden than on an online exchange and need teasing out.

      I have corresponded with David Edelman whilst building an algorithm to calculate the Edelman Formula for place prices.

      Your blog is very interesting and I'll continue to tune in.

  4. Lets say there are 8 horses in a race. The favourite is $3.30 (30.3% chance of winning, ignoring the bookies clip on the event). From just these two pieces of information can we calcualte (a) the probability of the horse coming 2nd (b) and prob of coming 3rd? I thought it might be 30.3% * 69.7% = 21% prob of coming 2nd and 30.3% * (21% + 30.3%) = prob of coming 3rd. This results in a 67% probability of coming 1,2,3 meaning place odds greater than $1.49 are generous. Or am I way off?

    1. Your answer assumes that the conditional probability of coming second (if you don't come first) is proportional to your unconditional probability of winning. While that is very roughly true, it isn't accurate enough for more than government work.

    2. So you answer is (in translation) "impossible to calculate any probability based on the limited amount of information".

    3. You need to know the probabilities of all the horses in the race and then using either Harville, Henery or Stern methods to determine place and show probabilities.

      This was a common calculation when there was a big disconnect between win/place/show prices on the Tote (Paris-Mutuel). With exchange betting I am not so sure such inefficiencies exist anymore.

    4. Renee: yes. We don't have enough information to determine the place probabilities from win only. That estimate will ultimately rest on the reasonableness of assumptions made and, of course, empirical evidence where it exists. My point is merely that some methods are more reasonable than others. Here I've considered an approach where the underlying assumption is conceptually simple but the calculations less so. Alternatives mentioned by James have simpler calculations (esp. for Harville) but the underlying assumptions they make are at least partially motivated by convenience of calculation. But as James also points out, any attempt to use these devices, simple or complex, rests on the assumption that win markets are more efficient than show/place to begin with. One would be strongly advised to test that assumption, as James has.