The "horse race problem" asks for a reasonable estimate of the relative ability of entrants in a competition given that we know the probabilities of winning. The question is vague, but implies the competition awards a win to the contestant with the highest (or lowest) score, and perhaps further implies that the distribution of contestant scores is a translation family where "ability" is a location parameter. This post formalizes the problem and provides a simple practical solution.

Geelong Cup Photo Finish 2005 
Inferring running time distributions from win probabilities
The question is hardly new. As noted in
Probability models for horse race outcomes by Mukhtar M. Ali, one way to estimate race outcome probabilities given a liquid market for the winner is to calibrate distributions of their running times. For example, one might assume the distribution of the \(i\)'th horse's running time \(\{\tau_i\}\)is gaussian, with known variance, and that the location of each horses' distribution is a free parameter to be estimated.
More generally we assume all running time distributions belong to a translation family where \(f\) is fixed and a single parameter \(a_i\) characterizes each horse via its running time density
$$ f_i(x) = f(x;a_i) = f(xa_i) $$
We denote the corresponding probability that the \(i\)'th horse's running time \(\tau_i\) is less than \(x\) by
$$ F(x  a_i) = Prob( \tau_i < x).$$
Under these assumptions the probability \(p_i\) that the \(i\)'th horse wins the race is\begin{equation} p_i = \int f(x;a_i) \prod_{j \not= i}^k \left(1F\left(x;a_j\right) \right) dx. \end{equation} If we arbitrarily fix \(a_1=0\) we can presumably solve for \(\{a_i\}_{i>1}\). More on that in a moment.
Motivation (or lack thereof)
On application of the horse race problem is the estimation of socalled exotic probabilities: the probabilities of relevance for racegoers betting on quinellas, exactas (forecast) and trifectas. This presupposes that the market for horses winning a race is more efficient than the markets for various permutations of finishers. There is a small literature aimed at exploiting inefficiencies between the win and place (show) markets at racetracks.
The evidence is not altogether compelling, incidentally. It has been argued
here that place prices on at least one betting exchange are roughly as efficient as the win market. And the ability for professional punters to beat the track take is greater for trifectas than the win market (because knowledge of two or three horses can be multiplied, as it were) suggesting that the smart money might be in trifectas instead. In a past life I gathered a fair amount of evidence to suggest just that, so don't rush off to the Dapto dogs armed only with a quaint assumption. Still, the horse race problem as stated remains a interesting mathematical problem and has motivated some people to study order statistics.
Normal running times
As noted by Ali, in the case where \(f = \phi \) is the normal distribution an approximate solution for the \(a_i\) given known \(p_i\) is given by Henery
$$
a_i = \frac{ (k1) \phi\left( \Phi^{1}(\frac{1}{k}) \right) \left(\Phi^{1}(p_i) \Phi^{1}(\frac{1}{k})\right) }{ \Phi^{1}\left( \frac{i\frac{3}{8}}{k + \frac{3}{4} } \right) }
$$
Some other approximations are mentioned in
this presentation by Victor Lo who summarizes some empirical findings.
Gamma running times
I leave it to the reader to explore another territory where analytic solutions are available. For a treatise on ranking models with gamma processes see
Gamma processes, paired comparisons, and ranking by Hal Stern. (Related:
Generalized BradleyTerry models).
Arbitrarily distributed running times by brute force optimization
Here I restrict myself to some observations on the problem where \(f\) is arbitrary. As mentioned above we can set \(a_1=0\) and throw \(\{a_i\}_{i>1}\) to a solver for \(F=0\). Here \(F = F\left(\{a_i\}_{i\not=1}\right)\) is an \(n\) dimensional vector function taking \(n1\) arguments, and the \(i\)'th entry of \(F\) is the difference between the computed (\(q_i\)) and desired probabilities: \begin{equation} F_i = \overbrace{\int f(x;a_i) \prod_{j \not= i}^k \left(1F\left(x;a_j\right) \right) dx}^{q_i}  p_i.\end{equation}This works fine for moderate \(n\), but for \(n>100\) it is time consuming, as can be seen in the following figure.

Computation time in seconds (left) as a function of the number of runners in the race.
Log scale on right. Matlab's implementation of LevenbergMarquardt optimizer was used. 
Solving a race with 200 runners takes on the order of one hour.
A novelty? Arbitrarily distributed running times by subspace optimization
We now introduce a method that cuts the computation time dramatically while retaining a high level of accuracy in the solution. The idea is to restrict the \(n1\) dimensional search to a subspace of dimension \(m < n1\). In particular if \(m\) is fixed (say \(m=15\)) then we can compute the solution in approximately linear time. A one hour computation for a large race becomes a one minute computation (not as cool as
detexity, but not bad).
Obviously the subspace must be chosen with some care, lest it lie nowhere near the best solution. But before getting to that it also helps to introduce a scaled measure of discrepancy between the desired vector \(p\) and the one computed. We use the logodds discrepancy:
$$ D(p_i, q_i) = log(p_i)log(1p_i)log(q_i)+log(1q_i)$$ rather than the naked difference of probabilities in the equation above.
Now to optimize over a subspace we define a map \(\phi(z \in R^m) \mapsto a \in R^n\) as follows. We assume the probabilities are sorted from highest to lowest. We let \(\{c_k\}_{k=1}^{m1}\) be a collection of centroids obtained by kmeans clustering of the differenced logprobabilities \(\{\log(p_1)\log(p_i)\}\) into \(m1\) clusters. We then enlarge the collection by adding \(0\) and \(\log(p_1)\log(p_n)\). After sorting the enlarged list of cluster centroids (padded with extreme points) we have a collection \(c = \{0, \dots, \log(p_1)\log(p_n)\}\) of cardinality \(m+1\). Then we define an \(n\) by \(m+1\) matrix \(A\) by setting \(A_{1j} = \delta_{1j}\), \(A_{n,j} = \delta_{m+1,j}\) where \(\delta\) is the Dirac delta function. For \(2 \le i \le m\), choosing the convex combination of the nearest two cluster points that returns \(p_i\). Thus there are two nonzero entries of \(A_{ij}\) for all but the first and last rows (unless a \(p_i\) coincides with a centroid). With slight abuse of notation, \(A\) as a linear transformation from \(R^{m+1}\) to \(R^n\) satisfying \(p = A(c)\).
The map \(\phi(z \in R^m) \mapsto a \in R^n\) is defined by composition:
$$
\phi = A \cdot \iota
$$where in matlab notation we would write \(\iota(z) = [0;z]\). That is to say \(\iota\) is the trivial map from \(R^m\) to \(R^{m+1}\) that assumes the first coordinate is zero. Finally then, we pass the vector function \(\tilde{F}\) taking \(m\) arguments to a solver. The \(i\)'th coordinate \(\tilde{F}\) is given by
$$
\tilde{F}_i = D\left( q_i\left( \phi\left(z\right)\right), p_i \right)
$$
This works surprisingly well. The intuition is that the sorted abilities \(\{a_i\}\) can be interpolated reasonably (and very roughly, vary with the \(\log\) of \(p_i\)) so long as we have enough knot points. It helps to place the knot points near population centers to reduce the amount of interpolation. As we vary \(z\) we are varying the locations of the knot points and forcing the \(a_i\)'s to move in a subspace, but since the interpolation is pretty good the subspace lies quite close to the solution.
And as noted, it is much faster. The figure below shows computation time as a function of the number of starters when we use subspaces of dimension \(5\) and \(15\). No code optimization has been attempted and undoubtedly, better numbers could be achieved in any number of ways. But the point is that using a subspace search alone results in a \(50\)x improvement or so for \(n \approx 200\). The improvement would of course be much greater, in relative terms, for \(n \gg 200\).

Computation time as a function of race size when using a subspace search 
The error introduced by the subspace approximation will vary, of course, but as a rough guide the ratio of \(p_i\) to \(q_i\) differs from unity by about half a percent when using a subspace of dimension five, even when there are \(200\) runners. So a horse at 100/1 might be calibrated at 99.5/1. This error drops by another order of magnitude when using a subspace of dimension \(15\), whereupon the calibrated and desired probabilities are, as a practical matter, indistinguishable.
Possible improvements
For known distributions we ought to be able to use more accurate interpolation. Here I merely assumed that probabilities fall off as the exponential of ability difference. For very large races we should use the stability postulate instead, at least for those in the tail.
A generalization?
It's usually around this point where start to suspect there can't be anything new here. After all, any similar optimization problem where we have an ounce of intuition as to the solution that could be similarly addressed (i.e. by minimizing interpolation with clustering). Perhaps somebody could point me to method XYZ and conclude that the horse race solution presented here is merely an application of the same.