## Monday, September 2, 2013

### Quasi-random sampling subject to linear and moment constraints (or: "how to bypass model calibration")

Its a rainy long weekend in New York and I've been dusting off an old experiment while listening to the thunder at a friend's place. I was particularly impressed by the fact they had an outlet on their porch positioned right by a comfy chair.

Here's the motivation. In finance we sometimes find ourselves attempting to calibrate a probabilistic model for an underlying random variable (such as a stock price) subject to known constraints (such as the price of an option on the same). Often, we then draw samples from the same model for the purpose of pricing other securities, performing risk calculations or satisfying some other need.

\begin{equation} observed\ prices \rightarrow calibrated\ model \rightarrow random\ samples \end{equation} This has several advantages including the built in regularization we get by using a parametrized model. There is on obvious drawback however: one has to come up with the model!

A while ago I was motivated to skip the calibration step. I was inspired by the herding literature in machine learning, and I started playing with a more direct path: \begin{equation} observed\ prices \rightarrow random\ samples \end{equation}.The idea is that we devise quasi-random sampling schemes that automatically achieve the calibration while striving to fill out the space in a way that achieves some of the objectives of a calibrated model.

Best described by an example I feel. And in this one we consider samples on a discrete lattice where smoothness of the sampled distribution is controlled using the discrete Laplacian.

Example

I will lead with the picture. The middle plot demonstrates that we are able to generate samples whose histogram is quite smooth, yet satisfy the constraints provided. In this case we've chosen a two dimensional state space, as might commonly occur when (strike, maturity) or say (attachment point, maturity) axes are in play. Discrete Laplacian minimizing quasi-random samples (middle) approximating the true distribution (left). For comparison we also show a minimal surface (right) generated using the same constraints with direct optimization.
We've started with some fake distribution on the left, created some functions of the same and used those functions as constraints.

Aspirations and notation

We allow a lattice to be determined by axes $$x^{(1)}=\{x^{(1)}_1 >...> x^{(1)}_{n_1}\}$$ and $$x^{(2)}=\{x^{(2)}_1 >... > x^{(2)}_{n_2}\}$$. We suppose that $$F = \{f_k\}_{k \in K}$$ is a collection of functions on the implied lattice whose means are known constants $$\mu_k$$. That is,
\begin{equation}
\mu_k = E[f_k] \ \  \ \ k \in K.
\end{equation}
with respect to some density that we wish to sample from. Specifically we want to draw samples $$z=\{z^i\}_{i=1}^I = \{(z^i_1,z^i_2)\}_{i=1}^I$$ such that
\begin{equation}
\hat{f_k} := \frac{1}{I}\sum_{i=1}^I f_k(z^i)
\end{equation}
satisfies approximate equalities
\begin{equation}
\hat{f_k}(z) \approx \mu_k, \ \ k \in K.
\end{equation}
It will turn out that using equalities here involves no loss of generality. Indeed a cute feature of the algorithm exhibited here is that inequalities are treated in the same manner as equalities: we simply use the same function twice with the same weight $$w$$ but different $$\mu$$'s. The lower $$\mu$$ is interpreted as a lower bound and the higher as an upper bound.

Now for notational convenience we might as well consider $$f_k$$ to be functions defined on the integer lattice $$\{1..n_1\} \times \{1..n_2\}$$, identifying $$f(r,s) := f(x^{(1)}_r,x^{(2)}_s)$$. We also identify samples $$(z^{(1)}_r,z^{(2)}_s)$$ with their integer pairs $$(r,s)$$. We denote by $$Z^i(r,s)$$ the number of samples falling on the lattice point.

A fast algorithm for generating quasi-random samples

In what follows we will use a prior probability $$\phi(r,s).$$ defined on the lattice. It might might be determined by minimization of the square of the discrete Laplacian subject to the equality and inequality constraints given, or by other means.

The idea is to start lobbing samples on the lattice (like stackable Go stones) and watch the distribution emerge. By monitoring the discrete Laplacian we can try to coerce the samples to fill in holes - smoothing things out as it were. By monitoring the expectation of the constraint functions $$f_k$$ with respect to the pseudo-sample distribution thus far, we can correct for bias.

For this to be efficient we need a very efficient means of bookkeeping.

Initialization:

At the outset we fix a constant lattice function $$A_k(r,s) = signum \left( \mu_k - f(r,s) \right)$$ for $$k \in K$$. We initialize the laplacian $$D(r,s)=0$$ for all lattice points $$(r,s)$$. We initialize counters $$S^k_0 = 0$$ for $$k \in K$$.

Selecting the next quasi-random sample:

Beginning with quasi-random sample $$i=1$$ we perform the following steps.

1. Set urgency $$\nu(r,s) = D(r,s)$$ then for $$k \in K$$ update it by $$\nu(r,s) \rightarrow \nu(r,s) + w_k A_k(r,s) signum( S^k - i\mu_k )$$
2. Choose the most urgent lattice point after weighting: $$(r_i,s_i) = \arg \max \nu(r,s) \phi(r,s)$$
3. Increment the sample count for the lattice point:  $$Z(r_i,s_i) \rightarrow Z(r_i,s_i) + 1$$
4. Decrement $$D(r_i,s_i) \rightarrow D(r_i,s_i) -1$$ then increment $$D(r^*,s^*) \rightarrow D(r^*,s^*) + \frac{1}{4}$$ for all lattice points $$r^*,s^*$$ neighbouring $$r_i,s_i$$.
5. Update $$S^k \rightarrow S^k + f^k(r_i,s_i)$$.

What we are doing here is selecting a new point that improves as many of the biases as possible, subject to the discrete Laplacian not getting out of whack.

Selecting the next quasi-random sample (slightly more efficient):

It turns out we can be a little sneakier and initialize $$\nu(r,s) = \phi(r,s)$$. Then the following algorithm  is equivalent and somewhat more efficient.
1. Choose the most urgent lattice point after weighting: $$(r_i,s_i) = \arg \max \nu(r,s) \phi(r,s)$$
2. Increment the sample count for the lattice point:  $$Z(r_i,s_i) \rightarrow Z(r_i,s_i) + 1$$
3. Update $$S^k = S^k_{prev} + f^k(r_i,s_i)$$
4. For $$k \in K$$ update $$\nu(r,s) \rightarrow \nu(r,s) + w_k A_k(r,s) \left( signum(S^k-i \mu_k) - signum(S^k_{prev} - (i-1) \mu_k \right)$$
5. Decrement $$\nu(r_i,s_i) \rightarrow \nu(r_i,s_i) -1$$ then for all four neighbours $$(r^*,s^*)$$ of $$(r_i,s_i)$$ increment $$\nu(r_i,s_i) = \nu(r_i,s_i) + \frac{1}{4}$$.
6. Set $$S^k_{prev} = S^k$$.
Notice that we don't really need to consider the discrete Laplacian and the moment terms separately, since we only care about their direct tradeoff.

A minor practical tweak

It seems to help to use a tuning parameter $$\lambda$$ that weighs the relative importance of satisfying the constraints $$f_k$$ versus minimizing the discrete Laplacian. Left as an exercise for the interested reader.

#### 1 comment:

1. Hello, This looks interesting. Did you try to do for other distribution ? Kev