Aug 8, 2007: On the complexity of the Mean Variance Optimization (MVO) problem & its practical solutions

  • Complexity of Mean Variance Optimization (MVO)
  • Using a fast greedy algorithm to quickly solve MVO
  • Using Simulated Annealing to better solve MVO
  • More algorithm options
  • On the complexity of MVO

    Consider the task of picking, say, 14 ETFs out of the top 40 ETFs. This task which sounds so innocent turns out to be a hard computational problem since as we know from our combinatorics classes, the number of different ways to pick K items out of N items can be very large:

    Where K=14 out of N=40 gives over 23 billion possibilities:
    C(40,14) = (40!/(14!*(40-14)!)) = 23,206,929,840

    Our portfolio composition problem is even harder because in order to be able to tell which of any two portfolios is better, we need to compute the overall portfolio mean correlation for each and every portfolio we compare. More significantly: the optimal solution may not even be in the "14 out of 40" space, it may be a result of picking 10 out of the top 30 candidates, or whatever.

    The solution lies in a compromise: we want to find a solution that is good enough and likely close enough to the optimal solution without having to go over all possible solutions.

    Using a fast greedy algorithm to quickly solve MVO

    One of the algorithms I implemented to find a near optimal solution in a large space really fast is a Greedy algorithm (wikipedia link) implemented this way:

    This algorithm is very fast for portfolio sizes of 6-20 out of 20-50 or so candidates. It takes less than 2 seconds to find a near optimal portfolio of 14 components out of 45 on my dual-core Opteron(tm) desktop using an interpreted language (perl). It is simple and straight forward, and it is easy to run it serially over NxM combinations of N and M (where N < M) and pick the best solution among all runs.

    Using Simulated Annealing to better solve MVO

    An even better algorithm, which takes longer, but gets closer to an optimal solution in most cases is Simulated Annealing (wikipedia link). Using the simulated annealing algorithm with some little clever adjustments, I'm able to converge on a pretty good solution (usually better than the greedy algorithm version) with a low mean correlation in about 5-10 seconds.

    The idea of simulated annealing is to look for a global maximum in a terrain, by starting with some arbitrary solution, then wandering around trying to find better neighboring solutions. To pick a "neighboring" solution we apply one small change to the portfolio: e.g. add, remove, or replace one ETF. To decide if it is better we evaluate the new solution in terms of fitness (goodness). In our case better means a lower mean-correlation between components.

    We always keep a record of the best solution found so far.

    If a new solution is better, we switch to it and continue the loop looking for the next neighbor.

    If the neighbor portfolio is not better, we randomly, with some (low) probability switch to the neighbor anyway.

    Finally, we gradually lower the probability of switching to a random neighbor the closer we get to the end.

    The occasional random jump is designed to avoid getting stuck in a local maximum, i.e. to keep searching for the global maximum.

    The ending condition can be either of

    There's more to it. For example:

    and a few more small details each of which has actually made a difference in the speed of converging on a good solution fast, in my implementation.

    Acknowledgement: I would like to thank Dror Sneh and Gideon Intrater for suggesting better algorithms and Simulated Annealing to me during the last 4th of July weekend, when I described the problem to them. (It sounds so obvious in hindsight.)

    Note: there are several machine-learning optimization techniques that should converge much faster (e.g. see: Gradient Descent, Quasi-Newton methods, SR1 method, BFGS ) but they require a differentiable function (being able to know the gradient at each solution) so they aren't applicable in this case where there is a very large number of local maxima, and the target function is not continuous/differentiable.

    More algorithm options

    To make my implementation more flexible I added two options to both algorithms:

    Here's an example of a Simulated Annealing run trying to pick the best Sharpe-ratio sub-portfolio out of 45 candidates, in 20011 iterations. A precondition (-r 0.85) requires than no two ETFs in the portfolio will have a higher correlation than 0.85.

    In this example run, the algorithm progressively finds better and better solutions (17 times). The best solution in a row is found in iteration 4003 of 20011.

    The result is a strong 7-ETF portfolio with nice weekly Sharpe-ratio of 0.22 and a low mean internal correlation of 0.72. In back-testing it shows a return of 8.06% in the past 18 weeks (a bit over 4 months), including the recent (July-Aug 2007) correction. The assumption is that it is likely to continue the strong performance with low risk, in the coming months:

    $ etf-cor -alps -R 5 -r 0.85 14 18w 0w 1w DIA PRFN JKF JKI TTH JKD DFE DKA
    Annealing:       1/20011     [best #1 found @ 1] 0.1655
    Annealing:       3/20011     [best #2 found @ 3] 0.1850
    Annealing:       4/20011     [best #3 found @ 4] 0.1856
    Annealing:       5/20011     [best #4 found @ 5] 0.2009
    Annealing:       7/20011     [best #5 found @ 7] 0.2535
    Annealing:       8/20011     [best #6 found @ 8] 0.2874
    Annealing:      10/20011     [best #7 found @ 10] 0.2967
    Annealing:      11/20011     [best #8 found @ 11] 0.3046
    Annealing:      12/20011     [best #9 found @ 12] 0.3287
    Annealing:      18/20011     [best #10 found @ 18] 0.3403
    Annealing:      20/20011     [best #11 found @ 20] 0.3467
    Annealing:      21/20011     [best #12 found @ 21] 0.3526
    Annealing:      28/20011     [best #13 found @ 28] 0.3723
    Annealing:      48/20011     [best #14 found @ 48] 0.3838
    Annealing:   20011/20011     [best #17 found @ 4003] 0.4217   rrrr
    Portfolio[7]:   TTH EWY VDE UTH VOX PRFN DIA
    %Change[18]:    1.45 0.25 1.65 1.56 0.56 1.69 1.04 0.86 1.23 -2.57 2.84
    %0.03 -1.07 3.01 2.21 -0.53 -4.94 -1.16
    Correlation matrix [18w..0w/1w]:
           TTH  EWY  VDE  UTH  VOX PRFN  DIA
    TTH      -  .41  .40  .67  .80  .59  .68
    EWY    .41    -  .83  .57  .67  .73  .68
    VDE    .40  .83    -  .70  .76  .85  .81
    UTH    .67  .57  .70    -  .78  .82  .92
    VOX    .80  .67  .76  .78    -  .85  .85
    PRFN   .59  .73  .85  .82  .85    -  .94
    DIA    .68  .68  .81  .92  .85  .94    -
                                            Mean correlation: 0.729536
    [18w-0w/1w 7]   Portfolio          SPY  Portfolio-vs-SPY (> 1.0 is better)
    -------------   ---------       ------  ----------------
    %Total_Return:       8.06         1.12              7.17
    %Annualized:        24.24         3.18              7.62
    %Return_Mean:        0.43         0.06              6.94
    %Return_StdDev:      1.95         1.87              0.96
    %Max_DrawDown:      -4.94        -5.47              1.11
    %Alpha(annual):     20.44         0.00                 -
    Beta:                1.02         1.00              0.99
    Alpha/Beta:         20.13         0.00                 -
    R(correlation):      0.97         1.00              1.03
    %R2:                94.49       100.00              1.06
    Sharpe_ratio:      0.2217       0.0332              6.68
    Sortino_ratio:     0.4248       0.0612              6.94
    Current (2007-08-03) MMVR rankings:
    1        2.8472 EWY     iShares MSCI South Korea Index
    2        2.3886 TTH     Telecom HOLDRs
    15       1.5022 VOX     Vanguard Telecom Services VIPERs
    43       1.0927 VDE     Vanguard Energy VIPERs
    50       1.0524 DIA     DIAMONDS Trust, Series 1
    102      0.4591 UTH     Utilities HOLDRs
    120      0.3420 PRFN    PowerShares FTSE RAFI Industrials

    -- ariel