March 22, 2008: Other lessons from the Netflix Challenge

  • Market update
  • Other lessons from the Netflix Challenge

  • Introduction

    Not often one gets a chance to watch the "best of the best" brains in the world competing for a $1M prize, while getting a rare peek into their way of thinking. The times when ideas hit a wall, and other times, when significant breakthroughs are being made.

    Last year I had this chance, and the more I looked, the more obvious it became to me that what I see had direct applicability to my personal quest for an investing model.

    Then I had a moment of enlightenment.

    Market Update

    One of the worst ways to invest is to follow prevailing sentiment.

    When you feel really bad and have an urge to sell everything and run for the hills. When everyone around you is losing their last hope, that's usually a strong contrarian indicator.

    A few observations on the recent events:

    Yes, it can be worse. Nothing is 100% certain in the markets, but I would strongly advise to cover all shorts in times like this where so many signals point to at least an intermediate meaningful bottom.

    Moreover, some big winners in the financial sector are starting to emerge. JP Morgan is the most obvious one. JPM got Bear Stearns for essentially free (235 million dollars, when their building alone is estimated at 1.5B worth) plus a guarantee from the government covering the most toxic balance sheet pieces of Bear Stearns. This is essentially a repeat of the panic of 1907 (wikipedia) when JP Morgan made off like a bandit while other weak players got totally wiped out.

    And lastly, check this classical bottom indicator: expectations are so bad, that terrible news start looking good when compared to end-of-the-world scenarios. Both Goldman Sachs and Lehman Brothers announced pretty bad results. Much worse than last year, big write offs from sub-prime related fallout, you name it.

    What did their stocks do? They jumped sharply up. Here's one reference: Lehman net income drops 57%, Stock surges 46%.

    Another observation: Lehman Brothers Holdings Inc., is the fourth-biggest U.S. securities firm. So we just had the failure of the fifth-biggest firm, and the restoration of confidence regarding the 4th biggest. Sounds almost like a closure to me.

    Other lessons from the Netflix Challenge

    Alice is a deep value investor, she only buys when price/book values are at 0.8 or less.

    Bob is a momentum investor, he watches MACD indicators like a hawk. He would never buy a stock if its 50 day moving average price is below the 100 day moving average price.

    Cleo is a contrarian investor, sentiment indicators are her guide. She follows the ARMS index, the The ISE Investor Sentiment Index, and the VIX.

    All three are very good in what they do, all three ignore the noise and focus on their pure data-driven approaches. yet all three have had times when their well researched, fact-based methods broke down.

    Even the best investors -- Warren Buffett comes to mind -- have times of dry spells. In the late 1990's, value stocks which outperform over the long term, lagged growth for several years. Valuations were sky high and Alice, like Buffett, found it very challenging to find stocks to invest in.

    Bob did very badly in 2004 when the market traded sideways, whipsawing momentum chasers as soon as they jumped on a stock that looked like it is about to take off.

    Cleo thought 2001 was the bottom as sentiment kept hitting record lows, only to see another leg down in 2002.

    Back in 2006, Netflix issued a challenge on the net. Improve our current movie recommendations system (Cinematch) by 10% and win a million dollars.

    It was a brilliant move by the company. Thousands of the best machine-learning experts all over the world, took the challenge and started working around the clock on a better model. Netflix got a lot of visibility and goodwill in the machine-learning community, not to mention being able to draft the best minds to work for them for free. Meeting the challenge would "cost" Netflix a million dollars, but in return, it could significantly improve their already legendary 90% customer satisfaction rate, and boost average revenue per user by encouraging customers to order more movies.

    At first all was quiet. You couldn't tell how many minds around the world were working on the problem. It took several weeks and many submissions, until one of the ~3000 competing teams, was able to improve on the baseline Cinematch RMSE (Root Mean Square Error) of 0.9514 for the first time since the competition began. The leading approach used a SVD (Singular Value Decomposition) algorithm. You may think of SVD as a way to factor a matrix into three fundamental components akin to factoring any natural number to its prime factors. For the interested, here's a site explaining SVD with some graphical animations.

    To be clear, SVD wasn't enough to win the prize; for that a 10% improvement or score of 0.85626 (0.9514 * 0.90) was required, nevertheless, using SVD marked a significant breakthrough in the competition.

    Little by little, RMSE improvements added up, and different teams took the lead away from each other. On Oct 1st, 2007, one year after the contest began, a team going by the name BellKor, from AT&T Bell Labs in New Jersey was in the lead. Their best RMSE: 0.8712, was 8.43% better than Netflix's existing system, Cinematch, just shy of the 10% required. Not enough to win the grand prize.

    According to the Netflix prize rules, if the target hasn't been achieved after a year of competition, a "progress prize" of $50,000 would be awarded to the leaders, and the $1M challenge would stay in place. All competing teams were encouraged to continue in their quest until the target 10% improvement has been achieved.

    It turns out that improvements to these kinds of models are getting more and more difficult. Progress becomes almost asymptotic to the target as time goes by. Achieving the next 1.6% improvement may prove as difficult as the prior 8.43% (maybe even more...) At the time of this writing (Mar 22, 2008), the best result still belongs to BellKor, with RMSE=0.8655, a 9.03% improvement over the Cinematch baseline. The following chart courtesy of Yehuda Koren, of the BellKor team, shows the changing leadership among teams over time.

    What has all this to do with our quest for a great all weather investing model? They are both challenging, real world problems where one can keep gaining insights and improving the model for better predictions. They are also similar in that the winning strategy must combine multiple, independent approaches to the problem. To be a really successful investor, one must combine the expertise of Alice (Fundamentals), Bob (Technicals), and Cleo (Sentiment) together. This, in fact, is standard machine learning practice: build multiple models and consult all, as if they were a team of experts, before making a final decision. Several machine learning strategies, such as bagging (Breiman 1994), Boosting (Schapire, Freund, 1990), rely on this idea in order to come up with better models. This is often referred to as the "committee" or "ensemble" approach.

    The BellKor team published several papers since their win. One of them: Lessons from the Netflix Challenge details the independent 3 approaches they used to win the progress prize. I recently had the pleasure of meeting a member of the winning team, Yehuda Koren, when he visited the west coast. The most important lesson according to Yehuda, is to use an ensemble of complementing models. In his words: Two half tuned models are worth more than a single fully tuned model.

    Go broad:
    Applying this to investing means that it is better to have, say, a combo of a fundamental and a technical model than having only one highly tuned fundamental model. Even better would be to add a 3rd sentiment model to the first two. You could avoid committing money to the markets as long as the 3 don't agree, but jump in with both feet, and possibly even leverage, when all of them agree.

    Go deep:
    Furthermore: even within one approach, the "ensemble" approach can be applied. For example, in the technical approach, a good model should look at several different MACD time-frames. This makes total sense, since different investors look at different time-frames so the more they all agree, the more likely it is for money to flow in or out of the market.

    In an upcoming article, I hope to discuss one of the strongest elements in my developing model which I expect to be the first and foremost committee member in my all-season ultimate model.

    As always, any feedback, question, request, criticism, is welcome.

    -- ariel