Casual Rocket Scientists:
An Interview with a Layman
Leading the Netflix Prize, Martin Chabbert
By Eric Siegel, Ph.D.
Conference Chair, Predictive Analytics World
A buddy and I are thinking of building a spaceship next year. Thing is, with absolutely no training or background, I am frankly "no rocket scientist". But, who cares — I want to go to Mars.
This is essentially what a couple of non-analytical friends in Montreal did in launching a mission to win the $1 million Netflix Prize, arguably the most high profile analytical competition to date.
And these "casual part-timers" have succeeded by developing one of three solutions that together won the Prize by improving upon Netflix's movie recommendation capabilities by 10%, as announced September 21, 2009. And it appears their success actually hinged more on adept engineering than innovative science.
Want to know how they managed to break the rules and succeed without the proper university degree? Well, Martin Chabbert of the Netflix champion uber-team BellKor's Pragmatic Chaos provided us with a tell-all interview, below in this article. But first, here's some background to give context to his words of wisdom.
The Netflix Prize competition has attracted a well-earned white hot spotlight within the data mining community and beyond. It's a well-reputed, objective competition like others, such as the annual KDD-cup, but augments prestige with no small bit of bling: A $1 million cash prize. The competition bears the promise of near-term commercial deployment by way of its sponsorship by the slick, hip Netflix, and focuses on the sky-rocketing realm of product recommendation systems, as it pertains to everyone's favorite medium: movies. It's got everyone screaming that crowdsourcing, which very publicly privatizes R&D, is truly taking off. This competition is so cutting-edge, it makes the space race look like good ol' "Flash Gordon".
But with the darkest of horses, a pair of casual part-timers, racing ahead in the Netflix contest, there emerges an uncanny parallel to SpaceShipOne, the first privately funded human spaceflight, which won the $10 million Ansari X PRIZE. According to some, that underdog, short on resources with a spend of $25 million, put the established, gargantuan NASA to shame. (Note that Netflix winners will show a better ROI!)
Who Are These Guys?
Martin Chabbert and Martin Piotte are friends who work at a "a medium size international company (~300 employees) in the telecommunications field, more specifically Voice over IP. Our jobs consist in pure software design and development."
The apparently-dynamic duo joined to form the team PragmaticTheory (see also their often-entertaining blog). The thing is, neither has a background in statistics or analytics, let alone recommendation systems in particular.
But their "Pragmatic" approach proved groundbreaking. The 2-person team, one component of uber-team BellKor's Pragmatic Chaos, was once by itself the number one contender, and during final months of the competition was often in the top echelons. Indeed, these engineers may have circumvented established "dogmatic" theories by effortlessly thinking outside a box they knew nothing of in the first place and boldly going where no one has gone before (some of their core techniques are outlined in a New York Times article). Freddy Mercury lauded such triumph by an "everyman" when he sang that Flash Gordon was "Nothing but a man // And he can never fail // No one but the pure at heart // May find the Golden Grail".
In wonder of their success, I queried expert and uber-teammate Chris Volinsky, the Director of the Statistics Research Department at AT&T Research, and winner of the 2007 and 2008 Progress Prizes as a member of team BellKor. I asked, "What was your impression when you learned these astute competitors were not experts in the field, and what do you make of that fact now that you've gotten to know them?" Volinsky replied:
"From the beginning I thought it was awesome how many people in the top of the leaderboard were what could be called 'amateurs', in fact our group had no experience with collaborative filtering when we started either! I think the openness of the competition, and the fact that it was a real dataset, on real customers about a topic (movies) that everyone could relate to was a big factor in getting so many people involved.
"Our teammates are very talented engineers and computer scientists and have a real intuition for data and analysis. It just goes to show that sometimes it takes a fresh perspective from outside the field to make progress."
How Did They Do It?
Joining forces. Only by an international collaboration, and the combining of methodologies, did the leading teams hit the mark. BellKor's Pragmatic Chaos is composed of three teams that have also competed independently, located in the U.S., Canada, Austria and Israel (the Austrian component includes February 2009's Predictive Analytics World speaker, Andreas Töscher).
Combined methodology made simple. Each team has developed an intricate approach. Once they agree to collaborate, how intricately must they integrate their systems? Actually, not so intricately at all. Rather than dig in, think hard, and assess where one system's weaknesses may be compensated for by another team's strengths, they let predictive modeling do it. At least, that's how Mr. Töscher indicates they did it when two of these sub-teams combined to form "BellKor in BigChaos" and become the leader several months ago.
In this approach, each system may conveniently be treated as a "black box," training a new "meta-system" to combine the respective outputs into one better output. This is called meta-learning, ensemble methods, or blending, and elicits the concept of collective intelligence. For more information on blending/ensembles, see this Predictive Analytics World workshop and session.
Don't Quit Your Day Job
It was a close call. BellKor's Pragmatic Chaos held only a narrow edge over another team that also qualified for the Grand Prize, The Ensemble.
An "Ensemble" indeed, this arch-nemesis gave the winners a serious run for their money. A virtual "Borg" collective, its open policy assimilated any and all teams that wished to participate. On the other hand, BellKor's Pragmatic Chaos combined the efforts of only three teams. However these three included the very top competitors — and, per Chabbert's remarks below, the teams collaborated closely, jointly engineering their final solution in a more "hands-on" fashion than analytical blending alone. This may have been an important factor in sustaining a winning margin.
In the final days, hours and minutes of the competition, multiple such "meta-teams" cropped up and made for a dramatic finish, more exciting than a horse race, as well-described by KDnuggets.com, as well as the competitors themselves, PragmaticTheory and The Ensemble
Intrigued by the outstanding success of these hobbyists? So was I!
[Eric Siegel] What about this analytical challenge first attracted you to participate and compete?
[Martin Chabbert] Martin [Piotte] and I had been looking for a hobby project to do together for a little while. This one seemed fun and accessible. By quickly reading through some of the threads on the forum, we also figured that a more pragmatic and less dogmatic approach might yield some good results.
[ES] With no relevant background in statistics — let alone product recommendations specifically — what capabilities or background did make your success possible? Do you consider yourselves mathematicians, or at least strong with math?
[MC] I am certainly not a mathematician - I have engineering level skill. I consider Martin Piotte to have an exceptional mathematical mind (he participated successfully in international math contests when he was a student) even though he never formally studied in that field. In the end, the mathematics used in this contest seem very complex, but are really rather simple. Compared to what most people think, this was more of an engineering contest than a mathematical contest [See Martin's response below for elaboration on this central point. -Ed]. Also, I think that having a perhaps less in-depth but wider array of skills and knowledge helped us.
[ES] You've said, when first getting started, you learned many core strategies/techniques from the Netflix Prize discussion board. Did you do much reading or research elsewhere to ramp up?
[MC] Having started late in the competition, the forum was a good starting point as many avenues had already been explored and links had been posted to many interesting papers. In the end though, reading and getting a good understanding of the actual research papers was a very important step. The forum was also a place where people proposed new (sometimes far fetched) ideas; these ideas often inspired us to come up with our own creative innovations.
[ES] Although the competition is not yet technically over, what can you tell us about your core technical approach and the key to its success?
[MC] We cannot give much detail on our technical approach at this time, because of the Netflix verification process. If all goes well, a paper will be published which will give detailed information on our techniques. At a higher level, I think that we were successful because of three things. First, our pragmatic approach. We threw everything we could at this problem; no holds barred; no stone left unturned. We had an incredible amount of failures, but also a good level of success. Second, our ability to find patterns in the data or psychological aspects and then translate those into a working model. Many people came up with (often good) ideas as to what should be captured, but translating those words into a mathematical formula is the complicated part. Finally, I think that our background in engineering and software was key. In this contest, there was a fine line between a bad idea and a bug in the code. Often you would think that the model was simply bad because it didn't yield the expected results, but in fact the problem was a bug in the code. Having the ability to write code with few bugs and the skill to actually find the bugs before giving up on the model is something that definitely helped a lot.
[ES] A year ago you blogged that external movie data (e.g., from IMDB) doesn't help. Do you still stand by that?
[MC] Yes, I still stand by that. This sort of data helps people put words on their tastes and try to make sense of what they like and don't like. But the contest was about predicting ratings, not explaining them. The algorithms which find the actual patterns in the data, in infinite shades of gray, are much more powerful than any sort of meta-data which assigns movies to black and white boxes.
[ES] What existing tools and programming languages did you use?
[MC] In general, we used Java and C++. We initially used Java because it was convenient and we are used to the development environment. After a little while, performance became an issue and we switched to C++ for the more complex models. As professional software engineers, these pure programming languages are much more natural than integrated scripting languages like, say, Matlab.
[ES] How did you collaborate as a team of two — what was your modus operandi, and in what manner did you divide and conquer?
[MC] Throughout the contest, we always had a long "todo" list of ideas that we wanted to try out. We simply prioritized that list regularly, but mostly each picked the ideas from the list that we were excited about at that time. Since we had so many different things going on, it never became an issue of us stepping on each other's toes. Blending then becomes the glue that puts all these ideas together.
[ES] So then did the pair of you have two separate implementations, joined only by the verbal exchange of ideas and a blending method? That is, as if you could have been two separately competing teams who joined up?
[MC] No, we definitely cannot be seen as two separate teams joined only by blending. First, we had a common code base which contained various tools, infrastructure and framework code. Most of all though, we were constantly discussing new ideas, implementation details, issues, results, etc. I believe that this sort of exchange is what makes a true team better than individuals alone.
[ES] You've said you worked 2 or 3 hours a night on this project. Could you estimate the total hours you invested? Did you use much of your vacation time?
[MC] I ended up working roughly 10 hours a week on average. Martin P. worked a bit more, roughly 20 hours on average. Of course, the final months were a bit more hectic so these numbers are not steady state. The only time that we used vacations to work on Netflix was a few weeks ago to produce our initial documentation; after the end of the contest.
[ES] So between your "day job", four children and the Prize... are you a coffee drinker?
[MC] :) Yes, I do drink some coffee, but I don't think I can be considered a junkie. You know what they say: eat right, get plenty of exercise (yes I do that too)... that helps keep the energy level up.
[ES] Was it a difficult decision to team up and become a smaller part of a larger team? How did you time the moment of pulling the trigger on the commitment to join forces?
[MC] The decision to team up with others wasn't really difficult. We had anticipated that once someone had crossed the 10% threshold, a big coalition would form with most of the other contestants. It was a question of having enough weight on our side to keep the lead. We actually thought that having the number one and two teams together would be enough... and we were right... but just barely. What people don't necessarily know was that our coalition had been working together for a few months before we actually submitted over 10%. It wasn't simply a case of blending everything and we were there. We actually worked together, sharing our knowledge and techniques. We improved our score a lot between the team-up and going public. The actual timing for the submission was a mixture of having hit a certain threshold where we felt somewhat comfortable, having implemented some of the more interesting ideas we still had and also timing around everyone's vacations.
[ES] There is a long tradition of at-home innovation taking place in the garage — after all, one fairly high-ranking Netflix Prize team is entitled, "Just a guy in a garage" (also, check out the movie Primer!) . You did most of the work in your dining room — how about your teammate? Is there a room you'd best recommend for future Prize competitors? :)
[MC] Laptops are a great invention. They give you the freedom to move to the room that inspires you the most on a specific day.
[ES] You've said some of your favorite movies are Fight Club, Seven, American Beauty, Memento and Jackass. Do you feel your movie preferences are predictable? To the extent they are not, do you feel you as a consumer are an anomaly?
[MC] I don't think that certain preferences are harder to predict than others. People tend to fantasize that they are exceptional (or anomalies as you say it), but the fact is that, in a large enough population, you can always find others that display similar taste patterns. What makes prediction difficult is more in the way people translate their likes or dislikes into 5 star ratings. This mental step is not an easy one and being coherent in that task is quite difficult. This thought process can yield inconsistent ratings which adds to the overall complexity. Also, sparseness is an issue so anyone wanting to become more "predictable" can simply rate more items.
[ES] Do you plan to compete for the next, forthcoming Netflix Prize?
[MC] It will depend what the goal and structure of the contest [is]. I'm sure it'll be hard to resist participating, but then again, if it's too similar, I'm not sure that we will have the motivation needed to invest all of those hours again.
Well, something tells me these two are likely to dive in again. I mean, hello — sequel! ("Netflix Prize II: Electric Boogaloo"?)
About the author: Eric Siegel, Ph.D.
The president of Prediction Impact, Inc., and the program chair for Predictive Analytics World, Eric Siegel is an expert in predictive analytics and data mining and a former computer science professor at Columbia University, where he won the engineering school's award for teaching, including graduate-level courses in machine learning and intelligent systems - the academic terms for predictive analytics. After Columbia, Dr. Siegel co-founded two software companies for customer profiling and data mining, and then started Prediction Impact in 2003, providing predictive analytics services and training to mid-tier through Fortune 100 companies.
Dr. Siegel is the instructor of the acclaimed training program, Predictive Analytics for Business, Marketing and Web, and the online version, Predictive Analytics Applied. He has published over 20 papers and articles in data mining research and computer science education, has served on 10 conference program committees, and has cochaired an Association for the Advancement of Artificial Intelligence Symposium held at MIT.