Tuesday, September 22, 2009

Netflix Awards $1 Million Prize and Starts a New Contest

Netflix, the movie rental company, has decided its million-dollar-prize competition was such a good investment that it is planning another one.

The company’s challenge, begun in October 2006, was both geeky and formidable: come up with a recommendation software that could do a better job accurately predicting the movies customers would like than Netflix’s in-house software, Cinematch. To qualify for the prize, entries had to be at least 10 percent better than Cinematch.

The winner, formally announced Monday morning, is a seven-person team of statisticians, machine-learning experts and computer engineers from the United States, Austria, Canada and Israel. The multinational team calls itself BellKor’s Pragmatic Chaos. The group — a merger of teams — was the longtime frontrunner in the contest, and in late June it finally surpassed the 10 percent barrier. Under the rules of the contest, that set off a 30-day period in which other teams could try to beat them.

That, in turn, prompted a wave of mergers among competing teams, who joined forces at the last minute to try to top the leader. In late July, Netflix declared the contest over and said two teams had passed the 10-percent threshold, BellKor and the Ensemble, a global alliance with some 30 members. Netflix publicly said the finish was too close to call. But Netflix officials at the time privately informed BellKor it had won. Though further review of the algorithms by expert judges was needed, it certainly seemed BellKor was the winner, as it turned out to be.

But the race was even closer than had been thought, as Netflix’s chief executive, Reed Hastings, explained for the first time at a press conference in New York on Monday. The BellKor team presented its final submission 20 minutes before the deadline, Mr. Hastings said. Then, just before time ran out, The Ensemble made its last entry. The two were a dead tie, mathematically. But under contest rules, when there is a tie, the first team past the post wins.

“That 20 minutes was worth $1 million,” Mr. Hastings said.

The Netflix contest has been widely followed because its lessons could extend well beyond improving movie picks. The researchers from around the world were grappling with a huge data set — 100 million movie ratings — and the challenges of large-scale predictive modeling, which can be applied across the fields of science, commerce and politics.

The way teams came together, especially late in the contest, and the improved results that were achieved suggest that this kind of Internet-enabled approach, known as crowdsourcing, can be applied to complex scientific and business challenges.

That certainly seemed to be a principal lesson for the winners. The blending of different statistical and machine-learning techniques “only works well if you combine models that approach the problem differently,” said Chris Volinsky, a scientist at AT&T Research and a leader of the Bellkor team. “That’s why collaboration has been so effective, because different people approach problems differently.”

Yet the sort of sophisticated teamwork deployed in the Netflix contest, it seems, is a tricky business. Over three years, thousands of teams from 186 countries made submissions. Yet only two could breach the 10-percent hurdle. “Having these big collaborations may be great for innovation, but it’s very, very difficult,” said Greg McAlpin, a software consultant and a leader of the Ensemble. “Out of thousands, you have only two that succeeded. The big lesson for me was that most of those collaborations don’t work.”

The data set for the first contest was 100 million movie ratings, with the personally identifying information stripped off. Contestants worked with the data to try to predict what movies particular customers would prefer, and then their predictions were compared with how the customers actually did rate those movies later, on a scale of one to five stars.

The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.

The payoff for Netflix? “Accurately predicting the movies Netflix members will love is a key component of our service,” said Neil Hunt, chief product officer.

No comments:

Post a Comment