In October of 2006, Netflix released a $1,000,000 contest to improve their rating prediction/movie recommendation algorithms. No one has won the prize yet (surprisingly).
I read the latest wired mag (i know, i know) which featured a contestant. I’m easily inspired to work on difficult challenges. I figure this will be good learning AND good research into software collaboration (one of my favorite topics). Yes, I’ve entered the fray as Team SocialMode. Heck, I’ve been looking for a meaty project to put a bunch of my thoughts together. This prize is perfect for that and it takes nothing more than access to cheap CPU cycles and a brain (both of those I have, which I have more of I don’t know.)
My approach will not be highly abstract. Having built several Pay Per Click engines/behavioral targeting systems, simple content recommendation engines, search algos and a few collaborative filtering systems, my experience leads me to believe an improved algorithm will come from practical analysis of rating behavior, user interface behavior, exposure to movies to be rated (cognitive dissonance type concepts), clustering of practical movie meta data (e.g. I like anything with George Clooney, 10 explosions or more, or dinosaurs) and normalizing simple “flags” (all people dislike the Star Wars with Jar Jar, just need to adjust for an individuals rating scale).
Some assumptions of mine:
- People rate things they haven’t seen
- People rate in batches
- People don’t rate as they watch
- Viewing experience affects rating
- Technical quality affects rating
- There are Gender and Age differences
- Every individual has a different 1-5 scale
- It is cognitivily easier to rate something as 1 or 5 (love or hate) than 2-4
- We deal with bits better than inbetween values
- User interface widgets often times make it harder to rate inbetween value
I’ll have more to say on this.
My solution will be using Python and the Orange library and will utilize data from IMDB, BoxOfficeMojo and RottenTomatoes.
Some interesting links:
Let’s roll! All progress will be posted.