Now that the 2 initial batches of challenges have completed (10
challenges in all) we can start to take stock of how the challenge
arbitration process is working. I don't think there's any disagreement
that 'better' images are bubbling to the top, however there's been keen
interest (in the forums and via feedback emails) regarding our method
for determining 2nd from 3rd, 18th from 19th etc.
Is this thing better than that thing?
Early
in the planning phase of the challenges project we scoured our own
forums and the wider web to gain a feel for different flavours of
online content competitions. When it comes to democratic judging models there are 'star rating' systems (i.e. amazon), +/- systems (i.e. digg), +1
systems (i.e. electoral), point packets (e.g. pentax forum with 5,3,1), point-spreads (n points between <=n images, e.g. sony talk forum) and more. These models fall into two broad groups: rating models (inherent merit independent of competitors) and voting models (merit relative to the merit of ALL competitors). We knew we had to support many judging models eventually (and our system is extensible such that we can) however we had to pick a model to start with. We went with a rating model (stars) and here's why.
Challenges have to be fun
Meaningless constraints, menial tasks and feelings of obligation are all decidedly un-fun, so we had to steer clear of them as much as possible. Consider also that aesthetic sensibility varies from user to user, as does enthusiasm, attention span, available time and a host of other factors. With these in mind lets consider voting models versus rating models.
Pure voting systems: everyone votes, most opinions ignored
Consider a hypothetical voting system in which a user is required to vote for their favourite 3 images (and rank them). Our hypothetical user starts with the first image they see, comparing it to more images (one at a time), keeping their favourite until they've seen them all at which point the selected image is 'the best'. The process of picking places 2 and 3 then becomes an exercise of searching through the gallery to find 'those other good ones'. To have any hope of a fair decision, several passes are required by each voter (i.e. a menial task) which is both boring and impractical for people taking a coffee break. Furthermore, this model also only yields rankings for the cream of the crop leaving the aspiring photographers (i.e. most entrants) with no feedback on their work (again, not fun). In algorithm parlance this voting method turns voters into bubble sorting machines (when determining 1st place, some optimizations for subsequent places). Sound fun?
Even less enjoyable is the realisation that in all voting systems (especially electoral systems) votes for quirky images will have essentially no impact on the outcome. This saps people's will to express their opinions if their own tastes are a little left-field. Not cool!
Our rating system: no-one's opinion wasted
Instead consider a hypothetical rating system in which users rate an image not compared to ALL other images in the challenge but according to its own merits (factoring in a voter's taste and their interpretation of the rules). If enough users rate an arbitrary number of images each, considering each image in isolation, an amazing thing happens: consensus. True, we may never know whether randomUser99 liked the 29th placed entry in the geometry challenge if they didn't rate it, but we can say that generally people generally thought it was one of the better images in the challenge (in the context of the challenge theme). The image owner (and anyone else) also has access to 40 people's opinions of their image, 7 of whom thought it worthy of 5 stars versus only 1 person deeming it 0.5 stars. Every entrant also has an overall crowd-sourced rating (3.208 in the case above) to improve upon in subsequent challenges, making the results interesting for everyone, not just a handful of winners.
Voting results are interesting for winners, rating results are interesting for all participants
Is it really that simple?
So rating systems are great. Well, it turns out they have a few vulnerabilities, a few pitfalls for the unwary developer. Firstly, you need sufficient numbers of voter input on each entry, distributing the number (not necessarily the magnitude) of ratings as evenly as possible. Secondly, you need a robust technique to derive an overall rating from many user ratings in such a way that overall ratings are comparable (to facilitate ranking). Each of these topics is a blog post in itself (this post's already a bit lengthy) but suffice to say that we feel we've come up with a neat solution to both.
If it's rating, why call it voting?
Even though we (I) feel that rating systems are preferable to voting, given the current system configuration (a few large concurrent challenges run by dpreview staff), we plan to allow user-created challenges within the next few months and our citizen challenge hosts are likely to have their own ideas. That's why we've planned for a variety of judging models (including various voting models) for them to choose from. Even though we plan to support both rating and voting judging models at some point, we had to pick a label for UI gee-gaws (buttons, titles etc.) and stick with it. Why 'vote'? Well, given the two terms 'rate' and 'vote', our 30-second co-worker survey revealed that 'vote' is a more compelling call to action. So there you have it.
Stay tuned
By this point in the post, the anxiety of needing to know the exact algorithm for calculating overall image ratings may be causing some readers to gnaw their own fingers of. Well, luckily humans have plenty of fingers because this isn't that post. Fear not, the draft of 'the algorithm post' is gestating nicely and should emerge soon enough. Until then, keep voting (rating) and curb that gnawing.