Now that the 2 initial batches of challenges have completed (10 challenges in all) we can start to take stock of how the challenge arbitration process is working. I don't think there's any disagreement that 'better' images are bubbling to the top, however there's been keen interest (in the forums and via feedback emails) regarding our method for determining 2nd from 3rd, 18th from 19th etc.
Is this thing better than that thing?
Early
in the planning phase of the challenges project we scoured our own
forums and the wider web to gain a feel for different flavours of
online content competitions. When it comes to democratic judging models there are 'star rating' systems (i.e. amazon), +/- systems (i.e. digg), +1
systems (i.e. electoral), point packets (e.g. pentax forum with 5,3,1), point-spreads (n points between <=n images, e.g. sony talk forum) and more. These models fall into two broad groups: rating models (inherent merit independent of competitors) and voting models (merit relative to the merit of ALL competitors). We knew we had to support many judging models eventually (and our system is extensible such that we can) however we had to pick a model to start with. We went with a rating model (stars) and here's why.
Challenges have to be fun
Meaningless constraints, menial tasks and feelings of obligation are all decidedly un-fun, so we had to steer clear of them as much as possible. Consider also that aesthetic sensibility varies from user to user, as does enthusiasm, attention span, available time and a host of other factors. With these in mind lets consider voting models versus rating models.
|
|
In a voting model: unplaced Our rating model: 29th (top 10%) overall rating: 3.208 |
Pure voting systems: everyone votes, most opinions ignored
Consider a hypothetical voting system in which a user is required to vote for their favourite 3 images (and rank them). Our hypothetical user starts with the first image they see, comparing it to more images (one at a time), keeping their favourite until they've seen them all at which point the selected image is 'the best'. The process of picking places 2 and 3 then becomes an exercise of searching through the gallery to find 'those other good ones'. To have any hope of a fair decision, several passes are required by each voter (i.e. a menial task) which is both boring and impractical for people taking a coffee break. Furthermore, this model also only yields rankings for the cream of the crop leaving the aspiring photographers (i.e. most entrants) with no feedback on their work (again, not fun). In algorithm parlance this voting method turns voters into bubble sorting machines (when determining 1st place, some optimizations for subsequent places). Sound fun?
Even less enjoyable is the realisation that in all voting systems (especially electoral systems) votes for quirky images will have essentially no impact on the outcome. This saps people's will to express their opinions if their own tastes are a little left-field. Not cool!
Our rating system: no-one's opinion wasted
Instead consider a hypothetical rating system in which users rate an image not compared to ALL other images in the challenge but according to its own merits (factoring in a voter's taste and their interpretation of the rules). If enough users rate an arbitrary number of images each, considering each image in isolation, an amazing thing happens: consensus. True, we may never know whether randomUser99 liked the 29th placed entry in the geometry challenge if they didn't rate it, but we can say that generally people generally thought it was one of the better images in the challenge (in the context of the challenge theme). The image owner (and anyone else) also has access to 40 people's opinions of their image, 7 of whom thought it worthy of 5 stars versus only 1 person deeming it 0.5 stars. Every entrant also has an overall crowd-sourced rating (3.208 in the case above) to improve upon in subsequent challenges, making the results interesting for everyone, not just a handful of winners.
Voting results are interesting for winners, rating results are interesting for all participants
Is it really that simple?
So rating systems are great. Well, it turns out they have a few vulnerabilities, a few pitfalls for the unwary developer. Firstly, you need sufficient numbers of voter input on each entry, distributing the number (not necessarily the magnitude) of ratings as evenly as possible. Secondly, you need a robust technique to derive an overall rating from many user ratings in such a way that overall ratings are comparable (to facilitate ranking). Each of these topics is a blog post in itself (this post's already a bit lengthy) but suffice to say that we feel we've come up with a neat solution to both.
If it's rating, why call it voting?
Even though we (I) feel that rating systems are preferable to voting, given the current system configuration (a few large concurrent challenges run by dpreview staff), we plan to allow user-created challenges within the next few months and our citizen challenge hosts are likely to have their own ideas. That's why we've planned for a variety of judging models (including various voting models) for them to choose from. Even though we plan to support both rating and voting judging models at some point, we had to pick a label for UI gee-gaws (buttons, titles etc.) and stick with it. Why 'vote'? Well, given the two terms 'rate' and 'vote', our 30-second co-worker survey revealed that 'vote' is a more compelling call to action. So there you have it.
Stay tuned
By this point in the post, the anxiety of needing to know the exact algorithm for calculating overall image ratings may be causing some readers to gnaw their own fingers of. Well, luckily humans have plenty of fingers because this isn't that post. Fear not, the draft of 'the algorithm post' is gestating nicely and should emerge soon enough. Until then, keep voting (rating) and curb that gnawing.


One problem I see with a "no-one's opinion wasted" system: as long as the list of entries is displayed through a fixed process (e.g., first challenge submissions are listed first), those entries at the beginning of the list are more likely to receive user attention than the entries elsewhere.
Perhaps some kind of randomizing system that does not always show entries in chronological order of submission?
Posted by: Charles Hueter | Jan 21, 2009 10:11:16 PM
@Charles hueter
Indeed it would be a fatal flaw in the system if early-bird submissions always had the prime voting real estate. Your suggestion of randomising entries in the voting gallery is such a good one we implemented it before the challenge beta even began ;) I plan to cover the mechanics of vote distribution (and why randomisation during voting is crucial) more fully in a future post
Posted by: Jaysen Marais | Jan 21, 2009 10:32:47 PM
The challenges have too many entries, once you get to 50 or so photos it's impossible for anyone to actually look through them all. Not to mention that looking at them in thumbnail form will give one set of results, looking at them full-screen quite another, as the technical flaws are much easier to see.
Posted by: touristguy87 | Jan 22, 2009 1:24:52 AM
@touristguy87
There is no obligation to look through all images. You can view images larger by clicking the image, from here you can vote and move next/previous. We are mulling over ways to make this a more fluid experience
Posted by: Jaysen Marais | Jan 22, 2009 12:02:59 PM
1. With hundreds of entries, I bet there is a tendency to skip subtlety in images so that the thumbnail has enough punch to get someone interested. You could argue this is a good thing, that an image should look good from a distance as well as close in, and I am pondering my own image philosophy in this light. But it also means that some images can't compete as well in this "market."
2. When I do decide to vote on an image, it is already one I like, so I'm likely to give it 2 to 4 stars, saving 5 stars for something truly outstanding (and rare), and never giving 1 star (why bother?). So I think the stats are probably skewed towards the high end.
Posted by: eopix | Jan 22, 2009 6:14:48 PM
Meh.
If you want s neat voting system:
-users rate -2,1,1 2 points
- Average the 50% (or any funny %) most voted
and you got it
Suggestions: show # of views and votes in every image.
A final conclusion: You could as well give random ratings since Dpreview audience is not very sophisticated, to say the least (there's no other explanation for the fact that my submission ranked 120th).
Oh, and what's with the blog system? Doesn't it recognize one's dpreview user?
Posted by: igb | Jan 30, 2009 8:07:05 PM
I think you are putting a lot of though into this and it definitely shows well. I agree with users mentioning the difficulty to go through so many images. This is idealistic in my opinion there are so many image, I rated a lot but gave up; I might have missed the best ones.
This being said, the "screening" model from 1X.com could be an inspiration. The way the images are presented for screening is awesome. You may want to have a look at www.1x.com
Thanks
Posted by: Jean-Yves | Jan 31, 2009 1:54:17 PM
randomisation is crucial, but you could improve it by randomizing the not-yet-voted-on and already-voted-on images separately. Then display the pictures a user has not voted on yet before the others. Otherwise when you want to pick up voting for a gallery you get lots of images that you already voted on. I don't know how the others vote, but I sometimes vote on every single photo in a gallery.
Posted by: don | Feb 3, 2009 9:25:36 AM
When will complain be fixed? It would be nice if this were working so that one could direct to the host one's view of a picture not meeting the rules. Since when is a cloud a "pictures of subjects in mid-air,freezing their movement."? Or another one of someone standing on a bowsprit.
Waiting till the voting stage to remove by low or no vote is not as effective, especially when the challenge might have reached its max before voting begins..... see another point of view which had many that did not meet "This challenge is about pictures of buildings and places that people know, captured in an way they're not used to"!
Posted by: ready123 | Feb 8, 2009 11:46:41 AM
http://blog.dpreview.com/dev/2009/01/wheat-from-chaff-how-our-voting-system-works.html
I just have one question. After reading all this it becomes painfully obvious that this a completely subjective process which you are trying to make *objective* by processing, by applying logic in some way to, a bunch of subjective opinions.
Has it ever occurred to you that you really cannot do that? In the end you still have to correlate the results with what you consider to be "great photography"...and the most that you can get out of this process is that a lot of people agree with you, about these images.
...what if that doesn't happen, does that invalidate their opinions? Does that invalidate your processing? Ultimately all you are doing is searching for reinforcement for *your* opinions with regards to these photos.
It seems to me that the only real way to know which ones people really like are to offer them for sale and see which ones sell the best. "Votes" are cheap.
Posted by: touristguy87 | Feb 8, 2009 4:33:27 PM
"Is it really that simple?
So rating systems are great. Well, it turns out they have a few vulnerabilities, a few pitfalls for the unwary developer."
...one big one being that they inevitably put the developers' own "spin" on both the system and the results that it generates. As you said at the beginning there are already several sites out there that do this. Yours will be better, "more objective", than all of them? Or just another flower in the field, just another piece of art in the museum? How do you determine the "value" of your efforts vs those of others? That's completely subjective.
Posted by: touristguy87 | Feb 8, 2009 4:37:17 PM
The challenges have too many entries, once you get to 50 or so photos it's impossible for anyone to actually look through them all. Not to mention that looking at them in thumbnail form will give one set of results, looking at them full-screen quite another, as the technical flaws are much easier to see.
here i also have a good share about electronics, if you like you can come here: http://www.tradestea.com
Posted by: abby | Feb 16, 2009 1:41:50 AM
The challenges have too many entries, once you get to 50 or so photos it's impossible for anyone to actually look through them all. Not to mention that looking at them in thumbnail form will give one set of results, looking at them full-screen quite another, as the technical flaws are much easier to see.
Posted by: abby | Feb 16, 2009 1:42:38 AM
The challenges have too many entries, once you get to 50 or so photos it's impossible for anyone to actually look through them all. Not to mention that looking at them in thumbnail form will give one set of results, looking at them full-screen quite another, as the technical flaws are much easier to see.
here i also have a good share about electroncs if you like you can come here:
http://www.tradestead.com
Posted by: abby | Feb 16, 2009 1:44:08 AM
I have posted about "spoilers" in the Challenges forum, but didn't get any feedback from you at DPR. I'd define a spoiler, in real life, as someone that scores an image mcuh below what he/she thinks it's worth, to reduce its final score. Statistically, no intent considered, as an outlier (see more below).
Depending on how many spoilers an image gets, it could move it a lot rankings, it could change final results.
For example, if you have 50 people voting for images, typically, and, say, 5 spoilers, they could reduce the final score of an image (giving it 0.5 stars), by a typical value of 0.2 to 0.3. This is enough to move an image from 1st to 10th position or more in most challenges, if you're in the midrange, more than 30 positions.
And they are there, how come many 1st place images have 0.5s and 1s scores, are those votes in good faith?
My suggetsion is to remove them as statistical outliers. Many criteria exist to establish that, a good one is anything below average score minus 2x standard deviations could be seen as outliers (it depends on number of voters, but that's a good ROT).
Here's an example, with votes (series is for votes from 0.5 thru 5, total of 50 votes, 5 scoring 0.5):
5;0;1;3;5;6;11;11;6;2
Total of votes: 50
Average score: 3.2
SD: 1.2
AS - 2*SD = 0.8
Thus, scores 0.5 are considered outliers.
Removing those votes:
Average Score: 3.5
Difference: 0.3 points
Posted by: rhlpetrus | Feb 16, 2009 9:13:46 PM
Remark about high votes as outliers: one could also consider a score which is above AS + 2*SD as an outlier, even though I'd say give those people the doubt's advantage, maybe they actually think the image is outstanding for some reason. And, from actual votes, I didn't see much of that happening, no low score image with 5 star votes.
Posted by: rhlpetrus | Feb 16, 2009 9:18:31 PM
Is getting a vote of 0.5 star, better or worse than getting no vote at all?
Am I saying: I like this more than one I haven't given any vote to
Or, am I saying: this is rubbish?
I thought the former, but having read this blog, I'm not so sure now.
This needs to be made clear.
Posted by: Adam | May 22, 2009 7:37:16 PM
Hi,
The rating system is looking efficient.I am impressed the way you have analysed and described the voting system.
Posted by: bluetooth freisprecheinrichtung | Oct 21, 2009 11:34:24 AM