How to tell hammers from spammers in Amazon Mechanical Turk

Reasoning about probability is hard. God knows I have made my share of mistakes. A recent paper by Karger, Oh, and Shah on Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems offers another example when they claim that spamming workers in a crowdsourcing platform like Amazon Mechanical Turk would be 50% correct. This is incorrect. They could be anywhere from 100 to 0 percent correct. Accuracy cannot distinguish shirking from conscientious workers in these platforms but our technology can.

Let me first explain with some simple math why accuracy is not a sign of honest work. We all make mistakes. So we all accept that honest workers will also make them. Suppose that you now send out a crowdsourcing task that consists of deciding if an Internet photograph is appropriate for children. This is a two-label task: yes or no. The results come back and worker A gets it 85% correct and worker B gets it 80% correct. Who is the spammer? Are they both conscientious? Well, there is not enough information in what I just said to answer either of these questions.

To understand this, let me make up a possible scenario. The task you sent out consists of 85% age appropriate photos. A spammer could spend a little time noticing that most of photographs trend this way and just proceed to answer that they are all age appropriate. Their final accuracy would be 100%*85% = 85%.

An honest worker on the same data could be 100% accurate on the age inappropriate photos and 76.4% correct on appropriate ones. This would make their performance 80% accurate ( 85%*76.4% + 15%*100% ). The honest worker is less accurate than the spammer in this task! Accuracy is NOT a measure of spamming in crowdsourcing!

What tells hammers from spammers apart is how they answer their tasks. You have to look at all their answers and see how they align. This is not the place to explain how our wonderful technology does this (see our other posts). But you can get an intuitive sense of how this can work by realizing that being honest aligns workers in their answers. To say it tongue in cheek, the truth biases them. In the hypothetical case above, the spammer answered that all of the age inappropriate photos were OK while the honest one did not.

Contact us if you are interested in finding out how we can improve your crowdsourcing platform.

 

MathML errors when viewing site

A web visitor has just alerted us to browser errors when attempting to see the math equations in this blog. Serving up math on the web is not trivial! We are working on the problem. We suggest, if possible, that you consider viewing this site with Firefox until we resolve this issue.