Voting Patterns in the Labeling Task – Data Engines Corporation

Consider a collection of recognizers/annotators/labelers that have to label a data set. The problem of unsupervised inference in this task is being able to infer the true prevalence of the labels in the data and the conditional recognition probabilities for the recognizers.

The most abstract formulation of this problem treats the recognizers as black boxes. All we get to see about the decision process of each recognizer is its label for each data point. This abstraction is very helpful in practice. Treating the recognizers as black boxes allows us to develop algorithms that can readily accommodate heterogeneous decision makers. For example, it is common practice to have human annotators decide a gold standard in situations where computer algorithms also can label. Instead of accepting that the human annotators are perfect (numerous studies have shown they are not!), an algorithm that treats everyone as a black box can pool computer and human decisions and output an accuracy for all.

The only information available in this abstract setting are all the possible label voting patterns that the collection of recognizers could output. For example, if we have three recognizers carrying out a binary label task, such as information retrieval engines deciding if a document is relevant to a search query, the number of possible voting patterns is equal to $8$. In general, for $R$ recognizers trying to label data with one of $L$ labels, the number of label voting patterns observable is $L^R$. There could be less than these number of states. For example, in the case of speech recognizers that have insertion and deletion errors. In other words, the event space of label voting patterns depends on the type of errors that the recognizers make. We will discuss the event space of recognizers that have deletion and insertion errors in a future post.

The frequency of each of these label voting patterns is a direct observable available for any algorithm that tries to carry out unsupervised inference. In our next post, we will show how each of these observable frequencies results in an inference equation. And it is the collection of these inference equations that determines if a particular labeling task is solvable in an unsupervised setting.