In our last post we derived a comparison between the number of observable label voting pattern frequencies and the number of parameters we would need to solve in the case of independent recognizers. That lead us to the concept of a minimum number of recognizers needed for unsupervised inference. It so happens that we can prove that the minimum is always greater than two. A somewhat satisfying result that is reminiscent of the idea of triangulation in geometry. More than two decision makers are always needed whenever we want to infer the accuracy of their labeling performance without any knowledge of the ground truth.
In this post we want to talk about the possibility of carrying out unsupervised inference even when the recognizers are correlated. As far as we know, our approach is unique in dealing with this situation. Not solely because the inference equations associated with the label voting pattern frequencies give us much more information than traditional approaches such as maximizing log likelihood of the data, but because we can quantify when recognizers are too correlated. Unsupervised inference is not always possible and a viable, practical algorithm for carrying it out needs to be able to detect when the recognizers have become too correlated.
The statistical quantities relevant to a discussion about the quality of the recognizers are the conditional recognition probabilities,
$$p(\{\ell_1,\ell_2,\ell_3,\ldots\} | \ell_t).$$
For each of the $L$ labels in a particular task, the $R$-dimensional table of these conditional probabilities sums to one,
$$\sum_{e} p(e | \ell_t) = 1,$$
where $e=\{\ell_1,\ell_2,\ell_3,\ldots\}$ is a label voting pattern event and we sum over all possible label voting patterns. The most general conditional probability table for a given label would then require that we estimate $L^R -1$ parameters. Compare this to the case of independent recognizers where the table only requires that we estimate $R*(L-1)$ parameters. We immediately deduce that the unsupervised inference problem where $L^R-1$ parameters are required for each label is not solvable since the number of parameters we have to estimate,
$$L(L^R-1) + (L-1),$$
is always greater than the number of inference equations we have,
$$L^R -1.$$
Restating what we have just said,
$$L(L^R – 1) + (L-1) \gt L^R -1,$$
since
$$L^{R+1} \gt L^R,$$
for any $L \ge 2$ and $R \gt 1$.
While we have been discussing unsupervised inference assuming that all $L^R$ label voting patterns are observable, it should be clear to the reader that the real issue is: does the system of inference equations allow for unique solutions? If you consider the case of completely correlated recognizers, only $L$ label voting patterns would be observed (the patterns where their votes are equal). But we still need to solve for $L(L-1) + (L-1)$ statistical parameters and this is always greater than the $L-1$ linearly independent inference equations.
Having discussed at length when unsupervised inference is impossible, we now come to the title of this post: the window of opportunity for unsupervised inference. Whenever the number of parameters needed to specify the prevalence of labels and the conditional recognition probabilities, call it $P$, lies in the range
$$(L-1)(R*L-1) \le P \le L^R -1,$$
the unsupervised inference problem in the labeling task is solvable. The lower limit corresponds to the case of completely independent recognizers and the upper limit corresponds to the maximum number of linearly independent inference equations.
In our next post we will begin detailing the application of this approach to a set of eight information retrieval engines used during the 2003 Reliable Information Access (RIA) Workshop at MITRE.