Equations for unsupervised inference

In our previous post, we discussed the observables in an unsupervised setting: the label voting patterns of the recognizers. In this post we will discuss how we can use the frequency of these voting patterns to create a system of equations that can be used for unsupervised inference. We will use the case of three recognizers carrying out a binary labeling task for ease of discussion but hasten to add that the formalism works for any number of labels.

Take one of the eight possible label voting patterns that the three recognizers can produce. We will use the label voting pattern $\{A,B,A\}$ where $A$ and $B$ represent our two labels. The inference equation for this voting pattern is,
$$\color{red} f_{\{A,B,A\}} = p(\{A,B,A\} | A)p(A) + p(\{A,B,A\} | B)p(B).$$
The frequency of the $\{A,B,A\}$ voting pattern is equal to the number of times the recognizers where presented with a data point with correct label $A$ and they outputted the labels $\{A,B,A\}$ plus the number of times the recognizers where presented with a data point with correct label $B$ and they outputted the said voting pattern. Similar equations can be written for each of the eight label voting patterns.

Each of these equations relates an observable, the frequency of a label voting pattern, that we can measure without any knowledge of the correct labels for the data points to two types of statistical quantities we would like to know: the prevalence of the labels in the data, $p(\ell)$, and the conditional recognition probabilities, $p(\text{votes} | \ell).$

We can immediately deduce from these equations that carrying out unsupervised inference when the recognizers are completely correlated is impossible. For example, if the three recognizers are completely correlated they would output just two patterns $\{A,A,A\}$ and $\{B,B,B\}$. This observation could be explained as one of two cases. The recognizers are perfect and the observed frequency of the two voting events gives us the prevalence of the $A$ and $B$ labels in the data. Or, they are imperfect and there are an infinity of solutions for the prevalences and conditional recognition probabilities.

The interesting science occurs in the window of opportunity that these equations create. We will show on our next post how there is a minimum number of recognizers that are needed to carry out unsupervised inference. And that as long as the recognizers are not too correlated, it is possible to solve the joint unsupervised inference problem for the labeling task.