I picked up Algebraic Statistics this morning and discovered a simple proof of the Marriage Lemma in the first page of the Introduction. The book is by Pistone, Riccomagno and Wynn and dates to 2001. The subtitle describes its mathematical bent well: “Computational Commutative Algebra in Statistics”. Basically, they show that polynomial algebra problems abound in statistics and the book is an explanation of the mathematical machinery from algebraic geometry that can be used to gain insight into traditional statistical techniques.
The problem they start with in the Introduction is the “simplest example possible”, as they say. Here is their description:
Suppose that two people (small enough) stand together on a bathroom scale. Our model is that the measurement is additive, so that if there is no error, and $\theta_1$ and $\theta_2$ are the two weights, the reading should be
$$
Y = \theta_1 + \theta_2
$$
Without any other information it is not possible to estimate, or compute, the individual weights $\theta_1$ and $\theta_2$. If there is an unknown zero correlation $\theta_0$ then $Y=\theta_1 + \theta_2 + \theta_0$ and we are in worse trouble.
This simple linear algebra problem (and its solution) is the core of the Marriage Lemma: Two recognizers will never be able to tell how accurate the other one is. The weight problem described in the book is equivalent if we think of accuracies as weights. Let me explain.
Clearly, if we are forced to weigh two people together whenever we use a weight scale, we could never determine the individual weights of a single pair. Or to state it explicitly, the single linear equation $Y=\theta_1 + \theta_2$ where we have a single measurement cannot be used to solve for the two unknown weights. This is the marriage lemma: two people living together will never be able to figure out their individual weights when they use this two-at-a-time weight scale.
But if three people live together, they can do so. This is equivalent to our machine learning claim that three independent recognizers can measure their own accuracies without any knowledge of the correct labels/values for the data.
The following equations would apply for the case of finding the weight of three people when you are forced to use a two-person scale
\begin{align}
Y_{12} &= \theta_1 + \theta_2 \\
Y_{13} &= \theta_1 + \theta_3 \\
Y_{23} &= \theta_2 + \theta_3
\end{align}
You weigh the three possible pairs and then you get three linear equations for three unknowns. Assuming the weight scale is perfect – this will always have a solution. When noise is present in the scale you minimize the least squares of the equation residuals.
[update]: The case of a two-person scale is not as far fetched as it sounds. It would apply to a weight scale that only recorded weights above some non-zero value. For example, a scale that only gave readings above 200 pounds. A group of three people, each of whom weighed between 100 and 200 lbs, could then use this algorithm to figure out their individual weights.