Proofreading Example

Finding and correcting typos in a long manuscript requires experience and concentration. Often two professionals proofread the same manuscript and, as a result, not only more typos get detected, but the juxtaposition of their efforts helps estimate the number of remaining typos. This is how it is done.

Assume that the two proofreaders read the same manuscript independently of each other and detect errors with probabilities p and q. Assume also that each error detection is independent of the rest so that, for each proofreader, finding another typo is a Bernoulli trial, with probabilities p and q, respectively. Assume one found A errors, the other B, and that there are C errors noticed by both. The total number of typos found by the couple is A + B - C.

There now are a couple of ways to proceed. What is the relation of A, B, C to the probabilities p and q? Feller (Ch. 6, §10, #23) and later G. Polya chose to treat A, B, and C as the expected values in Bernoulli trials. Assuming the total number of errors is (a large) M,

A ≈ pM, B ≈ qM, C ≈ pqM

so that

M = pM · qM / pqM ≈ AB/C.

From here, the number U of unnoticed typos can be estimated as

	U	= M - (A + B - C)
		≈ AB/C - (A + B - C)
		= (A - C)(B - C) / C.

Perhaps curiously, the result does not include M, the total number of errors. This is only because the latter has been assumed to be large. (This is the context in which the problem has been mentioned in the second edition of the very entertaining book by P. Nahin.)

According to a 1976 editorial (Monthly, p. 801),

The article Probabilities in Proofreading by G. Polya (this Monthly, 83 (1976) 42) has stimulated a lot of reader response. V. N. Murty has informed us that the estimate obtained by G. Polya for the number of unnoticed misprints was obtained by Edward Deming and Chandra Sekhar (J. Amer. Stat. Assoc., 44 (1949) 101-15) and that demographers use it to estimate vital events. Ralph Winter writes that if the number C of misprints noticed by both proofreaders is 0, then Polya's estimate is undefined. He then adds that if most probably numbers (rather than expected numbers) are used, the estimate becomes (A - C)(B - C) / (C + 1). L. Glickman has informed us that the problem Polya solves appears as Exercise 23 on page 170 of W. Feller's book An Introduction to Probability Theory and Its Applications, vol. I, 3rd edition (J. Wiley, New York, 1968).

(I should mention that the problem appears already in the second edition of Feller's book, where he comments that the approach has been used by Ernst Rutherford to count the number of scintillations, which means it was known in the 1920-30s.)

References

W. Feller, An Introduction to Probability Theory and Its Applications, Vol.1, John Wiley & Sons; 2nd edition (1958)
P. Nahin, Duelling Idiots and Other Probability Puzzlers, Princeton University Press, 2000
G. Polya, Probabilities in Proofreading, Amer Math Monthly, 83, n 1 (Jan. 1976) p. 42

74373815