Proofreading Example
Finding and correcting typos in a long manuscript requires experience and concentration. Often two professionals proofread the same manuscript and, as a result, not only more typos get detected, but the juxtaposition of their efforts helps estimate the number of remaining typos. This is how it is done.
Assume that the two proofreaders read the same manuscript independently of each other and detect errors with probabilities p and q. Assume also that each error detection is independent of the rest so that, for each proofreader, finding another typo is a Bernoulli trial, with probabilities p and q, respectively. Assume one found A errors, the other B, and that there are C errors noticed by both. The total number of typos found by the couple is
There now are a couple of ways to proceed. What is the relation of A, B, C to the probabilities p and q? Feller (Ch. 6, §10, #23) and later G. Polya chose to treat A, B, and C as the expected values in Bernoulli trials. Assuming the total number of errors is (a large) M,
A ≈ pM, B ≈ qM, C ≈ pqM |
so that
M = pM · qM / pqM ≈ AB/C. |
From here, the number U of unnoticed typos can be estimated as
U | = M - (A + B - C) | |
≈ AB/C - (A + B - C) | ||
= (A - C)(B - C) / C. |
Perhaps curiously, the result does not include M, the total number of errors. This is only because the latter has been assumed to be large. (This is the context in which the problem has been mentioned in the second edition of the very entertaining book by P. Nahin.)
According to a 1976 editorial (Monthly, p. 801),
The article Probabilities in Proofreading by G. Polya (this Monthly, 83 (1976) 42) has stimulated a lot of reader response. V. N. Murty has informed us that the estimate obtained by G. Polya for the number of unnoticed misprints was obtained by Edward Deming and Chandra Sekhar (J. Amer. Stat. Assoc., 44 (1949) 101-15) and that demographers use it to estimate vital events. Ralph Winter writes that if the number C of misprints noticed by both proofreaders is 0, then Polya's estimate is undefined. He then adds that if most probably numbers (rather than expected numbers) are used, the estimate becomes |
(I should mention that the problem appears already in the second edition of Feller's book, where he comments that the approach has been used by Ernst Rutherford to count the number of scintillations, which means it was known in the 1920-30s.)
References
- W. Feller, An Introduction to Probability Theory and Its Applications, Vol.1, John Wiley & Sons; 2nd edition (1958)
- P. Nahin, Duelling Idiots and Other Probability Puzzlers, Princeton University Press, 2000
- G. Polya, Probabilities in Proofreading, Amer Math Monthly, 83, n 1 (Jan. 1976) p. 42
- What Is Probability?
- Intuitive Probability
- Probability Problems
- Sample Spaces and Random Variables
- Probabilities
- Conditional Probability
- Dependent and Independent Events
- Algebra of Random Variables
- Expectation
- Probability Generating Functions
- Probability of Two Integers Being Coprime
- Random Walks
- Probabilistic Method
- Probability Paradoxes
- Symmetry Principle in Probability
- Non-transitive Dice
|Contact| |Front page| |Contents| |Up|
Copyright © 1996-2018 Alexander Bogomolny
71536707