Benford's Law and Zipf's Law
With the view to the eerie but uniform distribution of digits of randomly selected numbers, it comes as a great surprise that, if the numbers under investigation are not entirely random but somehow socially or naturally related, the distribution of the first digit is not uniform. More accurately, digit D appears as the first digit with the frequency proportional to
The law was discovered by the American astronomer Simon Newcomb in 1881 who noticed that the first pages of books of logarithms were soiled much more than the remaining pages. In 1938, Frank Benford arrived at the same formula after a comprehensive investigation of listings of data covering a variety of natural phenomena. (Benford's original data table can be found on Eric Weisstein's Treasure Troves of Mathematics - Benford's Law page.) The law applies to budget, income tax or population figures as well as street addresses of people listed in the book American Men of Science. In the face of such universality of the law, it's quite astonishing that there exists a more general framework - Zipf's Law. Which, in turn, falls under a more general rubric of scaling phenomena.
Strange as it sounds, Benford's Law may be explained from the first principles, the chief among which is sheer universality of mathematics. Budget data gathered from yearly reports of a thousand of corporations may appear random at first glance, but then it also quite reasonable to assume that corporation budgets depend on a few parameters: the corporation size, particular industry a company belongs to, the quality of the management, the state of the market. The size of a river basin is a function of the river's depth and breadth. Most of the dependencies are expressed more or less accurately by simple formulas: linear, power or exponential, oscillating, leading to saturation.
Functional dependencies are abound. This is why studying Calculus makes so much sense. But the notion of function does not reduce to a functional relation. A function is a collection of three attributes: domain, range, and a particular form of dependency of elements from the range on elements from the function domain. The same functional dependency expressed by the formula
For example, statistical data on river basins will probably not include streams below a certain size. Small streams are called brooks and socially (picnic areas vs. beach fronts), politically (brooks that dry up in summers can't serve as good state borders), militarily (troops just wade across without special training), and ecologically (spawning place for mosquitos vs. salmon) play roles different from that of rivers. At the other extreme, there is a natural upper limit on the breadth of a water conduit after which it's more reasonable to talk of lakes, bays, seas, or oceans. Similarly, population data may skip a hamlet of 17 households at the lower end (if, for example, it was gathered by a cable company), and, of course, at the upper end, there are facts on the ground with a very limited number of super cities with populations in millions.
With this in mind, it's natural that statistical data for a phenomenon that obeys one of the power laws (b) is biased towards the lower part of the range, whereas that for a phenomenon with saturation (d) tends to be biased towards the upper part of the range.
Mark Nigrini from the Southern Methodist University, who in recent years pioneered application of Benford's Law to tax evasion and other fraud detection offers an example from a stock market. (See his recent book Digital Analysis Using Benford's Law: Tests Statistics for Auditors.) Assume, in a bull period, a market average indicator starts with an average of $1,000 that grows 20% a year. For the next 10 years, we'll get the following statistical data on the market indicator:
With 40% of the data starting with digit 1.
For further investigation, here's a short list of Internet resources devoted to Benford's Law and, further down the page, two book excerpts describing Zipf's Law and personality.
- M. Nigrini, Benford's Law: : Applications for Forensic Accounting, Auditing, and Fraud Detection, Wiley, 2012 (a companion web site)
- Benford's Law page, Eric Weisstein
- Benford's Law, Kevin Brown
- BENFORD ONLINE BIBLIOGRAPHY by Arno Berger and Ted Hill
- Terry Tao's take
This is an excerpt from
The Quark and the Jaguar
by Murray Gell-Mann, Freeman & Co, 1994
... Often, however, we encounter less than ideal cases. We may find regularities, predict that similar regularities will occur elsewhere, discover that the prediction is confirmed, and thus identify a robust pattern: however, it may be a pattern for which the explanation continues to elude us. In such a case we speak of an "empirical" or "phenomenological" theory, using fancy words to mean basically that we see what is going on but do not yet understand it. There are many such empirical theories that connect together facts encountered in everyday life.
Suppose we pick up a book of statistical facts, like the World Almanac. Looking inside, we find a list of U.S. metropolitan areas in order of decreasing population, together with the population figures. There may also be corresponding lists for the cities in individual states and in other countries. In each list every city can be assigned a rank, equal to 1 for the most populous city, 2 for the next most populous, and so on. Is there a general rule for all these lists that describes how the population decreases as the rank increases? Roughly speaking, yes. With fair accuracy, the population is inversely proportional to the rank; in other words, the successive populations are roughly proportional to 1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1 /9, 1 /10, 1/11, and so on.
Now let us look at the list of the largest business firms in decreasing order of volume of business (say the monetary value of sales during a given year). Is there an approximate rule that describes how the sales figures of the firms vary with their ranks? Yes, and it is the same rule as for populations. The volume of business is approximately in inverse proportion to the rank of the firm.
How about the exports from a given country in a given year in decreasing order of monetary value? Again, we find the same rule is a fair approximation.
An interesting consequence of that rule is easily verified by perusing any of the lists mentioned, for example a list of cities with their populations. First let us look at, say, the third digit of each population figure. As expected, the third digit is randomly distributed; the numbers of 0s, 1s, 2s, 3s, etc. in the third place are all roughly equal. A totally different situation obtains for the distribution of first digits, however. There is an overwhelming preponderance of 1s, followed by 2s, and so forth. The percentage of population figures with initial 9s is extremely small. That behavior of the first digit is predicted by the rule, which, if exactly obeyed, would give a proportion of initial 1s to initial 9s of 45 to 1.
divided by n
(n - 2/5)3/4
|31||Kansas City, Mo.||434,829||322,581||384,308|
|37||Virginia Beach, Va.||393,089||270,270||336,015|
|73||Baton Rouge, La.||219,531||136,986||201,033|
|Populations of U.S. cities from the 1994 World Almanac compared with Zipf's original law and a modified version of it.|
What if we put down the World Almanac and pick up a book on secret codes, containing a list of the most common words in a certain kind of English text arranged in decreasing order of frequency of occurrence? What is the approximate rule for the frequency of occurrence of each word as a function of its rank? Again, we encounter the same rule, which works for other languages as well.
Many of these relationships were noticed in the early 1930s by a certain George Kingsley Zipf, who taught German at Harvard, and they are all aspects of what is now called Zipf's law. Today, we would say that Zipf's law is one of many examples of so-called scaling laws or power laws, encountered in many places in the physical, biological, and behavioral sciences. But in the 1930s such laws were still something of a novelty.
In Zipf's law the quantity under study is inversely proportional to the rank, that is, proportional to 1, 1/2, 1/3, 1/4, etc. Benoit Mandelbrot has shown that a more general power law (nearly the most general) is obtained by subjecting this sequence successively to two kinds of modification. The first alteration is to add a constant to the rank, giving 1/(1 + constant), 1/(2 + constant), 1/(3 + constant), 1/(4 + constant), etc. The further change allows, instead of these fractions, their squares or their cubes or their square roots or any other powers of them. The choice of the squares, for instance, would yield the sequence 1/(1 + constant) 2 1/(2 + constant)2, 1(3 + constant)2, 1(4 + constant)2 etc. The power in the more general power law is 1 for Zipf's law, 2 for the squares, 3 for the cubes, 1/2 for the square roots, and so on. Mathematics gives a meaning to intermediate values of the power as well, such as 3/4 or 1.0237. In general, we can think of the power as 1 plus a second constant. just as the first constant was added to the rank, so the second one is added to the power. Zipf's law is then the special case in which those two constants are zero.
Mandelbrot's generalization of Zipf's law is still very simple: the additional complexity lies only in the introduction of the two new adjustable constants, a number added to the rank and a number added to the power 1. (An adjustable constant, by the way, is called a "parameter," a word that has been widely misused lately, perhaps under the influence of the somewhat similar word "perimeter." The modified power law has two additional parameters.) In any given case, instead of comparing data with Zipf's original law, one can introduce those two constants and adjust them for an optimal fit to the data. We can see in the chart on page 94 how a slightly modified version of Zipf's law fits some population data significantly better than Zipf's original rule (with both constants set equal to zero), which already works fairly well. "Slightly modified" means that the new constants have rather small values in the altered power law used for the comparison. (The constants in the chart were chosen by mere inspection of the data. An optimal fit would have yielded even better agreement with the actual populations.)
When Zipf first described his law, at a time when very few other scaling laws were known, he tried to make an important issue of how his principle distinguished the behavioral from the physical sciences, where such laws were supposedly absent. Today, after so many power laws have been discovered in physics, those remarks tend to detract from Zipf's reputation rather than enhance it. Another circumstance is said to have worked against his reputation as well, namely that he indicated a certain sympathy with Hider's territorial rearrangements of Europe, perhaps justifying his attitude by arguing that those conquests tended to make the populations of European countries conform more closely to Zipf's law.
This is an excerpt from
The Fractal Geometry of Nature
by Benoit Mandelbrot, Freeman & Co, 1983
GEORGE KINGSLEY ZIPF
Zipf, an American scholar, started as a philologist but came to describe himself as a statistical human ecologist. He was for twenty years a Lecturer at Harvard, and died just after having published, apparently at his own expense, Human Behavior and the Principle of Least Effort, (Zipf 1949-1965).
This is one of those books (Fournier 1907 is another) in which flashes of genius, projected in many directions, are nearly overwhelmed by a gangue of wild notions and extravagance. On the one hand, it deals with the shape of sexual organs and justifies the Anschluss of Austria into Germany because it improved the fit of a mathematical formula. On the other hand, it is filled with figures and tables that hammer away ceaselessly at the empirical law that, in social science statistics, the best combination of mathematical convenience and empirical fit is often given by a scaling probability distribution. Some examples are studied in Chapter 38.
Natural scientists recognize in "Zipf's laws" the counterparts of the scaling laws which physics and astronomy accept with no extraordinary emotion-when evidence points out their validity. Therefore physicists would find it hard to imagine the fierceness of the opposition when Zipf - and Pareto before him - followed the same procedure, with the same outcome, in the social sciences. The most diverse attempts continue to be made, to discredit in advance all evidence based on the use of doubly logarithmic graphs. But I think this method would have remained uncontroversial, were it not for the nature of the conclusion to which it leads. Unfortunately, a straight doubly logarithmic graph indicates a distribution that flies in the face of the Gaussian dogma, which long ruled uncontested. The failure of applied statisticians and social scientists to heed Zipf helps account for the striking backwardness of their fields.
Zipf brought encyclopedic fervor to collecting examples of hyperbolic laws in social sciences, and unyielding stamina to defending his findings and analogous findings by others. However, the present Essay makes it obvious that his basic belief was without merit. It is not true that frequency distributions are always hyperbolic in the social sciences, and always Gaussian in the natural sciences. An even more serious failing was that Zipf tied his findings together with empty verbal argument, and came nowhere close to integrating them into a body of thought.
At a critical point in my life (Chapter 42), I read a wise review of Human Behavior by the mathematician J. L. Walsh. By only mentioning what was good, this review influenced greatly my early scientific work, and its indirect influence continues. Therefore, I owe a great deal to Zipf through Walsh.
Otherwise Zipf's influence is likely to remain marginal. One sees in him, in the clearest fashion - even in caricature - the extraordinary difficulties that surround any interdisciplinary approach.
W. Li from Rockefeller University gathered a comprehensive bibliography on Zipf's Law.