 Subject: Probability in integer sequences
Date: Mon, 15 Nov 2000 09:50:53 -0500
From: Bernd Liebermann

Hello,

a few years ago I developped a simple characteristic value for the trend of a series of ordinal measurements. It expresses within a range from -1 to +1 whether the series tends to de- or increase. It is computed by looking at each number starting with the second, and comparing it to all previous numbers. Then the number of all negative comparisons (x(i) is smaller than a previous number) is subtracted from the number of all positive comparisons (x(i) is greater than a previous number). This difference is divided by the total number of all comparisons. If this ratio exceeds a certain threshold of significance the change in the measurements can be said to be most probably due to an effective underlying trend, rather than to measurement errors.

In order to determine these thresholds I did some computer simulations with random integer sequences and thus obtained the standard deviations of the trend ratio distributions, depending on the length of the sequence and the range of the values. In the first step, that's okay for practical purposes but, nevertheless, it's theoretically dissatisfying.

I tried an exact computational approach to determine the probability of a given trend ratio under random condtions, but got stuck and now are looking for good ideas by others. Let me tell you, what I think I have understood so far.

First, consider the simple case of random sequences with range R=3 and length N=3, e.g. 1, 2, 3 or 3, 2, 2. All of the possible R^N=27 sequences result in a trend ratio of either TR = -1, -2/3, -1/3, 0, 1/3, 2/3 or 1. As the distribution is symmetric and the expected value is 0, we can disregard the sign. Thus, the variance of the distribution is p(|TR| = 1/3) * (1/3)^2 + p(|TR| = 2/3) * (2/3)^2 + p(|TR| = 1) * 1^2. By simply counting you get

Var(TR) = (4/27 * 1/9) + (12/27 * 4/9) + (2/27 * 9/9) = 0.288...

The divisor 27 in all product results from R^N, which is clear. The divisor 9 results from N-1 + N-2 = 2 + 1, which is easily recognized as the square of the number of all possible absolute values for TR, except zero, which is, of course, equal to the square of all comparisons.

So, in common:

Var(TR) = Sum{(Fi / R^N) * (i^2 / Sum{j}^2)} .

with i running from 1 to N and j from 1 to N-1.

What I'm looking for now is the values for F or, in other words, the frequency of each absloute TR, when you build all R^N possible sequences, given R and N.

The problem simplifies down to the question: Given a sequence of N random integers between 1 and R: if you compare each number to all its predecessors, what is the probability p(x) that in x comparisons the successor turns out to be greater than the predeccessor?

I tortured my mind, but I still haven't found the solution. Maybe it's easy. Maybe you know it ?!

Best regards,
Bernd Liebermann 