Constructive Existence of the de Bruijn Cycles

There are nk strings of length k written with n distinct symbols, such as, e.g., 0, 1, 2, ..., n-1. We shall call such strings (n, k)-sequences.

de Bruijn cycle, say dBC(n, k), is a sequence written on a circle so that each of the nk (n, k)-sequences appears in it as a subsequence exactly once.

Elsewhere the existence of dBC(n, k), for all n, k > 0, has been shown by modeling the sequences on a graph. Here we shall formulate an algorithm for forming de Bruijn sequences and prove that this is what it does. The algorithm has been implemented as a Java applet.

Algorithm

  1. Write down k zeros.
  2. Append to the sequence S already generated the largest symbol possible such that the newly formed (n, k)-sequence (i.e., the last k symbols of S) has not already appeared. Repeat when possible.
  3. Remove the last k - 1 symbols of the sequence.

The algorithm falls into the category of "greedy algorithms" for, on every step, it makes an extreme choice among the existing possibilities. It is worth noting that the algorithm is of the sort produced by an Euler cycle on the de Bruijn graph, for the given n and k. The difference is in that now we employ the greedy approach - and this in the hope that the algorithm will eventually lead to an Euler cycle.

Note that starting with k zeros is a spurious requirement; for, we are forming (or at least expect to form) an Euler cycle that could be created starting at any position. Also, there are other ways to express the greediness of the approach. We could form any permutation of the k symbols, which would induce a certain order among the symbols. Then, on step 2 choose the first (according to the new order) still available symbol. The algorithm - as presented - uses the sequence of n symbols in the reverse order of magnitudes.

In any event, what we need to show is that the sequence constructed by the algorithm is of length nk, while when the algorithm leaves step 2, the length of the string is nk + k - 1.

So, for the proof, assume that the algorithm slipped to step 3 with the string ending in k symbols abc...de. This means that the string contains at least n + 1 occurrences of bc...de, the first n followed by one of the given n symbols ( a different one each time). On the other hand, the n + 1 occurrences of bc...de must have been preceded by n + 1 (unless of course that is the sequence we started with). But this would contradict the fact that there are n symbols an all. But if bc...de is the sequence we started with, it may be removed (Step 3).

What remains? First of all, since we removed the k - 1 symbols that me started with, the string that remained could be "folded" into a cycle. As such, its construction could have started at any point whatsoever, provided the order of the n symbols has been adjusted accordingly (by a cycle permutation.) This exactly means that any sequence of k - 1 symbols appears exactly n times in the final string, meaning that every sequence of length k appears exactly once. Thus the length of the output string is indeed k.

References

  1. S. B. Maurer, A. Ralston, Discrete Algorithmic Mathematics, A K Peters, 3rd edition, 2004.
  2. A. Ralston, De Bruijn Sequences: A Model Example of the Interaction of Discrete Mathematics and Computer Science, Mathematics Magazine, Vol. 55, No. 3 (May, 1982), pp. 131-143

de Bruijn Cycles

|Contact| |Front page| |Contents| |Algebra|

Copyright © 1996-2018 Alexander Bogomolny

71923573