Descartes' Rule of Signs

Scott E. Brodie

1/1/99

In Descartes' revolutionary work, La Geometrie, as the discussion turns to the roots of polynomial equations, we find, without hint of a proof, the statement:

On connoift auffy de cecy combien il peut y auoir de vrayes racines, & combien de fauffes en chafque Equation. A fçauoir il y en peut auoir autant de vrayes, que les fignes + & -- s'y trouuent de fois eftre changés; & autant de fauffes qu'il s'y trouue de fois deux fignes +, ou deux fignes -- quie s'entrefuiuent.

"We can determine also the number of true and false roots that any equation can have, as follows: An equation can have as many true roots as it contains changes of sign, from + to - or from - to +; and as many false roots as the number of times two + signs or two - signs are found in succession."

(By the term "true roots" Descartes means "positive roots"; by "false roots" he means "negative roots".)

This statement is the basis of the attribution to Descartes of the proposition now known as "Descartes' Rule of Signs":

The number of positive roots of a polynomial with real coefficients is equal to the number of "changes of sign" in the list of coefficients, or is less than this number by a multiple of 2. [Since the negative roots of the polynomial equation f(x) = 0 are positive roots of the equation f(-x) = 0, the rule can be readily applied to help count the negative roots as well.]

Descartes' Rule is a staple of high-school algebra courses, but a proof is seldom seen, even at the college level -- this sort of material from the "theory of equations" being seriously out of fashion these days.

Before attempting a proof, it may be helpful to examine some simple cases:

If f(x) is a linear polynomial (that is, of degree 1), say, f(x) = mx + b then the only root of f(x) is x = -b / m , which is positive only when m and b are of opposite sign.

If f(x) is a quadratic polynomial (that is, of degree 2), it is convenient to divide by the coefficient of the x² term, so as to obtain a new polynomial with leading coefficient equal to 1. This polynomial, say f(x) = x² + bx + c , shares the same roots, and the same pattern of variations in the signs of the coefficients, as the original polynomial. Suppose we have identified the two roots, say r and s. Then we may express f(x) in factored form:

f(x)	= ( x - r )( x - s )
	= x² - ( r + s )x + rs.

Compare coefficients b = - ( r + s ) and c = rs. Thus c is positive only when r and s are of the same sign, that is, whenever f(x) has two positive roots or two negative roots. But if c and x are both positive, the only way the expression f(x) = x² + bx + c can equal zero is for b to be negative (in which case there are two variations in sign). Of course, according to the quadratic formula, the two roots are given by the formula

x = (-b ± √b² - 4ac) / 2a,

And, if c is large enough, there will be no real roots at all. Thus, if f(x) has two variations in sign, there will be either two positive real roots, or none at all. On the other hand, if c is negative, there will be one variation in sign (regardless of whether b is positive or negative), and there will be two real roots. Since c = rs, the roots will be of opposite sign - that is, there will be exactly one positive root.

These ad hoc arguments verify Descartes' Rule of Signs for linear and quadratic polynomials. Of course, it would not be possible to proceed much further in similar fashion - the formulas for the roots of cubic and quartic polynomials are unwieldy in the extreme, and there are no analogous formulas for the roots of polynomials of higher degree. In order to treat the general case, a more systematic approach is needed.

For further reference, if f(x) is a polynomial, let's denote the number of variations in sign of its coefficients by V[f(x)], and denote the number of positive real roots of f(x) by P[f(x)].

As in the quadratic case, to simplify matters, we stipulate that in any polynomial f(x), the leading coefficient (the coefficient of the highest power of the variable x in the polynomial) is never zero. If we divide the entire polynomial by this leading coefficient, we get a new polynomial with leading coefficient equal to 1. (Such a polynomial is called monic.) The new monic polynomial will have the same roots, and the same pattern of variations in the signs of the coefficients, as did the original polynomial f(x). Thus, without loss of generality, we may assume our polynomial is monic.

Similarly, observe that if the "constant term" a₀ of f(x) is zero, we may divide the polynomial f(x) by some positive power x^m of x to obtain another polynomial with non-zero constant term, with the same positive roots and the same variations in sign as f(x), since a factor of the form x^m contributes neither positive roots nor variations in sign to f(x). Thus, "without loss of generality", we may assume that the constant term a₀ in our polynomial is not zero.

Thus we may denote our polynomial in the form:

f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀

Now it is time to start counting. Begin with a simple observation:

I. Let f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀. If a₀ < 0, then V[f(x)] is odd; if a₀ > 0, then V[f(x)] is even.

This is an example of a basic property of two-state systems: If you change their state an odd number of times, you end up in the opposite state from the state with which you started; if you change state an even number of times, you recover the original state.

Amazingly, we can make a similar statement about the positive roots of f(x):

II. Let f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀. If a₀ < 0, then P[f(x)] is odd; if a₀ > 0, then P[f(x)] is even.

We can give a simple proof by induction as follows:

We have already observed that this observation holds for polynomials of degree 1 and of degree 2.

Now suppose that II holds for polynomials of degree less than k. Consider a polynomial f(x) = x^k + a_k-1 x^k-1 + ... + a₁x + a₀, of degree k.

Suppose first that a₀ < 0. Then f(0) = a₀ < 0, while, for large enough values of x, the x^k term of the polynomial will dominate the others, so that for some positive value of x, f(x) > 0. Since every polynomial is a continuous function, we may apply Bolzano's Theorem ("If f(x) is a continuous real-valued function on the closed interval [a,b], with f(a) < 0 < f(b), then there exists a real number c, between a and b, such that f(c) = 0.") to conclude that f(x) has a positive root, say at x = p. Then (x - p) divides f(x), and we may set f(x) / (x - p) = g(x), where g(x) is a monic polynomial of degree k - 1. Indeed, the constant term, say, b₀, of g(x), must be positive (since a₀= - pb₀ < 0). Applying the induction hypothesis, we know that P[g (x)] is even. But

P[f(x)] = P[g(x) (x - p)]

= P[g(x)] + 1,

so P[f(x)] is odd.

On the other hand, if a₀ > 0, we must consider two possibilities. If P[f(x)] = 0, then II holds, as 0 is an even number. If P[f(x)] > 0, then f(x) has a positive root, say at x = p. Reasoning as in the previous case, the constant term of g(x) = f(x) / (x - p) must be negative, and thus by the induction hypothesis, P[g (x)] must be odd, and P[f(x)] must be even.

We may combine I and II in the form

III. Let f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀. Then V[f(x)] and P[f(x)] are both even or both odd - that is, V[f(x)] and P[f(x)] differ by an integer multiple of 2.

The final piece of Descartes' Rule follows from the next observation:

IV. Let f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀, and suppose p > 0. Then V[f(x)(x - p)] > V[f(x)].

For simplicity, suppose for now that none of the coefficients a_i is equal to zero - that is, none of the terms of f(x) is "missing". We need only keep track of the sequence of signs of the coefficients, say, + + - - - +. In this skeletal notation, we lay out the multiplication of f(x) by (x - p) in the manner of ordinary two-digit multiplication:

+ + - - - +

+ -

+ + - - - +

- - + + + -

+ ? - ? ? + -

In the final product polynomial, the leading coefficient, that is, the coefficient of xⁿ⁺¹, is positive (in fact, it is equal to 1); the constant term is of opposite sign from a₀, the constant term of f(x). In some instances, the sign of the remaining terms will depend on the actual values of the number p and the coefficients a_k. The signs of these terms have been indicated by a "?" in the example. However, wherever there is a variation in sign in the original polynomial (as from the second to the third term in the example), the sign of the corresponding term in the product is unambiguously determined: it is necessarily the same as that of the term at the head of the column in the diagram, and opposite to the sign of the preceding unambiguously determined term. Evidently, each variation in sign of f(x) induces a corresponding variation in sign in the product.

What of the intervening "?" terms? They do not matter! If there is only one "?" term between terms of known opposite sign, there will be one variation in sign, no matter what sign the "?" represents. If there are more than one "?" terms in succession, there will be at least one, and possibly more, variations in sign between the known terms. (As pointed out in I above, the number of variations in sign between two terms of known opposite sign will necessarily be odd.) Any such additional variations in sign only further increase the number of variations of sign in the product polynomial.

The possibility that some of the coefficients of f(x) are equal to zero adds no important difficulties. In general, we can describe the sequence of signs of such a polynomial in the form, say, + ³ ³ - £ £ £ +, where the variations in sign are due to the "+"s and "-"s, and the intervening "³ " and "£" symbols indicate terms whose coefficients are assumed only to be greater than or equal, or less than or equal, to zero, respectively, so as not to create a variation in sign. As before, our multiplication problem may be depicted schematically:

+ ³ ³ - £ £ £ +

+ -

+ ³ ³ - £ £ £ +

- £ £ + ³ ³ ³ -

+ ? ? - ? ? ? + -

Evidently, each variation in sign from the original polynomial still propagates to the product as before.

In summary, we have V[f(x)(x - p)] ³ V[f(x)].

But the constant term of f(x)(x - p) and the constant term of f(x) are of opposite sign, so by I, exactly one of V[f(x)(x - p)] and V[f(x)] is even, while the other one must be odd. In particular, they cannot be equal. Thus we conclude V[f(x)(x - p)] > V[f(x)].

Now return to our original polynomial, f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀. We can express f(x) in factored form as

f(x) = N(x)(x - p₁)(x - p₂) ... (x - p_m),

where N(x) has no positive roots (but may, of course, nevertheless have variations in sign), and the p_i are the m positive roots of f(x), listed repeatedly, if necessary, according to their multiplicity. Applying IV repeatedly, once for each root p_i, we find

V. Let f(x) = xⁿ + a_n-1 x^n-1 + ... + a₁x + a₀. Then V[f(x)] ³ P[f(x)].

Thus, either the number of positive roots is equal to the number of variations in sign, or (by III) is less than the number of variations in sign by an integer multiple of 2, Q.E.D.

References

The Geometry of Rene Descartes with a facsimile of the first edition, translated by D. E. Smith and M.L. Latham, New York, Dover Publications, 1954.
B. E. Meserve, Fundamental Concepts of Algebra, New York, Dover Publications, 1982.

73384807