Descartes’ Rule of Signs
Scott E. Brodie
1/1/99
In Descartes' revolutionary work, La Geometrie, as the discussion turns to the roots of polynomial equations, we find, without hint of a proof, the statement:
| |
On connoift auffy de cecy combien il peut y auoir de vrayes racines, & combien de fauffes en chafque Equation. A fçauoir il y en peut auoir autant de vrayes, que les fignes + & -- s'y trouuent de fois eftre changés; & autant de fauffes qu'il s'y trouue de
fois deux fignes +, ou deux fignes -- quie s'entrefuiuent.
|
| |
"We can determine also the number of true and false roots that any
equation can have, as follows: An equation can have as many true roots
as it contains changes of sign, from + to - or from - to +; and as many
false roots as the number of times two + signs or two - signs are found
in succession."
|
(By the term "true roots" Descartes means "positive roots"; by "false
roots" he means "negative roots".)
This statement is the basis of the attribution to Descartes of the proposition
now known as "Descartes' Rule of Signs":
| |
The number of positive roots of a polynomial with real coefficients is equal to the number of "changes of sign" in the list of coefficients, or is less than this number by a multiple of 2. [Since the negative roots of the polynomial equation f(x) = 0 are positive roots of the equation f(–x) = 0, the rule can be readily applied to help count the negative roots as well.]
|
Descartes' Rule is a staple of high-school algebra courses, but a proof is seldom seen, even at the college level -- this sort of material from the "theory of equations" being seriously out of fashion these days.
Before attempting a proof, it may be helpful to examine some simple cases:
If f(x) is a linear polynomial (that is, of degree 1),
say, f(x) = mx + b then the only root of f(x)
is x = –b / m , which is positive only when m
and b are of opposite sign.
If f(x) is a quadratic polynomial (that is, of degree
2), it is convenient to divide by the coefficient of the x2
term, so as to obtain a new polynomial with leading coefficient equal to
1. This polynomial, say f(x) = x2 + bx
+ c , shares the same roots, and the same pattern of variations
in the signs of the coefficients, as the original polynomial. Suppose we
have identified the two roots, say r and s. Then we may express
f(x) in factored form:
| |
| f(x) | = ( x – r )( x – s ) |
| | = x2 – ( r + s )x + rs. |
|
Compare coefficients b = – ( r + s ) and c = rs.
Thus c is positive only when r and s are of the same
sign, that is, whenever f(x) has two positive roots or two
negative roots. But if c and x are both positive, the only
way the expression f(x) = x2 + bx
+ c can equal zero is for b to be negative (in which case
there are two variations in sign). Of course, according to the quadratic
formula, the two roots are given by the formula
| |
x = (-b ± √b² - 4ac) / 2a,
|
And, if c is large enough, there will be no real roots at all. Thus,
if f(x) has two variations in sign, there will be either
two positive real roots, or none at all. On the other hand, if c
is negative, there will be one variation in sign (regardless of whether
b is positive or negative), and there will be two real roots. Since
c = rs, the roots will be of opposite sign – that is, there will
be exactly one positive root.
These ad hoc arguments verify Descartes’ Rule of Signs for linear
and quadratic polynomials. Of course, it would not be possible to proceed
much further in similar fashion – the formulas for the roots of cubic and
quartic polynomials are unwieldy in the extreme, and there are no analogous
formulas for the roots of polynomials of higher degree. In order to treat
the general case, a more systematic approach is needed.
For further reference, if f(x) is a polynomial, let’s
denote the number of variations in sign of its coefficients by V[f(x)],
and denote the number of positive real roots of f(x) by P[f(x)].
As in the quadratic case, to simplify matters, we stipulate that in any polynomial
f(x), the leading coefficient (the coefficient of the highest
power of the variable x in the polynomial) is never zero. If we
divide the entire polynomial by this leading coefficient, we get a new
polynomial with leading coefficient equal to 1. (Such a polynomial is called
monic.) The new monic polynomial will have the same roots, and the same
pattern of variations in the signs of the coefficients, as did the original
polynomial f(x). Thus, without loss of generality, we may
assume our polynomial is monic.
Similarly, observe that if the "constant term" a0 of f(x)
is zero, we may divide the polynomial f(x) by some positive
power xm of x to obtain another polynomial with
non-zero constant term, with the same positive roots and the same variations
in sign as f(x), since a factor of the form xm
contributes neither positive roots nor variations in sign to f(x).
Thus, "without loss of generality", we may assume that the constant term a0
in our polynomial is not zero.
Thus we may denote our polynomial in the form:
| |
f(x) = xn + an-1 xn-1
+ … + a1x + a0
|
Now it is time to start counting. Begin with a simple observation:
I. Let f(x) = xn + an-1
xn-1 + … + a1x + a0. If a0
< 0, then V[f(x)] is odd; if a0
> 0, then V[f(x)] is even.
This is an example of a basic property of two-state systems: If you
change their state an odd number of times, you end up in the opposite state
from the state with which you started; if you change state an even number
of times, you recover the original state.
Amazingly, we can make a similar statement about the positive roots of
f(x):
II. Let f(x) = xn + an-1
xn-1 + … + a1x + a0. If a0
< 0, then P[f(x)] is odd; if a0
> 0, then P[f(x)] is even.
We can give a simple proof by induction as follows:
We have already observed that this observation holds for polynomials
of degree 1 and of degree 2.
Now suppose that II holds for polynomials of degree less than k.
Consider a polynomial f(x) = xk + ak-1 xk-1 + … + a1x + a0, of degree k.
Suppose first that a0 < 0. Then f(0) = a0 < 0, while, for large enough values of x, the xk
term of the polynomial will dominate the others, so that for some positive
value of x, f(x) > 0. Since every polynomial is a
continuous function, we may apply Bolzano's
Theorem ("If f(x) is a continuous real-valued function
on the closed interval [a,b], with f(a) < 0 < f(b),
then there exists a real number c, between a and b,
such that f(c) = 0.") to conclude that f(x)
has a positive root, say at x = p. Then (x -
p) divides f(x), and we may set f(x) /
(x - p) = g(x), where
g(x) is a monic polynomial of degree k - 1. Indeed, the constant term, say, b0, of g(x),
must be positive (since a0 = -
pb0 < 0). Applying the induction hypothesis, we know
that P[g (x)] is even. But
| |
| P[f(x)] | = P[g(x) (x - p)] |
| | = P[g(x)] + 1, |
|
so P[f(x)] is odd.
On the other hand, if a0 > 0, we must consider two
possibilities. If P[f(x)] = 0, then II
holds, as 0 is an even number. If P[f(x)] > 0, then f(x)
has a positive root, say at x = p. Reasoning as in the previous
case, the constant term of g(x) = f(x) / (x
- p) must be negative, and thus by the induction
hypothesis, P[g (x)] must be odd, and P[f(x)]
must be even.
We may combine I and II in the form
III. Let f(x) = xn +
an-1 xn-1 + … + a1x + a0.
Then V[f(x)] and P[f(x)] are both even or both odd – that is, V[f(x)] and P[f(x)] differ by an integer
multiple of 2.
The final piece of Descartes’ Rule follows from the next observation:
IV. Let f(x) = xn
+ an-1 xn-1 + … + a1x + a0,
and suppose p > 0. Then V[f(x)(x -
p)] > V[f(x)].
For simplicity, suppose for now that none of the coefficients ai is equal to zero – that is, none of the terms of f(x) is "missing". We need only keep track of the sequence of signs of the coefficients,
say, + + – – – +.
In this skeletal notation, we lay out the multiplication of f(x)
by (x - p) in the manner of ordinary
two-digit multiplication:
+ +
– – –
+
+ –
+ +
– – –
+
– –
+ +
+ –
+ ? –
? ? +
–
In the final product polynomial, the leading coefficient, that is, the
coefficient of xn+1, is positive (in fact, it is equal
to 1); the constant term is of opposite sign from a0,
the constant term of f(x). In some instances, the sign of
the remaining terms will depend on the actual values of the number p
and the coefficients ak. The signs of these terms
have been indicated by a "?" in the example. However, wherever there is
a variation in sign in the original polynomial (as from the second to the
third term in the example), the sign of the corresponding term in the product
is unambiguously determined: it is necessarily the same as that of the
term at the head of the column in the diagram, and opposite to the sign
of the preceding unambiguously determined term. Evidently, each variation
in sign of f(x) induces a corresponding variation in sign
in the product.
What of the intervening "?" terms? They do not matter! If there
is only one "?" term between terms of known opposite sign, there will be
one variation in sign, no matter what sign the "?" represents. If there
are more than one "?" terms in succession, there will be at least one,
and possibly more, variations in sign between the known terms. (As pointed
out in I above, the number of variations in sign between
two terms of known opposite sign will necessarily be odd.) Any such
additional variations in sign only further increase the number of variations
of sign in the product polynomial.
The possibility that some of the coefficients of f(x)
are equal to zero adds no important difficulties. In general, we can describe
the sequence of signs of such a polynomial in the form, say, + ³
³ – £
£ £ +,
where the variations in sign are due to the "+"s and "–"s, and the intervening
"³ " and "£
" symbols indicate terms whose coefficients are assumed only to be greater
than or equal, or less than or equal, to zero, respectively, so as not
to create a variation in sign. As before, our multiplication problem may
be depicted schematically:
+ ³
³ –
£ £
£ +
+ –
+ ³
³ –
£ £
£ +
– £
£ +
³ ³
³ –
+ ? ?
– ? ?
? + –
Evidently, each variation in sign from the original polynomial still
propagates to the product as before.
In summary, we have V[f(x)(x -
p)] ³ V[f(x)].
But the constant term of f(x)(x -
p) and the constant term of f(x) are of opposite sign,
so by I, exactly one of V[f(x)(x
- p)] and V[f(x)] is even,
while the other one must be odd. In particular, they cannot be equal. Thus
we conclude V[f(x)(x -
p)] > V[f(x)].
Now return to our original polynomial, f(x) = xn
+ an-1 xn-1 + … + a1x + a0.
We can express f(x) in factored form as
f(x) = N(x)(x -
p1)(x - p2)
… (x - pm),
where N(x) has no positive roots (but may, of course,
nevertheless have variations in sign), and the pi are
the m positive roots of f(x), listed repeatedly, if
necessary, according to their multiplicity. Applying IV
repeatedly, once for each root pi, we find
V. Let f(x) = xn + an-1
xn-1 + … + a1x + a0. Then V[f(x)]
³ P[f(x)].
Thus, either the number of positive roots is equal to the number of
variations in sign, or (by III) is less than the number
of variations in sign by an integer multiple of 2, Q.E.D.
References
- The Geometry of Rene Descartes with a facsimile of the first edition,
translated by D. E. Smith and M.L. Latham, New York, Dover Publications,
1954.
- B. E. Meserve, Fundamental Concepts of Algebra, New York, Dover Publications, 1982.
Copyright © 1996-2009 Alexander Bogomolny
|