Our treatment of eigenvectors in Section 5.4 was motivated in part by the objective of finding particularly simple matrix representations of a linear transformation . The simplest situation we could hope for is that there is a choice of basis for which is diagonal. We say that the basis diagonalizes the transformation in this case, and that is diagonalizable. In this section we develop theoretical and computational tools for determining whether a linear transformation is diagonalizable, and for finding a diagonalizing basis when is in fact diagonalizable.
Let be a finite-dimensional vector space. A linear transformation is diagonalizable if there exists an ordered basis of for which is a diagonal matrix. In this case, we say the basis diagonalizes .
As was already laid out in Section 5.4 a matrix representation is diagonal if the elements of are eigenvectors of . According to Theorem 5.5.2, the converse is also true.
Let be an ordered basis of . The matrix will be diagonal if and only if for each the -th column of is of the form
for some . By Definition 5.2.1 the -th column of is the coordinate vector . Thus is diagonal if and only if for all we have for some . Next, by definition of , we have
.
We conclude that is diagonal if and only if is an eigenvector of for all . Furthermore, when this is the case, we see that the -th diagonal entry of is the corresponding eigenvalue . This proves statements (1) and (2). Statement (3) follows from (1) and Definition 5.5.1.
The phrase “an ordered basis consisting of eigenvectors of ” is a bit of a mouthful. The definition below allows us to shorten this to simply “an eigenbasis of ”.
We saw in Example 5.4.9 that and are eigenvectors of with eigenvalues , respectively. It is clear that the two eigenvectors are linearly independent, and hence that is an eigenbasis of . It follows from Theorem 5.5.2 that is diagonalizable, and that in fact
As is easily computed, is the only eigenvalue of , and . It follows that any two eigenvectors and lie in the one-dimensional space , and hence are scalar multiples of one another. Thus we cannot find two linearly independent eigenvectors of . We conclude that does not have an eigenbasis, and hence is not diagonalizable.
Roughly put, Theorem 5.5.2 tells us that is diagonalizable if it has “enough” eigenvectors: more precisely, if we can find a large enough collection of linearly independent eigenvectors. So when exactly can we do this? Our first examples were deceptively simple in this regard due to their low-dimensional setting. For transformations of higher-dimensional spaces we need more theory, which we now develop. Theorem 5.5.7 will serve as one of the key results for our purposes. It tells us that eigenvectors chosen from different eigenspaces are linearly independent.
Let be a linear transformation, and let be a set of eigenvectors of satisfying . If the eigenvalues are distinct (i.e., for ), then is linearly independent.
We prove the result by contradiction. Suppose we can find a finite set of eigenvectors with distinct eigenvalues that is linearly dependent. It follows that we can find such a set of minimum cardinality. In other words, there is positive integer satisfying the following properties: (i) we can find a linearly dependent set of eigenvectors of with distinct eigenvalues; (ii) for all , any set of eigenvectors of with distinct eigenvalues is linearly independent 1 .
Now assume is a set of minimal cardinality satisfying for all and for all . First observe that we must have : eigenvectors are nonzero by definition, and thus any set consisting of a single eigenvector is linearly independent. Next, since is linearly dependent we have
,(5.5.1)
where for some . After reordering, we may assume without loss of generality that . Next we apply to both sides of (5.5.1):
.(5.5.2)(5.5.3)(5.5.4)
From equation (5.5.1) and the equation in (5.5.4) we have
,
and hence
.(5.5.5)
Since and , we have . Thus equation (5.5.5) implies that the set is a linearly dependent set of eigenvectors of with distinct eigenvalues, contradicting the minimality of . This completes our proof by contradiction.
Let be a set eigenvectors of with distinct eigenvalues. According to Theorem 5.5.7 the set is linearly independent. Since it follows that is an eigenbasis for and hence is diagonalizable.
Since has three distinct eigenvalues the linear transformation is diagonalizable. Indeed, any choice of eigenvectors with is guaranteed to be linearly independent, and hence gives rise to an eigenbasis of . For example the usual procedure allows us to easily find eigenvectors
Let be a linear transformation, . It cannot be stressed enough that having distinct eigenvalues is a sufficient, but not necessary condition for to be diagonalizable. In other words we have
A good counterexample to keep in mind is , where is the identity matrix. The transformation is clearly diagonalizable since , where is the standard basis; and yet is the only eigenvalue of .
Theorem 5.5.7 makes no assumption about the dimension of and can thus can be applied to linear transformations of infinite-dimensional spaces. The differential operator provides an interesting example.
Let , and let be defined as . For each let . In Example 5.4.12 we saw that the functions are eigenvectors of with eigenvalue : i.e., . It follows from Corollary 5.5.8 that for any distinct values the set is linearly independent, and thus that the (uncountably) infinite set is linearly independent.
The next corollary is a useful strengthening of Theorem 5.5.7, and will be used to prove Theorem 5.5.13. Roughly speaking, it says that eigenspaces associated to distinct eigenvalues are “linearly independent”. Be careful: the phrase in quotes currently has no real meaning for us. We know what it means for vectors to be linearly independent, but not subspaces. However, it is a decent shorthand for the precise statement of Corollary 5.5.12.
Before proving the result, we point out one subtlety here: although the for all , we cannot assume that each is an eigenvector. Indeed, is an eigenvector in this case if and only if . This observation guides the proof that follows.
To pick out the terms of (5.5.6) that are nonzero (if any), we define
.
Assume by contradiction that is nonempty: i.e., . In this case we would have
,
since for all . But then
would be a nontrivial linear combination of the eigenvectors equal to . Since the eigenvectors have distinct eigenvalues, this contradicts Theorem 5.5.7. Thus . Equivalently, for all , as desired.
Assume is diagonalizable. From Theorem 5.5.2, there is an eigenbasis of . After reordering we may assume that
,
where for each and each , the element is an eigenvector with eigenvalue : i.e., . Observer that since is a list of vectors, we have
.
We claim that for all the set is a basis of . The desired result follows in this case since
.
Proceeding then to the claim, observe that each set is linearly independent, since the underlying set of is linearly independent. Thus it suffices to show that for all . To this end, fix an with and take any . Since is a basis we can write
,
where for each we have
.
Bringing to the right-hand side of the equation above yields
.
Recall that , and thus . Since for all , it follows from Corollary 5.5.12 that
We now collect our various results about diagonalizability into one procedure that (a) decides whether a linear transformation is diagonalizable, and (b) if it is, computes an eigenbasis for . The procedure applies to any linear transformation of a finite-dimensional vector space, not just matrix transformations. As usual, the first step is to choose a matrix representation for .
For the most part the validity of this procedure is a direct consequence of Theorem 5.5.2 and Theorem 5.5.13. However, there are two details that need to be pointed out.
That is diagonalizable if and only if is diagonalizable follows from the fact that a basis of the -eigenspace of to a basis of the -eigenspace of using the coordinate vector transformation .
That the ordered list described in Step 3 is in fact a basis is shown in the proof of Theorem 5.5.13.
Note first that where is the standard basis of . (See Theorem 5.2.3.) Since is upper triangular, we easily see that its characteristic polynomial is . Next we investigate the eigenspaces:
.
By inspection we see that both and have rank 2, and hence nullity by the rank-nullity theorem. Thus both eigenspaces have dimension one, and we have . We conclude that , and hence , is not diagonalizable.
The diagonalizability examples in this text will focus largely on the special case of matrix transformations . However, our conscience demands that we give at least one full example of a more abstract linear transformation.
Let be the linear transformation defined as . Decide whether is diagonalizable. If yes, find an eigenbasis for and compute the corresponding matrix representing .
In this subsection we will focus on matrix transformations . Recall (5.2.3) that in this situation we have where is the standard basis of . As such Procedure 5.5.14 boils down to steps (2)-(3), and the eigenbasis of found in (3) is itself an eigenbasis for . Letting the change of basis formula (5.3.20) yields
where . Lastly, since is the standard basis of , the change of basis matrix is obtained by placing the -th element of as the -th column for all . We record these observations as a separate procedure specifically for matrix transformations.
The process of finding and satisfying (5.5.7) is called diagonalizing the matrix ; and we say that the matrix diagonalizes in this case. (Of course this is possible if and only if is diagonalizable.)
To factor , we first look for integer roots dividing the constant term : i.e., we test whether any of are roots. Luckily, we see that is a root of . Doing polynomial division of by yields
.
Repeating this factoring technique on , we see that , and thus can continue to factor:
.
We conclude that the eigenvalues of are ,, and . We now compute bases for the corresponding eigenspaces. The bases below were obtained using Procedure 3.8.10. We omit the details of the Gaussian elimination performed in each case. (Check for yourself!)
.
We have ski Since
,
we conclude that is diagonalizable. Furthermore, we have , where
Recall that two square matrices and are similar if for some invertible matrix (5.3.27). From the foregoing discussion it follows that a matrix is diagonalizable if and only if it is similar to a diagonal matrix.
An matrix is diagonalizable if and only if it is similar to a diagonal matrix: i.e., if and only if there is an invertible matrix and a diagonal matrix such that
According to Theorem 5.3.28 the matrix is similar to a diagonal matrix if and only if there is a linear transformation and ordered bases of such that and . By definition such a would be diagonalizable, since is diagonal. Since is diagonalizable if and only if is diagonalizable, we conclude that is similar to a diagonal matrix if and only if is diagonalizable.
We know from Theorem 5.3.28 that similar matrices can be thought of as two matrix representations of the same overlying linear transformation . As such similar matrices share many of the same algebraic properties, as Theorem 5.5.21 details.
Let and be the characteristic polynomials of and , repsectively. We have
algebraleft/right dist.2.5.26.
This proves statement (2).
Statement (3) follows from (2) since the eigenvalues of a matrix are the real roots of its characteristic polynomial. Furthermore, by Theorem 5.4.25 the trace and determinant of a matrix are equal to the sum and product of the roots of its characteristic polynomial. Thus (4) also follows from (2).
The proofs of statements (5)-(6) are left as exercises.
A diagonalizable matrix is similar to a diagonal matrix (5.5.20) and similar matrices share many essential properties (5.3.28, 5.5.21) In this spirit, a good way of thinking about a diagonalizable matrix is that it is “as good as diagonal”.
where is diagonal. This allows us to answer questions about by first answering the question for and then use the equations in (5.5.8) to translate the results back to . What makes this method effective is that algebraic questions involving diagonal matrices are easy to answer! Before getting to some illustrative examples, we need a few results about the operation , which is called conjugation by .
Furthermore, since is diagonal, it follows that is also diagonal, and in fact its diagonal entries are given by . This gives us an easy method of computing arbitrary polynomials of the matrix .
A square-root of an matrix is a matrix such that . If and are similar matrices, satisfying , then has a square-root if and only if has a square-root. Indeed, if satisfies , then satisfies
So when exactly does a diagonal matrix have a square-root? Clearly, it is sufficient that the diagonal entries satisfy for all , as in the example above. Interestingly, this is not a necessary condition! Indeed, consider the following example:
We end this section with a deeper look at what the characteristic polynomial reveals about eigenspaces. To begin with, we first define the characteristic polynomial of a general linear transformation , where is a finite-dimensional vector space.
Let be a linear transformation, where is finite-dimensional. Let be an ordered basis of , and let . We define the characteristic polynomial of to be the characteristic polynomial of : i.e., the characteristic polynomial of is
For the characteristic polynomial of a linear transformation to be well-defined, it should not depend on the choice of basis. This is true thanks to Theorem 5.5.21 and Theorem 5.3.20. Indeed, given two choice of ordered bases of , the matrices and are similar (5.3.20), and thus their characteristic polynomials are equal (5.5.21,(2)).
Let be a linear transformation, where is finite-dimensional. If is an eigenvalue of , then we can factor the chacteristic polynomial of as , where is not a root of . As we will see, the exponent is an upper bound for the dimension of . We call the algebraic multiplicity of the eigenvalue .
Let be a linear transformation, where is finite-dimensional, and let be the characteristic polynomial of . Given an eigenvalue of , we can factor as , where is not a root of the polynomial : i.e., . We call the geometric multiplicity of the eigenvalue , and we call its geometric multiplicity. If , we say is a repeated eigenvalue of .
Let be a linear transformation, where , let be the characteristic polynomial of , and suppose is an eigenvalue of of algebraic multiplicity : i.e., and . We have
Since is an eigenvalue, we have , and thus . Assume by contradiction that . Let , and let be a basis for . We can extend to an ordered basis
of . By definition, the characteristic polynomial of is given my , where w. Since are -eigenvectors of , the matrix is of the form
An easy proof by induction on shows that for such a matrix we have for some polynomial . On the other hand, since has algebraic multiplicity we have for some polynomial with . Setting these two expressions equal to one another we see that
,
or equivalently,
.
Since it follows that . Contradiction! We conclude that , as desired.
In other words, is diagonalizable if and only if all roots of are real, and the geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity.
If (2) is true, then each is an eigenvalue of and we have
,
by counting degrees in (5.5.9). It follows from Theorem 5.5.2 that is diagonalizable.
Implication: .
If is diagonalizable, then there is an ordered basis of for which is diagonal. Letting be the -th diagonal element of , we have
.
This expression tells us that are the roots of , and hence that all roots are real since since for all . On the other hand each is a root of , and thus for all . It follows that are the distinct eigenvalues of . By Theorem 5.5.13, since is diagonalizable we must have
.(5.5.10)
Since for all (5.5.30), and since (counting degrees in (5.5.9)), for the equality (5.5.10) to hold we must have for all , as desired.
From Theorem 5.5.30 and Corollary 5.5.31 we can deduce a much finer picture of the eigenspaces of a linear transformation from its factored characteristic polynomial. This often reduces our workload when treating questions of diagonalizability, as the next examples illustrate.
The eigenvalues of are and , and each has algebraic multiplicity . Thus , and is diagonalizable if and only if
.
By inspection we see that and are -eigenvectors, and thus we must have . Next we have where
.
It is not difficult to see (either using Gaussian elimination or inspection) that this matrix has rank 3, and hence nullity 1. We conclude that , and hence is not diagonalizable.
For each matrix use Procedure 5.5.14 to determine whether it is diagonalizable. If yes, then produce an invertible matrix and diagonal matrix satisfying . For the last matrix the characteristic polynomial is provided for convenience.
According to Theorem 5.5.21 if and are similar, then they have the same characteristic polynomial. Show that the converse is false by showing that the matrices
Each matrix below has characteristic polynomial . Use Procedure 5.5.14 to decide whether is diagonalizable. If yes, provide an inverible and diagonal satisfying .