Diagonalization

Section 5.5 Diagonalization

Our treatment of eigenvectors in Section 5.4 was motivated in part by the objective of finding particularly simple matrix representations \([T]_B\) of a linear transformation \(T\colon V\rightarrow V\text{.}\) The simplest situation we could hope for is that there is a choice of basis \(B\) for which \([T]_B\) is diagonal. We say that the basis \(B\) diagonalizes the transformation \(T\) in this case, and that \(T\) is diagonalizable. In this section we develop theoretical and computational tools for determining whether a linear transformation \(T\) is diagonalizable, and for finding a diagonalizing basis \(B\) when \(T\) is in fact diagonalizable.

Subsection 5.5.1 Diagonalizable transformations

Definition 5.5.1. Diagonalizable.

Let \(V\) be a finite-dimensional vector space. A linear transformation \(T\colon V\rightarrow V\) is diagonalizable if there exists an ordered basis \(B\) of \(V\) for which \([T]_B\) is a diagonal matrix. In this case, we say the basis \(B\) diagonalizes \(T\text{.}\)

An \(n\times n\) matrix \(A\) is diagonalizable if the matrix transformation \(T_A\colon \R^n\rightarrow \R^n\) is diagonalizable.

As was already laid out in Section 5.4 a matrix representation \([T]_B\) is diagonal if the elements of \(B\) are eigenvectors of \(T\text{.}\) According to Theorem 5.5.2, the converse is also true.

Theorem 5.5.2. Diagonalizabilty: basis of eigenvectors.

Let \(T\colon V\rightarrow V\) be a linear transformation, and let \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\) be an ordered basis of \(V\text{.}\)

The matrix \([T]_B\) is diagonal if and only if \(B\) consists of eigenvectors of \(T\text{.}\)
If \([T]_B\) is diagonal, then the \(j\)-th diagonal entry of \([T]_B\) is the eigenvalue \(\lambda_j\) associated to the eigenvector \(\boldv_j\text{.}\)
The transformation \(T\) is diagonalizable if and only if there is an ordered basis of \(V\) consisting of eigenvectors of \(T\text{.}\)

Proof.

Let \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\) be an ordered basis of \(V\text{.}\) The matrix \([T]_B\) will be diagonal if and only if for each \(1\leq j\leq n\) the \(j\)-th column of \(A\) is of the form

\begin{equation*} (0,\dots, \lambda_j,0,\dots, 0)=\lambda_j\bolde_j \end{equation*}

for some \(\lambda_j\text{.}\) By Definition 5.2.1 the \(j\)-th column of \([T]_{B}\) is the coordinate vector \([T(\boldv_j)]_{B}\text{.}\) Thus \([T]_{B}\) is diagonal if and only if for all \(1\leq j\leq n\) we have \([T(\boldv_j)]_B=\lambda_j\bolde_j\) for some \(\lambda_j\in \R\text{.}\) Next, by definition of \([\phantom{v}]_B\text{,}\) we have

\begin{equation*} [T(\boldv_j]_B=(0,\dots, \lambda_j,0,\dots, 0)\iff T(\boldv_j)=\lambda_j\boldv_j\text{.} \end{equation*}

We conclude that \([T]_B\) is diagonal if and only if \(\boldv_j\) is an eigenvector of \(T\) for all \(1\leq j\leq n\text{.}\) Furthermore, when this is the case, we see that the \(j\)-th diagonal entry of \([T]_B\) is the corresponding eigenvalue \(\lambda_j\text{.}\) This proves statements (1) and (2). Statement (3) follows from (1) and Definition 5.5.1.

The phrase “an ordered basis consisting of eigenvectors of \(T\)” is a bit of a mouthful. The definition below allows us to shorten this to simply “an eigenbasis of \(T\)”.

Definition 5.5.3. Eigenbasis.

Let \(T\colon V\rightarrow V\) be a linear transformation. An ordered basis \(B=(\boldv_1, \boldv_2,\dots, \boldv_n)\) is an eigenbasis of \(T\) if \(\boldv_j\) is an eigenvector of \(T\) for all \(1\leq j\leq n\text{.}\)

Example 5.5.4.

Let \(T=T_A\text{,}\) where

\begin{equation*} A=\frac{1}{5}\begin{amatrix}[rr]-3\amp 4\\ 4\amp 3 \end{amatrix}\text{.} \end{equation*}

We saw in Example 5.4.9 that \(\boldv_1=(1,2)\) and \(\boldv_2=(-1,2)\) are eigenvectors of \(T\) with eigenvalues \(1, -1\text{,}\) respectively. It is clear that the two eigenvectors are linearly independent, and hence that \(B'=(\boldv_1, \boldv_2)\) is an eigenbasis of \(T\text{.}\) It follows from Theorem 5.5.2 that \(T\) is diagonalizable, and that in fact

\begin{equation*} [T]_{B'}=\begin{amatrix}[rr] 1\amp 0\\ 0\amp -1 \end{amatrix}\text{,} \end{equation*}

as one easily verifies.

Example 5.5.5.

Let \(T\colon \R^2\rightarrow \R^2\) be rotation by \(\pi/4\text{:}\) i.e., \(T=T_A\text{,}\) where

\begin{equation*} A=\frac{1}{2}\begin{amatrix}[rr]\sqrt{2}\amp -\sqrt{2}\\ \sqrt{2}\amp \sqrt{2} \end{amatrix}\text{.} \end{equation*}

As discussed in Example 5.4.10, \(T\) has no eigenvectors whatsoever. It follows that there is no eigenbasis of \(T\text{,}\) and hence that \(T\) is not diagonalizable.

Example 5.5.6.

Let \(T=T_A\text{,}\) where

\begin{equation*} A=\begin{amatrix}[rr] 2\amp 1\\ 0\amp 2 \end{amatrix}\text{.} \end{equation*}

As is easily computed, \(\lambda=2\) is the only eigenvalue of \(T\text{,}\) and \(W_2=\Span\{(1,0)\}\text{.}\) It follows that any two eigenvectors \(\boldv_1\) and \(\boldv_2\) lie in the one-dimensional space \(W_2\text{,}\) and hence are scalar multiples of one another. Thus we cannot find two linearly independent eigenvectors of \(T\text{.}\) We conclude that \(T\) does not have an eigenbasis, and hence is not diagonalizable.

Subsection 5.5.2 Linear independence of eigenvectors

Roughly put, Theorem 5.5.2 tells us that \(T\) is diagonalizable if it has “enough” eigenvectors: more precisely, if we can find a large enough collection of linearly independent eigenvectors. So when exactly can we do this? Our first examples were deceptively simple in this regard due to their low-dimensional setting. For transformations of higher-dimensional spaces we need more theory, which we now develop. Theorem 5.5.7 will serve as one of the key results for our purposes. It tells us that eigenvectors chosen from different eigenspaces are linearly independent.

Theorem 5.5.7. Linear independence of eigenvectors.

Let \(T\colon V\rightarrow V\) be a linear transformation, and let \(S=\{\boldv_1,\dots, \boldv_r\}\) be a set of eigenvectors of \(T\) satisfying \(T\boldv_i=\lambda_i\boldv_i\text{.}\) If the eigenvalues \(\lambda_i\) are distinct (i.e., \(\lambda_i\ne \lambda_j\) for \(i\ne j\)), then \(S\) is linearly independent.

Proof.

We prove the result by contradiction. Suppose we can find a finite set of eigenvectors with distinct eigenvalues that is linearly dependent. It follows that we can find such a set of minimum cardinality. In other words, there is positive integer \(r\) satisfying the following properties: (i) we can find a linearly dependent set of \(r\) eigenvectors of \(T\) with distinct eigenvalues; (ii) for all \(k\lt r\text{,}\) any set of \(k\) eigenvectors of \(T\) with distinct eigenvalues is linearly independent¹.

Now assume \(S=\{\boldv_1, \boldv_2, \dots, \boldv_r\}\) is a set of minimal cardinality satisfying \(T(\boldv_i)=\lambda_i\boldv_i\) for all \(1\leq i\leq n\) and \(\lambda_i\ne \lambda j\) for all \(1\leq i\lt j\leq n\text{.}\) First observe that we must have \(r\gt 1\text{:}\) eigenvectors are nonzero by definition, and thus any set consisting of a single eigenvector is linearly independent. Next, since \(S\) is linearly dependent we have

\begin{equation} c_1\boldv_1+c_2\boldv_2+\cdots +c_r\boldv_r=\boldzero\text{,}\tag{5.5.1} \end{equation}

where \(c_i\ne 0\) for some \(1\leq i\leq r\text{.}\) After reordering, we may assume without loss of generality that \(c_1\ne 0\text{.}\) Next we apply \(T\) to both sides of (5.5.1):

\begin{align} c_1\boldv_1+c_2\boldv_2\cdots +c_r\boldv_r=\boldzero \amp\implies T(c_1\boldv_1+c_2\boldv_2+\cdots +c_r\boldv_r)=T(\boldzero) \tag{5.5.2}\\ \amp\implies c_1T(\boldv_1)+c_2T(\boldv_2)+\cdots +c_rT(\boldv_r)=\boldzero \tag{5.5.3}\\ \amp\implies c_1\lambda_1\boldv_1+c_2\lambda_2\boldv_2+\cdots +c_r\lambda_r\boldv_r=\boldzero \text{.}\tag{5.5.4} \end{align}

From equation (5.5.1) and the equation in (5.5.4) we have

\begin{equation*} \lambda_r(c_1\boldv_1+c_2\boldv_2\cdots +c_r\boldv_r)- (c_1\lambda_1\boldv_1+c_2\lambda_2\boldv_2\cdots +c_r\lambda_r\boldv_r)=\boldzero\text{,} \end{equation*}

and hence

\begin{equation} c_1(\lambda_r-\lambda_1)\boldv_1+\cdots +c_{r-1}(\lambda_r-\lambda_{r-1})+\cancel{c_r(\lambda_r-\lambda_r)}\boldv_r=\boldzero\text{.}\tag{5.5.5} \end{equation}

Since \(c_1\ne 0\) and \(\lambda_1\ne \lambda_r\text{,}\) we have \(c_1(\lambda_r-\lambda_1)\ne 0\text{.}\) Thus equation (5.5.5) implies that the set \(S'=\{\boldv_1, \boldv_2, \dots, \boldv_{r-1}\}\) is a linearly dependent set of eigenvectors of \(T\) with distinct eigenvalues, contradicting the minimality of \(r\text{.}\) This completes our proof by contradiction.

Corollary 5.5.8. Diagonalizable if distinct eigenvalues.

Let \(T\colon V\rightarrow V\) be a linear transformation, and suppose \(\dim V=n\text{.}\) If \(T\) has \(n\) distinct eigenvalues, then \(T\) is diagonalizable.

Proof.

Let \(S=\{\boldv_1, \boldv_2,\dots, \boldv_n\}\) be a set eigenvectors of \(T\) with distinct eigenvalues. According to Theorem 5.5.7 the set \(S\) is linearly independent. Since \(\val{S}=n=\dim V\) it follows that \(B=(\boldv_1,\boldv_2,\dots, \boldv_n)\) is an eigenbasis for \(T\) and hence \(T\) is diagonalizable.

Example 5.5.9.

Let \(T=T_A\text{,}\) where

\begin{equation*} A=\begin{amatrix}[rrr] 6 \amp 6 \amp -2 \\ -8 \amp -13 \amp 7 \\ -8 \amp -16 \amp 10 \end{amatrix}\text{.} \end{equation*}

The characteristic polynomial of \(A\) is

\begin{equation*} p(t)=t^{3} - 3 t^{2} - 4 t + 12=(t+2)(t-2)(t-3)\text{.} \end{equation*}

Since \(A\) has three distinct eigenvalues the linear transformation \(T_A\) is diagonalizable. Indeed, any choice of eigenvectors \(\boldv_1, \boldv_2, \boldv_3\) with \(\boldv_1\in W_{-2}, \boldv_2\in W_2, \boldv_3\in W_3\) is guaranteed to be linearly independent, and hence gives rise to an eigenbasis \(B=(\boldv_1, \boldv_2, \boldv_3)\) of \(T_A\text{.}\) For example the usual procedure allows us to easily find eigenvectors

\begin{equation*} \boldv_1=(1,-2,-2), \boldv_2=(1,-1,-1),\boldv_3=(2,-1,0) \end{equation*}

from the three eigenspaces. You can verify for yourself that these three vectors are indeed linearly independent.

Remark 5.5.10.

Let \(T\colon V\rightarrow V\) be a linear transformation, \(\dim V=n\text{.}\) It cannot be stressed enough that having \(n\) distinct eigenvalues is a sufficient, but not necessary condition for \(T\) to be diagonalizable. In other words we have

\begin{equation*} T \text{ has distinct eigenvalues} \implies T \text{ diagonalizable} \end{equation*}

but

\begin{equation*} T \text{ diagonalizable }\;\not\!\!\!\!\!\implies T \text{ has distinct eigenvalues}\text{.} \end{equation*}

A good counterexample to keep in mind is \(T_I\colon \R^n\rightarrow \R^n\text{,}\) where \(I=I_n\) is the \(n\times n\) identity matrix. The transformation is clearly diagonalizable since \([T]_B=I\text{,}\) where \(B=(\bolde_1, \bolde_2,\dots, \bolde_n)\) is the standard basis; and yet \(\lambda=1\) is the only eigenvalue of \(T\text{.}\)

Theorem 5.5.7 makes no assumption about the dimension of \(V\) and can thus can be applied to linear transformations of infinite-dimensional spaces. The differential operator \(T(f)=f'\) provides an interesting example.

Example 5.5.11.

Let \(V=C^\infty(\R)\text{,}\) and let \(T\colon V\rightarrow V\) be defined as \(T(f)=f'\text{.}\) For each \(\lambda\in \R\) let \(f_{\lambda}(x)=e^{\lambda x}\text{.}\) In Example 5.4.12 we saw that the functions \(f_\lambda\) are eigenvectors of \(T\) with eigenvalue \(\lambda\text{:}\) i.e., \(T(f_\lambda)=\lambda f_\lambda\text{.}\) It follows from Corollary 5.5.8 that for any distinct values \(\lambda_1, \lambda_2, \dots, \lambda_r\) the set \(\{e^{\lambda_1x}, e^{\lambda_2x}, \dots, e^{\lambda_rx}\}\) is linearly independent, and thus that the (uncountably) infinite set \(S=\{e^{\lambda x}\colon \lambda\in \R\}\subseteq C^{\infty}(\R)\) is linearly independent.

The next corollary is a useful strengthening of Theorem 5.5.7, and will be used to prove Theorem 5.5.13. Roughly speaking, it says that eigenspaces associated to distinct eigenvalues are “linearly independent”. Be careful: the phrase in quotes currently has no real meaning for us. We know what it means for vectors to be linearly independent, but not subspaces. However, it is a decent shorthand for the precise statement of Corollary 5.5.12.

Corollary 5.5.12.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(\dim V=n\text{.}\) Let \(\lambda_1, \lambda_2, \dots, \lambda_r\) be distinct eigenvalues of \(T\text{,}\) and for each \(1\leq i\leq r\) let \(W_{\lambda_i}\) be the \(\lambda_i\)-eigenspace. If

\begin{equation} \boldw_1+\boldw_2+\cdots +\boldw_r=\boldzero\text{,}\tag{5.5.6} \end{equation}

where \(\boldw_i\in W_{\lambda_i}\text{,}\) then \(\boldw_i=\boldzero\) for all \(i\text{.}\)

Proof.

Before proving the result, we point out one subtlety here: although the \(\boldw_i\in W_{\lambda_i}\) for all \(i\text{,}\) we cannot assume that each \(\boldw_i\) is an eigenvector. Indeed, \(\boldw_i\) is an eigenvector in this case if and only if \(\boldw_i\ne 0\text{.}\) This observation guides the proof that follows.

To pick out the terms of (5.5.6) that are nonzero (if any), we define

\begin{equation*} J=\{j \colon \boldw_j\ne 0\}=\{j_1, j_2,\dots, j_k\}\text{.} \end{equation*}

Assume by contradiction that \(J\) is nonempty: i.e., \(\val{J}=k\geq 1\text{.}\) In this case we would have

\begin{align*} \boldzero \amp= \boldw_1+\boldw_2+\cdots \boldw_r \\ \amp = \boldw_{j_1}+\boldw_{j_2}+\cdots +\boldw_{j_k} \text{,} \end{align*}

since \(\boldw_i=\boldzero\) for all \(i\notin J\text{.}\) But then

\begin{equation*} \boldw_{j_1}+\boldw_{j_2}+\cdots +\boldw_{j_k}=\boldzero \end{equation*}

would be a nontrivial linear combination of the eigenvectors \(\boldw_{j_i}\) equal to \(\boldzero\text{.}\) Since the eigenvectors \(\boldw_{j_i}\) have distinct eigenvalues, this contradicts Theorem 5.5.7. Thus \(J=\{\, \}\text{.}\) Equivalently, \(\boldw_i=\boldzero\) for all \(1\leq i\leq r\text{,}\) as desired.

At last we are ready to state and prove what will be our main tool for determining whether a linear transformation is diagonalizable.

Theorem 5.5.13. Diagonalizability: dimension of eigenspaces.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(\dim V=n\text{.}\) Let \(\lambda_1, \lambda_2, \dots, \lambda_r\) be the distinct eigenvalues of \(T\text{,}\) and for each \(1\leq i\leq r\text{,}\) let \(W_{\lambda_i}\) be the \(\lambda_i\)-eigenspace. We have

\begin{equation*} T \text{ is diagonalizable } \iff \sum_{i=1}^r\dim W_{\lambda_i}=n\text{.} \end{equation*}

Proof.

We prove the two implications separately. In each we use the equivalence

\begin{equation*} T \text{ is diagonalizable} \iff T \text{ has an eigenbasis } B\text{,} \end{equation*}

proved in Theorem 5.5.2.

Proof: \(T\) diagonalizable \(\implies \sum_{i=1}^r\dim W_{\lambda_i}=n\).

Assume \(T\) is diagonalizable. From Theorem 5.5.2, there is an eigenbasis \(B\) of \(T\text{.}\) After reordering we may assume that

\begin{equation*} B=(\underset{W_{\lambda_1}}{\underbrace{\boldv_{\lambda_1,1},\dots, \boldv_{\lambda_1,n_1}}},\underset{W_{\lambda_2}}{\underbrace{\boldv_{\lambda_2,1},\dots, \boldv_{\lambda_2,n_2}}},\dots, \underset{W_{\lambda_r}}{\underbrace{\boldv_{\lambda_r,1},\dots, \boldv_{\lambda_r,n_r}}} )\text{,} \end{equation*}

where for each \(1\leq i\leq r\) and each \(1\leq j\leq n_i\text{,}\) the element \(\boldv_{\lambda_i,j}\) is an eigenvector with eigenvalue \(\lambda_i\text{:}\) i.e., \(\boldv_{\lambda_i,j}\in W_{\lambda_i}\text{.}\) Observer that since \(B\) is a list of \(n\) vectors, we have

\begin{equation*} n=n_1+n_2+\cdots+n_r\text{.} \end{equation*}

We claim that for all \(1\leq i\leq r\) the set \(S_{\lambda_i}=\{\boldv_{\lambda_i, 1}, \dots, \boldv_{\lambda_i, n_i}\}\) is a basis of \(W_{\lambda_i}\text{.}\) The desired result follows in this case since

\begin{align*} \sum_{i=1}^r\dim W_{\lambda_i} \amp=\sum_{i=1}^r\val{S_{\lambda_i}} \\ \amp = \sum_{i=1}^r n_i \\ \amp = n\text{.} \end{align*}

Proceeding then to the claim, observe that each set \(S_{\lambda_i}\) is linearly independent, since the underlying set of \(B\) is linearly independent. Thus it suffices to show that \(\Span S_{\lambda_i}=W_{\lambda_i}\) for all \(1\leq i\leq r\text{.}\) To this end, fix an \(i\) with \(1\leq i\leq n\) and take any \(\boldv\in W_{\lambda_i}\text{.}\) Since \(B\) is a basis we can write

\begin{align*} \boldv \amp= \underset{\boldw_{\lambda_1}}{\underbrace{\sum_{j=1}^{n_1}c_{1,j}\boldv_{\lambda_1, j}}}+\dots +\underset{\boldw_{\lambda_i}}{\underbrace{\sum_{j=1}^{n_i}c_{i,j}\boldv_{\lambda_i, j}}}+\dots \underset{\boldw_{\lambda_r}}{\underbrace{\sum_{j=1}^{n_r}c_{r,j}\boldv_{\lambda_r, j}}} \\ \amp=\boldw_1+\boldw_2+\cdots +\boldw_r \text{,} \end{align*}

where for each \(1\leq k\leq r\) we have

\begin{equation*} \boldw_k=\sum_{i=1}^{n_k}c_{k,j}\boldv_{\lambda_k, j}\in W_{\lambda_k}\text{.} \end{equation*}

Bringing \(\boldv\) to the right-hand side of the equation above yields

\begin{equation*} \boldzero=\boldw_1+\boldw_2+\cdots +(\boldw_i-\boldv)+\cdots +\boldw_r\text{.} \end{equation*}

Recall that \(\boldv\in W_{\lambda_i}\text{,}\) and thus \(\boldw_{i}-\boldv\in W){\lambda_i}\text{.}\) Since \(\boldw_k\in W_{\lambda_k}\) for all \(k\ne i\text{,}\) it follows from Corollary 5.5.12 that

\begin{equation*} \boldw_1=\boldw_2=\dots=(\boldw_i-\boldv)=\dots =\boldw_r=0\text{.} \end{equation*}

Thus

\begin{equation*} \boldv=w_i=\sum_{j=1}^{n_i}c_{i,j}\boldv_{\lambda_i, j}\text{,} \end{equation*}

showing that \(\boldv\in \Span S_{\lambda_i}\text{,}\) as desired.

Proof: \(\sum_{i=1}^r\dim W_{\lambda_i}=n\implies T\) is diagonalizable.

Let \(n_i=\dim W_{\lambda_i}\) for all \(1\leq i\leq r\) . We assume that

\begin{equation*} n=\dim W_{\lambda_1}+ \dim W_{\lambda_2}+\cdots \dim W_{\lambda_r}=n_1+n_2+\cdots +n_r\text{.} \end{equation*}

For each \(1\leq i\leq n\text{,}\) let

\begin{equation*} S_{\lambda_i}=\{\boldv_{\lambda_i, 1}, \boldv_{\lambda_i,2},\dots, \boldv_{\lambda_{i,n_i}}\} \end{equation*}

be a basis of the eigenspace \(W_{\lambda_i}\text{.}\) We claim

is an eigenbasis of \(T\text{.}\) Since \(\boldzero\ne \boldv_{\lambda_i, j}\in W_{\lambda_i}\) for all \(1\leq i\leq r\) and \(1\leq j\leq n_i\text{,}\) we see that \(B\) consists of eigenvectors of \(T\text{.}\) Since

\begin{equation*} n_1+n_2+\cdots n_r=n=\dim V\text{,} \end{equation*}

to show that \(B\) is a basis it suffices to show that it is linearly independent. To this end, assume we have

\begin{align*} \boldzero \amp= \underset{\boldw_{\lambda_1}}{\underbrace{\sum_{j=1}^{n_1}c_{1,j}\boldv_{\lambda_1, j}}} +\underset{\boldw_{\lambda_2}}{\underbrace{\sum_{j=1}^{n_2}c_{2,j}\boldv_{\lambda_2, j}}}+\dots \underset{\boldw_{\lambda_r}}{\underbrace{\sum_{j=1}^{n_r}c_{r,j}\boldv_{\lambda_r, j}}} \\ \amp=\boldw_1+\boldw_2+\cdots +\boldw_r \text{,} \end{align*}

where for each \(1\leq i\leq r\) we have

\begin{equation*} \boldw_i=\sum_{i=1}^{n_i}c_{i,j}\boldv_{\lambda_i, j}\in W_{\lambda_k}\text{.} \end{equation*}

By Corollary 5.5.12 we must have

\begin{equation*} \boldzero=\boldw_i=\sum_{i=1}^{n_i}c_{i,j}\boldv_{\lambda_i, j} \end{equation*}

for all \(i\text{.}\) Finally, since the set

\begin{equation*} S_{\lambda_i}=\{\boldv_{\lambda_i, 1}, \boldv_{\lambda_i,2},\dots, \boldv_{\lambda_{i,n_i}}\} \end{equation*}

is linearly independent for each \(i\text{,}\) we must have \(c_{i,j}=0\) for all \(1\leq i\leq r\) and \(1\leq j\leq n_i\text{.}\) This proves that \(B\) is linearly independent, hence a basis.

We now collect our various results about diagonalizability into one procedure that (a) decides whether a linear transformation \(T\) is diagonalizable, and (b) if it is, computes an eigenbasis for \(T\text{.}\) The procedure applies to any linear transformation of a finite-dimensional vector space, not just matrix transformations. As usual, the first step is to choose a matrix representation \(A=[T]_B\) for \(T\text{.}\)

Procedure 5.5.14. Deciding whether a linear transformation is diagonalizable.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(\dim V=n\text{.}\) To decide whether \(T\) is diagonalizable proceed as follows.

Pick any ordered basis \(B\) of \(V\) and compute \(A=[T]_B\text{.}\) We have \(T\) diagonalizable if and only if \(A\) diagonalizable.
Let \(\lambda_1, \lambda_2, \dots, \lambda_r\) be the distinct eigenvalues of \(A\text{.}\) Compute \(n_i=\dim W_{\lambda_i}\) for each \(1\leq i\leq r\text{.}\) We have

\begin{equation*} A \text{ diagonalizable }\iff \sum_{i=1}^r\dim W_{\lambda_i}=n\text{.} \end{equation*}
Assume \(A\) is diagonalizable according to Step (2). For each \(1\leq i\leq r\) compute a basis \(S_{\lambda_i}=\{\boldv_{\lambda_i, 1}, \boldv_{\lambda_i,2},\dots, \boldv_{\lambda_i, n_i} \}\) of \(W_{\lambda_i}\text{.}\) The ordered list

\begin{equation*} B'=(\boldv_{\lambda_1,1},\dots, \boldv_{\lambda_1,n_1},\boldv_{\lambda_2,1},\dots, \boldv_{\lambda_2,n_2},\dots,\boldv_{\lambda_r,1},\dots, \boldv_{\lambda_r,n_r} )\text{,} \end{equation*}

is an eigenbasis of \(A\text{.}\)
“Lifting” the basis \(B'\) back to \(V\) via the coordinate transformation \([\phantom{\boldv}]_B\) yields an eigenbasis \(B''\) of \(T\text{.}\) The matrix \([T]_{B''}\) is diagonal, of the form

\begin{equation*} [T]_{B''}= \begin{bmatrix} \lambda_1 \amp \amp \amp \amp \amp \amp \\ \amp \ddots \amp \amp \amp \amp \\ \amp \amp \lambda_2 \amp \amp \amp \\ \amp \amp \amp \ddots \amp \amp \\ \amp \amp \amp \amp \lambda_r \amp \\ \amp \amp \amp \amp \amp \ddots \\ \amp \amp \amp \amp \amp \amp \lambda_r \end{bmatrix}\text{.} \end{equation*}

Proof.

For the most part the validity of this procedure is a direct consequence of Theorem 5.5.2 and Theorem 5.5.13. However, there are two details that need to be pointed out.

That \(T\) is diagonalizable if and only if \(A=[T]_B\) is diagonalizable follows from the fact that a basis of the \(\lambda\)-eigenspace of \(A\) to a basis of the \(\lambda\)-eigenspace of \(T\) using the coordinate vector transformation \([\phantom{v}]_B\text{.}\)
That the ordered list \(B'\) described in Step 3 is in fact a basis is shown in the proof of Theorem 5.5.13.

Example 5.5.15.

Let \(T=T_A\text{,}\) where

\begin{equation*} A=\begin{amatrix}[rrr] 2\amp 1\amp 1\\ 0\amp 3\amp 2\\ 0\amp 0\amp 3 \end{amatrix}\text{.} \end{equation*}

Decide whether \(T\) is diagonalizable. If yes, find an eigenbasis of \(T\) and compute the corresponding matrix representing \(T\text{.}\)

Solution.

Note first that \(A=[T]_B\) where \(B\) is the standard basis of \(\R^3\text{.}\) (See Theorem 5.2.3.) Since \(A\) is upper triangular, we easily see that its characteristic polynomial is \(p(t)=(t-1)(t-3)^2\text{.}\) Next we investigate the eigenspaces:

\begin{equation*} W_2=\NS(2I-A)=\NS \begin{amatrix}[rrr]0\amp -1\amp -1\\ 0\amp 1\amp -2\\ 0\amp 0\amp 1 \end{amatrix},coordinatecoordinate W_3=\NS(3I-A)=\NS \begin{amatrix}[rrr]-1\amp -1\amp -1\\ 0\amp 0\amp -2\\ 0\amp 0\amp 0 \end{amatrix}\text{.} \end{equation*}

By inspection we see that both \(2I-A\) and \(3I-A\) have rank 2, and hence nullity \(3-2=1\) by the rank-nullity theorem. Thus both eigenspaces have dimension one, and we have \(\dim W_2+\dim W_3=1+1=2\lt 3\text{.}\) We conclude that \(A\text{,}\) and hence \(T_A\text{,}\) is not diagonalizable.

The diagonalizability examples in this text will focus largely on the special case of matrix transformations \(T_A\colon \R^n\rightarrow \R^n\text{.}\) However, our conscience demands that we give at least one full example of a more abstract linear transformation.

Example 5.5.16. Transposition.

Let \(S\colon M_{22}\rightarrow M_{22}\) be the linear transformation defined as \(S(A)=A^T\text{.}\) Decide whether \(S\) is diagonalizable. If yes, find an eigenbasis for \(S\) and compute the corresponding matrix representing \(S\text{.}\)

Solution.

We saw in Example 5.4.24 that

\begin{equation*} [S]_B=\begin{bmatrix} 1\amp 0\amp 0\amp 0\\ 0\amp 0\amp 1\amp 0\\ 0\amp 1\amp 0\amp 0\\ 0\amp 0\amp 0\amp 1 \end{bmatrix}\text{,} \end{equation*}

where \(B=(E_{11}, E_{12}, E_{21}, E_{22})\) is the standard ordered basis of \(M_{22}\text{.}\) Furthermore, we saw that \(1\) and \(-1\) are the distinct eigenvalues of \(A=[S]_B\text{,}\) and that

\begin{equation*} S_1=\{(1,0,0,0), (0,1,1,0), (0,0,0,1)\}, S_{-1}=\Span\{(0,1,-1,0)\} \end{equation*}

are bases of \(W_1\) and \(W_{-1}\text{,}\) respectively. It follows that \(\dim W_1+\dim W_{-1}=3+1=4\text{,}\) that \(A\) is diagonalizable, and that

\begin{equation*} B'=((1,0,0,0), (0,1,1,0), (0,0,0,1), (0,1,-1,0)) \end{equation*}

is an eigenbasis of \(A\text{.}\) We conclude that \(S\) is diagonalizable, and we lift \(B'\) via \([\phantom{v}]_B\) to the eigenbasis

\begin{equation*} B''=\left\{ \begin{amatrix}[rr]1\amp 0\\ 0\amp 0 \end{amatrix}, \begin{amatrix}[rr]0\amp 1\\ 1\amp 0 \end{amatrix}, \begin{amatrix}[rr]0\amp 0\\ 0\amp 1 \end{amatrix}, \begin{amatrix}[rr]0\amp 1\\ -1\amp 0 \end{amatrix} \right\} \end{equation*}

of \(S\text{.}\) Lastly, we have

\begin{equation*} [S]_{B''}= \begin{amatrix}[rrrr]1\amp 0\amp 0\amp 0\\ 0\amp 1\amp 0\amp 0\\ 0\amp 0\amp 1\amp 0\\ 0\amp 0\amp 0\amp -1 \end{amatrix}\text{.} \end{equation*}

Video example: deciding if diagonalizable.

Figure 5.5.17. Video: deciding if diagonalizable

Subsection 5.5.3 Diagonalizable matrices

In this subsection we will focus on matrix transformations \(T_A\colon \R^n\rightarrow \R^n\text{.}\) Recall (5.2.3) that in this situation we have \(A=[T]_B\) where \(B\) is the standard basis of \(\R^n\text{.}\) As such Procedure 5.5.14 boils down to steps (2)-(3), and the eigenbasis \(B'\) of \(A\) found in (3) is itself an eigenbasis for \(T=T_A\text{.}\) Letting \(D=[T]_{B'}\) the change of basis formula (5.3.20) yields

\begin{equation*} D=P^{-1}AP\text{,} \end{equation*}

where \(P=\underset{B'\rightarrow B}{P}\text{.}\) Lastly, since \(B\) is the standard basis of \(\R^n\text{,}\) the change of basis matrix \(\underset{B'\rightarrow B}{P}\) is obtained by placing the \(j\)-th element of \(B'\) as the \(j\)-th column for all \(1\leq j\leq n\text{.}\) We record these observations as a separate procedure specifically for matrix transformations.

Procedure 5.5.18. Deciding whether a matrix is diagonalizable.

Let \(A\) be an \(n\times n\) matrix, and let \(T=T_A\) be its corresponding matrix transformation. To decide whether \(A\) is diagonalizable, proceed as follows.

Let \(W_1, W_2, \dots, W_r\) be the nonzero eigenspaces of \(A\text{.}\) We have

\begin{equation*} A \text{ diagonalizable}\iff \sum_{i=1}^r\dim W_i=n\text{.} \end{equation*}
Assume \(A\) is diagonalizable and let \(B'=(\boldv_1, \boldv_2, \dots, \boldv_n)\) be an eigenbasis of \(A\) satisfying \(A\boldv_i=\lambda_i\boldv_i\) for all \(1\leq i\leq n\text{.}\) (We do not assume the \(\lambda_i\) are distinct here.) Letting

\begin{equation*} P=\begin{amatrix}[rrrr]\vert\amp \vert\amp \amp \vert \\ \boldv_1\amp \boldv_2\amp \cdots\amp \boldv_n \\ \vert\amp \vert\amp \amp \vert \end{amatrix}, D=\begin{amatrix}[rrrr] \lambda_1\amp 0\amp \dots \amp 0\\ 0\amp \lambda_2\amp \dots \amp 0 \\ \vdots \amp \amp \amp \vdots \\ 0\amp 0 \amp \dots \amp \lambda_n \end{amatrix}\text{,} \end{equation*}

we have

\begin{equation} D=P^{-1}AP\text{.}\tag{5.5.7} \end{equation}

The process of finding \(P\) and \(D\) satisfying (5.5.7) is called diagonalizing the matrix \(A\text{;}\) and we say that the matrix \(P\) diagonalizes \(A\) in this case. (Of course this is possible if and only if \(A\) is diagonalizable.)

Example 5.5.19.

The matrix

\begin{equation*} A=\begin{amatrix}[rrrr]14 \amp 21 \amp 3 \amp -39 \\ 12 \amp 25 \amp 3 \amp -41 \\ 12 \amp 24 \amp 5 \amp -42 \\ 12 \amp 22 \amp 3 \amp -38 \end{amatrix} \end{equation*}

has characteristic polynomial \(p(t)=t^4 - 6t^3 + 9t^2 + 4t - 12\text{.}\) Decide whether \(A\) is diagonalizable. If yes, find an invertible matrix \(P\) and diagonal matrix \(D\) such that \(D=P^{-1}AP\text{.}\)

Solution.

To factor \(p(t)\text{,}\) we first look for integer roots dividing the constant term \(-12\text{:}\) i.e., we test whether any of \(\pm 1, \pm 2, \pm 3, \pm 4, \pm 6, \pm 12\) are roots. Luckily, we see that \(-1\) is a root of \(p(t)\text{.}\) Doing polynomial division of \(p(t)\) by \((t+1)\) yields

\begin{equation*} p(t)=(t+1)\underset{q(t)}{(t^3-7t^2+16t-12)}\text{.} \end{equation*}

Repeating this factoring technique on \(q(t)\text{,}\) we see that \(q(2)=0\text{,}\) and thus can continue to factor:

\begin{align*} p(t)\amp=(t+1)(t^3-7t^2+16t-12)\\ \amp=(t+1)(t-2)(t^2-5t+6) \\ \amp = (t+1)(t-2)^2(t-3)\text{.} \end{align*}

We conclude that the eigenvalues of \(A\) are \(-1\text{,}\) \(2\text{,}\) and \(3\text{.}\) We now compute bases for the corresponding eigenspaces. The bases below were obtained using Procedure 3.8.10. We omit the details of the Gaussian elimination performed in each case. (Check for yourself!)

\begin{align*} W_{-1} \amp =\NS \begin{amatrix}[rrrr] -15\amp -21\amp -3\amp 39\\ -12\amp -26\amp -3\amp 41\\ -12\amp -24\amp -6\amp 42\\ -12\amp -22\amp -3\amp -37 \end{amatrix}=\Span\{(1,1,1,1)\} \\ W_{2} \amp =\NS \begin{amatrix}[rrrr] -12\amp -21\amp -3\amp 39\\ -12\amp -23\amp -3\amp 41\\ -12\amp -24\amp -3\amp 42\\ -12\amp -22\amp -3\amp 40 \end{amatrix}=\Span\{(3,2,0,2),(1,1,2,1)\} \\ W_{-1} \amp =\NS \begin{amatrix}[rrrr] -11\amp -21\amp -3\amp 39\\ -12\amp -22\amp -3\amp 41\\ -12\amp -24\amp -2\amp 42\\ -12\amp -22\amp -3\amp 41 \end{amatrix}=\Span\{(3,5,6,4)\} \text{.} \end{align*}

We have ski Since

\begin{equation*} \dim W_{-1}+\dim W_{2}+\dim W_{3}=1+2+1=4=\dim \R^4\text{,} \end{equation*}

we conclude that \(A\) is diagonalizable. Furthermore, we have \(D=P^{-1}AP\text{,}\) where

\begin{equation*} P=\begin{amatrix}[rrrr] 1\amp 3\amp 1\amp 3\\ 1\amp 2\amp 1\amp 5\\ 1\amp 0\amp 2\amp 6\\ 1\amp 2\amp 1\amp 4 \end{amatrix}, D=\begin{amatrix}[rrrr] -1\amp 0 \amp 0 \amp 0 \\ 0 \amp 2\amp 0\amp 0\\ 0\amp 0\amp 2\amp 0\\ 0\amp 0\amp 0\amp 3 \end{amatrix}\text{.} \end{equation*}

Recall that two square matrices \(A\) and \(A'\) are similar if \(A'=P^{-1}AP\) for some invertible matrix \(P\) (5.3.27). From the foregoing discussion it follows that a matrix \(A\) is diagonalizable if and only if it is similar to a diagonal matrix.

Corollary 5.5.20. Diagonalizabilty and similarity.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if it is similar to a diagonal matrix: i.e., if and only if there is an invertible matrix \(P\) and a diagonal matrix \(D\) such that

\begin{equation*} D=P^{-1}AP\text{.} \end{equation*}

Proof.

According to Theorem 5.3.28 the matrix \(A\) is similar to a diagonal matrix \(D\) if and only if there is a linear transformation \(T\colon \R^n\rightarrow \R^n\) and ordered bases \(B, B'\) of \(\R^n\) such that \([T]_B=A\) and \([T]_{B'}=D\text{.}\) By definition such a \(T\) would be diagonalizable, since \([T]_{B'}=D\) is diagonal. Since \(T\) is diagonalizable if and only if \(A=[T]_B\) is diagonalizable, we conclude that \(A\) is similar to a diagonal matrix \(D\) if and only if \(A\) is diagonalizable.

We know from Theorem 5.3.28 that similar matrices can be thought of as two matrix representations of the same overlying linear transformation \(T\text{.}\) As such similar matrices share many of the same algebraic properties, as Theorem 5.5.21 details.

Theorem 5.5.21. Properties of similarity.

Suppose \(A\) is similar to \(A'\text{:}\) i.e., there is an invertible matrix \(P\) such that \(A'=P^{-1}AP\text{.}\) The following hold:

\(A'\) is similar to \(A\text{:}\) i.e., there is an invertible matrix \(Q\) such that \(A=Q^{-1}AQ\text{.}\)
\(A\) and \(A'\) have the same characteristic polynomial.
\(A\) and \(A'\) have the same eigenvalues.
\(A\) and \(A'\) have the same trace and determinant.
\(A\) and \(A'\) have the same rank.
For any \(\lambda\in \R\) we have \(\dim W_{\lambda}=\dim W_{\lambda}'\text{,}\) where \(W_\lambda, W_\lambda'\) are the \(\lambda\)-eigenspaces of \(A\) and \(A'\text{,}\) respectively.

Proof.

Statement (1) follows by taking \(Q=P^{-1}\text{.}\)

Let \(p_A(t)\) and \(p_{A'}(t)\) be the characteristic polynomials of \(A\) and \(A'\text{,}\) repsectively. We have

\begin{align*} p_{A'}(t)\amp =\det(tI-A')\\ \amp =\det(tI-P^{-1}AP) \amp (A'=P^{-1}AP)\\ \amp = \det(P^{-1}tIP-P^{-1}AP) \amp (\text{algebra}) \\ \amp = \det(P^{-1}(tI-A)P) \amp (\text{left/right dist.}) \\ \amp = \det(P^{-1})\det(tI-A)\det(P) \amp (\knowl{./knowl/th_det_mult.html}{\text{2.5.26}}) \\ \amp = (\det(P))^{-1}\det(P)\det(tI-A) \\ \amp = \det(tI-A)=p_A(t)\text{.} \end{align*}

This proves statement (2).

Statement (3) follows from (2) since the eigenvalues of a matrix are the real roots of its characteristic polynomial. Furthermore, by Theorem 5.4.25 the trace and determinant of a matrix are equal to the sum and product of the roots of its characteristic polynomial. Thus (4) also follows from (2).

The proofs of statements (5)-(6) are left as exercises.

A diagonalizable matrix is similar to a diagonal matrix (5.5.20) and similar matrices share many essential properties (5.3.28, 5.5.21) In this spirit, a good way of thinking about a diagonalizable matrix is that it is “as good as diagonal”.

Mantra 5.5.22. Diagonalizable mantra.

A diagonalizable matrix is as good as diagonal.

In practical terms, if \(A\) is diagonalizable, then we have

\begin{align} D\amp=P^{-1}AP \amp A\amp=PDP^{-1} \tag{5.5.8} \end{align}

where \(D\) is diagonal. This allows us to answer questions about \(A\) by first answering the question for \(D\) and then use the equations in (5.5.8) to translate the results back to \(A\text{.}\) What makes this method effective is that algebraic questions involving diagonal matrices are easy to answer! Before getting to some illustrative examples, we need a few results about the operation \(A\mapsto P^{-1}AP\text{,}\) which is called conjugation by \(P\text{.}\)

Theorem 5.5.23. Properties of conjugation.

Let \(P\) be an invertible \(n\times n\) matrix.

Conjugation is linear.
For all \(A_1,A_2\in M_{nn}\) and \(c_1,c_2\in \R\text{,}\) we have \(P^{-1}(c_1A_1+c_2A_2)P=c_1P^{-1}A_1P+c_2P^{-1}A_2P\text{.}\)
Conjugation commutes with powers.

For all \(A\in M_{nn}\) and integers \(k\geq 0\text{,}\) we have

\begin{equation*} (P^{-1}AP)^k=P^{-1}A^kP\text{.} \end{equation*}

If \(A\) is invertible, this equality holds for all integers \(n\text{.}\)
Conjugation commutes with polynomials.

Given any polynomial \(f(x)=a_rx^r+a_{r-1}x^{r-1}+\cdots +a_1x+a_0\) with real coefficients, we have

\begin{equation*} f(P^{-1}AP)=P^{-1}f(A)P\text{.} \end{equation*}

Proof.

The proof is left as an exercise.

Example 5.5.24. Diagonalizable: matrix powers.

Assume \(D=P^{-1}AP\text{,}\) where \(D\) is diagonal. The normally difficult computation \(A^{k}\) can be accomplished by first computing \(D^{k}\) (easy) and then observing that

\begin{align*} A^k\amp = (PDP^{-1})^k \amp \\ \amp =PD^kP^{-1} \amp (\knowl{./knowl/th_conjugation.html}{\text{Theorem 5.5.23}}, (2)) \text{.} \end{align*}

For example, the matrix

\begin{equation*} A=\begin{amatrix}[rr]1\amp 3\\ 1\amp -1 \end{amatrix} \end{equation*}

is diagonalizable and satisfies \(D=P^{-1}AP\text{,}\) where

\begin{equation*} P=\begin{amatrix}[rr]3\amp 1\\ 1\amp -1 \end{amatrix}, D=\begin{amatrix}[rr]2\amp 0\\ 0\amp -2 \end{amatrix}\text{.} \end{equation*}

It follows that for any \(k\in \Z\) we have

\begin{align*} A^k \amp=PD^kP^{-1} \\ \amp = P\begin{amatrix}[rr]2^{k}\amp 0\\ 0\amp (-2)^{k} \end{amatrix} P^{-1}\\ \amp = \frac{1}{4}\begin{amatrix}[rr]3\cdot2^k+(-2)^k\amp 3\cdot 2^k-3(-2)^{k}\\ 2^{k}-(-2)^k\amp 2^k+3(-2)^{k} \end{amatrix}\text{.} \end{align*}

Example 5.5.25. Diagonalizable: matrix polynomials.

Assume \(D=P^{-1}AP\text{,}\) where \(D\) is a diagonal \(n\times n\) matrix. Let \([D]_{ii}=d_{i}\text{.}\) Given any polynomial \(f(x)=\anpoly\text{,}\) we have

\begin{align*} f(A) \amp= f(PDP^{-1}) \\ \amp =Pf(D)P^{-1} \amp (\knowl{./knowl/th_conjugation.html}{\text{Theorem 5.5.23}},(3))\text{.} \end{align*}

Furthermore, since \(D\) is diagonal, it follows that \(f(D)\) is also diagonal, and in fact its diagonal entries are given by \(f(d_i)\text{.}\) This gives us an easy method of computing arbitrary polynomials of the matrix \(A\text{.}\)

Consider again the matrix \(A\) (and \(P\) and \(D\)) from Example 5.5.24. Let \(f(x)=x^2 -4\text{.}\) Since \(f(2)=f(-2)=0\text{,}\) it follows that \(f(D)=D^2-4I=\boldzero\text{.}\) We conclude that

\begin{equation*} f(A)=A^2-4I=Pf(D)P^{-1}=P\boldzero P^{-1}=\boldzero\text{,} \end{equation*}

as you can check directly.

Example 5.5.26.

A square-root of an \(n\times n\) matrix \(A\) is a matrix \(B\) such that \(B^2=A\text{.}\) If \(A\) and \(A'\) are similar matrices, satisfying \(A'=P^{-1}AP\text{,}\) then \(A\) has a square-root if and only if \(A'\) has a square-root. Indeed, if \(B\) satisfies \(B^2=A\text{,}\) then \(C=P^{-1}BP\) satisfies

\begin{equation*} C^2=(P^{-1}BP)^2=P^{-1}B^2P=P^{-1}AP=A'\text{.} \end{equation*}

Similarly, if \(C\) satisfies \(C^2=A'\text{,}\) then \(B=PCP^{-1}\) satisfies

\begin{equation*} B^2=(PCP^{-1})^2=PC^2P^{-1}=PA'P^{-1}=A\text{.} \end{equation*}

As an example, the matrix

\begin{equation*} A=\begin{amatrix}[rr]0\amp -2\\ 1 \amp 3 \end{amatrix} \end{equation*}

satisfies \(D=P^{-1}AP\text{,}\) where

\begin{equation*} P=\begin{amatrix}[rr]2\amp 1\\ -1\amp -1 \end{amatrix}, D=\begin{amatrix}[rr]1\amp 0\\ 0\amp 2 \end{amatrix}\text{.} \end{equation*}

Since

\begin{equation*} C=\begin{bmatrix}1\amp 0\\ 0\amp \sqrt{2} \end{bmatrix} \end{equation*}

is a square-root of \(D\text{,}\)

\begin{equation*} B=PCP^{-1}=\begin{amatrix}[rr]2-\sqrt{2}\amp 2-2\sqrt{2}\\ -1+\sqrt{2}\amp -1+2\sqrt{2} \end{amatrix} \end{equation*}

is a square-root of \(A\text{.}\)

So when exactly does a diagonal matrix \(D\) have a square-root? Clearly, it is sufficient that the diagonal entries \(d_i\) satisfy \(d_i\geq 0\) for all \(i\text{,}\) as in the example above. Interestingly, this is not a necessary condition! Indeed, consider the following example:

\begin{equation*} \begin{amatrix}[rr]-1\amp 0\\ 0\amp -1 \end{amatrix} =\begin{amatrix}[rr]0\amp -1\\ 1\amp 0 \end{amatrix} ^2\text{.} \end{equation*}

Subsection 5.5.4 Algebraic and geometric multiplicity

We end this section with a deeper look at what the characteristic polynomial reveals about eigenspaces. To begin with, we first define the characteristic polynomial of a general linear transformation \(T\colon V\rightarrow V\text{,}\) where \(V\) is a finite-dimensional vector space.

Definition 5.5.27. Characteristic polynomial of a transformation.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(V\) is finite-dimensional. Let \(B\) be an ordered basis of \(V\text{,}\) and let \(A=[T]_B\text{.}\) We define the characteristic polynomial of \(T\) to be the characteristic polynomial of \(A\text{:}\) i.e., the characteristic polynomial of \(T\) is

\begin{equation*} p(t)=\det(tI-A)\text{.} \end{equation*}

Remark 5.5.28.

For the characteristic polynomial of a linear transformation \(T\colon V\rightarrow V\) to be well-defined, it should not depend on the choice of basis. This is true thanks to Theorem 5.5.21 and Theorem 5.3.20. Indeed, given two choice of ordered bases \(B, B'\) of \(V\text{,}\) the matrices \(A=[T]_B\) and \(A'=[T]_{B'}\) are similar (5.3.20), and thus their characteristic polynomials are equal (5.5.21,(2)).

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(V\) is finite-dimensional. If \(\lambda\in \R\) is an eigenvalue of \(T\text{,}\) then we can factor the chacteristic polynomial \(p(t)\) of \(T\) as \(p(t)=(t-\lambda)^mq(t)\text{,}\) where \(\lambda\) is not a root of \(q(t)\text{.}\) As we will see, the exponent \(m\) is an upper bound for the dimension of \(W_\lambda\text{.}\) We call \(m\) the algebraic multiplicity of the eigenvalue \(\lambda\text{.}\)

Definition 5.5.29. Algebraic/geometric multiplicity.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(V\) is finite-dimensional, and let \(p(t)\) be the characteristic polynomial of \(T\text{.}\) Given an eigenvalue \(\lambda\in \R\) of \(T\text{,}\) we can factor \(p(t)\) as \(p(t)=(t-\lambda)^mq(t) \text{,}\) where \(\lambda\) is not a root of the polynomial \(q(t)\text{:}\) i.e., \(q(\lambda)\ne 0\text{.}\) We call \(m\) the geometric multiplicity of the eigenvalue \(\lambda\text{,}\) and we call \(\dim W_\lambda\) its geometric multiplicity. If \(m\gt 1\text{,}\) we say \(\lambda\) is a repeated eigenvalue of \(T\text{.}\)

Theorem 5.5.30. Algebraic and geometric multiplicity.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(dim V=n\text{,}\) let \(p(t)\) be the characteristic polynomial of \(T\text{,}\) and suppose \(\lambda\in\R\) is an eigenvalue of \(T\) of algebraic multiplicity \(m\geq 1\text{:}\) i.e., \(p(t)=(t-\lambda)^mq(t)\) and \(q(\lambda)\ne 0\text{.}\) We have

\begin{equation*} 1\leq \dim W_\lambda\leq m\text{.} \end{equation*}

In other words, the geometric multiplicity of an eigenvalue is bounded above by its algebraic multiplicity.

Proof.

Since \(\lambda\) is an eigenvalue, we have \(W_\lambda\ne \{\boldzero\}\text{,}\) and thus \(\dim W_\lambda\geq 1\text{.}\) Assume by contradiction that \(\dim W_{\lambda}\gt m\text{.}\) Let \(m'=\dim W_{\lambda}\text{,}\) and let \(S_{\lambda}=\{\boldv_1,\boldv_2,\dots, \boldv_{m'}\}\) be a basis for \(W_{\lambda}\text{.}\) We can extend \(S_{\lambda}\) to an ordered basis

\begin{equation*} B=(\boldv_1, \dots, \boldv_{m'}, \boldv_{m'+1},\dots, \boldv_n) \end{equation*}

of \(V\text{.}\) By definition, the characteristic polynomial of \(T\) is given my \(p(t)=\det(tI-A)\text{,}\) where \(A=[T]_B\)w. Since \(\boldv_1,\boldv_2,\dots, \boldv_{m'}\) are \(\lambda\)-eigenvectors of \(T\text{,}\) the matrix \(A=[T]_B\) is of the form

An easy proof by induction on \(m'\) shows that for such a matrix \(A\) we have \(p(t)=\det(tI-A)=(t-\lambda)^{m'}r(t)\) for some polynomial \(r(t)\text{.}\) On the other hand, since \(\lambda\) has algebraic multiplicity \(m\) we have \(p(t)=(t-\lambda)^mq(t)\) for some polynomial \(q(t)\) with \(q(\lambda)=0\text{.}\) Setting these two expressions equal to one another we see that

\begin{equation*} (t-\lambda)^{m'}r(t)=(t-\lambda)^mq(t)\text{,} \end{equation*}

or equivalently,

\begin{equation*} (t-\lambda)^{m'-m}r(t)=q(t)\text{.} \end{equation*}

Since \(m'\gt m\) it follows that \(q(\lambda)=(\lambda-\lambda)^{m'-m}r(\lambda)=0\text{.}\) Contradiction! We conclude that \(\dim W_{\lambda}\leq m\text{,}\) as desired.

Corollary 5.5.31.

Let \(T\colon V\rightarrow V\) be a linear transformation, where \(\dim V=n\text{,}\) and suppose the characteristic polynomial of \(T\) factors over \(\mathbb{C}\) as

\begin{equation} p(t)=(t-\lambda_1)^{m_1}(t-\lambda_2)^{m_2}\cdots (t-\lambda_r)^{m_r}\text{,}\tag{5.5.9} \end{equation}

where \(\lambda_i\ne \lambda_j\) for all \(1\leq i\lt j\leq n\text{.}\) The following are equivalent:

\(T\) is diagonalizable.
For all \(1\leq i\leq n\) we have \(\lambda_i\in \R\) and \(\dim W_{\lambda_i}=m_i\text{.}\)

In other words, \(T\) is diagonalizable if and only if all roots of \(p(t)\) are real, and the geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity.

Proof.

Implication: \((2)\implies (1)\).

If (2) is true, then each \(\lambda_i\) is an eigenvalue of \(T\) and we have

\begin{equation*} \sum_{i=1}^r\dim W_{\lambda_i}=\sum_{i=1}^r m_i=n\text{,} \end{equation*}

by counting degrees in (5.5.9). It follows from Theorem 5.5.2 that \(T\) is diagonalizable.

Implication: \((1)\implies (2)\).

If \(T\) is diagonalizable, then there is an ordered basis \(B\) of \(V\) for which \(D=[T]_B\) is diagonal. Letting \(d_i\) be the \(i\)-th diagonal element of \(D\text{,}\) we have

\begin{equation*} p(t)=\det(tI-D)=(t-d_1)(t-d_2)\dots (t-d_n)\text{.} \end{equation*}

This expression tells us that \(d_1, d_2, \dots, d_n\) are the roots of \(p(t)\text{,}\) and hence that all roots are real since since \(d_i\in \R\) for all \(1\leq i\leq n\text{.}\) On the other hand each \(\lambda_i\) is a root of \(p(t)\text{,}\) and thus \(\lambda_i\in \R\) for all \(1\leq i\leq r\text{.}\) It follows that \(\lambda_1, \lambda_2, \dots, \lambda_r\) are the distinct eigenvalues of \(T\text{.}\) By Theorem 5.5.13, since \(T\) is diagonalizable we must have

\begin{equation} \sum_{i=1}^r\dim W_{\lambda_i}=n\text{.}\tag{5.5.10} \end{equation}

Since \(\dim W_{\lambda_i}\leq m_i\) for all \(1\leq i\leq n\) (5.5.30), and since \(\sum_{i=1}^rm_i=n\) (counting degrees in (5.5.9)), for the equality (5.5.10) to hold we must have \(\dim W_{\lambda_i}=m_i\) for all \(1\leq i\leq r\text{,}\) as desired.

From Theorem 5.5.30 and Corollary 5.5.31 we can deduce a much finer picture of the eigenspaces of a linear transformation from its factored characteristic polynomial. This often reduces our workload when treating questions of diagonalizability, as the next examples illustrate.

Example 5.5.32.

The matrix

\begin{equation*} A=\begin{amatrix}[rrrr] 2 \amp -1 \amp 1 \amp 0 \\ -4 \amp 2 \amp 0 \amp -4 \\ -4 \amp 1 \amp 1 \amp -4 \\ -4 \amp 1 \amp -1 \amp -2 \end{amatrix} \end{equation*}

has characteristic polynomial \(p(t)=(t-1)(t+2)(t-2)^2\text{.}\) Decide whether \(A\) is diagonalizable.

Solution.

The eigenvalues of \(A\) are \(1,-2, 2\text{.}\) Since the eigenvalues \(1\) and \(-2\) both have algebraic multiplicity \(1\text{,}\) we have by Theorem 5.5.30

\begin{equation*} 1\leq \dim W_1, \dim W_{-2}\leq 1\text{,} \end{equation*}

and hence

\begin{equation*} \dim W_1=\dim W_{-2}=1\text{.} \end{equation*}

It follows that \(A\) is diagonalizable if and only if \(\dim W_{2}=2\text{.}\) We have \(W_{2}=\NS(2I-A)\text{,}\) where

\begin{equation*} 2I-A= \begin{amatrix}[rrrr] 0 \amp 1 \amp -1 \amp 0 \\ 4 \amp 0 \amp 0 \amp -4 \\ 4 \amp -1 \amp 1 \amp -4 \\ 4 \amp -1 \amp 1 \amp -4 \end{amatrix}\text{.} \end{equation*}

This matrix clearly has rank 2 (the first two columns form a basis for its column space), and hence nullity \(4-2=2\text{.}\) We conclude that \(A\) is diagonalizable.

Example 5.5.33.

The matrix

\begin{equation*} A=\begin{amatrix}[rrrr] 1 \amp 0 \amp -3 \amp 1 \\ 0 \amp 1 \amp 1 \amp 1 \\ 0 \amp 0 \amp -2 \amp 1 \\ 0 \amp 0 \amp -1 \amp 0 \end{amatrix} \end{equation*}

has characterisic polynomial \(p(t)=(t-1)^2(t+1)^2\text{.}\) Decide whether \(A\) is diagonalizable.

Solution.

The eigenvalues of \(A\) are \(1\) and \(-1\text{,}\) and each has algebraic multiplicity \(2\text{.}\) Thus \(1\leq\dim W_1, \dim W_{-1}\leq 2\text{,}\) and \(A\) is diagonalizable if and only if

\begin{equation*} \dim W_1=\dim W_{-1}=2\text{.} \end{equation*}

By inspection we see that \((1,0,0,0)\) and \((0,1,0,0)\) are \(1\)-eigenvectors, and thus we must have \(\dim W_1=2\text{.}\) Next we have \(W_{-1}=\NS(-I-A)\) where

\begin{equation*} -I-A=\begin{amatrix}[rrrr] -2 \amp 0 \amp 3 \amp -1 \\ 0 \amp -2 \amp -1 \amp -1 \\ 0 \amp 0 \amp 1 \amp -1 \\ 0 \amp 0 \amp 1 \amp -1 \end{amatrix}\text{.} \end{equation*}

It is not difficult to see (either using Gaussian elimination or inspection) that this matrix has rank 3, and hence nullity 1. We conclude that \(\dim W_{-1}=1\lt 2\text{,}\) and hence \(A\) is not diagonalizable.

Exercises 5.5.5 Exercises

Exercise Group.

For each matrix \(A\) use Procedure 5.5.14 to determine whether it is diagonalizable. If yes, then produce an invertible matrix \(P\) and diagonal matrix \(D\) satisfying \(D=P^{-1}AP\text{.}\) For the last matrix the characteristic polynomial \(p(t)\) is provided for convenience.

1.

\(A=\begin{amatrix}[rrr] 3\amp 0\amp 0 \\ 0\amp 2\amp 0\\ 0\amp 1\amp 2 \end{amatrix}\)

2.

\(A=\begin{amatrix}[rrr] -1\amp 4\amp -2\\ -3\amp 4\amp 0\\ -3\amp 1\amp 3 \end{amatrix}\)

3.

\(A=\begin{amatrix}[rrr] 0\amp 0\amp 0\\ 0\amp 0\amp 0 \\ 3\amp 0\amp 1 \end{amatrix}\)

4.

\(A=\begin{amatrix}[rrr] 5\amp 0\amp 0\\ 1\amp 5\amp 0 \\ 0\amp 1\amp 5 \end{amatrix}\)

5.

\(A=\begin{amatrix}[rrr] 19\amp -9\amp -6\\ 25\amp -11\amp -9\\ 17\amp -9\amp -4 \end{amatrix} \text{;}\) \(p(t)=t^3-4t^2+5t-2\)

6.

Let \(A=\begin{bmatrix} a \amp b \\ c \amp d\end{bmatrix} \text{.}\) Show that \(A\) is diagonalizable if and only if either \((a-d)^2+4bc\gt 0\) or \(A=aI\) (i.e., \(a=d\) and \(b=c=0\)).

7.

Prove Theorem 5.5.21.

Hint.

Show that for any \(c\in \R\) we have \(cI-P^{-1}AP=P^{-1}(cI-A)P\text{.}\)

Exercise Group.

For each exercise construct a \(3\times 3\) matrix \(A\) satisfying the given conditions. Begin by showing that the given \(A\) must be diagonalizable.

8.

\(A\) has eigenspaces \(W_2=\Span\{(1,0,1),(1,1,1)\) and \(W_{-1}=\Span\{(1,0,-1)\}\text{.}\)

9.

\(A\boldw=\boldw\) for all \(\boldw\in W=\{(x,y,z)\colon x+y+z=0\}\text{,}\) \(A\boldx=\boldzero\) for \(\boldx=(1,1,1)\text{.}\)

10.

Assume \(A\) is a \(3\times 3\) matrix with eigenvalues \(0\text{,}\) \(1\text{,}\) and \(-1\text{.}\)

Show that \(A\) is diagonalizable. Provide an explicit diagonal matrix \(D\) that \(A\) is similar to.
Prove that \(A^n=A\) for all odd integers \(n\geq 1\text{.}\)

11.

Prove statement (5) of Theorem 5.4.25.

Hint.

Use Theorem 5.3.28 and Theorem 5.2.9.

12.

Prove statement (6) of Theorem 5.4.25.

Hint.

Use Theorem 5.3.28 and Procedure 5.4.23.

13.

According to Theorem 5.5.21 if \(A\) and \(B\) are similar, then they have the same rank. Show that the converse is false by showing that the matrices

\begin{equation*} A=\begin{amatrix}[rr] 1\amp 0\\ 0\amp 0 \end{amatrix}, B=\begin{amatrix}[rr] 0\amp 1\\ 0\amp 0 \end{amatrix} \end{equation*}

have the same rank, but are not similar.

14.

According to Theorem 5.5.21 if \(A\) and \(B\) are similar, then they have the same characteristic polynomial. Show that the converse is false by showing that the matrices

\begin{equation*} A=\begin{amatrix}[rr] 1\amp 1\\ 0\amp 1 \end{amatrix}, B=\begin{amatrix}[rr] 1\amp 0\\ 0\amp 1 \end{amatrix} \end{equation*}

have the same characteristic polynomial, but are not similar.

15.

Prove all statements of Theorem 5.5.23.

16.

(a)

\(A\in M_{33}\text{,}\) \(p(t)=\det(tI-A)=t^3-t^2\text{,}\) \(\nullity A=2\)

(b)

\(A\in M_{33}\text{,}\) \(p(t)=\det(tI-A)=t^3+t^2-t\)

(c)

\(A\in M_{22}\text{,}\) \(\tr A=4\text{,}\) \(\det A=3\)

Hint.

See Remark 5.4.26.

17.

Each matrix \(A\) below has characteristic polynomial \(p(t)=t^3-3t+2\text{.}\) Use Procedure 5.5.14 to decide whether \(A\) is diagonalizable. If yes, provide an inverible \(P\) and diagonal \(D\) satisfying \(D=P^{-1}AP\text{.}\)

\(\displaystyle A=\begin{amatrix}[rrr] -5\amp 0\amp 3\\ -6\amp 1\amp 3\\ -6\amp 0\amp 4 \end{amatrix}\)
\(\displaystyle A=\begin{amatrix}[rrr] -2\amp -3\amp -3\\ -3\amp -3\amp 4\\ -3\amp -4\amp 5 \end{amatrix}\)

18.

Let

\begin{equation*} A=\begin{amatrix}[rrr] -5\amp 0\amp 3\\ -6\amp 1\amp 3\\ -6\amp 0\amp 4 \end{amatrix}\text{.} \end{equation*}

Use your work from Exercise 5.5.5.17 to find a matrix \(C\) satisfying \(C^3=A\text{.}\)

That we can find a minimal \(r\) in this sense is plausible enough, but we are secretly using the well-ordering principle of the integers here.