The spectral theorem

Section 5.6 The spectral theorem

Among the many takeaways from Section 5.5 is the simple fact that not all matrices are diagonalizable. In principle Theorem 5.5.13 gives a complete answer to the question of diagonalizability in terms of eigenspaces. However, you should not be mislead by the artificially simple examples treated in Section 5.5. In practice even the determination (or approximation) of the distinct eigenvalues of an \(n\times n\) matrix poses a very challenging computational problem as \(n\) gets large. As such the general question of whether a matrix is diagonalizable remains an intractable one. This makes all the more welcoming the main result of this section: all symmetric matrices are diagonalizable! This surprising fact is a consequence of the spectral theorem for self-adjoint operators: a result which itself fits into a larger suite of spectral theorems that treat the diagonalizability of various families of linear transformations of inner product spaces (both finite and infinite dimensional).

Subsection 5.6.1 Self-adjoint operators

Though we are mainly interested in the diagonalizability of symmetric matrices, our arguments are made more elegant by abstracting somewhat to the realm of linear transformations of inner product spaces. In this setting the appropriate analogue of a symmetric matrix is a self-adjoint linear transformation.

Definition 5.6.1. Self-adjoint operators.

Let \((V, \langle\, , \rangle)\) be a finite-dimensional inner product space. A linear transformation \(T\colon V\rightarrow V\) is called a self-adjoint operator if

\begin{equation} \langle T(\boldv), \boldw\rangle=\langle \boldv, T(\boldw)\rangle\tag{5.6.1} \end{equation}

for all \(\boldv, \boldw\in V\text{.}\)

The next theorem makes explicit the connection between self-adjoint operators and symmetric matrices.

Theorem 5.6.2. Self-adjoint operators and symmetry.

Let \((V, \langle\, , \rangle)\) be a finite-dimensional inner product space, let \(T\colon V\rightarrow V\) be a linear transformation, and let \(B\) be an orthonormal ordered basis of \(V\text{.}\) The following statements are equivalent.

\(T\) is self-adjoint.
\(A=[T]_B\) is symmetric.

Proof.

Let \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\text{.}\) We have

\begin{equation*} A=\begin{amatrix}[cccc]\vert \amp \vert \amp \amp\vert \\ \left[T(\boldv_1)\right]_B\amp [T(\boldv_2)]_B\amp \cdots \amp [T(\boldv_n)]_B\\ \vert \amp \vert \amp \amp\vert \end{amatrix}\text{.} \end{equation*}

Furthermore, since \(B\) is orthonormal, the \(i\)-th entry of \([T(\boldv_j)]_B\) is computed as \(\langle T(\boldv_j), \boldv_i\rangle\) (4.2.7). Thus \(A=[a_{ij}]\text{,}\) where

\begin{equation*} a_{ij}=\langle T(\boldv_j), \boldv_i\rangle. \end{equation*}

It follows that

\begin{align*} A \text{ symmetric} \amp\iff a_{ij}=a_{ji} \text{ for all } 1\leq i,j\leq n \\ \amp\iff \langle T(\boldv_j), \boldv_i\rangle=\langle T(\boldv_i), \boldv_j\rangle \text{ for all } 1\leq i,j\leq n \\ \amp \iff \langle T(\boldv_j), \boldv_i\rangle=\langle \boldv_j, T(\boldv_i)\rangle \text{ for all } 1\leq i,j\leq n \amp (\knowl{./knowl/d_innerproduct.html}{\text{4.1.1}}, ii)\text{.} \end{align*}

The last equality in this chain of equivalences states that \(T\) satisfies property (5.6.1) for all elements of \(B\text{.}\) Not surprisingly, this is equivalent to \(T\) satisfying the property for all elements in \(V\text{.}\) (See Exercise 5.6.3.9.) We conclude that \(A\) is symmetric if and only if \(T\) is self-adjoint.

Corollary 5.6.3. Self-adjoint operators and symmetry.

Let \(A\in M_{nn}\text{.}\) The following statements are equivalent.

\(A\) is symmetric.
\(T_A\) is self-adjoint with respect to the dot product.
\(A\boldx\, \cdot \boldy=\boldx\cdot A\boldy\) for all \(\boldx, \boldy\in\R^n\text{.}\)

Proof.

Since \(A=[T_A]_B\text{,}\) where \(B\) is the standard ordered basis of \(\R^n\text{,}\) and since \(B\) is orthonormal with respect to the dot product, it follows from Theorem 5.6.2 that statements (1) and (2) are equivalent. Statements (2) and (3) are equivalent since by definition \(T_A(\boldx)=A\boldx\) for all \(\boldx\in \R^n\text{.}\)

The next result, impressive in its own right, is also key to the induction argument we will use to prove Theorem 5.6.8. A proper proof would require a careful treatment of complex vector spaces: a topic which lies just outside the scope of this text. The “proof sketch” we provide can easily be upgraded to a complete argument simply by justifying a few statements about \(\C^n\) and its standard inner product.

Theorem 5.6.4. Eigenvalues of self-adjoint operators.

Let \((V, \langle\, , \rangle)\) be a finite-dimensional inner product space. If \(T\colon V\rightarrow V\) is self adjoint, then all roots of its characteristic polynomial \(p(x)\) are real: i.e., we have

\begin{equation*} p(x)=(x-\lambda_1)(x-\lambda_2)\cdots (x-\lambda_n)\text{,} \end{equation*}

where \(\lambda_i\in \R\) for all \(1\leq i\leq n\text{.}\)

Proof sketch of Theorem 5.6.2.

Pick an orthonormal ordered basis \(B\) of \(V\text{,}\) and let \(A=[T]_B\text{.}\) By Theorem 5.6.2, \(A\) is symmetric. To prove that all roots of the characteristic polynomial \(p(t)=\det(tI-A)\) are real, we make a slight detour into complex vector spaces. The set

\begin{equation*} \C^n=\{(z_1,z_2,\dots, z_n)\colon z_i\in \C \text{ for all } 1\leq i\leq n\} \end{equation*}

of all complex \(n\)-tuples, together with the operations

\begin{equation*} (z_1,z_2,\dots, z_n)+(w_1, w_2,\dots, w_n)=(z_1+w_1, z_2+w_2, \dots, z_n+w_n) \end{equation*}

and

\begin{equation*} \alpha (z_1, z_2, \dots, z_n)=(\alpha z_1, \alpha z_2, \dots, \alpha z_n), \end{equation*}

where \(\alpha\in \C\text{,}\) forms what is called a vector space over \(\C\). This means that \(V=\C^n\) satisfies the strengthened axioms of Definition 3.1.1 obtained by replacing every mention of a scalar \(c\in \R\) with a scalar \(\alpha\in \C\text{.}\) Additionally, the vector space \(\C^n\) has the structure of a complex inner product defined as

\begin{equation*} \langle (z_1,z_2,\dots, z_n), (w_1,w_2,\dots, w_n)\rangle=z_1\overline{w_1}+z_2\overline{w_2}+\cdots +z_n\overline{w_n}\text{,} \end{equation*}

where \(\overline{w_i}\) denotes the complex conjugate of \(w_i\) for each \(i\text{.}\) Essentially all of our theory of real vector spaces can be “transported” to complex vector spaces, including definitions and results about eigenvectors and inner products. The rest of this argument makes use of this principle by citing without proof some of these properties, and this is why it has been downgraded to a “proof sketch”.

We now return to \(A\) and its characteristic polynomial \(p(x)\text{.}\) Recall that we want to show that all roots of \(p(x)\) are real. Let \(\lambda\in \C\) be a root of \(p(x)\text{.}\) The complex theory of eigenvectors implies that there is a nonzero vector \(\boldz\in \C^n\) satisfying \(A\boldz=\lambda \boldz\text{.}\) On the one hand, we have

\begin{equation*} \langle A\boldz, \boldz\rangle =\langle \lambda\boldz, \boldz\rangle=\lambda\langle \boldz, \boldz\rangle \end{equation*}

using properties of our complex inner product. On the other hand, since \(A^T=A\) it is easy to see that Corollary 5.6.3 extends to our complex inner product: i.e.,

\begin{equation*} \langle A\boldz, \boldw\rangle=\langle \boldz, A\boldw\rangle \end{equation*}

for all \(\boldz, \boldw\in \C^n\text{.}\) Thus

\begin{align*} \langle A\boldz, \boldz\rangle \amp= \\ \amp= \langle \boldz, A\boldz\rangle \\ \amp = \langle \boldz, \lambda\boldz\rangle\\ \amp =\overline{\lambda}\langle \boldz, \boldz\rangle \text{.} \end{align*}

(In the last equality we use the fact that our complex inner product satisfies \(\langle \boldz, \alpha\boldw\rangle=\overline{\alpha} \langle \boldz, \boldw\rangle\) for any \(\alpha\in \C\) and vectors \(\boldz, \boldw\in \C^n\text{.}\)) It follows that

\begin{equation*} \lambda\langle \boldz, \boldz\rangle=\overline{\lambda}\langle \boldz, \boldz\rangle\text{.} \end{equation*}

Since \(\boldz\ne \boldzero\text{,}\) we have \(\langle \boldz, \boldz\rangle\ne 0\) (another property of our complex inner product), and thus \(\lambda=\overline{\lambda}\text{.}\) Since a complex number \(z=a+bi\) satisfies \(\overline{z}=z\) if and only if \(b=0\) if and only if \(z\) is real, we conclude that \(\lambda\) is a real number, as claimed.

Corollary 5.6.5. Eigenvalues of self-adjoint operators.

If \(T\) is a self-adjoint operator on a finite-dimensional inner product space \(V\text{,}\) then \(T\) has an eigenvalue: i.e., there is a \(\lambda\in \R\) and nonzero \(\boldv\in V\) such that \(T(\boldv)=\lambda\boldv\text{.}\)

Proof.

The corollary follows from Theorem 5.6.4 and the fact that the eigenvalues of \(T\) are the real roots of its characteristic polynomial (5.4.25).

From Theorem 5.6.4 and Corollary 5.6.3 it follows that the characterisitic polynomial of any symmetric matrix must factor as a product of linear terms over \(\R\text{,}\) as illustrated by the next two examples.

Example 5.6.6. Symmetric \(2\times 2\) matrices.

Verify that the characteristic polynomial of any symmetric \(2\times 2\) matrix factors into linear terms over \(\R\text{.}\)

Solution.

Given a symmetric \(2\times 2\) matrix

\begin{equation*} A=\begin{bmatrix} a\amp b\\ b\amp c \end{bmatrix}\text{,} \end{equation*}

we have

\begin{equation*} p(x)=\det(xI-A)=x^2-(a+c)x+(ac-b^2).\text{.} \end{equation*}

Using the quadratic formula and some algebra, we see that the roots of \(p(x)\) are given by where (using the quadratic formula)

\begin{equation*} \frac{(a+c)\pm \sqrt{(a+c)^2-4ac+4b^2}}{2}=\frac{(a+c)\pm \sqrt{(a-c)^2+4b^2}}{2}\text{.} \end{equation*}

Since \((a-c)^2+4b^2\geq 0\text{,}\) we see that both these roots are real. Thus \(p(x)=(x-\lambda_1)(x-\lambda_2)\text{,}\) where \(\lambda_1, \lambda_2\in \R\text{.}\)

Example 5.6.7. Symmetric \(4\times 4\) matrix.

Verify that the characteristic polynomial of the symmetric matrix

\begin{equation*} A=\begin{amatrix}[rrrr] 6 \amp 2\amp 4\amp 0\\ 2\amp 6\amp 0\amp 4\\ 4\amp 0\amp -6\amp -2\\ 0\amp 4\amp -2\amp -6 \end{amatrix} \end{equation*}

factors into linear terms over \(\R\text{.}\)

Solution.

The characteristic polynomial of \(A\) is \(p(x)=x^4-112x^2+2560\text{.}\) We can use the quadratic equation to solve \(p(x)=0\) for \(u=x^2\text{,}\) yielding

\begin{equation*} u=\frac{112\pm\sqrt{(112)^2-4(2560)}}{2}=56\pm 24\text{.} \end{equation*}

We conclude that \(x^2=32\) or \(x^2=80\text{,}\) and thus \(x=\pm 4\sqrt{2}\) or \(x=\pm 4\sqrt{5}\text{.}\) It follows that

\begin{equation*} p(x)=(x-4\sqrt{2})(x+4\sqrt{2})(x-4\sqrt{5})(x+4\sqrt{5})\text{.} \end{equation*}

Subsection 5.6.2 The spectral theorem for self-adjoint operators

Our version of the spectral theorem concerns self-adjoint linear transformations on a finite-dimensional inner product space. It tells us two remarkable things: (a) every such linear transformation has an eigenbasis (and hence is diagonalizable); and furthermore, (b) the eigenbasis can be chosen to be orthogonal, or even orthonormal.

Theorem 5.6.8. Spectral theorem for self-adjoint operators.

Let \((V, \angvec{\, , \,})\) be a finite-dimensional vector space, and let \(T\colon V\rightarrow V\) be a linear transformation. The following statements are equivalent.

\(T\) is self-adjoint.
\(T\) is diagonalizable and eigenvectors with distinct eigenvalues are orthogonal.
\(T\) has an orthogonal eigenbasis.
\(T\) has an orthonormal eigenbasis.

Proof.

We will prove the cycle of implications \((1)\implies (2)\implies (3)\implies (4)\implies (1)\text{.}\)

Implication: \((1)\implies (2)\).

Assume \(T\) is self adjoint. First we show that eigenvectors with distinct eigenvalues are orthogonal. To this end, suppose we have \(T(\boldv)=\lambda\boldv\) and \(T(\boldv')=\lambda'\boldv'\text{,}\) where \(\lambda\ne \lambda'\text{.}\) Using the definition of self-adjoint, we have

\begin{align*} \angvec{T(\boldv), \boldv'}=\angvec{\boldv, T(\boldv')} \amp\implies \angvec{\lambda\boldv, \boldv'}=\angvec{\boldv, \lambda'\boldv'} \\ \amp\implies \lambda\angvec{\boldv, \boldv'}=\lambda'\angvec{\boldv, \boldv'} \\ \amp \implies \angvec{\boldv, \boldv'}=0 \amp (\lambda\ne \lambda')\text{.} \end{align*}

We now prove by induction on \(\dim V\) that if \(T\) is self-adjoint, then \(T\) is diagonalizable. The base case \(\dim V=1\) is trivial. Assume the result is true of any \(n\)-dimensional inner product space, and suppose \(\dim V=n+1\text{.}\) By Corollary 5.6.5 there is a nonzero \(\boldv\in V\) with \(T(\boldv)=\lambda\boldv\text{.}\) Let \(W=\Span\{\boldv\}\text{.}\) Since \(\dim W=1\text{,}\) we have \(\dim W^\perp=\dim V-1=n\text{.}\) The following two facts are crucial for the rest of the argument and are left as an exercise (5.6.3.10).

For all \(\boldv\in W^\perp\) we have \(T(\boldv)\in W^\perp\text{,}\) and thus by restricting \(T\) to \(W^\perp\) we get a linear transformation \(T\vert_{W^{\perp}}\colon W^\perp\rightarrow W^\perp\text{.}\)
The restriction \(T\vert_{W^\perp}\) is self-adjoint, considered as a linear transformation of the inner product space \(W^\perp\text{.}\) Here the inner product on the subspace \(W^\perp\) is inherited from \((V, \angvec{\, , \,})\) by restriction.

Now since \(\dim W^\perp=n-1\) and \(T\vert_{W^\perp}\) is self-adjoint, we may assume by induction that \(T\vert_{W^\perp}\) has an eigenbasis \(B'=(\boldv_1, \boldv_2,\dots, \boldv_n)\text{.}\) We claim that \(B=(\boldv, \boldv_2, \dots, \boldv_n)\) is an eigenbasis of \(V\text{.}\) Since by definition \(T\vert_{W^\perp}(\boldw')=T(\boldw')\) for all \(\boldw'\in W^\perp\text{,}\) we see that the vectors \(\boldv_i\) are also eigenvectors of \(T\text{,}\) and thus that \(B\) consists of eigenvectors. To show \(B\) is a basis it is enought to prove linear independence, since \(\dim V=n+1\text{.}\) Suppose we have

\begin{equation*} c\boldv+c_1\boldv_1+\cdots +c_n\boldv_n=\boldzero \end{equation*}

for scalars \(c, c_i\in \R\text{.}\) Taking the inner product with \(\boldv\text{,}\) we have :

\begin{align*} c\boldv+c_1\boldv_1+\cdots +c_n\boldv_n=\boldzero\amp\implies\\ \langle\boldv, c\boldv+c_1\boldv_1+\cdots +c_n\boldv_n\rangle=\langle\boldv, \boldzero\rangle \amp\implies c\angvec{\boldv, \boldv}+\sum_{i=1}^nc_i\angvec{\boldv, \boldv_i}=0 \\ \amp \implies c\angvec{\boldv, \boldv}=0 \amp (\angvec{\boldv, \boldv_i}=0)\\ \amp \implies c=0 \amp (\angvec{\boldv, \boldv}\ne 0)\text{.} \end{align*}

It follows that we have

\begin{equation*} c_1\boldv_1+\cdots +c_n\boldv_n=\boldzero\text{,} \end{equation*}

and thus \(c_i=0\) for all \(1\leq i\leq n\text{,}\) since \(B'\) is linearly independent. Having proved that \(B\) is an eigenbasis, we conclude that \(T\) is diagonalizable.

Implication: \((2)\implies (3)\).

Let \(\lambda_1, \lambda_2, \dots, \lambda_r\) be the distinct eigenvalues of \(T\text{.}\) Since \(T\) is assumed to be diagonalizable, following Procedure 5.5.14 we can create an eigenbasis \(B\) by picking bases \(B_i\) of each eigenspace \(W_{\lambda_i}\) and combining them. We can always choose these bases so that each \(B_i\) is orthogonal. When we do so, the assembled \(B\) will be orthogonal as a whole. Indeed given any two elements \(\boldv, \boldv'\) of \(B\text{,}\) if both vectors are elements of \(B_i\) for some \(i\text{,}\) then they are orthogonal by design; furthermore, if \(\boldv\) is an element of basis \(B_i\) and \(\boldv'\) is an element of basis \(B_j\) with \(i\ne j\text{,}\) then they are eigenvectors with distinct eigenvalues, and hence orthogonal by assumption!

Implication: \((3)\implies (4)\).

This is easy to see since an orthonormal eigenbasis can be obtained from an orthogonal eigenbasis by scaling each element by the reciprocal of its norm.

Implication: \((4)\implies (1)\).

Assume \(B\) is an orthonormal eigenbasis of \(T\text{.}\) Since \(B\) is an eigenbasis, \([T]_B\) is a diagonal matrix, and hence symmetric. Since \(B\) is orthonormal with respect to the dot product, we conclude from Theorem 5.6.2 that \(T\) is self-adjoint.

An operator that admits an orthogonal (and hence an orthonormal) eigenbasis is called orthogonally diagonalizable.

Definition 5.6.9. Orthogonally diagonalizable.

Let \(V\) be a finite-dimensional inner product space. A linear transformation \(T\colon V\rightarrow V\) is orthogonally diagonalizable if there exists an orthogonal (equivalently, an orthonormal) eigenbasis of \(T\text{.}\)

This new language affords us a more succinct articulation of Theorem 5.6.8: to be self-adjoint is to be orthogonally diagonalizable. Think of this as a sort of “diagonalizable+” condition.

Mantra 5.6.10. Self-adjoint mantra.

To be self-adjoint on a finite-dimensional inner product space is to be “diagonalizable+”. In more detail:

\begin{equation} T \text{ is self-adjoint} \iff T \text{ is orthogonally diagonalizable}\text{.}\tag{5.6.2} \end{equation}

As an immediate consequence of Theorem 5.6.8, we have the following result about symmetric matrices.

Corollary 5.6.11. Spectral theorem for symmetric matrices.

Let \(A\) be an \(n\times n\) matrix. The following statements are equivalent.

\(A\) is symmetric.
\(A\) is diagonalizable and eigenvectors with distinct eigenvalues are orthogonal with respect to the dot product.
\(A\) is orthogonally diagonalizable.
There exists an orthogonal matrix \(Q\) and diagonal matrix \(D\) such that

\begin{equation} D=Q^{-1}AQ=Q^TAQ\text{.}\tag{5.6.3} \end{equation}

Proof.

By Corollary 5.6.3 we have \(A\) symmetric if and only if \(T_A\) is self-adjoint with respect to the dot product. Statements (1)-(3) are seen to be equivalent by applying Theorem 5.6.8 to \(T_A\) (with respect to the dot product). Let \(B\) be the standard basis of \(\R^n\text{.}\) We see that (4) is equivalent to (3) by observing that \(B'\) is an orthonormal eigenbasis of \(T_A\) if and only if the matrix \(Q=\underset{B'\rightarrow B}{P}\) obtained by placing the elements of \(B'\) as columns is orthogonal and diagonalizes \(A\text{.}\)

The process of finding matrices \(Q\) and \(D\) satisfying (5.6.3) is called orthogonal diagonalization. A close look at the proof of Theorem 5.6.8 gives rise to the following orthogonal diagonalization method for matrices.

Procedure 5.6.12. Orthogonal diagonalization.

Let \(A\) be a symmetric matrix. To orthogonally diagonalize \(A\) proceed as follows.

Let \(\lambda_1, \lambda_2, \dots, \lambda_r\) be the distinct eigenvalues of \(A\text{.}\) For each \(1\leq i\leq r\text{,}\) compute an orthonormal ordered basis of \(W_{\lambda_i}\text{.}\)
Let \(B'=(\boldv_1, \boldv_2, \dots, \boldv_n)\) be the ordered basis obtained by concatenating the orthonormal bases computed in (1). This is an orthonormal basis of eigenvectors. It follows that the matrix

\begin{equation*} Q=\begin{bmatrix} \vert\amp \vert\amp \amp \vert\\ \boldv_1\amp \boldv_2\amp\cdots \amp \boldv_n\\ \vert\amp \vert\amp \amp \vert \end{bmatrix} \end{equation*}

is orthogonal (i.e., \(Q^{-1}=Q^T\)), and the matrix \(D=Q^{-1}AQ=Q^TAQ\) is diagonal.

Example 5.6.13. Orthogonal diagonalization.

The symmetric matrix

\begin{equation*} A=\frac{1}{3}\begin{amatrix}[rrr] -1\amp 2\amp 2\\ 2 \amp -1 \amp 2 \\ 2\amp 2 \amp -1 \end{amatrix} \end{equation*}

has characteristic polynomial \(p(x)=x^3+x^2-x-1\text{.}\) Find an orthogonal matrix \(Q\) and diagonal matrix \(D\) such that \(D=Q^TAQ\text{.}\)

Solution.

First we factor \(p(x)\text{.}\) Looking at the constant term we see that the only possible integer roots of \(p(x)\) are \(\pm 1\text{.}\) It is easily verified that \(p(1)=0\text{,}\) and polynomial division yields the factorization \(p(x)=(x-1)(x^2+2x+1)\text{.}\) Further factorization of \(x^2+2x+1\) gives us \(p(x)=(x-1)(x+1)^2\text{.}\)

Next we compute orthonormal bases of the eigenspaces \(W_1\) and \(W_{-1}\text{,}\) yielding

\begin{align*} B_1\amp=\left(\frac{1}{\sqrt{3}}(1,1,1)\right) \amp B_{2}\amp =\left(\frac{1}{\sqrt{2}}(1,-1,0), \frac{1}{\sqrt{6}}(1,1,-2)\right)\text{.} \end{align*}

Assembling these bases elements into the orthogonal matrix

\begin{equation*} Q=\begin{amatrix}[rrr]1/\sqrt{3}\amp 1/\sqrt{2}\amp 1/\sqrt{6}\\ 1/\sqrt{3}\amp -1/\sqrt{2}\amp 1/\sqrt{6}\\ 1/\sqrt{3}\amp 0\amp -2/\sqrt{6}\end{amatrix}\text{,} \end{equation*}

we conclude that \(D=Q^{-1}AQ=Q^TAQ\text{,}\) where

\begin{equation*} D=\begin{amatrix}[rrr]1\amp 0\amp 0\\ 0\amp -1\amp 0\\ 0\amp 0\amp -1 \end{amatrix}\text{.} \end{equation*}

Observe that the two eigenspaces \(W_1\) and \(W_{-1}\) of the matrix \(A\) in Example 5.6.13 are orthogonal to one another, as predicted by the spectral theorem. Indeed, \(W_1\) is the line passing through the origin with direction vector \(\boldn=(1,1,1)\text{,}\) and \(W_{-1}\) is its orthogonal complement, the plane passing through the origin with normal vector \(\boldn\text{.}\) Figure 5.6.14 depicts the orthogonal configuration of the eigenspaces of this example. This is an excellent illustration of what makes the diagonalizability of symmetric matrices (and self-adjoint operators) special. Keep it in mind!

Figure 5.6.14. Eigenspaces of a symmetric matrix are orthogonal

Do not overlook the reverse implication of equivalence (5.6.2). As the next example illustrates, we can show an operator is self-adjoint by examining the geometry of its eigenspaces.

Example 5.6.15. Orthogonal projections are self-adjoint.

Let \((V,\angvec{\, , \,})\) be a finite-dimensional inner product space, let \(W\) be a subpsace of \(V\text{,}\) and let \(T=\operatorname{proj}_W\) be orthogonal projection onto \(W\text{.}\) Prove that \(T\) is self-adjoint.

Solution.

By Theorem 5.6.8 it suffices to show that \(T\) is orthogonally diagonalizable. According to Exercise 4.3.6.12 we have

\begin{align*} \boldv\in W \amp \iff T(\boldv)=\boldv\\ \boldv\in W^\perp \amp \iff T(\boldv)=\boldzero\text{.} \end{align*}

Equivalently, \(W=W_1\) and \(W^\perp=W_0\) are the 1- and 0-eigenspaces of \(T\text{,}\) respectively. Since \(\dim W+\dim W^\perp=\dim V\) we conclude that \(T\) is diagonalizable. Since clearly \(W\) and \(W^\perp\) are orthogonal, we conclude that \(T\) is in fact othogonally diagonalizable, hence self-adjoint.

Exercises 5.6.3 Exercises

Exercise Group.

Orthogonally diagonalize the given symmetric matrix \(A\) following Procedure 5.6.12: i.e. find a diagonal matrix \(D\) and orthogonal matrix \(Q\) satisfying \(D=Q^{T}AQ\text{.}\)

1.

\(A=\begin{amatrix}[rr] 2\amp 1\\ 1\amp 2 \end{amatrix}\)

2.

\(A=\begin{amatrix}[rr]-1\amp 1\\ 1\amp 2 \end{amatrix}\)

3.

\(A=\begin{amatrix}[rrr]-1 \amp 1 \amp -2 \\ 1 \amp -1 \amp -2 \\ -2 \amp -2 \amp 2 \end{amatrix}\)

4.

\(A=\frac{1}{6}\begin{amatrix}[rrr] 1\amp 7\amp -2\\ 7\amp 1\amp -2\\ -2\amp -2\amp 10 \end{amatrix}\)

5.

\(A=\begin{amatrix}[rrr] 1\amp 2\amp 1\\ 2\amp 0\amp 0\\ 1\amp 0\amp 3 \end{amatrix}\)

6.

\(A=\begin{amatrix}[rrrr] 0 \amp 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1 \\ 1 \amp 0 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \end{amatrix}\)

7.

\(A=\begin{amatrix}[rrrr]0 \amp 0 \amp 1 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \\ 1 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1 \end{amatrix}\)

8.

\(A=\begin{amatrix}[rrrr] 1\amp 1\amp 1\amp 1\\ 1\amp 1\amp 1\amp 1 \\ 1\amp 1\amp 1\amp 1 \\ 1\amp 1\amp 1\amp 1 \end{amatrix}\)

9.

Let \((V, \langle\, , \rangle)\) be a finite-dimensional inner product space, let \(T\colon V\rightarrow V\) be a linear transformation, and let \(B=(\boldv_1, \boldv_2, \dots, \boldv_n)\) be an ordered basis of \(V\text{.}\) Prove: \(T\) is self-adjoint if and only if

\begin{equation*} \langle T(\boldv_i),\boldv_j\rangle=\langle \boldv_i, T(\boldv_j)\rangle \end{equation*}

for all \(1\leq i,j\leq n\text{.}\) In other words, to prove \(T\) is self-adjoint it suffices to show property (5.6.1) holds for all elements of a basis of \(V\text{.}\)

10.

Let \((V, \langle\, , \rangle)\) be a finite-dimensional inner product space, let \(T\colon V\rightarrow V\) be a self-adjoint operator, and let \(W\) be a subspace of \(V\text{.}\)

Prove: if \(\boldv\in W^\perp\text{,}\) then \(T(\boldv)\in W^\perp\text{.}\)
By (a), restricting \(T\) to \(W^\perp\) defines a linear transformation

\begin{align*} T\vert_{W^\perp}\colon W^\perp\amp\rightarrow W^\perp \\ \boldv \amp \mapsto T(\boldv)\text{.} \end{align*}

Prove that \(T\vert_{W^\perp}\) is self-adjoint. Here the inner product on the subspace \(W^\perp\) is inherited from \((V, \angvec{\, , \,})\) by restriction.

11.

Assume \(A\in M_{nn}\) is symmetric and orthogonal. Prove that the characteristic polynomial of \(A\) factors as \(p(x)=(x-1)^r(x+1)^s\) for some nonnegative integers \(r,s\text{.}\) In particular, the eigenvalues of \(A\) are among \(1\) and \(-1\text{.}\)

Exercise Group.

Let \(\mathcal{C}\subseteq \R^2\) be a conic curve defined by a quadratic equation of the form

\begin{equation} \mathcal{C}\colon ax^2+bxy+cy^2=d\tag{5.6.4} \end{equation}

where \(a,b,c\in \R\) are fixed constants. You may have learned that \(\mathcal{C}\) can be rotated to a conic \(\mathcal{C}'\) with a “standard equation” of the form \(ex^2+fy^2=d\text{.}\) In the following exercises we will see why this is true.

12.

Find a symmetric matrix \(A\in M_{22}\) satisfying the following property: \(\boldx=(x,y)\) satisfies (5.6.4) if and only if

\begin{equation} \boldx \cdot (A\boldx)=\boldx^TA\boldx=d\text{.}\tag{5.6.5} \end{equation}

(Here we conflate the \(1\times 1\) matrix \(\begin{bmatrix}d \end{bmatrix}\) with the scalar \(d\in \R\text{.}\))

13.

Show that there is a rotation matrix \(Q\in M_{22}\) satisfying \(D=Q^TAQ\text{,}\) where

\begin{equation*} D=\begin{amatrix}[rr] e\amp 0\\ 0\amp f \end{amatrix} \end{equation*}

for some \(e,f\in \R\text{.}\)

Hint.

See Exercise 5.3.5.16.

14.

Show that \(\boldx\) satisfies (5.6.5) if and only if \(\boldx'=Q^{-1}\boldx=Q^T\boldx\) satisfies

\begin{equation} ex+fy=d\text{.}\tag{5.6.6} \end{equation}

15.

Explain why we can conclude that there is a rotation that maps the conic \(\mathcal{C}\) with equation (5.6.4) to the conic \(\mathcal{C}'\) with “standard equation” (5.6.6).

16.

Let \(\mathcal{C}\subseteq\R^2\) be the conic curve with equation

\begin{equation*} x^2+4xy+y^2=1\text{.} \end{equation*}

Find an angle \(\theta\) and constants \(a,b\in \R\) such that the rotation \(\rho_\theta\) maps \(\mathcal{C}\) to a conic \(\mathcal{C}'\) with defining equation

\begin{equation*} ax^2+by^2=1\text{.} \end{equation*}
First graph \(\mathcal{C}'\text{,}\) and then graph \(\mathcal{C}\) using the result of (a). What type of conics (parabolas, ellipses, hyperbolas) are \(\mathcal{C}\) and \(\mathcal{C'}\) ?