Skip to main content
Logo image

Section 4.3 Orthogonal projection

A trick we learn early on in physics-- specifically, in dynamics problems in \(\R^2\)-- is to pick a convenient axis and then decompose any relevant vectors (force, acceleration, velocity, position, etc.) into a sum of two components: one that points along the chosen axis, and one that points perpendicularly to it. As we will see in this section, this technique can be vastly generalized. Namely, instead of \(\R^2\) we can take any inner product space \((V, \langle\, , \rangle)\text{;}\) and instead of a chosen axis in \(\R^2\text{,}\) we can choose any finite-dimensional subspace \(W\subseteq V\text{;}\) then any \(\boldv\in V\) can be decomposed in the form
\begin{equation*} \boldv=\boldw+\boldw^\perp, \end{equation*}
where \(\boldw\in W\) and \(\boldw^\perp\) is a vector orthogonal to \(W\text{,}\) in a sense we will make precise below. Just as in our toy physics example, this manner of decomposing vectors helps simplify computations in problems where the subspace \(W\) chosen is of central importance.

Subsection 4.3.1 Orthogonal complement

We begin by making sense of what it means for a vector to be orthogonal to a subspace.

Definition 4.3.1. Orthogonal complement.

. Let \((V,\langle \ , \rangle)\) be an inner product vector space, and let \(W\subseteq V\) be a subspace.
A vector \(\boldv\) is orthogonal to \(W\) if it is orthogonal to every element of \(W\text{:}\)i.e., if \(\langle \boldv, \boldw\rangle=0\) for all \(\boldw\in W\text{.}\)
The orthogonal complement of \(W\text{,}\) denoted \(W^\perp\text{,}\) is the set of all elements of \(V\) orthogonal to \(W\text{:}\) i.e.,
\begin{equation*} W^\perp=\{\boldv\in V\colon \langle \boldv, \boldw\rangle=0 \text{ for all } \boldw\in W\}\text{.} \end{equation*}

Remark 4.3.2. Computing \(W^\perp\).

According to Definition 4.3.1, to verify that a vector \(\boldv\) lies in \(W^\perp\text{,}\) we must show that \(\langle \boldv, \boldw\rangle=0\) for all \(\boldw\in W\text{.}\) The “for all” quantifier here can potentially make this an onerous task: there are in principle infinitely many \(\boldw\) to check! In the special case where \(W\) has a finite spanning set, so that \(W=\Span \{\boldw_1, \boldw_2,\dots, \boldw_r\}\) for some vectors \(\boldw_i\text{,}\) deciding whether \(\boldv\in W^\perp\) reduces to checking whether \(\langle \boldv, \boldw_i\rangle=0\) for all \(1\leq i\leq r\text{.}\) In other words, we have
\begin{equation*} \boldv\in W^\perp\iff \langle \boldv, \boldw_i\rangle=0 \text{ for all } 1\leq i\leq r\text{.} \end{equation*}
The forward implication of this equivalence is clear: if \(\boldv\) is orthogonal to all elements of \(W\text{,}\) then clearly it is orthogonal to each \(\boldw_i\text{.}\) The reverse implication is left as an exercise. (See Exercise 4.3.6.7.)
We illustrate this computational technique in the next examples.

Example 4.3.3.

Consider the inner product space \(\R^2\) together with the dot product. Let \(W=\Span\{(1,1)\}=\{(t,t)\colon t\in \R\}\text{:}\) the line \(\ell\subseteq \R^2\) with equation \(y=x\text{.}\) Compute \(W^\perp\) and identify it as a familiar geometric object in \(\R^2\text{.}\)
Solution.
According to Remark 4.3.2, since \(W=\Span\{(1,1)\}\text{,}\) we have
\begin{equation*} \boldx\in W^\perp \iff \boldx\cdot (1,1)=0\text{.} \end{equation*}
Letting \(\boldx=(x,y)\text{,}\) we see that \(\boldx\cdot (1,1)=0\) if and only if \(x+y=0\text{,}\) if and only if \(y=-x\text{.}\) Thus \(W^\perp=\{(x,y)\colon y=-x\}\) is the line \(\ell'\subseteq \R^2\) with equation \(y=-x\text{.}\) Observe that the lines \(\ell\) and \(\ell'\) are indeed perpendicular to one another. (Graph them!)

Example 4.3.4.

Consider the inner product space \(\R^3\) together with the dot product. Let \(W\subseteq \R^3\) be the plane with equation \(x-2y-z=0\text{.}\) Compute \(W^\perp\) and identify this as a familiar geometric object in \(\R^3\text{.}\)
Solution.
First, solving \(x-2y-z=0\) for \((x,y,z)\text{,}\) we see that
\begin{equation*} W=\{(2s+t,s,t)\colon s,t\in \R\}=\Span\{(2,1,0),(1,0,1)\}\text{.} \end{equation*}
Next, according to Remark 4.3.2 we have
\begin{equation*} \boldx\in W^\perp\iff \boldx\cdot (2,1,0)=0 \text{ and } \boldx\cdot (1,0,1)=0\text{.} \end{equation*}
It follows that \(W^\perp\) is the set of vectors \(\boldx=(x,y,z)\) satisfying the linear system
\begin{equation*} \begin{linsys}{3} 2x\amp +\amp y \amp \amp \amp = \amp 0\\ x\amp \amp \amp +\amp z \amp = \amp 0 \end{linsys}. \end{equation*}
Solving this system using Gaussian elimination we conclude that
\begin{equation*} W^\perp=\{(t,-2t,-t)\colon t\in \R\}=\Span\{(1,-2,-1)\}\text{,} \end{equation*}
which we recognize as the line \(\ell\subseteq \R^3\) passing through the origin with direction vector \((1,-2,-1)\text{.}\) This is none other than the normal line to the plane \(W\) passing through the origin.

Example 4.3.6.

Consider the inner product space \(\R^3\) with the dot product. Let \(W=\Span\{(1,1,1)\}\subset \R^3\text{,}\) the line passing through the origin with direction vector \((1,1,1)\text{.}\) The orthogonal complement \(W^\perp\) is the set of vectors orthogonal to \((1,1,1)\text{.}\) Using the definition of dot product, this is the set of solutions \((x,y,z)\) to the equation
\begin{equation*} x+y+z=0\text{,} \end{equation*}
which we recognize as the plane passing through the origin with normal vector \((1,1,1)\text{.}\) Note that we have
\begin{equation*} \dim W+\dim W^\perp=1+2=3, \end{equation*}
as predicted in Theorem 4.3.5.
The notion of orthogonal complement gives us a more conceptual way of understanding the relationship between the various fundamental spaces of a matrix.
  1. Using the dot product method of matrix multiplication, we see that a vector \(\boldx\in\NS(A)\) if and only if \(\boldx\cdot\boldr_i=0\) for each row \(\boldr_i\) of \(A\text{,}\) if and only if \(\boldx\cdot \boldw=0\) for all \(\boldw\in \Span \{\boldr_1, \boldr_2, \dots, \boldr_m\}=\RS A\) (see Remark 4.3.2), if and only if \(\boldx\in (\RS A)^\perp\text{.}\) This shows \(\NS A=(\RS A)^\perp\text{.}\)
    We can use Corollary 4.3.13 to conclude \(\RS A=(\NS A)^\perp\text{.}\) Alternatively, and more directly, the argument above shows that \(\boldw\in \RS A\implies \boldw\in (\NS A)^\perp\text{,}\) proving \(\RS A\subseteq (\NS A)^\perp\text{.}\) Next, by the rank-nullity theorem we have \(\dim \RS A=n-\dim\NS A\text{;}\) and by Theorem 4.3.5 we have \(\dim (\NS A)^\perp=n-\dim\NS A\text{.}\) It follows that \(\dim\RS A=\dim (\NS A)^\perp\text{.}\) Since \(\RS A\subseteq (\NS A)^\perp\) and \(\dim \RS A=\dim (\NS A)^\perp\text{,}\) we conclude by Corollary 3.7.10 that \(\RS A=(\NS A)^\perp\text{.}\)
  2. This follows from (1) and the fact that \(\CS(A)=\RS(A^T)\text{.}\)

Example 4.3.8.

Understanding the orthogonal relationship between \(\NS A \) and \(\RS A\) allows us in many cases to quickly determine/visualize the one from the other. As an example, consider \(A=\begin{bmatrix}1\amp -1\amp 1\\ 1\amp -1\amp -1 \end{bmatrix}\text{.}\) Looking at the columns, we see easily that \(\rank A =2\text{,}\) which implies that \(\nullity A=3-2=1\text{.}\) Since \((1,-1,0)\) is an element of \(\NS(A)\) and \(\dim\NS A=1\text{,}\) we must have \(\NS A=\Span\{(1,-1,0)\}\text{,}\) a line. By orthogonality, we conclude that
\begin{equation*} \RS A=(\NS A)^\perp\text{,} \end{equation*}
which is the plane with normal vector \((1,-1,0)\) passing through the origin.

Subsection 4.3.2 Orthogonal Projection

Let \(B=\{\boldw_1, \boldw_2, \dots, \boldw_r\}\text{.}\) We first show that the vectors
\begin{equation} \boldw=\sum_{i=1}^r\frac{\angvec{\boldv,\boldw_i}}{\angvec{\boldw_i, \boldw_i}}\boldw_i\tag{4.3.4} \end{equation}
and \(\boldw^\perp=\boldv-\boldw\) satisfy the conditions in (4.3.1). It is clear that the \(\boldw\) defined in (4.3.4) is an element of \(W\text{,}\) since it is a linear combination of the \(\boldw_i\text{.}\) Furthermore, we see easily that our choice \(\boldw^\perp=\boldv-\boldw\) satisfies
\begin{equation*} \boldw+\boldw^\perp=\boldw+(\boldv-\boldw)=\boldv\text{.} \end{equation*}
It remains only to show that \(\boldw^\perp=\boldv-\boldw\in W^\perp\text{.}\) Since \(B\) is a basis of \(W\text{,}\) it suffices to show that \(\langle \boldw^\perp,\boldw_j\rangle=0\) for all \(1\leq i\leq r\text{.}\) We compute:
\begin{align*} \langle \boldw^\perp, \boldw_j \rangle\amp = \langle \boldv-\proj{\boldv}{W}, \boldw_i \rangle\\ \amp = \left\langle \boldv-\sum_{i=1}^r\frac{\angvec{\boldv,\boldw_i}}{\angvec{\boldw_i, \boldw_i}}\boldw_i, \boldw_j\right\rangle \\ \amp = \langle \boldv, \boldw_j\rangle -\sum_{i=1}^r\frac{\angvec{\boldv,\boldw_i}}{\angvec{\boldw_i, \boldw_i}}\langle \boldw_i, \boldw_j\rangle \\ \amp = \langle \boldv, \boldw_j\rangle -\frac{\langle \boldv, \boldw_j\rangle}{\cancel{\langle \boldw_j, \boldw_j\rangle}}\cancel{\langle \boldw_j, \boldw_j\rangle}\\ \amp = 0 \text{,} \end{align*}
as desired.
Having shown that a decomposition of \(\boldv\) of the form (4.3.1) exists, we now show it is unique in the sense specified. Suppose we have
\begin{equation*} \boldv=\boldw+\boldw^\perp=\boldu+\boldu^\perp\text{,} \end{equation*}
where \(\boldw, \boldu\in W\) and \(\boldw^\perp, \boldu^\perp\in W^\perp\text{.}\) Rearranging, we see that
\begin{equation*} \boldw-\boldu=\boldu^\perp-\boldw^\perp\text{.} \end{equation*}
We now claim that \(\boldw-\boldu=\boldu^\perp-\boldw^\perp=\boldzero\text{,}\) in which case \(\boldw=\boldu\) and \(\boldw^\perp=\boldu^\perp\text{,}\) as desired. To see why the claim is true, consider the vector \(\boldv'=\boldw-\boldu=\boldu^\perp-\boldw^\perp\text{.}\) Since \(\boldv'=\boldw-\boldu\text{,}\) and \(\boldw, \boldu\in W\text{,}\) we have \(\boldv'\in W\text{.}\) On the other hand, since \(\boldv'=\boldu^\perp-\boldw^\perp\text{,}\) and \(\boldu^\perp, \boldw^\perp\in W^\perp\text{,}\) we have \(\boldv'\in W^\perp\text{.}\) Thus \(\boldv'\in W\cap W^\perp\text{.}\) Since \(W\cap W^\perp=\{\boldzero\}\) (Theorem 4.3.5), we conclude \(\boldv'=\boldw-\boldu=\boldu^\perp-\boldw^\perp=\boldzero\text{,}\) as claimed.
At this point we have proved both (1) and (2), and it remains only to show that (4.3.3) holds for all \(\boldw'\in W\text{.}\) To this end we compute:
\begin{align*} \norm{\boldv-\boldw'}^2\amp = \norm{\boldw^\perp+(\boldw-\boldw')}^2\\ \amp =\norm{\boldw^\perp}^2+\norm{\boldw-\boldw'}^2 \amp (\knowl{./knowl/ex_ortho_pythag.html}{\text{Exercise 4.2.3.13}})\\ \amp \geq \norm{\boldw^\perp}^2\\ \amp =\norm{\boldv-\boldw}^2\text{.} \end{align*}
This shows \(\norm{\boldv-\boldw'}^2\geq \norm{\boldv-\boldw}^2\text{.}\) Taking square-roots now proves the desired inequality.

Remark 4.3.10. Orthogonal projection formula.

The formula (4.3.2) is very convenient for computing an orthogonal projection \(\proj{\boldv}{W}\text{,}\) but mark well this important detail: to apply the formula we must first provide an orthogonal basis of \(W\text{.}\) Thus unless one is provided, our first step in an orthogonal projection computation is to produce an orthogonal basis of \(W\text{.}\) In some simple cases (e.g., when \(W\) is 1- or 2-dimensional) this can be done by inspection. Otherwise, we use the Gram-Schmidt procedure.

Example 4.3.11.

Consider the inner product space \(\R^3\) with the dot product. Let \(W\subseteq\R^3\) be the plane with equation \(x+y+z=0\text{.}\) Compute \(\proj{\boldv}{W}\) for each \(\boldv\) below.
  1. \(\displaystyle \boldv=(3,-2,2)\)
  2. \(\displaystyle \boldv=(2,1,-3)\)
  3. \(\displaystyle \boldv=(-7,-7,-7)\)
Solution.
According to Remark 4.3.10 our first step is to produce an orthogonal basis of \(W\text{.}\) We do so by inspection. Since \(\dim W=2\text{,}\) we simply need to find two solutions to \(x+y+z=0\) that are orthogonal to one another: e.g., \(\boldw_1=(1,-1,0)\) and \(\boldw_2=(1,1,-2)\text{.}\) Thus we choose \(B=\{ (1,-1,0), (1,1,-2)\}\) as our orthogonal basis, and our computations become a matter of applying (4.3.2), which in this case becomes
\begin{equation*} \proj{\boldv}{W}=\frac{\boldv\cdot\boldw_1}{\boldw_1\cdot \boldw_1}\boldw_1+\frac{\boldv\cdot\boldw_2}{\boldw_2\cdot \boldw_2}\boldw_2= \frac{\boldv\cdot\boldw_1}{2}\boldw_1+\frac{\boldv\cdot\boldw_2}{6}\boldw_2\text{.} \end{equation*}
Now compute:
\begin{align*} \proj{(3,-2,2)}{W} \amp =\frac{5}{2}(1,-1,0)+\frac{-3}{6}(1,1,-2)=(2,-3,1)\\ \proj{(2,1,-3)}{W} \amp =\frac{1}{2}(1,-1,0)+\frac{9}{6}(1,1,-2)=(2,1,-3)\\ \proj{(-7,-7,-7)}{W} \amp =\frac{0}{2}(1,-1,0)+\frac{0}{6}(1,1,-2)=(0,0,0)\text{.} \end{align*}
The last two computations might give you pause. Why do we have \(\proj{(2,1,-3)}{W}=(2,1,-3)\) and \(\proj{(-7,7-7,-7)}{W}=(0,0,0)\text{?}\) The answer is that \((2,1,-3)\) is already an element of \(W\text{,}\) so it stands to reason that its projection is itself; and \((-7,-7,-7)\) is already orthogonal to \(W\) (it is a scalar multiple of \((1,1,1)\)), so it stands to reason that its projection is equal to \(\boldzero\text{.}\) See Exercise 4.3.6.12 for a rigorous proof of these claims.

Video example: orthogonal projection in function space.

Figure 4.3.12. Video: orthogonal projection in function space
Clearly \(W\subseteq (W^\perp)^\perp\text{.}\) For the other direction, take \(\boldv\in (W^\perp)^\perp\text{.}\) Using the orthogonal projection theorem, we can write \(\boldv=\boldw+\boldw^\perp\) with \(\boldw\in W\) and \(\boldw^\perp\in W^\perp\text{.}\) We will show \(\boldw^\perp=\boldzero\text{.}\)
Since \(\boldv\in (W^\perp)^\perp\) we have \(\angvec{\boldv,\boldw^\perp}=0\text{.}\) Then we have
\begin{align*} 0\amp =\angvec{\boldv,\boldw^\perp}\\ \amp =\angvec{\boldw+\boldw^\perp,\boldw^\perp}\\ \amp =\angvec{\boldw,\boldw^\perp}+\angvec{\boldw^\perp,\boldw^\perp} \amp \text{ (since \(W\perp W^\perp\)) }\\ \amp =0+\angvec{\boldw^\perp,\boldw^\perp} \end{align*}
Thus \(\angvec{\boldw^\perp,\boldw^\perp}=0\text{.}\) It follows that \(\boldw^\perp=\boldzero\text{,}\) and hence \(\boldv=\boldw+\boldzero=\boldw\in W\text{.}\)
  1. We must show that \(\proj{c\boldv+d\boldw}{W}=c\,\proj{\boldv}{W}+d\,\proj{\boldw}{W}\) for all \(c,d\in\R\) and \(\boldv,\boldw\in V\text{.}\) We pick an orthogonal basis \(B=\{\boldv_1,\boldv_2, \dots, \boldv_r\}\) of \(W\) and compute, using formula (4.3.2):
    \begin{align*} \proj{c\boldv+d\boldw}{W} \amp=\sum_{i=1}^{r}\frac{\langle c\boldv+d\boldw, \boldv_i\rangle}{\langle\boldv_i, \boldv_i\rangle }\boldv_i \\ \amp=\sum_{i=1}^r\frac{c\langle \boldv,\boldv_i\rangle+d\langle \boldw, \boldv_i\rangle}{\langle \boldv_i, \boldv_i\rangle}\boldv_i \\ \amp =c\sum_{i=1}^r\frac{\angvec{\boldv, \boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i+d\sum_{i=1}^r\frac{\angvec{\boldw, \boldv_i}}{\angvec{\boldv_i, \boldv_i}}\boldv_i\\ \amp = c\,\proj{\boldv}{W}+d\,\proj{\boldw}{W}\text{.} \end{align*}
  2. By definition we have \(\proj{\boldv}{W}\in W\) for all \(\boldv\in W\text{,}\) and thus \(\im \operatorname{proj}_W \subseteq W\text{.}\) For the other direction, if \(\boldw\in W\text{,}\) then \(\boldw=\proj{\boldw}{W}\) (Exercise 4.3.6.12), and thus \(\boldw\in \im\operatorname{proj}\text{.}\) This proves \(\im\operatorname{proj}=W\text{.}\)
    The fact that \(\NS\operatorname{proj}=W^\perp\) follows from the equivalence stated in (b) of Exercise 4.3.6.12.

Subsection 4.3.3 Orthogonal projection in \(\R^2\) and \(\R^3\)

For this subsection we will always work within Euclidean space: i.e., \(V=\R^n\) with the dot product. In applications we often want to compute the projection of a point onto a line (in \(\R^2\) or \(\R^3\)) or plane (in \(\R^3\)). According to Corollary 4.3.14 the operation of projecting onto any subspace \(W\subseteq \R^n\) is in fact a linear transformation \(\operatorname{proj}_W\colon \R^n\rightarrow \R^n\text{.}\) By Corollary 3.6.16 we have \(\operatorname{proj}_W=T_A\text{,}\) where
\begin{equation*} A=\begin{bmatrix} \vert \amp \vert \amp \amp \vert \\ \proj{\bolde_1}{W}\amp \proj{\bolde_2}{W}\amp \cdots \amp \proj{\bolde_n}{W} \\ \vert \amp \vert \amp \amp \vert \end{bmatrix}\text{.} \end{equation*}
Lastly, (4.3.2) gives us an easy formula for computing \(\proj{\bolde_j}{W}\) for all \(j\text{,}\) once we have selected an orthogonal basis for \(W\text{.}\) As a result we can easily derive matrix formulas for projection onto any subspace \(W\) of any Euclidean space \(\R^n\text{.}\) We illustrate this with some examples in \(\R^2\) and \(\R^3\) below.

Example 4.3.15. Projection onto a line \(\ell\subseteq \R^3\).

Any line in \(\R^3\) passing through the origin can be described as \(\ell=\Span\{\boldv\}\text{,}\) for some \(\boldv=(a,b,c)\ne 0\text{.}\) The set \(\{(a,b,c)\}\) is trivially an orthogonal basis of \(\ell\text{.}\) Using (4.3.2), we have
\begin{equation*} \proj{\boldx}{\ell}=\frac{\boldx\cdot \boldv}{\boldv\cdot\boldv}\boldv=\frac{ax+by+cz}{a^2+b^2+c^2}(a,b,c)\text{.} \end{equation*}
It follows that \(\operatorname{proj}_\ell=T_A\text{,}\) where
\begin{equation*} A=\begin{bmatrix} \vert \amp \vert \amp \vert \\ \proj{(1,0,0)}{\ell}\amp \proj{(0,1,0)}{\ell}\amp \proj{(0,0,1)}{\ell} \\ \vert \amp \vert \amp \vert \end{bmatrix}=\frac{1}{a^2+b^2+c}\begin{bmatrix} a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix}\text{.} \end{equation*}

Example 4.3.16.

Consider the line \(\ell=\Span\{(1,2,1)\}\subseteq \R^3\text{.}\)
  1. Find the matrix \(A\) such that \(\operatorname{proj}_\ell=T_A\text{.}\)
  2. Use your matrix formula from (a) to compute \(\proj{(-2,3,1)}{\ell}\text{,}\) \(\proj{(-2,-4,-2)}{\ell}\text{,}\) and \(\proj{(1,-1,1)}{\ell}\text{.}\)
  3. Compute \(d((-2,3,1), \ell)\) and \(d((-2,-4,-2), \ell)\text{.}\)
Solution.
  1. Using the general formula described in Example 4.3.15, we have
    \begin{equation*} A=\frac{1}{6}\begin{bmatrix} 1\amp 2\amp 1\\ 2\amp 4\amp 2\\ 1\amp 2\amp 1 \end{bmatrix}\text{.} \end{equation*}
  2. Now compute
    \begin{align*} \proj{(-2,3,1)}{\ell}\amp= \frac{1}{6}\begin{bmatrix} 1\amp 2\amp 1\\ 2\amp 4\amp 2\\ 1\amp 2\amp 1 \end{bmatrix} \begin{amatrix}[r] -2 \\ 3\\ 1 \end{amatrix}= \frac{1}{6}\begin{amatrix}[r] 5\\ 10\\ 5 \end{amatrix}\\ \proj{(-2,-4,-2)}{\ell}\amp= \frac{1}{6}\begin{bmatrix} 1\amp 2\amp 1\\ 2\amp 4\amp 2\\ 1\amp 2\amp 1 \end{bmatrix} \begin{amatrix}[r] -2 \\ -4\\ -2 \end{amatrix}= \frac{1}{6}\begin{amatrix}[r] -2\\ -4\\ -2 \end{amatrix}\\ \proj{(1,-1,1)}{\ell}\amp= \frac{1}{6}\begin{bmatrix} 1\amp 2\amp 1\\ 2\amp 4\amp 2\\ 1\amp 2\amp 1 \end{bmatrix} \begin{amatrix}[r] 1\\ -1\\ 1 \end{amatrix}= \frac{1}{6}\begin{amatrix}[r] 0\\ 0\\ 0 \end{amatrix}\text{.} \end{align*}
    The last two computations, \(\proj{(-2,-4,-2)}{\ell}=(-2,-4,-2)\) and \(\proj{(1,-1,1)}{\ell}=(0,0,0)\text{,}\) should come as no surprise, since \((-2,-4,-2)\in \ell\) and \((1,-1,1)\in \ell^\perp\text{.}\) (See Exercise 4.3.6.12.)
  3. We have
    \begin{align*} d((-2,3,1), \ell) \amp=\norm{(-2,3,1)-\proj{(-2,3,1)}{\ell}} \\ \amp = \norm{\frac{1}{6}(-17,8,1)}=\frac{\sqrt{354}}{6} \\ d((-2,-4,-2), \ell) \amp=\norm{(-2,-4,-2)-\proj{(-2,-4,-2)}{\ell}} \\ \amp = \norm{(0,0,0)}=0 \text{.} \end{align*}
    Again, the second computation should come as no surprise. Since \((-2,-4,-2)\) is itself an element of \(\ell\text{,}\) it stands to reason that its distance to \(\ell\) is equal to zero.

Example 4.3.17. Projection onto planes in \(\R^3\).

Any plane \(W\subseteq\R^3\) passing through the origin can be described as \(W=\{(x,y,z)\in \R^3\colon ax+by+cz=0\}\text{.}\) Equivalently, \(W\) is the set of all \(\boldx\in \R^3\) satisfying \(\boldx\cdot (a,b,c)=0\text{:}\) i.e., \(W=\ell^\perp\text{,}\) where \(\ell=\Span\{(a,b,c)\text{.}\) Consider the orthogonal decomposition with respect to \(\ell\text{:}\)
\begin{equation*} \boldx=\proj{\boldx}{\ell}+(\boldx-\proj{\boldx}{\ell})\text{.} \end{equation*}
Since \(\boldx-\proj{\boldx}{\ell}\in \ell^\perp=W\) and \(\proj{\boldx}{\ell}\in \ell=W^\perp\text{,}\) we see that this is also an orthogonal decomposition with respect to \(W\text{!}\) Using the matrix formula for \(\operatorname{proj}_\ell\) from Example 4.3.15, we have
\begin{align*} \proj{\boldx}{W} \amp =\boldx-\proj{\boldx}{\ell} \\ \amp = I\boldx - A\boldx\amp \left(A=\frac{1}{a^2+b^2+c}\begin{bmatrix} a^2\amp ab\amp ac\\ ab\amp b^2\amp bc\\ ac\amp bc\amp c^2 \end{bmatrix}\right)\\ \amp = (I-A)\boldx\\ \amp = \frac{1}{a^2+b^2+c^2}\begin{bmatrix}b^2+c^2\amp -ab\amp -ac\\ -ab\amp a^2+c^2\amp -bc\\ -ac\amp -bc\amp a^2+b^2 \end{bmatrix}\text{.} \end{align*}
We conclude that \(\operatorname{proj}_W=T_B\text{,}\) where
\begin{equation*} B=\frac{1}{a^2+b^2+c^2}\begin{bmatrix}b^2+c^2\amp -ab\amp -ac\\ -ab\amp a^2+c^2\amp -bc\\ -ac\amp -bc\amp a^2+b^2 \end{bmatrix}\text{.} \end{equation*}

Example 4.3.18.

Consider the plane \(W=\{(x,y,z)\in\R^3\colon x-2y+z=0-\}\text{.}\)
  1. Find the matrix \(A\) such that \(\operatorname{proj}_W=T_A\text{.}\)
  2. Use your matrix formula from (a) to compute \(\proj{(2,1,1)}{W}\) and \(\proj{(1,1,1)}{W}\text{.}\)
  3. Compute \(d((2,1,1),W)\) and \(d((1,1,1), W)\text{.}\)
Solution.
  1. Using the general formula described in Example 4.3.17, we have
    \begin{equation*} A=\frac{1}{6}\begin{amatrix}[rrr] 5\amp 2\amp -1\\ 2\amp 2\amp 2\\ -1\amp 2\amp 5 \end{amatrix}\text{.} \end{equation*}
  2. Now compute
    \begin{align*} \proj{(2,1,1)}{\ell}\amp= \frac{1}{6}\begin{amatrix}[rrr] 5\amp 2\amp -1\\ 2\amp 2\amp 2\\ -1\amp 2\amp 5 \end{amatrix} \begin{amatrix}[r] 2 \\ 1\\ 1 \end{amatrix}= \frac{1}{6}\begin{amatrix}[r] 11\\ 8\\ 5 \end{amatrix}\\ \proj{(1,1,1)}{\ell}\amp= \frac{1}{6}\begin{amatrix}[rrr] 5\amp 2\amp -1\\ 2\amp 2\amp 2\\ -1\amp 2\amp 5 \end{amatrix} \begin{amatrix}[r] 1 \\ 1\\ 1 \end{amatrix}= \frac{1}{6}\begin{amatrix}[r] 0\\ 0\\ 0 \end{amatrix}\text{.} \end{align*}
  3. We have
    \begin{align*} d((2,1,1), W) \amp = \norm{(2,1,1)-\proj{(2,1,1)}{W}}\\ \amp =\norm{\frac{1}{6}(1,-2,-1)}=\frac{6}{6} \\ d((1,1,1), W) \amp = \norm{(1,1,1)-\proj{(1,1,1)}{W}}\\ \amp =\norm{(0,0,0)}=0 \text{.} \end{align*}

Subsection 4.3.4 Trigonometric polynomial approximation

Consider the inner product space consisting of \(C([0,2\pi])\) along with the integral inner product \(\langle f, g\rangle=\int_0^{2\pi}f(x)g(x) \, dx\text{.}\) In Example 4.2.4 we saw that the set
\begin{equation*} B=\{1, \cos(x),\sin(x),\cos(2x),\sin(2x), \dots , \cos(nx),\sin(nx)\} \end{equation*}
is orthogonal with respect to this inner product. Thus \(B\) is an orthogonal basis of
\begin{equation*} \Span B=\{g\in C([0,2\pi])\colon g(x)=a_0+\sum_{k=1}^na_k\cos kx +b_k\sin kx \text{ for some } a_i, b_i\in \R\}\text{.} \end{equation*}
We call \(W=\Span B\) the space of trigonometric polynomials of degree at most \(n\).
Since \(B\) is an orthogonal basis of \(W\text{,}\) given an arbitrary function \(f(x)\in C[0,2\pi]\text{,}\) its orthogonal projection \(\hat{f}=\proj{f}{W}\) is given by
\begin{equation*} \hat{f}(x)=a_0+a_1\cos(x)+b_1\sin(x)+a_2\cos(2x)+b_2\sin(2x)+\cdots +a_n\cos(nx)+b_n\sin(nx)\text{,} \end{equation*}
where
\begin{equation*} a_0=\frac{1}{2\pi}\int_0^{2\pi} f(x) \ dx, \ a_j=\frac{1}{\pi}\int_0^{2\pi}f(x)\cos(jx)\, dx, \ b_k=\frac{1}{\pi}\int_0^{2\pi}f(x)\sin(kx)\, dx\text{.} \end{equation*}
Here we are using (4.3.2), as well as the inner product formulas \(\angvec{1,1}=2\pi\) and \(\angvec{\cos n x, \cos n x}=\angvec{\sin n x, \sin n x}=\pi\) from Example 4.2.4.
What is the relationship between \(f\) and \(\hat{f}\text{?}\) Theorem 4.3.9 tells us that \(\hat{f}\) is the “best” trigonometric polynomial approximation of \(f(x)\) of degree at most \(n\) in the following sense: given any any other trigonometric polynomial \(g\in W\text{,}\) we have
\begin{equation*} \left\vert\left\vert f-\hat{f}\right\vert\right\vert\leq \norm{f-g}\text{.} \end{equation*}
Unpacking the definition of norm in this inner product space, we conclude that
\begin{equation*} \int_0^{2\pi} (f-\hat{f})^2\, dx\leq \int_0^{2\pi} (f-g)^2 \, dx \end{equation*}
for all \(g\in W\text{.}\)
Thus, given a continuous function \(f\) on \([0,2\pi]\text{,}\) linear algebra shows us how to find its best trigonometric polynomial approximation of the form
\begin{equation*} g(x)=a_0+\sum_{k=1}^na_k\cos kx +b_k\sin kx\text{.} \end{equation*}
However, linear algebra does not tell us just how good this approximation is. This question, among others, is tackled by another mathematical theory: Fourier analysis. There we learn that the trigonometric polynomial approximations get arbitrarily close to \(f\) as we let \(n\) increase. More precisely, letting \(\hat{f}_n\) be the orthogonal projection of \(f\) onto the space of trigonometric polynomials of degree at most \(n\text{,}\) we have
\begin{equation*} \lim_{n\to\infty}\norm{f-\hat{f}_n}=0\text{.} \end{equation*}

Subsection 4.3.5 Least-squares solution to linear systems

In statistics we often wish to approximate a scatter plot of points \(P_i=(X_i, Y_i)\text{,}\) \(1\leq i\leq m\text{,}\) with a line \(\ell\colon y=mx+b\) that “best fits” the data. “Finding” this line amounts to finding the appropriate slope \(m\) and \(y\)-intercept \(b\text{:}\) i.e., in this setup, the points \(P_i=(X_i, Y_i)\) are given, and \(m\) and \(b\) are the unknowns we wish to find. For the line to perfectly fit the data, we would want
\begin{equation*} Y_i=mX_i+b \text{ for all } 1\leq i\leq m\text{.} \end{equation*}
In other words \((m,b)\) would be a solution to the matrix equation \(A\boldx=\boldy\text{,}\) where
\begin{equation*} \boldx=\begin{bmatrix}m \\ b\end{bmatrix},A=\begin{bmatrix} X_1\amp 1\\ X_2\amp 1\\ \vdots \amp \vdots \\ X_m\amp 1 \end{bmatrix}, \boldy=\begin{bmatrix} Y_1\\ Y_2\\ \vdots \\ Y_m\end{bmatrix}\text{.} \end{equation*}
Of course in most situations the provided points do not lie on a line, and thus there is no solution \(\boldx\) to the given matrix equation \(A\boldx=\boldy\text{.}\) When this is the case we can use the theory of orthogonal projection to find what is called a least-squares solution, which we now describe in detail.
The least-squares method applies to any matrix equation
\begin{equation} \underset{m\times n}{A}\, \underset{n\times 1}{\boldx}=\underset{m\times 1}{\boldy}\text{,}\tag{4.3.5} \end{equation}
where \(A\) and \(\boldy\) are given, and \(\boldx\) is treated as an unknown vector. Recall that
\begin{align*} A\boldx=\boldy \text{ has a solution } \amp\iff \boldy\in \CS A \amp (\knowl{./knowl/th_fundspaces_matrixtransform.html}{\text{Theorem 3.8.6}}) \text{.} \end{align*}
When \(\boldy\notin \CS A\text{,}\) and hence (4.3.5) does not have a solution, the least-squares method proceeds by replacing \(\boldy\) with the element of \(W=\CS A\) closest to it: that is, with its orthogonal projection onto \(W\text{.}\) Let \(\hat{\boldy}=\proj{\boldy}{W}\text{,}\) where orthogonal projection is taken with respect to the dot product on \(\R^m\text{,}\) and consider the adjusted matrix equation
\begin{equation} A\boldx=\hat{\boldy}\text{.}\tag{4.3.6} \end{equation}
By definition of \(\operatorname{proj}_W\text{,}\) we have \(\hat{\boldy}\in W=\CS A\text{,}\) and thus there is a solution \(\hat{\boldx}\) to (4.3.6). We call \(\hat{\boldx}\) a least-squares solution to (4.3.5). Observe that \(\hat{\boldx}\) does not necessarily satisfy \(A\hat{\boldx}=\boldy\text{;}\) rather, it satisfies \(A\hat{\boldx}=\hat{\boldy}\text{.}\) What makes this a “least-squares” solution is that \(A\hat{\boldx}=\hat{\boldy}\) is the element of \(W=\CS A\) closest to \(\boldy\text{.}\) With respect to the dot product, this means that a least-squares solution \(\hat{\boldx}\) minimizes the quantity
\begin{equation*} \norm{\boldy-A\boldx}=\sqrt{(y_1-y_1')^2+(y_2-y_2')^2+\cdots +(y_n-y_n')^2}\text{,} \end{equation*}
among all \(\boldx\in \R^n\text{.}\)

Example 4.3.19. Best fitting line.

Suppose we wish to find a line \(\ell\colon y=mx+b\) that best fits (in the least-square sense) the following data points: \(P_1=(-3,1), P_2=(1,2), P_3=(2,3)\text{.}\) Following the discussion above, we seek a solution \(\boldx=(m,b)\) to the matrix equation \(A\boldx=\boldy\text{,}\) where
\begin{equation*} \boldx=\begin{bmatrix}m \\ b \end{bmatrix}, A=\begin{amatrix}[rr]-3\amp 1\\ 1\amp 1\\ 2\amp 1 \end{amatrix} , \boldy=\begin{bmatrix}1\\ 2\\ 3 \end{bmatrix}\text{.} \end{equation*}
Using Gaussian elimination, we see easily that this equation has no solution: equivalently, \(\boldy\notin W=\CS A\text{.}\) Accordingly, we compute \(\hat{\boldy}=\proj{\boldy}{W}\) and find a solution to \(A\hat{\boldx}=\hat{\boldy}\text{.}\) Conveniently, the set \(B=\{(-3,2,1), (1,1,1)\}\) is already an orthogonal basis of \(W=\CS A\text{,}\) allowing us to use (4.3.2):
\begin{equation*} \hat{\boldy}=\frac{\boldy\cdot (-3,1,2)}{(-3,2,1)\cdot (-3,1,2)}(-3,1,2)+\frac{\boldy\cdot(1,1,1)}{(1,1,1)\cdot (1,1,1)}(1,1,1)=\frac{1}{14}(13, 33, 38)\text{.} \end{equation*}
Lastly, solving \(A\hat{\boldx}=\hat{\boldy}\) yields \((m,b)=\hat{\boldx}=(5/14, 2)\text{,}\) and we conclude the line \(\ell\colon y=(5/14)x+2\) is the one that best fits the data in the least-squares sense.

Remark 4.3.20. Visualizing least-squares.

Figure 4.3.21 helps us give a graphical interpretation of how the line \(\ell\colon y=(5/14)x+2\) best approximates the points \(P_1=(-3,1), P_2=(1,2), P_3=(2,3)\text{.}\)
Figure 4.3.21. Least-squares visualization
Let \(\boldy=(1,2,3)=(y_1,y_2,y_3)\) be the given \(y\)-values of the points, and let \(\hat{\boldy}=(y_1',y_2',y_3')\) be the orthogonal projection of \(\boldy\) onto \(\CS A\text{.}\) In the graph the values \(\epsilon_i\) denote the vertical difference \(\epsilon_i=y_i-y_i'\) between the data points, and our fitting line. The projection \(\hat{\boldy}\) makes the error \(\norm{\boldy-\hat{\boldy}}=\sqrt{ \epsilon_1^2+\epsilon_2^2+\epsilon_3^2}\) as small as possible. This means if I draw any other line and compute the corresponding differences \(\epsilon_i'\) at the \(x\)-values -3, 1 and 2, then
\begin{equation*} \epsilon_1^2+\epsilon_2^2+\epsilon_3^2\leq (\epsilon_1')^2+(\epsilon_2')^2+(\epsilon_3')^2 \end{equation*}
To compute a least-squares solution to \(A\boldx=\boldy\) we must first compute the orthogonal projection of \(\boldy\) onto \(W=\CS A\text{;}\) and this in turn requires first producing an orthogonal basis of \(\CS A\text{,}\) which may require using the Gram-Schmidt procedure. The following result bypasses these potentially onerous steps by characterizing a least-squares solution to \(A\boldx=\boldy\) as a solution to the matrix equation
\begin{equation*} A^TA\boldx=A^T\boldy\text{.} \end{equation*}
Let \(W=\CS A\text{,}\) and let \(\hat{\boldy}=\proj{\boldy}{W}\text{.}\) The key observation is that a vector \(\hat{\boldx}\) satisfies \(A\hat{\boldx}=\hat{\boldy}\) if and only if
\begin{equation*} \boldy=A\hat{\boldx}+(\boldy-A\hat{\boldx}) \end{equation*}
is an orthogonal decomposition of \(\boldy\) with respect to \(W=\CS A\text{;}\) and this is true if and only if \(\boldy-A\hat{\boldx}\in (\CS A)^\perp\text{.}\) Thus we have
\begin{align*} A\hat{\boldx}=\hat{\boldy} \amp\iff \boldy-A\hat{\boldx}\in (\CS A)^\perp \\ \amp\iff \boldy-A\hat{\boldx}\in \NS A^T \amp ((\CS A)^\perp=\NS A^T, \knowl{./knowl/th_row_null_comp.html}{\text{Theorem 4.3.7}}) \\ \amp\iff A^T(\boldy-A\hat{\boldx})=\boldzero \\ \amp \iff A^T\boldy-A^TA\hat{\boldx}=\boldzero\\ \amp \iff A^TA\hat{\boldx}=A^T\boldy\text{.} \end{align*}

Example 4.3.23.

Consider again the matrix equation \(A\boldx=\boldy\) from Example 4.3.19. According to Theorem 4.3.22 the least-squares solution can be found by solving the equation \(A^TA\boldx=A^T\boldy\) for \(\boldx\text{.}\) We compute
\begin{align*} A^TA\amp=\begin{amatrix}[rr]14\amp 0\\ 0\amp 3 \end{amatrix} \amp A^T\boldy\amp =\begin{amatrix}[r] 5\\ 6 \end{amatrix} \end{align*}
and solve
\begin{equation*} \begin{amatrix}[rr]14\amp 0\\ 0\amp 3 \end{amatrix}\boldx=\begin{amatrix}[r] 5\\ 6 \end{amatrix}\iff \boldx=(5/14, 2), \end{equation*}
just as before.

Exercises 4.3.6 Exercises

.

In each exercise below you are given an inner product space \(V\text{,}\) a subspace \(W=\Span B\) where \(B\) is orthogonal, and a vector \(\boldv\in V\text{.}\) Compute \(\proj{\boldv}{W}\text{.}\)
1.
\(V=\R^4\) with the dot product; \(W=\Span\{(1,1,1,1),(1,-1,1,-1), (1,1,-1,-1)\}\text{;}\) \(\boldv=(2,3,-1,1)\)
2.
\(V=\R^3\) with dot product with weights \(k_1=1, k_2=2, k_3=1\text{;}\) \(W=\{(1,1,1), (1,-1,1),(1,0,-1)\}\text{;}\) \(\boldv=(1,2,3)\)
3.
\(V=C([0,2\pi])\) with the integral inner product; \(W=\Span\{\cos x, \cos 2x, \sin x\}\text{;}\) \(f(x)=3\) for all \(x\in [0,2\pi]\)

4.

Let \(\mathcal{P}\subseteq \R^3\) be the plane passing through the origin with normal vector \(\boldn=(1,2,-1)\text{.}\) Find the orthogonal projection of \((1,1,1)\) onto \(\mathcal{P}\) with respect to the dot product.

5.

Recall that the trace \(\tr A\) of a square matrix is the sum of its diagonal entries. Let \(V=M_{22}\) with inner product \(\angvec{A,B}=\tr(A^TB)\text{.}\) (You may take for granted that this operation is indeed an inner product on \(M_{22}\text{.}\)) Define \(W=\{A\in M_{22}\colon \tr A=0\}\text{.}\)
  1. Compute an orthogonal basis for \(W\text{.}\) You can do this either by inspection (the space is manageable), or by starting with any basis of \(W\) and applying the Gram-Schmidt procedure.
  2. Compute \(\proj{A}{W}\text{,}\) where
    \begin{equation*} A=\begin{bmatrix}1\amp 2\\ 1\amp 1 \end{bmatrix}\text{.} \end{equation*}

6.

Let \(V=C([0,1])\) with the integral inner product, and let \(f(x)=x\text{.}\) Find the function of the form \(g(x)=a+b\cos(2\pi x)+c\sin(2\pi x)\) that “best approximates” \(f(x)\) in terms of this inner product: i.e. find the the \(g(x)\) of this form that minimizes \(d(g,f)\text{.}\)
Hint.
The set \(S=\{f(x)=1, g(x)=\cos(2\pi x), h(x)=\sin(2\pi x)\}\) is orthogonal with respect to the given inner product.

7.

Let \((V, \langle , \rangle )\) be an inner product space, let \(S=\{\boldw_1, \boldw_2, \dots, \boldw_r\}\subseteq V\text{,}\) and let \(W=\Span S\text{.}\) Prove:
\begin{equation*} \boldv\in W^\perp \text{ if and only if } \langle \boldv,\boldw_i \rangle=0 \text{ for all } 1\leq i\leq r\text{.} \end{equation*}
In other words, to check whether an element is in \(W^\perp\text{,}\) it suffices to check that it is orthogonal to each element of its spanning set \(S\text{.}\)

8.

Consider the inner product space \(\R^4\) together with the dot product. Let
\begin{equation*} W=\{(x_1,x_2,x_3,x_4)\in \R^4\colon x_1=x_3 \text{ and } x_2=x_4\}. \end{equation*}
Provide orthogonal bases for \(W\) and \(W^\perp\text{.}\)

10. Dimension of \(W^\perp\).

Prove statement (3) of Theorem 4.3.5: if \((V, \ \angvec{\ , \ })\) is an inner product space of dimension \(n\text{,}\) and \(W\) is a subspace of \(V\text{,}\) then
\begin{equation*} \dim W+\dim W^\perp=n\text{.} \end{equation*}
Hint.
By Corollary 4.2.10 there is an orthogonal basis \(B=\{\boldv_1,\dots ,\boldv_r\}\) of \(W\text{,}\) and furthermore, we can extend \(B\) to an orthogonal basis \(B'=\{\boldv_1,\boldv_2, \dots, \boldv_r, \boldu_1,\dots , \boldu_{n-r}\}\) of all of \(V\text{.}\) Show the \(\boldu_i\) form a basis for \(W^\perp\text{.}\)

12.

Let \(V\) an inner product space, and let \(W\subseteq V\) be a finite-dimensional subspace. Prove the following statements:
  1. \(\boldv\in W\) if and only if \(\proj{\boldv}{W}=\boldv\text{;}\)
  2. \(\boldv\in W^\perp\) if and only if \(\proj{\boldv}{W}=\boldzero\text{.}\)

13.

We consider the problem of fitting a collection of data points \((x,y)\) with a quadratic curve of the form \(y=f(x)=ax^2+bx+c\text{.}\) Thus we are given some collection of points \((x,y)\text{,}\) and we seek parameters \(a, b, c\) for which the graph of \(f(x)=ax^2+bx+c\) “best fits” the points in some way.
  1. Show, using linear algebra, that if we are given any three points \((x,y)=(r_1,s_1), (r_2,s_2), (r_3,s_3)\text{,}\) where the \(x\)-coordinates \(r_i\) are all distinct, then there is a unique choice of \(a,b,c\) such that the corresponding quadratic function agrees precisely with the data. In other words, given just about any three points in the plane, there is a unique quadratic curve connecting them.
  2. Now suppose we are given the four data points
    \begin{equation*} P_1=(0,2), P_2=(1,0), P_3=(2,2), P_4=(3,6)\text{.} \end{equation*}
    1. Use the least-squares method described in the lecture notes to come up with a quadratic function \(y=f(x)\) that “best fits” the data.
    2. Graph the function \(f\) you found, along with the points \(P_i\text{.}\) (You may want to use technology.) Use your graph to explain precisely in what sense \(f\) “best fits” the data.