$p=\hat x a$
Projecting $b$ onto $a$ with error $e = b - \hat xa\to a\cdot(b-\hat x a)=0\to \hat x=\frac{a\cdot b}{a\cdot a}=\frac{a^Tb}{a^Ta}$
$p=a\hat x=a\frac{a^Tb}{a^Ta}=Pb$
When $P$ is a projection ,both $I-P$,$P$ projects onto one subspace,$I-P$ projects onto the perpendicular subspace.
Start with $n$ vectors $a _1, ... , a_n$ in $R^m$. Assume that these a's are linearly independent. Problem: Find the combination $p = x_1 a_1 + · · · + x_na_n$ closest to a given vector $b$.($p=A\hat x$)
In geometry, Because $a^T_i(b-A\hat x)=0$
then $A^T(b-A\hat x)=0\to p=A\hat x=A(A^TA)^{-1}A^Tb\to P=A(A^TA)^{-1}A^T$
In Linear algebra,we can do it a very quick and beautiful way
Proof:
$A^T A$ is a square matrix $(n\times n)$. For every matrix $A$, we will now show that $A^T A$ has the same nullspace as $A$. When the columns of $A$ are linearly independent, its nullspace contains only the zero vector. Then $A^T A$, with this same nullspace, is invertible.
Let $A$ be any matrix. If $x$ is in its nullspace, then $Ax = 0$. Multiplying by $A^T$ gives $A^T Ax = 0$.
So $x$ is also in the nullspace of $A^T A$.Now start with the nullspace of $A^T A$. From $A^T Ax = 0$,we must prove $Ax = 0$. We can't multiply by $(A^T)^{-1}$, which generally doesn't exist. Just multiply by $x^T$ :
$(x^T )A^T Ax= 0$ or $(Ax)^T(Ax) = 0$ or $||Ax||^2=0$. We have shown: If $A^T Ax = 0$ then $Ax$ has length zero. Therefore $Ax = 0$. Every vector $x$ in one nullspace is in the other nullspace. If $A^T A$ has dependent columns, so has $A$. If $A^T A$ has independent columns, so has $A$. This is the good case : $A^T A$ is invertible