solves min 1 2 ∥ A x − b ∥ 2 \min \frac{1}{2} \lVert Ax-b\rVert^2 min 2 1 ∥ A x − b ∥ 2
check if x ∗ x^* x ∗ solves as A T A x ∗ = A T b A^TAx^*=A^Tb A T A x ∗ = A T b
A T ( A x ∗ − b ) = 0 A^T(Ax^*-b)=0 A T ( A x ∗ − b ) = 0
∥ A T r ∗ ∥ ≤ ε \lVert A^Tr^*\rVert\le \varepsilon ∥ A T r ∗ ∥ ≤ ε
min x ∈ R n f ( x ) \underset{x\in \mathbb{R}^n}{\min}f(x) x ∈ R n min f ( x ) , f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R smooth
(later: x ∈ C ≤ R n x\in C \le \mathbb{R}^n x ∈ C ≤ R n )
Optimality
x ˉ \bar{x} x ˉ is a …
local min of f f f if f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ B ε ( x ˉ ) f(\bar{x}) \le f(x), \forall x \in B_\varepsilon(\bar{x}) f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ B ε ( x ˉ ) , for some ε > 0 \varepsilon>0 ε > 0 small,
where B ε ( x ˉ ) = x ∣ ∥ x − x ˉ ∥ 2 ≤ ε B_\varepsilon(\bar{x})={x \mid \lVert x-\bar{x}\rVert_2 \le \varepsilon} B ε ( x ˉ ) = x ∣ ∥ x − x ˉ ∥ 2 ≤ ε
global min of f f f if f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ R n f(\bar{x})≤f(x), \forall x \in \mathbb{R}^n f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ R n
x ˉ \bar{x} x ˉ is a maximizer of f f f ↔ x ˉ \bar{x} x ˉ is a minimizer of − f -f − f
Ex : lack of attainment
lim x → ∞ e − x = 0 \underset{x \to \infty}{\lim} e^{-x}=0 x → ∞ lim e − x = 0
inf x e − x = 0 \underset{x}{\inf}e^{-x}=0 x inf e − x = 0
e − x e^{-x} e − x is not coercive
Definition : A function f is coercive if lim ∥ x ∥ → ∞ f ( x ) = ∞ \underset{\lVert x\rVert \to \infty}{\lim} f(x) = \infty ∥ x ∥ → ∞ lim f ( x ) = ∞
Equivalently The level sets { x ∣ f ( x ) ≤ τ } \{x \mid f(x) \le \tau\} { x ∣ f ( x ) ≤ τ } for any τ ∈ R \tau \in \mathbb{R} τ ∈ R are all compact (closer & bond)
First-order necessary condition
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R only if ∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0
[ local min x ˉ → ∇ f ( x ˉ ) = 0 ] \left [ \overset{\bar{x}}{\text{local min}} \to \nabla f(\bar{x})=0 \right ] [ local min x ˉ → ∇ f ( x ˉ ) = 0 ]
[ local max x ˉ → ∇ f ( x ˉ ) = 0 ] \left [ \overset{\bar{x}}{\text{local max}} \to \nabla f(\bar{x})=0 \right ] [ local max x ˉ → ∇ f ( x ˉ ) = 0 ]
[\ \overset{\bar{x}}{\text{local max}} \to \nabla f(\bar{x})=0 ]\
Proof sketch : Up to first-order
f ( x ˉ + α d ) − f ( x ˉ ) = ∇ f ( x ˉ ) T ( α d ) + o ( α ∥ d ∥ ) f(\bar{x}+\alpha d)-f(\bar{x})=\nabla f(\bar{x})^T(\alpha d)+o(\alpha \lVert d\rVert) f ( x ˉ + α d ) − f ( x ˉ ) = ∇ f ( x ˉ ) T ( α d ) + o ( α ∥ d ∥)
Because x ˉ \bar{x} x ˉ is a local min,
o ≤ lim α → 0 f ( x ˉ + α d ) − f ( x ˉ ) α = lim α → 0 ∇ f ( x ˉ ) T d + o ( α ∥ d ∥ ) α = ∇ f ( x ˉ ) T d o\le \underset{\alpha \to 0}{\lim}\frac{f(\bar{x}+\alpha d)-f(\bar{x})}{\alpha}=\underset{\alpha \to 0}{\lim}\nabla f(\bar{x})^Td+\frac{o(\alpha \lVert d\rVert)}{\alpha}=\nabla f(\bar{x})^Td o ≤ α → 0 lim α f ( x ˉ + α d ) − f ( x ˉ ) = α → 0 lim ∇ f ( x ˉ ) T d + α o ( α ∥ d ∥) = ∇ f ( x ˉ ) T d
→ ∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0
2nd-order sufficient condition
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R if
∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0 (stationary)
∇ 2 f ( x ˉ ) > 0 \nabla^2 f(\bar{x})\gt 0 ∇ 2 f ( x ˉ ) > 0 (positive definite of Hessian)
[ ∇ f ( x ˉ ) = 0 [\ \nabla f(\bar{x})=0 [ ∇ f ( x ˉ ) = 0 & ∇ 2 f ( x ˉ ) > 0 \nabla^2 f(\bar{x})\gt 0 ∇ 2 f ( x ˉ ) > 0 ⇒ x ˉ \bar{x} x ˉ local min ]\
f ’’ ( x g l o b a l ) > 0 f’’(x_{global})\gt 0 f ’’ ( x g l o ba l ) > 0
f ’’ ( x l o c a l ) > 0 f’’(x_{local})\gt 0 f ’’ ( x l oc a l ) > 0
f ’’ ( x ’ l o c a l ) = 0 f’’(x’_{local})=0 f ’’ ( x ’ l oc a l ) = 0
f ’’ ( x m a x ) < 0 f’’(x_{max})\lt 0 f ’’ ( x ma x ) < 0
Ex : f ( x ) = x 4 f(x)=x^4 f ( x ) = x 4
f ’ ( 0 ) = 0 f’(0)=0 f ’ ( 0 ) = 0
f ’’ ( 0 ) = 0 f’’(0)=0 f ’’ ( 0 ) = 0
Definition : A matrix (square/symmetric) A is positive semi-definite (i.e., A ≥ 0 A \ge 0 A ≥ 0 ) if the following equivalent conditions hold:
x T A x ≥ 0 , ∀ x ∈ R n x^TAx \ge 0, \forall x\in \mathbb{R}^n x T A x ≥ 0 , ∀ x ∈ R n
λ i ( A ) ≥ 0 , ∀ i = 1 , … , n \lambda_i(A) \ge 0, \forall i=1,…,n λ i ( A ) ≥ 0 , ∀ i = 1 , … , n
A = R T R , ∃ R A=R^TR, \exists R A = R T R , ∃ R (Cholesky factory exists)
Positive Definite
x T A x > 0 , ∀ x ≠ 0 x^TAx \gt 0, \forall x \ne 0 x T A x > 0 , ∀ x = 0
λ i ( A ) > 0 , ∀ i \lambda_i(A) \gt 0, \forall i λ i ( A ) > 0 , ∀ i
A = R T R A=R^TR A = R T R , R R R non-singular
Proof Sketch : Up to second-order:
f ( x ˉ + α d ) − f ( x ˉ ) = α ∇ f ( x ˉ ) T ⏟ 0 d + 1 2 α 2 d T ∇ 2 f ( x ˉ ) d ⏟ > 0 + o ( α 2 ∥ d ∥ 2 ) f(\bar{x}+\alpha d)-f(\bar{x})=\alpha\underbrace{\nabla f(\bar{x})^T}_0d+\underbrace{\frac{1}{2}\alpha^2d^T\nabla^2 f(\bar{x})d} _{\gt 0}+o(\alpha^2\lVert d\rVert^2) f ( x ˉ + α d ) − f ( x ˉ ) = α 0 ∇ f ( x ˉ ) T d + > 0 2 1 α 2 d T ∇ 2 f ( x ˉ ) d + o ( α 2 ∥ d ∥ 2 )
take α \alpha α “small”
→ x ˉ \bar{x} x ˉ is a local min
2nd-order necessary condition
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R only if
∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0 (stationary)
∇ 2 f ( x ˉ ) ≥ 0 \nabla^2 f(\bar{x})\ge 0 ∇ 2 f ( x ˉ ) ≥ 0 (positive semi-definite)
Eigen pair ( x , λ ) (x,\lambda) ( x , λ ) of A A A sq. symm:
A x = λ x Ax=\lambda x A x = λ x
x T A x = λ x T x ⏟ 1 = λ x^TAx=\lambda \underbrace{x^Tx}_1=\lambda x T A x = λ 1 x T x = λ
(λ > 0 \lambda \gt 0 λ > 0 means the direction does not change)
Ex: min 1 2 ∥ r ( x ) ∥ 2 = 1 2 ∑ i = 1 m r i ( x ) 2 \min \frac{1}{2}\lVert r(x)\rVert^2=\frac{1}{2}\sum^m_{i=1}r_i(x)^2 min 2 1 ∥ r ( x ) ∥ 2 = 2 1 ∑ i = 1 m r i ( x ) 2 , r i : R n → R r_i:\mathbb{R}^n \to \mathbb{R} r i : R n → R
f ( x ) = 1 2 ∥ r ( x ) ∥ 2 = 1 2 r 1 ( x ) 2 + 1 2 r 2 ( x ) 2 + … f(x)=\frac{1}{2}\lVert r(x)\rVert^2=\frac{1}{2}r_1(x)^2+\frac{1}{2}r_2(x)^2+… f ( x ) = 2 1 ∥ r ( x ) ∥ 2 = 2 1 r 1 ( x ) 2 + 2 1 r 2 ( x ) 2 + …
∇ f ( x ) = ∇ r 1 ( x ) ⋅ r 1 ( x ) + ∇ r 2 ( x ) ⋅ r 2 ( x ) + … = [ ∣ ∣ ∣ ∇ r 1 ( x ) ∇ r 2 ( x ) … ∇ r m ( x ) ∣ ∣ ∣ ] [ r 1 ( x ) r 2 ( x ) ⋮ r m ( x ) ] = J ( x ) T r ( x ) \begin{aligned}\nabla f(x) & =\nabla r_1(x)\cdot r_1(x)+\nabla r_2(x)\cdot r_2(x)+…\\ &= \begin{bmatrix}\mid & \mid & & \mid\\ \nabla r_1(x) & \nabla r_2(x) & \ldots & \nabla r_m(x)\\ \mid & \mid & & \mid\\ \end{bmatrix}\begin{bmatrix}r_1(x)\ r_2(x)\\ \vdots\\ r_m(x)\end{bmatrix}\\ &= J(x)^Tr(x)\end{aligned} ∇ f ( x ) = ∇ r 1 ( x ) ⋅ r 1 ( x ) + ∇ r 2 ( x ) ⋅ r 2 ( x ) + … = ∣ ∇ r 1 ( x ) ∣ ∣ ∇ r 2 ( x ) ∣ … ∣ ∇ r m ( x ) ∣ r 1 ( x ) r 2 ( x ) ⋮ r m ( x ) = J ( x ) T r ( x )
stop when ∥ J ( x ) T r ( x ) ∥ = ∥ ∇ f ( x ) ∥ \lVert J(x)^T r(x)\rVert=\lVert \nabla f(x)\rVert ∥ J ( x ) T r ( x )∥ = ∥ ∇ f ( x )∥ small
Least-Square (Linear)
f ( x ) = 1 2 ∥ A x − b ∥ 2 = 1 2 ∥ r ( x ) ∥ 2 f(x)=\frac{1}{2}\lVert Ax-b\rVert^2=\frac{1}{2}\lVert r(x)\rVert^2 f ( x ) = 2 1 ∥ A x − b ∥ 2 = 2 1 ∥ r ( x ) ∥ 2
where r i ( x ) = a i T x − b i , ∇ r i ( x ) = a i r_i(x)=a_i^Tx-b_i, \nabla r_i(x)=a_i r i ( x ) = a i T x − b i , ∇ r i ( x ) = a i
∇ f ( x ) = [ ∣ ∣ ∣ a 1 a 2 … a m ∣ ∣ ∣ ] [ a 1 T x − b 1 a 2 T x − b 2 ⋮ a m T x − b m ] = A T ( A x − b ) \begin{aligned}\nabla f(x) & =\begin{bmatrix}\mid & \mid & & \mid\\ a_1 & a_2 & \ldots & a_m\\ \mid & \mid & & \mid \end{bmatrix}\begin{bmatrix}a_1^Tx-b_1\\ a_2^Tx-b_2\\ \vdots\\ a_m^Tx-b_m\end{bmatrix}\\ &= A^T(Ax-b)\end{aligned} ∇ f ( x ) = ∣ a 1 ∣ ∣ a 2 ∣ … ∣ a m ∣ a 1 T x − b 1 a 2 T x − b 2 ⋮ a m T x − b m = A T ( A x − b ) ,
where A = [ − a 1 T − − a 2 T − ⋮ − a m T − ] A=\begin{bmatrix}-a^T_1-\\-a^T_2-\\ \vdots \\ -a^T_m-\end{bmatrix} A = − a 1 T − − a 2 T − ⋮ − a m T − and a i a_i a i is n × 1 n \times 1 n × 1 vector.
∇ f ( x ) = 0 \nabla f(x)=0 ∇ f ( x ) = 0 ↔ A T A x = A T b A^TAx=A^Tb A T A x = A T b
Gradient Descent
step size selection (linearized)
search direction selection
min x ∈ R n f ( x ) \underset{x \in \mathbb{R}^n}{\min} f(x) x ∈ R n min f ( x ) f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R continuously differentiable
Descent Direction
x 0 x^0 x 0 given
for k = 0 , 1 , 2 , … k = 0,1,2,… k = 0 , 1 , 2 , …
x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k x k + 1 = x k + α k d k
d k ∈ R n d^{k \in \mathbb{R}^n} d k ∈ R n search direction
α k ∈ R + \alpha^k \in \mathbb{R}_+ α k ∈ R + step length
“Linesearch Methods”
if f : R n → R f: \mathbb{R}^n \to \mathbb{R} f : R n → R is continuous differentiable and d ∈ R n d \in \mathbb{R}^n d ∈ R n is a descent direction, then ∃ \exist ∃ some ε > 0 \varepsilon \gt 0 ε > 0 such that f(x+\alpha d) \lt f(x), \forall \alpha \in (0,\varepsilon]\
Proof Sketch
because f ’ ( x ; d ) < 0 f’(x;d) \lt 0 f ’ ( x ; d ) < 0 , lim α → 0 f ( x + α d ) − f ( x ) α ≡ f ’ ( x ; d ) < 0 \underset{\alpha \to 0}{\lim}\frac{f(x+\alpha d)-f(x)}{\alpha} \equiv f’(x;d) \lt 0 α → 0 lim α f ( x + α d ) − f ( x ) ≡ f ’ ( x ; d ) < 0
because f f f is continuous, ∃ \exist ∃ small α \alpha α such that f ( x + α d ) < f ( x ) f(x+\alpha d) \lt f(x) f ( x + α d ) < f ( x )
Conceptual Algorithm
x 0 x^0 x 0 given for k = 0 , 1 , 2 , … k=0,1,2,… k = 0 , 1 , 2 , …
compute search direction d k d^k d k (given x k x^k x k )
compute step size α k \alpha_k α k such that f ( x k + α k d k ) < f ( x k ) f(x^k+\alpha^kd^k)\lt f(x^k) f ( x k + α k d k ) < f ( x k )
update x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k x k + 1 = x k + α k d k
Step Size Selection (
α k \alpha^k α k )
constant step α k ≡ α \alpha^k \equiv \alpha α k ≡ α (α > 0 , ∀ k \alpha \gt 0, \forall k α > 0 , ∀ k )
exact linesearch (not always possible)
choose α k \alpha^k α k such that
min α ≥ 0 ϕ ( α ) : = f ( x k + α d k ) \underset{\alpha \ge 0}{\min} \phi (\alpha) := f(x^k+\alpha d^k) α ≥ 0 min ϕ ( α ) := f ( x k + α d k )
backtracking linesearch (”Aruijo”)
reduce α \alpha α in steps (eg α ← α / 2 \alpha ← \alpha/2 α ← α /2 until f ( x k + α d k ) < f ( x k ) f(x^k+\alpha d^k) \lt f(x^k) f ( x k + α d k ) < f ( x k )
Exact Linearization
always possible for quadratic functions
f ( x ) = 1 2 x T A x + b T x + c o n s t f(x)=\frac{1}{2}x^TAx+b^Tx+const f ( x ) = 2 1 x T A x + b T x + co n s t
ϕ ( x ) = f ( x + α d ) = 1 2 ( x + α d ) T A ( x + α d ) + b T ( x + α d ) + c o n s t = 1 2 x T A x + b T x + 1 2 α 2 d T A d + α b T d + α d T A x + c o n s t = f ( x ) + α 2 1 2 d T A d + α ( A x + b ) T ⏟ ≡ ∇ f ( x ) T d + c o n s t = f ( x ) + α 2 1 2 d T A d + α ∇ f ( x ) T d + c o n s t \begin{aligned} \phi(x) & = f(x+\alpha d) \\ & = \frac{1}{2}(x+\alpha d)^TA(x+\alpha d)+b^T(x+\alpha d)+const \\ & = \frac{1}{2}x^TAx+b^Tx+\frac{1}{2}\alpha^2d^TAd+\alpha b^Td+\alpha d^T Ax+ const\\ & = f(x)+\alpha^2 \frac{1}{2}d^TAd+\alpha\underbrace{(Ax+b)^T}_{\equiv \nabla f(x)^T}d+const \\ & = f(x)+\alpha^2\frac{1}{2}d^TAd+\alpha \nabla f(x)^Td+const \end{aligned} ϕ ( x ) = f ( x + α d ) = 2 1 ( x + α d ) T A ( x + α d ) + b T ( x + α d ) + co n s t = 2 1 x T A x + b T x + 2 1 α 2 d T A d + α b T d + α d T A x + co n s t = f ( x ) + α 2 2 1 d T A d + α ≡ ∇ f ( x ) T ( A x + b ) T d + co n s t = f ( x ) + α 2 2 1 d T A d + α ∇ f ( x ) T d + co n s t
At (unconstrained) minimizer of ϕ \phi ϕ , 0 = ϕ ’ ( α ) = α d T A d + ∇ f ( x ) T d ↔ α = − ∇ f ( x ) T d d T A d 0=\phi’(\alpha)=\alpha d^T Ad+\nabla f(x)^T d ↔ \alpha =\frac{-\nabla f(x)^Td}{d^TAd} 0 = ϕ ’ ( α ) = α d T A d + ∇ f ( x ) T d ↔ α = d T A d − ∇ f ( x ) T d
d is descent direction ⇒ ∇ f ( x ) T d < 0 \nabla f(x)^Td \lt 0 ∇ f ( x ) T d < 0
if d T A d > 0 d^TAd \gt 0 d T A d > 0
if a & b ⇒ α > 0 \alpha \gt 0 α > 0
If A > 0 A \gt 0 A > 0 and d is a descent direction (i.e., ∇ f ( x ) T d < 0 \nabla f(x)^Td \lt 0 ∇ f ( x ) T d < 0 )
⇒ α = − ∇ f ( x ) T d d T A d > 0 \alpha = \frac{-\nabla f(x)^Td}{d^TAd} \gt 0 α = d T A d − ∇ f ( x ) T d > 0
suppose ∇ f ( x ) T d = 0 \nabla f(x)^Td=0 ∇ f ( x ) T d = 0
Choosing Search Direction d
steepest descent d k = − ∇ f ( x k ) ≡ ∇ f k d^k=-\nabla f(x^k)\equiv \nabla f_k d k = − ∇ f ( x k ) ≡ ∇ f k
Always provides descent direction:
f ’ ( x k ; d k ) = ∇ f ( x k ) T d k = ∇ f ( x k ) T ( − ∇ f ( x k ) ) = − ∇ f ( x k ) T ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2 \begin{aligned}f’(x^k;d^k) &= \nabla f(x^k)^Td^k\\ & = \nabla f(x^k)^T(-\nabla f(x^k)) \\ & = -\nabla f(x^k)^T \nabla f(x^k) \\ & = -\lVert \nabla f(x^k) \rVert^2 \end{aligned} f ’ ( x k ; d k ) = ∇ f ( x k ) T d k = ∇ f ( x k ) T ( − ∇ f ( x k )) = − ∇ f ( x k ) T ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2
If ∇ f ( x k ) ≠ 0 \nabla f(x^k) \ne 0 ∇ f ( x k ) = 0 (i.e., x k x^k x k not stationary)
⇒
d k = − ∇ f k d^k = -\nabla f_k d k = − ∇ f k descent direction
“Steepest” Descent is “Greedy”
First-order Taylor approximation of f at x k x^k x k ,
f ( x + d ) ≈ f ( x ) + ∇ f ( x ) T d ⇒ f ( x + d ) − f ( d ) ≈ ∇ f ( x ) T d f(x+d) \approx f(x)+\nabla f(x)^Td ⇒ f(x+d)-f(d) \approx \nabla f(x)^Td f ( x + d ) ≈ f ( x ) + ∇ f ( x ) T d ⇒ f ( x + d ) − f ( d ) ≈ ∇ f ( x ) T d
choose d to minimize f ( x + d ) − f ( x ) f(x+d)-f(x) f ( x + d ) − f ( x ) ,
i.e., min ∥ d ∥ 2 ≤ 1 ∇ f ( x ) T d \underset{\lVert d \rVert_2 \le 1}{\min} \nabla f(x)^Td ∥ d ∥ 2 ≤ 1 min ∇ f ( x ) T d ⇒ choose d = − ∇ f ( x ) ∥ ∇ f ( x ) ∥ d = \frac{-\nabla f(x)}{\lVert \nabla f(x) \rVert} d = ∥ ∇ f ( x )∥ − ∇ f ( x ) (i.e., ∥ d ∥ 2 = 1 \lVert d \rVert_2 =1 ∥ d ∥ 2 = 1 )
∇ f ( x ) T d = − ∇ f ( x ) T ∇ f ( x ) ∥ ∇ f ( x ) ∥ 2 = − ∥ ∇ f ( x ) ∥ 2 2 ∥ ∇ f ( x ) ∥ 2 = − ∥ ∇ f ( x ) ∥ 2 \begin{aligned} \nabla f(x)^Td &= -\nabla f(x)^T\frac{\nabla f(x)}{\lVert \nabla f(x) \rVert_2} \\ & = - \frac{\lVert \nabla f(x) \rVert_2^2}{\lVert \nabla f(x) \rVert_2} \\ & = - \lVert \nabla f(x) \rVert_2\end{aligned} ∇ f ( x ) T d = − ∇ f ( x ) T ∥ ∇ f ( x ) ∥ 2 ∇ f ( x ) = − ∥ ∇ f ( x ) ∥ 2 ∥ ∇ f ( x ) ∥ 2 2 = − ∥ ∇ f ( x ) ∥ 2
By Cauchy-Schwarz in e.g., − ∥ a ∥ 2 ⋅ ∥ b ∥ 2 ≤ a T b ≤ ∥ a ∥ 2 ⋅ − ∥ b ∥ 2 , ∀ a , b ∈ R n -\lVert a \rVert_2 \cdot \lVert b \rVert_2 \le a^Tb \le \lVert a \rVert_2 \cdot -\lVert b \rVert_2, \forall a,b \in \mathbb{R}^n − ∥ a ∥ 2 ⋅ ∥ b ∥ 2 ≤ a T b ≤ ∥ a ∥ 2 ⋅ − ∥ b ∥ 2 , ∀ a , b ∈ R n