solves min 1 2 ∥ A x − b ∥ 2 \min \frac{1}{2} \lVert Ax-b\rVert^2 min 2 1 ∥ A x − b ∥ 2
check if x ∗ x^* x ∗ solves as A T A x ∗ = A T b A^TAx^*=A^Tb A T A x ∗ = A T b
A T ( A x ∗ − b ) = 0 A^T(Ax^*-b)=0 A T ( A x ∗ − b ) = 0
∥ A T r ∗ ∥ ≤ ε \lVert A^Tr^*\rVert\le \varepsilon ∥ A T r ∗ ∥ ≤ ε
min x ∈ R n f ( x ) \underset{x\in \mathbb{R}^n}{\min}f(x) x ∈ R n min f ( x ) , f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R smooth
(later: x ∈ C ≤ R n x\in C \le \mathbb{R}^n x ∈ C ≤ R n )
Optimality
x ˉ \bar{x} x ˉ is a …
local min of f f f if f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ B ε ( x ˉ ) f(\bar{x}) \le f(x), \forall x \in B_\varepsilon(\bar{x}) f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ B ε ( x ˉ ) , for some ε > 0 \varepsilon>0 ε > 0 small,
where B ε ( x ˉ ) = x ∣ ∥ x − x ˉ ∥ 2 ≤ ε B_\varepsilon(\bar{x})={x \mid \lVert x-\bar{x}\rVert_2 \le \varepsilon} B ε ( x ˉ ) = x ∣ ∥ x − x ˉ ∥ 2 ≤ ε
global min of f f f if f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ R n f(\bar{x})≤f(x), \forall x \in \mathbb{R}^n f ( x ˉ ) ≤ f ( x ) , ∀ x ∈ R n
x ˉ \bar{x} x ˉ is a maximizer of f f f ↔ x ˉ \bar{x} x ˉ is a minimizer of − f -f − f
Ex : lack of attainment
lim x → ∞ e − x = 0 \underset{x \to \infty}{\lim} e^{-x}=0 x → ∞ lim e − x = 0
inf x e − x = 0 \underset{x}{\inf}e^{-x}=0 x inf e − x = 0
e − x e^{-x} e − x is not coercive
First-order necessary condition
[\ \overset{\bar{x}}{\text{local max}} \to \nabla f(\bar{x})=0 ]\
Proof sketch : Up to first-order
2nd-order sufficient condition
Positive Definite
Proof Sketch : Up to second-order:
2nd-order necessary condition
Least-Square (Linear)
Gradient Descent
step size selection (linearized)
search direction selection
Descent Direction
“Linesearch Methods”
Proof Sketch
Conceptual Algorithm
exact linesearch (not always possible)
backtracking linesearch (”Aruijo”)
Exact Linearization
always possible for quadratic functions
Choosing Search Direction d
Always provides descent direction:
⇒
“Steepest” Descent is “Greedy”
Definition : A function f is coercive if lim ∥ x ∥ → ∞ f ( x ) = ∞ \underset{\lVert x\rVert \to \infty}{\lim} f(x) = \infty ∥ x ∥ → ∞ lim f ( x ) = ∞
Equivalently The level sets { x ∣ f ( x ) ≤ τ } \{x \mid f(x) \le \tau\} { x ∣ f ( x ) ≤ τ } for any τ ∈ R \tau \in \mathbb{R} τ ∈ R are all compact (closer & bond)
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R only if ∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0
[ local min x ˉ → ∇ f ( x ˉ ) = 0 ] \left [ \overset{\bar{x}}{\text{local min}} \to \nabla f(\bar{x})=0 \right ] [ local min x ˉ → ∇ f ( x ˉ ) = 0 ]
[ local max x ˉ → ∇ f ( x ˉ ) = 0 ] \left [ \overset{\bar{x}}{\text{local max}} \to \nabla f(\bar{x})=0 \right ] [ local max x ˉ → ∇ f ( x ˉ ) = 0 ]
f ( x ˉ + α d ) − f ( x ˉ ) = ∇ f ( x ˉ ) T ( α d ) + o ( α ∥ d ∥ ) f(\bar{x}+\alpha d)-f(\bar{x})=\nabla f(\bar{x})^T(\alpha d)+o(\alpha \lVert d\rVert) f ( x ˉ + α d ) − f ( x ˉ ) = ∇ f ( x ˉ ) T ( α d ) + o ( α ∥ d ∥)
Because x ˉ \bar{x} x ˉ is a local min,
o ≤ lim α → 0 f ( x ˉ + α d ) − f ( x ˉ ) α = lim α → 0 ∇ f ( x ˉ ) T d + o ( α ∥ d ∥ ) α = ∇ f ( x ˉ ) T d o\le \underset{\alpha \to 0}{\lim}\frac{f(\bar{x}+\alpha d)-f(\bar{x})}{\alpha}=\underset{\alpha \to 0}{\lim}\nabla f(\bar{x})^Td+\frac{o(\alpha \lVert d\rVert)}{\alpha}=\nabla f(\bar{x})^Td o ≤ α → 0 lim α f ( x ˉ + α d ) − f ( x ˉ ) = α → 0 lim ∇ f ( x ˉ ) T d + α o ( α ∥ d ∥) = ∇ f ( x ˉ ) T d
→ ∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R if
∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0 (stationary)
∇ 2 f ( x ˉ ) > 0 \nabla^2 f(\bar{x})\gt 0 ∇ 2 f ( x ˉ ) > 0 (positive definite of Hessian)
[ ∇ f ( x ˉ ) = 0 [\ \nabla f(\bar{x})=0 [ ∇ f ( x ˉ ) = 0 & ∇ 2 f ( x ˉ ) > 0 \nabla^2 f(\bar{x})\gt 0 ∇ 2 f ( x ˉ ) > 0 ⇒ x ˉ \bar{x} x ˉ local min ]\
f ’’ ( x g l o b a l ) > 0 f’’(x_{global})\gt 0 f ’’ ( x g l o ba l ) > 0
f ’’ ( x l o c a l ) > 0 f’’(x_{local})\gt 0 f ’’ ( x l oc a l ) > 0
f ’’ ( x ’ l o c a l ) = 0 f’’(x’_{local})=0 f ’’ ( x ’ l oc a l ) = 0
f ’’ ( x m a x ) < 0 f’’(x_{max})\lt 0 f ’’ ( x ma x ) < 0
Ex : f ( x ) = x 4 f(x)=x^4 f ( x ) = x 4
Definition : A matrix (square/symmetric) A is positive semi-definite (i.e., A ≥ 0 A \ge 0 A ≥ 0 ) if the following equivalent conditions hold:
x T A x ≥ 0 , ∀ x ∈ R n x^TAx \ge 0, \forall x\in \mathbb{R}^n x T A x ≥ 0 , ∀ x ∈ R n
λ i ( A ) ≥ 0 , ∀ i = 1 , … , n \lambda_i(A) \ge 0, \forall i=1,…,n λ i ( A ) ≥ 0 , ∀ i = 1 , … , n
A = R T R , ∃ R A=R^TR, \exists R A = R T R , ∃ R (Cholesky factory exists)
x T A x > 0 , ∀ x ≠ 0 x^TAx \gt 0, \forall x \ne 0 x T A x > 0 , ∀ x = 0
λ i ( A ) > 0 , ∀ i \lambda_i(A) \gt 0, \forall i λ i ( A ) > 0 , ∀ i
A = R T R A=R^TR A = R T R , R R R non-singular
f ( x ˉ + α d ) − f ( x ˉ ) = α ∇ f ( x ˉ ) T ⏟ 0 d + 1 2 α 2 d T ∇ 2 f ( x ˉ ) d ⏟ > 0 + o ( α 2 ∥ d ∥ 2 ) f(\bar{x}+\alpha d)-f(\bar{x})=\alpha\underbrace{\nabla f(\bar{x})^T}_0d+\underbrace{\frac{1}{2}\alpha^2d^T\nabla^2 f(\bar{x})d} _{\gt 0}+o(\alpha^2\lVert d\rVert^2) f ( x ˉ + α d ) − f ( x ˉ ) = α 0 ∇ f ( x ˉ ) T d + > 0 2 1 α 2 d T ∇ 2 f ( x ˉ ) d + o ( α 2 ∥ d ∥ 2 )
→ x ˉ \bar{x} x ˉ is a local min
x ˉ \bar{x} x ˉ is a local min of f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R only if
∇ f ( x ˉ ) = 0 \nabla f(\bar{x})=0 ∇ f ( x ˉ ) = 0 (stationary)
∇ 2 f ( x ˉ ) ≥ 0 \nabla^2 f(\bar{x})\ge 0 ∇ 2 f ( x ˉ ) ≥ 0 (positive semi-definite)
Eigen pair ( x , λ ) (x,\lambda) ( x , λ ) of A A A sq. symm:
x T A x = λ x T x ⏟ 1 = λ x^TAx=\lambda \underbrace{x^Tx}_1=\lambda x T A x = λ 1 x T x = λ
(λ > 0 \lambda \gt 0 λ > 0 means the direction does not change)
Ex: min 1 2 ∥ r ( x ) ∥ 2 = 1 2 ∑ i = 1 m r i ( x ) 2 \min \frac{1}{2}\lVert r(x)\rVert^2=\frac{1}{2}\sum^m_{i=1}r_i(x)^2 min 2 1 ∥ r ( x ) ∥ 2 = 2 1 ∑ i = 1 m r i ( x ) 2 , r i : R n → R r_i:\mathbb{R}^n \to \mathbb{R} r i : R n → R
f ( x ) = 1 2 ∥ r ( x ) ∥ 2 = 1 2 r 1 ( x ) 2 + 1 2 r 2 ( x ) 2 + … f(x)=\frac{1}{2}\lVert r(x)\rVert^2=\frac{1}{2}r_1(x)^2+\frac{1}{2}r_2(x)^2+… f ( x ) = 2 1 ∥ r ( x ) ∥ 2 = 2 1 r 1 ( x ) 2 + 2 1 r 2 ( x ) 2 + …
∇ f ( x ) = ∇ r 1 ( x ) ⋅ r 1 ( x ) + ∇ r 2 ( x ) ⋅ r 2 ( x ) + … = [ ∣ ∣ ∣ ∇ r 1 ( x ) ∇ r 2 ( x ) … ∇ r m ( x ) ∣ ∣ ∣ ] [ r 1 ( x ) r 2 ( x ) ⋮ r m ( x ) ] = J ( x ) T r ( x ) \begin{aligned}\nabla f(x) & =\nabla r_1(x)\cdot r_1(x)+\nabla r_2(x)\cdot r_2(x)+…\\ &= \begin{bmatrix}\mid & \mid & & \mid\\ \nabla r_1(x) & \nabla r_2(x) & \ldots & \nabla r_m(x)\\ \mid & \mid & & \mid\\ \end{bmatrix}\begin{bmatrix}r_1(x)\ r_2(x)\\ \vdots\\ r_m(x)\end{bmatrix}\\ &= J(x)^Tr(x)\end{aligned} ∇ f ( x ) = ∇ r 1 ( x ) ⋅ r 1 ( x ) + ∇ r 2 ( x ) ⋅ r 2 ( x ) + … = ∣ ∇ r 1 ( x ) ∣ ∣ ∇ r 2 ( x ) ∣ … ∣ ∇ r m ( x ) ∣ r 1 ( x ) r 2 ( x ) ⋮ r m ( x ) = J ( x ) T r ( x )
stop when ∥ J ( x ) T r ( x ) ∥ = ∥ ∇ f ( x ) ∥ \lVert J(x)^T r(x)\rVert=\lVert \nabla f(x)\rVert ∥ J ( x ) T r ( x )∥ = ∥ ∇ f ( x )∥ small
f ( x ) = 1 2 ∥ A x − b ∥ 2 = 1 2 ∥ r ( x ) ∥ 2 f(x)=\frac{1}{2}\lVert Ax-b\rVert^2=\frac{1}{2}\lVert r(x)\rVert^2 f ( x ) = 2 1 ∥ A x − b ∥ 2 = 2 1 ∥ r ( x ) ∥ 2
where r i ( x ) = a i T x − b i , ∇ r i ( x ) = a i r_i(x)=a_i^Tx-b_i, \nabla r_i(x)=a_i r i ( x ) = a i T x − b i , ∇ r i ( x ) = a i
∇ f ( x ) = [ ∣ ∣ ∣ a 1 a 2 … a m ∣ ∣ ∣ ] [ a 1 T x − b 1 a 2 T x − b 2 ⋮ a m T x − b m ] = A T ( A x − b ) \begin{aligned}\nabla f(x) & =\begin{bmatrix}\mid & \mid & & \mid\\ a_1 & a_2 & \ldots & a_m\\ \mid & \mid & & \mid \end{bmatrix}\begin{bmatrix}a_1^Tx-b_1\\ a_2^Tx-b_2\\ \vdots\\ a_m^Tx-b_m\end{bmatrix}\\ &= A^T(Ax-b)\end{aligned} ∇ f ( x ) = ∣ a 1 ∣ ∣ a 2 ∣ … ∣ a m ∣ a 1 T x − b 1 a 2 T x − b 2 ⋮ a m T x − b m = A T ( A x − b ) ,
where A = [ − a 1 T − − a 2 T − ⋮ − a m T − ] A=\begin{bmatrix}-a^T_1-\\-a^T_2-\\ \vdots \\ -a^T_m-\end{bmatrix} A = − a 1 T − − a 2 T − ⋮ − a m T − and a i a_i a i is n × 1 n \times 1 n × 1 vector.
∇ f ( x ) = 0 \nabla f(x)=0 ∇ f ( x ) = 0 ↔ A T A x = A T b A^TAx=A^Tb A T A x = A T b
min x ∈ R n f ( x ) \underset{x \in \mathbb{R}^n}{\min} f(x) x ∈ R n min f ( x ) f : R n → R f:\mathbb{R}^n \to \mathbb{R} f : R n → R continuously differentiable
for k = 0 , 1 , 2 , … k = 0,1,2,… k = 0 , 1 , 2 , …
x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k x k + 1 = x k + α k d k
d k ∈ R n d^{k \in \mathbb{R}^n} d k ∈ R n search direction
α k ∈ R + \alpha^k \in \mathbb{R}_+ α k ∈ R + step length
if f : R n → R f: \mathbb{R}^n \to \mathbb{R} f : R n → R is continuous differentiable and d ∈ R n d \in \mathbb{R}^n d ∈ R n is a descent direction, then ∃ \exist ∃ some ε > 0 \varepsilon \gt 0 ε > 0 such that f(x+\alpha d) \lt f(x), \forall \alpha \in (0,\varepsilon]\
because f ’ ( x ; d ) < 0 f’(x;d) \lt 0 f ’ ( x ; d ) < 0 , lim α → 0 f ( x + α d ) − f ( x ) α ≡ f ’ ( x ; d ) < 0 \underset{\alpha \to 0}{\lim}\frac{f(x+\alpha d)-f(x)}{\alpha} \equiv f’(x;d) \lt 0 α → 0 lim α f ( x + α d ) − f ( x ) ≡ f ’ ( x ; d ) < 0
because f f f is continuous, ∃ \exist ∃ small α \alpha α such that f ( x + α d ) < f ( x ) f(x+\alpha d) \lt f(x) f ( x + α d ) < f ( x )
x 0 x^0 x 0 given for k = 0 , 1 , 2 , … k=0,1,2,… k = 0 , 1 , 2 , …
compute search direction d k d^k d k (given x k x^k x k )
compute step size α k \alpha_k α k such that f ( x k + α k d k ) < f ( x k ) f(x^k+\alpha^kd^k)\lt f(x^k) f ( x k + α k d k ) < f ( x k )
update x k + 1 = x k + α k d k x^{k+1}=x^k+\alpha^kd^k x k + 1 = x k + α k d k
Step Size Selection (
α k \alpha^k α k )
constant step α k ≡ α \alpha^k \equiv \alpha α k ≡ α (α > 0 , ∀ k \alpha \gt 0, \forall k α > 0 , ∀ k )
choose α k \alpha^k α k such that
min α ≥ 0 ϕ ( α ) : = f ( x k + α d k ) \underset{\alpha \ge 0}{\min} \phi (\alpha) := f(x^k+\alpha d^k) α ≥ 0 min ϕ ( α ) := f ( x k + α d k )
reduce α \alpha α in steps (eg α ← α / 2 \alpha ← \alpha/2 α ← α /2 until f ( x k + α d k ) < f ( x k ) f(x^k+\alpha d^k) \lt f(x^k) f ( x k + α d k ) < f ( x k )
f ( x ) = 1 2 x T A x + b T x + c o n s t f(x)=\frac{1}{2}x^TAx+b^Tx+const f ( x ) = 2 1 x T A x + b T x + co n s t
ϕ ( x ) = f ( x + α d ) = 1 2 ( x + α d ) T A ( x + α d ) + b T ( x + α d ) + c o n s t = 1 2 x T A x + b T x + 1 2 α 2 d T A d + α b T d + α d T A x + c o n s t = f ( x ) + α 2 1 2 d T A d + α ( A x + b ) T ⏟ ≡ ∇ f ( x ) T d + c o n s t = f ( x ) + α 2 1 2 d T A d + α ∇ f ( x ) T d + c o n s t \begin{aligned} \phi(x) & = f(x+\alpha d) \\ & = \frac{1}{2}(x+\alpha d)^TA(x+\alpha d)+b^T(x+\alpha d)+const \\ & = \frac{1}{2}x^TAx+b^Tx+\frac{1}{2}\alpha^2d^TAd+\alpha b^Td+\alpha d^T Ax+ const\\ & = f(x)+\alpha^2 \frac{1}{2}d^TAd+\alpha\underbrace{(Ax+b)^T}_{\equiv \nabla f(x)^T}d+const \\ & = f(x)+\alpha^2\frac{1}{2}d^TAd+\alpha \nabla f(x)^Td+const \end{aligned} ϕ ( x ) = f ( x + α d ) = 2 1 ( x + α d ) T A ( x + α d ) + b T ( x + α d ) + co n s t = 2 1 x T A x + b T x + 2 1 α 2 d T A d + α b T d + α d T A x + co n s t = f ( x ) + α 2 2 1 d T A d + α ≡ ∇ f ( x ) T ( A x + b ) T d + co n s t = f ( x ) + α 2 2 1 d T A d + α ∇ f ( x ) T d + co n s t
At (unconstrained) minimizer of ϕ \phi ϕ , 0 = ϕ ’ ( α ) = α d T A d + ∇ f ( x ) T d ↔ α = − ∇ f ( x ) T d d T A d 0=\phi’(\alpha)=\alpha d^T Ad+\nabla f(x)^T d ↔ \alpha =\frac{-\nabla f(x)^Td}{d^TAd} 0 = ϕ ’ ( α ) = α d T A d + ∇ f ( x ) T d ↔ α = d T A d − ∇ f ( x ) T d
d is descent direction ⇒ ∇ f ( x ) T d < 0 \nabla f(x)^Td \lt 0 ∇ f ( x ) T d < 0
if d T A d > 0 d^TAd \gt 0 d T A d > 0
if a & b ⇒ α > 0 \alpha \gt 0 α > 0
If A > 0 A \gt 0 A > 0 and d is a descent direction (i.e., ∇ f ( x ) T d < 0 \nabla f(x)^Td \lt 0 ∇ f ( x ) T d < 0 )
⇒ α = − ∇ f ( x ) T d d T A d > 0 \alpha = \frac{-\nabla f(x)^Td}{d^TAd} \gt 0 α = d T A d − ∇ f ( x ) T d > 0
suppose ∇ f ( x ) T d = 0 \nabla f(x)^Td=0 ∇ f ( x ) T d = 0
steepest descent d k = − ∇ f ( x k ) ≡ ∇ f k d^k=-\nabla f(x^k)\equiv \nabla f_k d k = − ∇ f ( x k ) ≡ ∇ f k
f ’ ( x k ; d k ) = ∇ f ( x k ) T d k = ∇ f ( x k ) T ( − ∇ f ( x k ) ) = − ∇ f ( x k ) T ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2 \begin{aligned}f’(x^k;d^k) &= \nabla f(x^k)^Td^k\\ & = \nabla f(x^k)^T(-\nabla f(x^k)) \\ & = -\nabla f(x^k)^T \nabla f(x^k) \\ & = -\lVert \nabla f(x^k) \rVert^2 \end{aligned} f ’ ( x k ; d k ) = ∇ f ( x k ) T d k = ∇ f ( x k ) T ( − ∇ f ( x k )) = − ∇ f ( x k ) T ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2
If ∇ f ( x k ) ≠ 0 \nabla f(x^k) \ne 0 ∇ f ( x k ) = 0 (i.e., x k x^k x k not stationary)
d k = − ∇ f k d^k = -\nabla f_k d k = − ∇ f k descent direction
First-order Taylor approximation of f at x k x^k x k ,
f ( x + d ) ≈ f ( x ) + ∇ f ( x ) T d ⇒ f ( x + d ) − f ( d ) ≈ ∇ f ( x ) T d f(x+d) \approx f(x)+\nabla f(x)^Td ⇒ f(x+d)-f(d) \approx \nabla f(x)^Td f ( x + d ) ≈ f ( x ) + ∇ f ( x ) T d ⇒ f ( x + d ) − f ( d ) ≈ ∇ f ( x ) T d
choose d to minimize f ( x + d ) − f ( x ) f(x+d)-f(x) f ( x + d ) − f ( x ) ,
i.e., min ∥ d ∥ 2 ≤ 1 ∇ f ( x ) T d \underset{\lVert d \rVert_2 \le 1}{\min} \nabla f(x)^Td ∥ d ∥ 2 ≤ 1 min ∇ f ( x ) T d ⇒ choose d = − ∇ f ( x ) ∥ ∇ f ( x ) ∥ d = \frac{-\nabla f(x)}{\lVert \nabla f(x) \rVert} d = ∥ ∇ f ( x )∥ − ∇ f ( x ) (i.e., ∥ d ∥ 2 = 1 \lVert d \rVert_2 =1 ∥ d ∥ 2 = 1 )
∇ f ( x ) T d = − ∇ f ( x ) T ∇ f ( x ) ∥ ∇ f ( x ) ∥ 2 = − ∥ ∇ f ( x ) ∥ 2 2 ∥ ∇ f ( x ) ∥ 2 = − ∥ ∇ f ( x ) ∥ 2 \begin{aligned} \nabla f(x)^Td &= -\nabla f(x)^T\frac{\nabla f(x)}{\lVert \nabla f(x) \rVert_2} \\ & = - \frac{\lVert \nabla f(x) \rVert_2^2}{\lVert \nabla f(x) \rVert_2} \\ & = - \lVert \nabla f(x) \rVert_2\end{aligned} ∇ f ( x ) T d = − ∇ f ( x ) T ∥ ∇ f ( x ) ∥ 2 ∇ f ( x ) = − ∥ ∇ f ( x ) ∥ 2 ∥ ∇ f ( x ) ∥ 2 2 = − ∥ ∇ f ( x ) ∥ 2
By Cauchy-Schwarz in e.g., − ∥ a ∥ 2 ⋅ ∥ b ∥ 2 ≤ a T b ≤ ∥ a ∥ 2 ⋅ − ∥ b ∥ 2 , ∀ a , b ∈ R n -\lVert a \rVert_2 \cdot \lVert b \rVert_2 \le a^Tb \le \lVert a \rVert_2 \cdot -\lVert b \rVert_2, \forall a,b \in \mathbb{R}^n − ∥ a ∥ 2 ⋅ ∥ b ∥ 2 ≤ a T b ≤ ∥ a ∥ 2 ⋅ − ∥ b ∥ 2 , ∀ a , b ∈ R n
A search direction d ∈ R n d \in \mathbb{R}^n d ∈ R n is a descent direction at x if the directional derivative is negative: ∇ f ( x ) T d = f ’ ( x ; d ) < 0 \nabla f(x)^Td=f’(x;d) \lt 0 ∇ f ( x ) T d = f ’ ( x ; d ) < 0