Gradients

f:RnRf: \R^n \to \R

Assume ithi^{th}-partial derivative exists:

δf(x)δxi=limε0f(x+εei)f(x)ε\frac{\delta f(x)}{\delta x_i} = \underset{\varepsilon→0}{lim}\frac{f(x+\varepsilon e_i)-f(x)}{\varepsilon}

ei=[0100]e_i=\begin{bmatrix} 0 \\ \vdots \\ 1 \\ 0\\ \vdots \\ 0\end{bmatrix}ith^{th} position

The gradient of ff is the collection of partials:

f(x)=[δf(x)/δx1δf(x)/δx2δf(x)/δxn]Rn\nabla f(x)= \begin{bmatrix}\delta f(x)/\delta x_1\\ \delta f(x)/\delta x_2 \\ \vdots\\ \delta f(x)/\delta x_n\end{bmatrix} \in \mathbb{R}^n

Directional derivative of at xRnx\in \R^n

in the direction dRnd \in \R^n

f(x,d)=limε0f(x+εd)f(x)εf’(x,d)=\underset{\varepsilon → 0}{lim}\frac{f(x+\varepsilon d)-f(x)}{\varepsilon}

The function ff is differentiable of xRnx \in \R^n if f(x,d)f’(x,d) exists for all of Rn\R^n, and f(x,d)=f(x)Tdf’(x,d)=\nabla f(x)^Td

Fact: the direction derivative is positively homogeneous of degree 1, i.e., f(x,αd)=αf(x,d),α0f’(x,\alpha d)=\alpha f’(x,d), \forall \alpha≥0

Special Directions

d=[d1d2dn]d=\begin{bmatrix}d_1 \\ d_2 \\ \vdots \\ d_n \end{bmatrix}d=eid=e_i

f(x,ei)=f(x)Tei=[f(x)]i=δf(x)δxif’(x,e_i)=\nabla f(x)^Te_i=[\nabla f(x)]_i=\frac{\delta f(x)}{\delta x_i}

Calculus Rules for Rn\R^n

f:RnRf:\mathbb{R}^n→\mathbb{R}

Linear function: f(x)=aTx=i=1naixif(x)=a^Tx=\sum^{n}_{i=1}a_ix_i

f(x)=[a1a2an]=a\nabla f(x)=\begin{bmatrix}a_1\\a_2\\ \vdots\\ a_n \end{bmatrix}=a

Quadratic functions:

f(x)=xTQx+cTx+αf(x)=x^TQx+c^Tx+\alpha

QQ is n×nn\times n matrix, cRnc\in \mathbb{R}^n and αR\alpha \in \mathbb{R}

f(x)=Qx+QTx+c=(Q+QT)x+c\nabla f(x)=Qx+Q^Tx+c=(Q+Q^T)x+c

If QQ is symmetric, i.e., Q=QTQ=Q^T, f(x)=2Qx+c\nabla f(x)=2Qx+c

WLOG, assume QQ symmetric.

If not, observe:

f(x)=xTQx+cTx+α=12xTQx+12xTQx+cTx+α=12xTQTx+12xTQx+cTx+α=xT(12QT+12Q)x+cTx+α=xTQx+cTx+α\begin{aligned}f(x) &=x^TQx+c^Tx+\alpha\\ &=\frac{1}{2}x^TQx+\frac{1}{2}x^TQx+c^Tx+\alpha\\ &=\frac{1}{2}x^TQ^Tx+\frac{1}{2}x^TQx+c^Tx+\alpha \\ &=x^T(\frac{1}{2}Q^T+\frac{1}{2}Q)x+c^Tx+\alpha \\ &=x^T\overline{Q}x+c^Tx+\alpha\end{aligned}

where Q=12QT+12Q\overline{Q}=\frac{1}{2}Q^T+\frac{1}{2}Q, called “symmetric part of QQ

Directional Derivative:

f(x,d)=f(x)Tdf’(x,d)=\nabla f(x)^Td

Computing Gradients

  1. Numerical → ”finite differencing”

  2. Symbolic

  3. Automatic Differentiation

Numerical:

Taylor’s Theorem:

f(x+εd)=f(x)+εf(x)Td+O(ε)f(x+\varepsilon d)=f(x)+\varepsilon \nabla f(x)^Td+O(\varepsilon), gO(ε) if limε0g(ε)ε=0g \in O(\varepsilon) \text{ if } \underset{\varepsilon \to 0}{\lim}\frac{g(\varepsilon)}{\varepsilon}=0

f(x;d)=f(x)Tdf(x+εd)f(x)εf’(x;d)=\nabla f(x)^Td \cong \frac{f(x+\varepsilon d)-f(x)}{\varepsilon}

Choose d=eid=e_i

δf(x)δxif(x+εei)f(x)ε\frac{\delta f(x)}{\delta x_i} \cong \frac{f(x+\varepsilon e_i)-f(x)}{\varepsilon} “forward finite differencing”

Central Differencing:

δf(x)δxi=f(x+εei)f(xεei)2ε+O(ε2)\frac{\delta f(x)}{\delta x_i}=\frac{f(x+\varepsilon e_i)-f(x-\varepsilon e_i)}{2\varepsilon}+O(\varepsilon^2)

Symbolic Differentiation

  1. δδx(f1(x)+f2(x))=δδxf1(x)+δδxf2(x)\frac{\delta}{\delta x}(f_1(x)+f_2(x))=\frac{\delta}{\delta x}f_1(x)+\frac{\delta}{\delta x} f_2(x)

  2. δδx(f1(x)f2(x))=(δδxf1(x))f2(x)+(δδxf2(x))f1(x)\frac{\delta}{\delta x}(f_1(x) \cdot f_2(x))=(\frac{\delta}{\delta x}f_1(x))f_2(x)+(\frac{\delta}{\delta x}f_2(x))f_1(x)

  3. δδxf1(f2(x))=δδf2(x)f1(f2(x))δδxf2(x)\frac{\delta}{\delta x} f_1(f_2(x))= \frac{\delta}{\delta f_2(x)}f_1(f_2(x)) \cdot \frac{\delta}{\delta x} f_2(x)

f:RnRf:\mathbb{R}^n→\mathbb{R}

f(x)=[δf(x)δx1δf(x)δxn]\nabla f(x)=\begin{bmatrix}\frac{\delta f(x)}{\delta x_1}\\ \vdots \\ \frac{\delta f(x)}{\delta x_n}\end{bmatrix}

Example f(x1,x2)=ln(x1)+x1x2sin(x2)f(x_1,x_2)=\ln(x_1)+x_1x_2-\sin(x_2)

δf(x1,x2)δx1=1x1+x2\frac{\delta f(x_1,x_2)}{\delta x_1}=\frac{1}{x_1}+x_2

δf(x1,x2)δx2=x1cos(x2)\frac{\delta f(x_1,x_2)}{\delta x_2}=x_1-\cos(x_2)

Computational Graph

input (x1,x2)=(2,5)(x_1,x_2)=(2,5)

v1=x1=2v_1=x_1=2

v2=x2=5v_2=x_2=5

v3=ln(v1)=ln(2)v_3=\ln(v_1)=\ln(2)

v4=v1v2=25=10v_4=v_1 \cdot v_2 = 2 \cdot 5 = 10

v5=sin(v2)=sin(5)=0.96v_5 = \sin(v_2)=\sin(5)=-0.96

v6=v3+v4=10.69v_6=v_3+v_4=10.69

v7=v6v5=11.65v_7=v_6-v_5=11.65

y=v7y=v_7

Forward Mode Auto Diff

Let v˙i=δviδx1\dot{v}_i=\frac{\delta v_i}{\delta x_1} (for gradient with respect to x1x_1, repeat for x2x_2)

v˙1=δv1δx1=1\dot{v}_1=\frac{\delta v_1}{\delta x_1} = 1

v˙2=δv2δx1=0\dot{v}_2=\frac{\delta v_2}{\delta x_1} = 0

v˙3=δv3δx1=δv3δv1δv1δx1=δv3δv1v˙1=1v1v˙1=12\dot{v}_3=\frac{\delta v_3}{\delta x_1}=\frac{\delta v_3}{\delta v_1}\cdot \frac{\delta v_1}{\delta x_1}=\frac{\delta v_3}{\delta v_1}\dot{v}_1=\frac{1}{v_1}\cdot \dot{v}_1=\frac{1}{2}

v˙4=δv4δx1=δv4δv1δv1δx1+δv4δv2δv2δx1=δv4δv1v˙1+δv4δx2v˙2=5\dot{v}_4=\frac{\delta v_4}{\delta x_1}=\frac{\delta v_4}{\delta v_1}\cdot \frac{\delta v_1}{\delta x_1}+\frac{\delta v_4}{\delta v_2} \cdot \frac{\delta v_2}{\delta x_1}=\frac{\delta v_4}{\delta v_1}\cdot \dot{v}_1+\frac{\delta v_4}{\delta x_2}\dot{v}_2=5

v˙5=cos(v2)v˙2=0\dot{v}_5=\cos(v_2)\cdot \dot{v}_2=0

v˙6=v˙3+v˙4=12+5\dot{v}_6=\dot{v}_3+\dot{v}_4=\frac{1}{2}+5

v˙7=v˙6v˙5=5.50=5.5\dot{v}_7=\dot{v}_6-\dot{v}_5=5.5-0=5.5

Reverse Mode Auto Diff

Let vˉi=δyδvi\bar{v}_i=\frac{\delta y}{\delta v_i} ("adjoint")

vˉ7=δyδv7=1\bar{v}_7=\frac{\delta y}{\delta v_7}=1

vˉ6=δyδv6=δyδv7δv7δv6=vˉ7δv7δv6=11=1\bar{v}_6=\frac{\delta y}{\delta v_6}=\frac{\delta y}{\delta v_7}\cdot \frac{\delta v_7}{\delta v_6}=\bar{v}_7\cdot \frac{\delta v_7}{\delta v_6}=1\cdot 1=1

vˉ5=δyδv5=δyδv7δv7δv5=1(1)=1\bar{v}_5=\frac{\delta y}{\delta v_5}=\frac{\delta y}{\delta v_7}\cdot\frac{\delta v_7}{\delta v_5}=1\cdot(-1)=-1

vˉ4=δyδv4=vˉ6δv6δv4=11=1\bar{v}_4=\frac{\delta y}{\delta v_4}=\bar{v}_6\cdot \frac{\delta v_6}{\delta v_4}=1\cdot 1=1

vˉ3=δyδv3=vˉ6δv6δv3=11=1\bar{v}_3=\frac{\delta y}{\delta v_3}=\bar{v}_6\cdot\frac{\delta v_6}{\delta v_3}=1\cdot1=1

vˉ2=δyδv2=vˉ4δv4δv2+vˉ5δv5δv2=2+(1)cos(5)=20.28=1.72\bar{v}_2=\frac{\delta y}{\delta v_2}=\bar{v}_4\cdot\frac{\delta v_4}{\delta v_2}+\bar{v}_5\frac{\delta v_5}{\delta v_2}=2+(-1)\cdot cos(5)=2-0.28=1.72

δf(x1,x2)δx1=δyδv1=vˉ1=vˉ3δv3δv1+vˉ4δv4δv1=112+15=5.5\frac{\delta f(x_1,x_2)}{\delta x_1}=\frac{\delta y}{\delta v_1}=\bar{v}_1=\bar{v}_3\cdot\frac{\delta v_3}{\delta v_1}+\bar{v}_4\cdot\frac{\delta v_4}{\delta v_1}=1\cdot\frac{1}{2}+1\cdot 5=5.5

Last updated