# Gradients

$$f: \R^n \to \R$$

Assume $$i^{th}$$-partial derivative exists:

$$\frac{\delta f(x)}{\delta x\_i} = \underset{\varepsilon→0}{lim}\frac{f(x+\varepsilon e\_i)-f(x)}{\varepsilon}$$

$$e\_i=\begin{bmatrix} 0 \ \vdots \ 1 \ 0\ \vdots \ 0\end{bmatrix}$$i$$^{th}$$ position                    <img src="https://596692103-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FuVC1Ieh2j1XfxqLWFoSa%2Fuploads%2FdSSY5Y9kZ26KnvAQV5Aq%2FUntitled.png?alt=media&#x26;token=b17481e4-2b66-4732-a908-2e24701ccb4c" alt="" data-size="original">

The gradient of $$f$$ is the collection of partials:

$$
\nabla f(x)= \begin{bmatrix}\delta f(x)/\delta x\_1\ \delta f(x)/\delta x\_2 \ \vdots\ \delta f(x)/\delta x\_n\end{bmatrix} \in \mathbb{R}^n
$$

### Directional derivative of  at $$x\in \R^n$$&#x20;

in the direction $$d \in \R^n$$

$$f’(x,d)=\underset{\varepsilon → 0}{lim}\frac{f(x+\varepsilon d)-f(x)}{\varepsilon}$$

The function $$f$$ is differentiable of $$x \in \R^n$$ if $$f’(x,d)$$ exists for all of $$\R^n$$, and $$f’(x,d)=\nabla f(x)^Td$$

> **Fact**: the direction derivative is positively homogeneous of degree 1, i.e., $$f’(x,\alpha d)=\alpha f’(x,d), \forall \alpha≥0$$

Special Directions

$$d=\begin{bmatrix}d\_1 \ d\_2 \ \vdots \ d\_n \end{bmatrix}$$→ $$d=e\_i$$

$$f’(x,e\_i)=\nabla f(x)^Te\_i=\[\nabla f(x)]\_i=\frac{\delta f(x)}{\delta x\_i}$$

### Calculus Rules for $$\R^n$$

$$f:\mathbb{R}^n→\mathbb{R}$$

Linear function: $$f(x)=a^Tx=\sum^{n}\_{i=1}a\_ix\_i$$

$$\nabla f(x)=\begin{bmatrix}a\_1\a\_2\ \vdots\ a\_n \end{bmatrix}=a$$

**Quadratic functions:**

$$f(x)=x^TQx+c^Tx+\alpha$$

$$Q$$ is $$n\times n$$ matrix, $$c\in \mathbb{R}^n$$ and $$\alpha \in \mathbb{R}$$

$$\nabla f(x)=Qx+Q^Tx+c=(Q+Q^T)x+c$$

If $$Q$$ is symmetric, i.e., $$Q=Q^T$$, $$\nabla f(x)=2Qx+c$$

WLOG, assume $$Q$$ symmetric.

If not, observe:

$$\begin{aligned}f(x) &=x^TQx+c^Tx+\alpha\ &=\frac{1}{2}x^TQx+\frac{1}{2}x^TQx+c^Tx+\alpha\ &=\frac{1}{2}x^TQ^Tx+\frac{1}{2}x^TQx+c^Tx+\alpha \ &=x^T(\frac{1}{2}Q^T+\frac{1}{2}Q)x+c^Tx+\alpha \ &=x^T\overline{Q}x+c^Tx+\alpha\end{aligned}$$

where $$\overline{Q}=\frac{1}{2}Q^T+\frac{1}{2}Q$$, called “symmetric part of $$Q$$”

**Directional Derivative:**

$$f’(x,d)=\nabla f(x)^Td$$

<figure><img src="https://596692103-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FuVC1Ieh2j1XfxqLWFoSa%2Fuploads%2Fy0dT1m7i2JRWq2FSKryY%2Fimage.png?alt=media&#x26;token=d624770c-e9ee-466b-bc51-8c79c1adb4b4" alt=""><figcaption></figcaption></figure>

### Computing Gradients

1. Numerical → ”finite differencing”
2. Symbolic
3. Automatic Differentiation

**Numerical:**

Taylor’s Theorem:

$$f(x+\varepsilon d)=f(x)+\varepsilon \nabla f(x)^Td+O(\varepsilon)$$, $$g \in O(\varepsilon) \text{ if } \underset{\varepsilon \to 0}{\lim}\frac{g(\varepsilon)}{\varepsilon}=0$$

$$f’(x;d)=\nabla f(x)^Td \cong \frac{f(x+\varepsilon d)-f(x)}{\varepsilon}$$

Choose $$d=e\_i$$

$$\frac{\delta f(x)}{\delta x\_i} \cong \frac{f(x+\varepsilon e\_i)-f(x)}{\varepsilon}$$ “forward finite differencing”

**Central Differencing:**

$$\frac{\delta f(x)}{\delta x\_i}=\frac{f(x+\varepsilon e\_i)-f(x-\varepsilon e\_i)}{2\varepsilon}+O(\varepsilon^2)$$

**Symbolic Differentiation**

1. $$\frac{\delta}{\delta x}(f\_1(x)+f\_2(x))=\frac{\delta}{\delta x}f\_1(x)+\frac{\delta}{\delta x} f\_2(x)$$
2. $$\frac{\delta}{\delta x}(f\_1(x) \cdot f\_2(x))=(\frac{\delta}{\delta x}f\_1(x))f\_2(x)+(\frac{\delta}{\delta x}f\_2(x))f\_1(x)$$
3. $$\frac{\delta}{\delta x} f\_1(f\_2(x))= \frac{\delta}{\delta f\_2(x)}f\_1(f\_2(x)) \cdot \frac{\delta}{\delta x} f\_2(x)$$

$$f:\mathbb{R}^n→\mathbb{R}$$

$$\nabla f(x)=\begin{bmatrix}\frac{\delta f(x)}{\delta x\_1}\ \vdots \ \frac{\delta f(x)}{\delta x\_n}\end{bmatrix}$$

Example $$f(x\_1,x\_2)=\ln(x\_1)+x\_1x\_2-\sin(x\_2)$$

$$\frac{\delta f(x\_1,x\_2)}{\delta x\_1}=\frac{1}{x\_1}+x\_2$$

$$\frac{\delta f(x\_1,x\_2)}{\delta x\_2}=x\_1-\cos(x\_2)$$

**Computational Graph**

<figure><img src="https://596692103-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FuVC1Ieh2j1XfxqLWFoSa%2Fuploads%2Ffj6RmUxytjP8oDmJM48n%2Fimage.png?alt=media&#x26;token=3586570d-5356-4fa9-ba45-59e385632e33" alt=""><figcaption></figcaption></figure>

input $$(x\_1,x\_2)=(2,5)$$

$$v\_1=x\_1=2$$

$$v\_2=x\_2=5$$

$$v\_3=\ln(v\_1)=\ln(2)$$

$$v\_4=v\_1 \cdot v\_2 = 2 \cdot 5 = 10$$

$$v\_5 = \sin(v\_2)=\sin(5)=-0.96$$

$$v\_6=v\_3+v\_4=10.69$$

$$v\_7=v\_6-v\_5=11.65$$

$$y=v\_7$$

Forward Mode Auto Diff

Let $$\dot{v}\_i=\frac{\delta v\_i}{\delta x\_1}$$ (for gradient with respect to $$x\_1$$, repeat for $$x\_2$$)

$$\dot{v}\_1=\frac{\delta v\_1}{\delta x\_1} = 1$$

$$\dot{v}\_2=\frac{\delta v\_2}{\delta x\_1} = 0$$

$$\dot{v}\_3=\frac{\delta v\_3}{\delta x\_1}=\frac{\delta v\_3}{\delta v\_1}\cdot \frac{\delta v\_1}{\delta x\_1}=\frac{\delta v\_3}{\delta v\_1}\dot{v}\_1=\frac{1}{v\_1}\cdot \dot{v}\_1=\frac{1}{2}$$

$$\dot{v}\_4=\frac{\delta v\_4}{\delta x\_1}=\frac{\delta v\_4}{\delta v\_1}\cdot \frac{\delta v\_1}{\delta x\_1}+\frac{\delta v\_4}{\delta v\_2} \cdot \frac{\delta v\_2}{\delta x\_1}=\frac{\delta v\_4}{\delta v\_1}\cdot \dot{v}\_1+\frac{\delta v\_4}{\delta x\_2}\dot{v}\_2=5$$

$$\dot{v}\_5=\cos(v\_2)\cdot \dot{v}\_2=0$$

$$\dot{v}\_6=\dot{v}\_3+\dot{v}\_4=\frac{1}{2}+5$$

$$\dot{v}\_7=\dot{v}\_6-\dot{v}\_5=5.5-0=5.5$$

Reverse Mode Auto Diff

Let $$\bar{v}\_i=\frac{\delta y}{\delta v\_i}$$ ("adjoint")

$$\bar{v}\_7=\frac{\delta y}{\delta v\_7}=1$$

$$\bar{v}\_6=\frac{\delta y}{\delta v\_6}=\frac{\delta y}{\delta v\_7}\cdot \frac{\delta v\_7}{\delta v\_6}=\bar{v}\_7\cdot \frac{\delta v\_7}{\delta v\_6}=1\cdot 1=1$$

$$\bar{v}\_5=\frac{\delta y}{\delta v\_5}=\frac{\delta y}{\delta v\_7}\cdot\frac{\delta v\_7}{\delta v\_5}=1\cdot(-1)=-1$$

$$\bar{v}\_4=\frac{\delta y}{\delta v\_4}=\bar{v}\_6\cdot \frac{\delta v\_6}{\delta v\_4}=1\cdot 1=1$$

$$\bar{v}\_3=\frac{\delta y}{\delta v\_3}=\bar{v}\_6\cdot\frac{\delta v\_6}{\delta v\_3}=1\cdot1=1$$

$$\bar{v}\_2=\frac{\delta y}{\delta v\_2}=\bar{v}\_4\cdot\frac{\delta v\_4}{\delta v\_2}+\bar{v}\_5\frac{\delta v\_5}{\delta v\_2}=2+(-1)\cdot cos(5)=2-0.28=1.72$$

$$\frac{\delta f(x\_1,x\_2)}{\delta x\_1}=\frac{\delta y}{\delta v\_1}=\bar{v}\_1=\bar{v}\_3\cdot\frac{\delta v\_3}{\delta v\_1}+\bar{v}\_4\cdot\frac{\delta v\_4}{\delta v\_1}=1\cdot\frac{1}{2}+1\cdot 5=5.5$$
