Nonlinear Least-Squares

  • general definition

  • linearization

  • Gauss-Newton

Nonlinear LS:

minxRn12r(x)22\underset{x\in \mathbb{R}^n}{\min}\frac{1}{2}||r(x)||^2_2 , where

r(x)=[r1(x)r2(x)rm(x)]r(x)=\begin{bmatrix}r_1(x) \\ r_2(x)\\ \vdots\\ r_m(x)\end{bmatrix}

ri(x):RnRr_i(x):\mathbb{R}^n→\mathbb{R}

i=1mi=1 \ldots m

mm outputs, nn inputs

Special case: Linear Least-Square:

r(x)=[r1(x)r2(x)rm(x)]=[b1a1Txb2a2TxbmamTx]=[ biaiTx] i=1m=bAx\begin{aligned}r(x)=\begin{bmatrix}r_1(x) \\ r_2(x)\\ \vdots \\ r_m(x)\end{bmatrix}& =\begin{bmatrix}b_1-a^T_1x \\ b_2-a^T_2x \\ \vdots \\ b_m-a^T_mx \end{bmatrix}=[\ b_i-a^T_ix]\ ^m_{i=1}\\ &=b-Ax\end{aligned}

where A=[a1Ta2TanT]A=\begin{bmatrix}-a^T_1-\\-a^T_2-\\ \vdots \\ -a^T_n-\end{bmatrix}, and AA is m×nm \times n.

m>nm>n: overdetermined

m<nm<n: underdetermined

More generally, nonlinear model:

ri(x)=bifi(x)r_i(x)=b_i-f_i(x), fi(x):RnRf_i(x):\mathbb{R}^n→\mathbb{R}

Example: position estimation→ “sensor localization”

nonlinear LS form: to obtain position estimate

minxR2i=1m(xbi2si)2\underset{x\in \mathbb{R}^2}{\min}\sum_{i=1}^m(\lVert x-b_i\rVert_2-s_i)^2

mm beacons biR2b_i \in \mathbb{R}^2, i=1mi=1 \ldots m

Nonlinear Least-Squares

minxRn12r(x)2=12i=1mri(x)2\underset{x \in \mathbb{R}^n}{\min}\frac{1}{2}\lVert r(x) \rVert^2=\frac{1}{2}\sum_{i=1}^m r_i(x)^2

r(x)=[r1(x)rm(x)]r(x)=\begin{bmatrix} r_1(x) \\ \vdots \\ r_m(x) \end{bmatrix}, ri(x):RnRr_i(x):\mathbb{R}^n→\mathbb{R}

Sensor localization example:

r(x)=[xb1f1xb2f2 xbmfm]r(x)= \begin{bmatrix} \lVert x-b_1\rVert-f_1\\ \lVert x-b_2\rVert-f_2\\ \ \vdots \\ \lVert x-b_m\rVert-f_m\end{bmatrix}

residual ri(x)=xbi2fir_i(x)=\lVert x-b_i \rVert_2-f_i

Special case:

ri(x)=aiTxbir_i(x)=a_i^Tx-b_i

r(x)=Axbr(x)=Ax-b

12r(x)2=12Axb2\frac{1}{2}\lVert r(x)\rVert^2=\frac{1}{2}\lVert Ax-b\rVert^2: linear least squares

Gauss-Newton

  • start with some guess for solution x(0)x^{(0)}

  • for k=0,1,2,3,k = 0,1,2,3,…

    • linearize the residuals rir_i at the current iterate x(k)x^{(k)}

      • rˉ(k)(x)\bar{r}^{(k)}(x) ← linearization at x(k)x^{(k)}

    • x(k+1)=arg minxRn12rˉ(k)(x)2x^{(k+1)}=\underset{x\in \mathbb{R}^n}{\argmin} \frac{1}{2} \lVert \bar{r}^{(k)}(x)\rVert^2

Linearization of a generic f:RnRf:\mathbb{R}^n\to \mathbb{R} at a point xRnx\in \mathbb{R}^n

f(x+d)=f(x)+f(x)Td+o(d)f(x+d)=f(x)+\nabla f(x)^Td+o(\lVert d\rVert)

or equivalently: z=x+dz=x+d or d=zxd=z-x

f(z)=f(x)+f(x)T(zx)+o(zx)f(z)=f(x)+\nabla f(x)^T(z-x)+o(\lVert z-x\rVert)

(little “oh”) go(ε)g\in o(\varepsilon) if limε0g(ε)ε=0\underset{\varepsilon → 0}{\lim}\frac{g(\varepsilon)}{\varepsilon}=0

Sensor Localization Example

r(x)=[xb1f1xb2f2xbmfm]r(x)= \begin{bmatrix} \lVert x-b_1\rVert-f_1\\ \lVert x-b_2\rVert-f_2 \\ \vdots \\ \lVert x-b_m\rVert-f_m\end{bmatrix}

linearization of vector-valued function r(x):RnRnr(x):\mathbb{R}^n→\mathbb{R}^n:

rˉ(k)(x)=[r1(x(k))+r1(x(k))T+o(xxk) rm(x(k))+rm(x(k))T+o(xxk)]=[r1(x(k))rm(x(k))]+[r1(x(k))Trm(x(k))T](xx(k))+o(xx(k))=r(x(k))+J(x(k))(xx(k))+o(xxk)\begin{aligned}\bar{r}^{(k)}(x)& =\begin{bmatrix} r_1(x^{(k)})+\nabla r_1(x^{(k)})^T+o(\lVert x-x^k\rVert) \\ \vdots\ r_m(x^{(k)})+\nabla r_m(x^{(k)})^T+o(\lVert x-x^k\rVert)\end{bmatrix}\\ & =\begin{bmatrix}r_1(x^{(k)})\\ \vdots\\ r_m(x^{(k)})\end{bmatrix}+\begin{bmatrix}\nabla r_1(x^{(k)})^T\\ \vdots\\ \nabla r_m(x^{(k)})^T\end{bmatrix}(x-x^{(k)})+o(\lVert x-x^{(k)}\rVert)\\&=r(x^{(k)})+J(x^{(k)})(x-x^{(k)})+o(\lVert x-x^k\rVert)\end{aligned}

where JJ is Jacobian of rr at x(k)x^{(k)}

Jacobian matrix of a function r:RnRmr:\mathbb{R}^n→\mathbb{R}^m

J(x)=[r1(x)Trm(x)T]J(x)=\begin{bmatrix}\nabla r_1(x)^T\\ \vdots\\ \nabla r_m(x)^T\end{bmatrix}

is a n×mn\times m matrix

“Gradient” of r:RnRmr:\mathbb{R}^n→\mathbb{R}^m

r(x)=J(x)T\nabla r(x)=J(x)^T

Goal: x(0),x(1),x(2),xtx^{(0)},x^{(1)},x^{(2)},…→x_t

  1. rˉ(k)(x)=r(x(k))+J(x(k))(xx(k))\bar{r}^{(k)}(x)=r(x^{(k)})+J(x^{(k)})(x-x^{(k)}) linearization at x(k)x^{(k)}

  2. x(k+1)=arg minx12r(x(k))+J(x(k))(xx(k))2=arg min12J(x(k))x(J(x(k))x(k)r(x(k)))2=arg min12Akxbk2\begin{aligned} x^{(k+1)} & = \underset{x}{\argmin} \frac{1}{2} \lVert r(x^{(k)})+J(x^{(k)})(x-x^{(k)})\rVert^2 \\ & = \argmin \frac{1}{2} \lVert J(x^{(k)})x-(J(x^{(k)})\cdot x^{(k)}-r(x^{(k)})) \rVert^2 \\&= \argmin\frac{1}{2}\lVert A_kx-b_k\rVert^2\end{aligned}

where

i.e., xk+1=Ak\bkx^{k+1}=A_k \backslash b_k

pseudo-code for Gauss-Newton

x(0)Rnx^{(0)} \in \mathbb{R}^n given; ε>0\varepsilon > 0 given

for k=0,1,2,3,k=0,1,2,3,…

x(k+1)=Ak\bkx^{(k+1)} = A_k \backslash b_k

where Ak=J(x(k))A_k=J(x^{(k)}), bk=Akx(k)r(x(k))b_k=A_kx^{(k)}-r(x^{(k)})

if x(k+1)x(k)<ε\lVert x^{(k+1)}-x^{(k)}\rVert < \varepsilon, exit

return x(k+1)x^{(k+1)}

Gauss-Newton Implementation

Least-Squares: xx is optimal for min12Axb2\min\frac{1}{2} \Vert Ax-b \rVert^2 iff AT(Axb)=0A^T(Ax-b)=0 or ATr=0A^Tr=0.

Last updated