Regularization

Motivation:

We wish to choose xx to balance the two objective values

f1(x)=Axb2f_1(x)=\lVert Ax-b\rVert^2 and f2(x)=Cxd2f_2(x)=\lVert Cx-d\rVert^2.

Generally, we can make f1(x)f_1(x) or f2(x)f_2(x) small, but not both.

b is observation (noisy)

Naive Least-Sq formation:

true signal x0x_0 recovered via minxxb2\underset{x}{\min} \lVert x-b\rVert^2

b=x0+noiseb=x_0+noisex0b=noisex_0-b=noise

Need to incorporate “side” information

“side” ↔ ”prior” ⇒ signal x0x_0 is smooth

Combining two models

minxf1(x)+f2(x)\underset{x}{\min}f_1(x)+f_2(x)

Balance competing objectives f1f_1 and f2f_2

  1. minx{f1(x)+λf2(x)}\underset{x}{\min}\{f_1(x)+\lambda f_2(x)\}, λ0\lambda ≥ 0

    λ\lambda: “regularization” parameter

    f2(x)f_2(x): regularization function

  2. minx{f1(x)f2(x)τ}\underset{x}{\min}\{f_1(x)\mid f_2(x)≤\tau\}, τ0\tau ≥ 0 min{f1(x)x st. f2(x)τ}\min\{f_1(x) \mid \forall x \text{ st. } f_2(x) \le \tau\}

    min{f1(x)xlevτf2}\min\{f_1(x)\mid x \in lev_\tau f_2\}

  3. min{f2(x)f1(x)Δ}\min\{f_2(x)\mid f_1(x)≤ \Delta\}, Δ0\Delta≥ 0

Tikhonov regularization

A particularly common example of regularized least-squares is Tikhonov, which has the form

minx12Axb2+λ12Dx2\underset{x}{\min}\frac{1}{2}\lVert Ax-b\rVert^2+\lambda \frac{1}{2}\lVert Dx\rVert^2

for which the objective can be expressed as

Axb2+λDx2=[AλD]x[b0]2\lVert Ax-b\rVert^2+\lambda \lVert Dx\rVert^2=\lVert \begin{bmatrix}A\\ \sqrt{\lambda}D\end{bmatrix}x-\begin{bmatrix}b\\ 0\end{bmatrix}\rVert^2

If DD has full column rank, then the stacked matrix

[AλD]\begin{bmatrix}A\\\sqrt{\lambda}D\end{bmatrix}

necessarily also has full rank for any positive λ\lambda, which implies that the regularized problem always has a well-defined unique solution.

💡 Example.

min{f2(x)xb2Δ}\min\{f_2(x)\mid \lVert x-b\rVert^2≤\Delta\}

Choose f2f_2 that promotes “smoothness”

signal x=(x1,x2,x3,,xn)x=(x_1,x_2,x_3,…,x_n)

Measure smoothness by measuring change between xi+1x_{i+1} and xix_i: f2(x)=n1i=1(xixi+1)2f_2(x)=\sum^{n-1}{i=1}(x_i-x{i+1})^2

minx12xb2f1(x)+λ12i=1n1(xixi+1)2f2(x).\underset{x}{\min}\frac{1}{2}\underbrace{\lVert x-b\rVert^2}_{f_1(x)}+\lambda \frac{1}{2} \underbrace{\sum^{n-1}_{i=1}(x_i-x_{i+1})^2}_{f_2(x)}.

Matrix notation:

D=[1100011000011000011]D = \begin{bmatrix} 1 & -1 & 0 & \ldots & \ldots & 0\\ 0 & 1 & -1 & 0 & \ldots & 0\\ 0 & 0 & 1 & -1 & \ldots & 0\\ & &\ddots \\ 0 & 0 & \ldots & 0 & 1 & -1\end{bmatrix}

f2(x)=Dx2f_2(x)=\lVert Dx\rVert^2

Dx=[x1x2x2x3xn1xn]Dx = \begin{bmatrix} x_1-x_2\\ x_2-x_3\\ \vdots \\ x_{n-1}-x_n \end{bmatrix}

This allows for a reformulation of the weighted least squares objective into a familiar least squares objective:

xb2+λDx2=[IλD]A^x[b0]b^2\lVert x-b\rVert^2+\lambda \lVert Dx\rVert^2=\lVert \underbrace{\begin{bmatrix}I\\\sqrt{\lambda}D\end{bmatrix}}_{\hat{A}}x-\underbrace{\begin{bmatrix}b\\ 0\end{bmatrix}}_{\hat{b}}\rVert^2

So the solution to the weighted least squares minimization program satisfies the normal equation A^TA^x=A^Tb^\hat{A}^T\hat{A}x=\hat{A}^T\hat{b}, which simplifies to

(I+λDTD)x=b.(I+\lambda D^TD)x=b.

Last updated