Motivation:
We wish to choose x to balance the two objective values
f1(x)=∥Ax−b∥2 and f2(x)=∥Cx−d∥2.
Generally, we can make f1(x) or f2(x) small, but not both.
b is observation (noisy)
Naive Least-Sq formation:
true signal x0 recovered via xmin∥x−b∥2
b=x0+noise ⇒ x0−b=noise
Need to incorporate “side” information
“side” ↔ ”prior” ⇒ signal x0 is smooth
Combining two models
xminf1(x)+f2(x)
Balance competing objectives f1 and f2
xmin{f1(x)+λf2(x)}, λ≥0
λ: “regularization” parameter
f2(x): regularization function
xmin{f1(x)∣f2(x)≤τ}, τ≥0 min{f1(x)∣∀x st. f2(x)≤τ}
min{f1(x)∣x∈levτf2}
min{f2(x)∣f1(x)≤Δ}, Δ≥0
Tikhonov regularization
A particularly common example of regularized least-squares is Tikhonov, which has the form
xmin21∥Ax−b∥2+λ21∥Dx∥2 for which the objective can be expressed as
∥Ax−b∥2+λ∥Dx∥2=∥[AλD]x−[b0]∥2 If D has full column rank, then the stacked matrix
[AλD] necessarily also has full rank for any positive λ, which implies that the regularized problem always has a well-defined unique solution.
💡 Example.
min{f2(x)∣∥x−b∥2≤Δ}
Choose f2 that promotes “smoothness”
signal x=(x1,x2,x3,…,xn)
Measure smoothness by measuring change between xi+1 and xi: f2(x)=∑n−1i=1(xi−xi+1)2
xmin21f1(x)∥x−b∥2+λ21f2(x)i=1∑n−1(xi−xi+1)2. Matrix notation:
D=1000−11000−11⋱……0−10………1000−1
f2(x)=∥Dx∥2
Dx=x1−x2x2−x3⋮xn−1−xn
This allows for a reformulation of the weighted least squares objective into a familiar least squares objective:
∥x−b∥2+λ∥Dx∥2=∥A^[IλD]x−b^[b0]∥2 So the solution to the weighted least squares minimization program satisfies the normal equation A^TA^x=A^Tb^, which simplifies to
(I+λDTD)x=b.