## Robust Least Squares for fitting data (planar surface)

Damn, it’s a good while ago since the last activity on this blog…! In the future I will try to avoid such long breaks, as it’s also very useful for me to make notes about last approaches / news / papers / tendencies in the computer vision (CV) society. Today I have an interesting and (hopefully) useful topic. Sooner or later working in the CV domain you face the problem of fitting data to the model. Let’s assume that you have a response data coming from somewhere (or somebody) and want to find a model for it. By the model we understand here a function with some coefficients describing the data best. The least squares method is a very well known technique for estimation of the model coefficients, you can find more information about the method itself following this link.

The least squares technique is very often preferred due to its simplicity and high time performance as compared to other methods. It minimizes the summed square of residuals, where the residual is defined as the difference between the observed response value and the fitted response value. Thus, the residual is considered as the error associated with the data. For instance, assume that you want to fit a set of data points in the three-dimensional space with a planar surface described by three parameters $\bf{a_{1}}$, $\bf{a_{2}}$, and $\bf{a_{3}}$ as a function of $\bf{x}$ and $\bf{y}$ as follows: $\bf{z = a_{1}x + a_{2}y + a_{3}}$.

Sure, for more complex surfaces you will need more parameters and nonlinear dependencies of $\bf{x}$ and $\bf{y}$, so you can describe also curved surfaces. The general form of the surface model is a polynomial $\bf{z(x,y) = \sum\limits_{k=1}^M a_{k}\cdot\varphi_{k}(x,y)}$,

where $\bf{\varphi_{1}(x,y),\ldots,\varphi_{M}(x,y)}$ are arbitrary fixed functions of $\bf{x}$ and $\bf{y}$, called the basis functions. To solve the surface fitting problem, parameters $\bf{a_{k}}$, for which the model function $\bf{z(x,y)}$ looks like the data, need to be determined. In the current entry I will consider only the linear least squares method, where “linear” means the following. The basis functions $\bf{\varphi_{k}(x,y)}$ can be nonlinear functions of $\bf{x}$ and $\bf{y}$, while the dependence of the model on its parameters $\bf{a_{k}}$ is linear. Therefore, “linear” refers only to the model’s dependence on its parameters $\bf{a_{k}}$. An extensive mathematical description of the linear least squares solution can be found on the Documentation Center of the MathWorks here. Note, the following types of the linear least squares are considered: Linear least squares, Weighted linear least squares, and Robust least squares.

The goal of this post is to show the difference between the robust and non-robust estimate performed using the linear least squares. Let’s start with the regular (non-robust) method. Here and later I will use the Python programming language for explanation, since I find it quite compact for showing the main idea. You need just two Python packages: NumPy and Matplotlib to get the code running.

First of all we predefine the surface model with the so-called “ground truth” coefficients which we will need for evaluation of the fitting data (changes in the code required for a general quadric surface will be posted later):


#! /usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt

# coefficients of the model
a1, a2, a3 = 0.1, -0.2, 4.0

# ground truth
A_gt = [a1, a2, a3]

print 'A_gt = ', A_gt

# create a coordinate matrix
nx = np.linspace(-1, 1, 41)
ny = np.linspace(-1, 1, 41)
x, y = np.meshgrid(nx, ny)

# make the estimation
z = a1*x + a2*y + a3



In this experiment we build a response data, i.e., data, for which a model needs to be found, out of the ground truth by involving some noise into it. Here two cases need to be considered. In the first case we corrupt the ground truth data by the Gaussian noise only which makes the input data just slightly different from the ground truth:


#######################################################
# CASE 1: data is corrupted by gaussian noise         #
#         Regular linear least squares method is used #
#######################################################

# let's add some gaussian noise
z_noise = z + 0.1*np.random.standard_normal(z.shape)



Then we find the coefficients using the regular (non-robust) linear least squares and display the output of the found model:


# non-robust least squares estimation
# X*A = Z

x_fl = x.flatten()
y_fl = y.flatten()
z_ones = np.ones([x.size,1])

X = np.hstack((np.reshape(x_fl, ([len(x_fl),1])), np.reshape(y_fl, ([len(y_fl),1])), z_ones))

Z = np.zeros(z_noise.shape)
Z[:] = z_noise
Z_fl = Z.flatten()
Z = np.reshape(Z_fl, ([len(Z_fl),1]))

A_lsq = np.linalg.lstsq(X,Z)

z_least_squares = np.dot(X, A_lsq)
z_least_squares = np.reshape(z_least_squares, z.shape)

lsq_non_robust_noise = np.hstack((z, z_noise, z_least_squares))

plt.figure()
plt.title('Non-robust estimate (corrupted only by noise)')
plt.imshow(lsq_non_robust_noise)
plt.clim(z.min(), z.max())

plt.show()



As an output you should get the following image. Left panel: the ground truth data for the pre-defined model. Middle panel: the noisy input for the linear least squares fit. Right panel: data computed with the found model. As you can see, it looks pretty good and the found model is very close to the ground truth. But let’s make the input data more different from the ground truth. For this purpose we corrupt it by both the Gaussian noise and so-called “outliers” which makes our experiment more real, since outliers are (unfortunately) a big part of the CV life. Then we run again the regular (non-robust) linear least squares and display the output:


############################################################
# CASE 2: data is corrupted by gaussian noise AND outliers #
#         Regular linear least squares method is used      #
############################################################

# create outliers
outlier_prop = 0.3
outlier_IND = np.random.permutation(x.size)
outlier_IND = outlier_IND[0:np.floor(x.size * outlier_prop)]

z_noise_outlier = np.zeros(z_noise.shape)
z_noise_outlier[:] = z_noise
z_noise_outlier_flt = z_noise_outlier.flatten()

z_noise_outlier_flt[outlier_IND] = z_noise_outlier_flt[outlier_IND] + 10*np.random.standard_normal(z_noise_outlier_flt[outlier_IND].shape)
z_noise_outlier = np.reshape(z_noise_outlier_flt, z.shape)

# non-robust least squares estimation
Z = np.zeros(z_noise_outlier.shape)
Z[:] = z_noise_outlier
Z_fl = Z.flatten()
Z = np.reshape(Z_fl, ([len(Z_fl),1]))

A_lsq_outlier = np.linalg.lstsq(X,Z)

z_lsq_outlier = np.dot(X, A_lsq_outlier)
z_lsq_outlier = np.reshape(z_lsq_outlier, z.shape)

lsq_non_robust_outlier = np.hstack((z, z_noise_outlier, z_lsq_outlier))

plt.figure()
plt.title('Non-robust estimate (corrupted by noise AND outliers)')
plt.imshow(lsq_non_robust_outlier)
plt.clim(z.min(), z.max())

plt.show()



This time the output (on the right panel) looks very bad and the obtained model is obviously far away from our ground truth. It leads us to the following conclusion: the least squares fitting is very sensitive to outliers. Outliers have a large influence on the fit because squaring the residuals magnifies the effects of these extreme data points. To minimize the influence of outliers the robust least-squares regression is required. You can find more details here on the MathWorks. Here I use the robust estimate with bisquare weights which is an iteratively reweighted least-squares algorithm. Shortly, the method starts with the previous solution, given by the non-robust estimate, and every iteration reduces the weight of high-leverage data points, which have a large effect on the least-squares fit. This procedure runs till convergence. For more details check please the ‘robustfit‘ function from the Matlab here. Let’s check what the robust estimate can do with our input data containing outliers:


############################################################
# CASE 3: data is corrupted by gaussian noise AND outliers #
#         Robust least squares method is used              #
############################################################

# robust least sqaures (starting with the least squares solution)
A_robust = A_lsq_outlier
n_robust_it = 10

# iterate till the fit converges
for robust_it in range(n_robust_it):

# compute absolute value of residuals (fit minus data)
abs_resid = abs(np.dot(X, A_robust) - Z)

# compute the scaling factor for the standardization of residuals
# using the median absolute deviation of the residuals
# 6.9460 is a tuning constant (4.685/0.6745)
abs_res_scale = 6.9460 * np.median(abs_resid)

# standardize residuals
w = abs_resid / abs_res_scale

# compute the robust bisquare weights excluding outliers
outliers = (w > 1)*1
w[ outliers.nonzero() ] = 0

good_values = (w != 0)*1

# calculate robust weights for 'good' points
# Note that if you supply your own regression weight vector,
# the final weight is the product of the robust weight and the regression weight.
tmp = 1 - np.power(w[ good_values.nonzero() ], 2)
w[ good_values.nonzero() ] = np.power(tmp, 2)

# get weighted X'es
XW = np.tile(w, (1, 3)) * X

a = np.dot(XW.T, X)
b = np.dot(XW.T, Z)

# get the least-squares solution to a linear matrix equation
A_robust = np.linalg.lstsq(a,b)

z_robust = np.dot(X, A_robust)
z_robust = np.reshape(z_robust, z.shape)

lsq_robust = np.hstack((z, z_noise_outlier, z_robust))

plt.figure()
plt.title('Robust estimate (corrupted by noise AND outliers)')
plt.imshow(lsq_robust)
plt.clim(z.min(), z.max())

plt.show()



And here is the output which looks again pretty good despite many outliers in the input data. The robust fit follows the bulk of the data and is not strongly influenced by the outliers. Awesome, isn’t it?;-) At this point I wish you a lot of fun with the robust least squares despite outliers (hope you won’t get too many);-) Also I thank Dr. Karl Pauwels from the University of Granada in Spain who inspired me with this approach.

Best wishes,
Alexey

This entry was posted in Uncategorized and tagged . Bookmark the permalink.