Robust linear model estimation using RANSAC – Python implementation

RANSAC or “RANdom SAmple Consensus” is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is one of classical techniques in computer vision. My motivation for this post has been triggered by a fact that Python doesn’t have a RANSAC implementation so far.

UpperAndLowerConvexHulls

The single one I have found on the Scipy Cookbook is actually not an original RANSAC implementation. Looking on that page at the figure above we see a nicely fitted model describing the data containing outliers. However, going through the implementation you will see that it judges the model using the squared error between the model and testing points. According to this, testing points satisfying the model are added to a set of previously selected points (considered for the modeling) and a new model is estimated using all these points altogether. The final model is tested against all selected points using the squared error. It’s not a bad idea, but there are two problems. First, it is not an initial RANSAC algorithm which judges the model by calculating distances from testing points to the model and classifies those ones as outliers whose distances are higher than a pre-defined threshold. Second, the present implementation contains some inconsistencies classifying points being far away from the model as inliers (see points marked with blue crosses in the figure below).

Therefore I’ve decided to implement the RANSAC algorithm in Python by myself. Before we will have a look at the Python code there are some details about the method and its implementation. RANSAC is a non-deterministic algorithm in a sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. More details about the RANSAC algorithm you can find here and on external links in the bottom of the page. The introduced implementation performs as follows:

  • The input to the RANSAC algorithm is a set of data points which contains outliers. The goal is to find a model describing inliers from the given data set.
  • RANSAC achieves its goal by iteratively selecting a random subset of the original data. In case of a line in a two-dimensional plane two points are sufficient to fit a model.
  • The randomly chosen points are considered as hypothetical inliers and all other points are then tested against the fitted model. Those points that fit the estimated model well are considered as a part of the consensus set.
  • The estimated model is reasonably good if sufficiently many points (due to the pre-defined threshold) have been classified as inliers (a part of the consensus set).
  • Optionally the model can be improved by reestimating it using all members of the consensus set. This step is omitted in the current implementation.
  • This procedure is repeated a fixed number of times, every iteration producing a new model. The method stops once either a model with an appropriate consensus set size is found, or after a pre-defined number of iterations selecting a model with less outliers as the best model.

Now we are ready to have a look at the Python code. First of all we set all RANSAC parameters and generate a sample set:


import numpy as np
import scipy
import matplotlib.pyplot as plt
import math
import sys

# Ransac parameters
ransac_iterations = 20  # number of iterations
ransac_threshold = 3    # threshold
ransac_ratio = 0.6      # ratio of inliers required to assert
                        # that a model fits well to data

# generate sparse input data
n_samples = 500               # number of input points
outliers_ratio = 0.4          # ratio of outliers

n_inputs = 1
n_outputs = 1

# generate samples
x = 30*np.random.random((n_samples,n_inputs) )

# generate line's slope (called here perfect fit)
perfect_fit = 0.5*np.random.normal(size=(n_inputs,n_outputs) )

# compute output
y = scipy.dot(x,perfect_fit)

Since almost all measurements usually contain noise and outliers, we also add some. We will try the method with both Gaussian and non Gaussian outliers, the difference is just one line of code.


# add a little gaussian noise
x_noise = x + np.random.normal(size=x.shape)
y_noise = y + np.random.normal(size=y.shape)

# add some outliers to the point-set
n_outliers = outliers_ratio*n_samples
indices = np.arange(x_noise.shape[0])
np.random.shuffle(indices)
outlier_indices = indices[:n_outliers]

x_noise[outlier_indices] = 30*np.random.random(size=(n_outliers,n_inputs))

# gaussian outliers
y_noise[outlier_indices] = 30*np.random.normal(size=(n_outliers,n_outputs))

# non-gaussian outliers (only on one side)
#y_noise[outlier_indices] = 30*(np.random.normal(size=(n_outliers,n_outputs))**2)

Before we will go through the RANSAC routine, we implement two short functions essential for the model estimation and its judgement. The first function estimates parameters of the line model y = mx + c for two randomly selected points, where m is line’s slope and c its y-intercept. For two randomly chosen points (x_{1}, y_{1}) and (x_{2}, y_{2}) m and c values are found as: m = (y_{2} - y_{1}) / (x_{2} - x_{1}), and c = y_{2} - mx_{2}.


def find_line_model(points):
    """ find a line model for the given points
    :param points selected points for model fitting
    :return line model
    """

    # [WARNING] vertical and horizontal lines should be treated differently
    #           here we just add some noise to avoid division by zero

    # find a line model for these points
    m = (points[1,1] - points[0,1]) / (points[1,0] - points[0,0] + sys.float_info.epsilon)  # slope (gradient) of the line
    c = points[1,1] - m * points[1,0]                                     # y-intercept of the line

    return m, c

The next function finds an intercept point of the normal from point (x_{0}, y_{0}) to the estimated line model. Point (x_{0}, y_{0}) is taken from the testing set used for the model evaluation.


def find_intercept_point(m, c, x0, y0):
    """ find an intercept point of the line model with
        a normal from point (x0,y0) to it
    :param m slope of the line model
    :param c y-intercept of the line model
    :param x0 point's x coordinate
    :param y0 point's y coordinate
    :return intercept point
    """

    # intersection point with the model
    x = (x0 + m*y0 - m*c)/(1 + m**2)
    y = (m*x0 + (m**2)*y0 - (m**2)*c)/(1 + m**2) + c

    return x, y

Now we have everything to start an iterative model search:


data = np.hstack( (x_noise,y_noise) )

ratio = 0.
model_m = 0.
model_c = 0.

# perform RANSAC iterations
for it in range(ransac_iterations):

    # pick up two random points
    n = 2

    all_indices = np.arange(x_noise.shape[0])
    np.random.shuffle(all_indices)

    indices_1 = all_indices[:n]
    indices_2 = all_indices[n:]

    maybe_points = data[indices_1,:]
    test_points = data[indices_2,:]

    # find a line model for these points
    m, c = find_line_model(maybe_points)

    x_list = []
    y_list = []
    num = 0

    # find orthogonal lines to the model for all testing points
    for ind in range(test_points.shape[0]):

        x0 = test_points[ind,0]
        y0 = test_points[ind,1]

        # find an intercept point of the model with a normal from point (x0,y0)
        x1, y1 = find_intercept_point(m, c, x0, y0)

        # distance from point to the model
        dist = math.sqrt((x1 - x0)**2 + (y1 - y0)**2)

        # check whether it's an inlier or not
        if dist < ransac_threshold:
            x_list.append(x0)
            y_list.append(y0)
            num += 1

    x_inliers = np.array(x_list)
    y_inliers = np.array(y_list)

    # in case a new model is better - cache it
    if num/float(n_samples) > ratio:
        ratio = num/float(n_samples)
        model_m = m
        model_c = c

    print '  inlier ratio = ', num/float(n_samples)
    print '  model_m = ', model_m
    print '  model_c = ', model_c

    # plot the current step
    ransac_plot(it, x_noise,y_noise, m, c, False, x_inliers, y_inliers, maybe_points)

    # we are done in case we have enough inliers
    if num > n_samples*ransac_ratio:
        print 'The model is found !'
        break

# plot the final model
ransac_plot(0, x_noise,y_noise, model_m, model_c, True)

print '\nFinal model:\n'
print '  ratio = ', ratio
print '  model_m = ', model_m
print '  model_c = ', model_c

Of course this code might be optimized a bit, but this version is quite clear to catch up the main idea. You can also pick more than two points for the model fitting, but in this case you need to reimplement find_line_model() function, e.g. using the linear least squares for estimation of m and c values. Below is a function used in the code for plotting input data, intermediate steps and the final result:


def ransac_plot(n, x, y, m, c, final=False, x_in=(), y_in=(), points=()):
    """ plot the current RANSAC step
    :param n      iteration
    :param points picked up points for modeling
    :param x      samples x
    :param y      samples y
    :param m      slope of the line model
    :param c      shift of the line model
    :param x_in   inliers x
    :param y_in   inliers y
    """

    fname = "output/figure_" + str(n) + ".png"
    line_width = 1.
    line_color = '#0080ff'
    title = 'iteration ' + str(n)

    if final:
        fname = "output/final.png"
        line_width = 3.
        line_color = '#ff0000'
        title = 'final solution'

    plt.figure("Ransac", figsize=(15., 15.))

    # grid for the plot
    grid = [min(x) - 10, max(x) + 10, min(y) - 20, max(y) + 20]
    plt.axis(grid)

    # put grid on the plot
    plt.grid(b=True, which='major', color='0.75', linestyle='--')
    plt.xticks([i for i in range(min(x) - 10, max(x) + 10, 5)])
    plt.yticks([i for i in range(min(y) - 20, max(y) + 20, 10)])

    # plot input points
    plt.plot(x[:,0], y[:,0], marker='o', label='Input points', color='#00cc00', linestyle='None', alpha=0.4)

    # draw the current model
    plt.plot(x, m*x + c, 'r', label='Line model', color=line_color, linewidth=line_width)

    # draw inliers
    if not final:
        plt.plot(x_in, y_in, marker='o', label='Inliers', linestyle='None', color='#ff0000', alpha=0.6)

    # draw points picked up for the modeling
    if not final:
        plt.plot(points[:,0], points[:,1], marker='o', label='Picked points', color='#0000cc', linestyle='None', alpha=0.6)

    plt.title(title)
    plt.legend()
    plt.savefig(fname)
    plt.close()

Let’s have a look at outputs of the RANSAC algorithm: some intermediate steps and the final result. Green points are samples from the sample set, blue points are points selected randomly for the model fitting, red points are those classified as a part of the consensus set (inliers). The fitted model is depicted by the blue line. The red line on the last plot shows the final solution.




The next run shows how the RANSAC algorithm performs in case of non Gaussian outliers:

# non Gaussian outliers (only on one side)

y_noise[outlier_indices] = 30*\
(np.random.normal(size=(n_outliers,n_outputs))**2)





The most interesting parameters to play with are the ratio of outliers in the sample set and the number of iterations. In case of a low outliers ratio it might make sense to pick more points for the modelling, especially for polynomials of a higher degree (e.g. quadratic or cubic). It will increase a probability of getting a good model. But this is not the case when a number of outliers is sufficiently high, since a probability of getting a good model decreases dramatically. Actually it’s a small disadvantage of RANSAC that all parameters need to be adopted to every specific task. But fitting various models and playing with parameters you get a feeling for all RANSAC parameters very quickly.

Enjoy RANSAC in Python,
Alexey

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Robust linear model estimation using RANSAC – Python implementation

  1. P says:

    Thank you for this very informative post!!
    I found a little mistake (more obvious for a small sample count):
    In line: if num/float(n_samples) > ratio:
    You would have to add “2” to the ‘num’ variable or subtract “2” from ‘n_samples’, because the two random “maybe” points will not be counted as inliers otherwise, which again makes the ratio wrong.

    e.g. if you use 5 input samples – 2 are randomly selected – and the line fits all other 3 as inliers, the equation would result in:
    3 / 5 … but really should be 5 / 5

    Hope this helps any future reader.
    Best regards,
    P

  2. This is very interesting. I am looking for a solution to match 2D points in two lists or different length. So I need to match one point fromt one list to a point in the other list. My points are features in images and then I need to match these features. Do you have any idea how to modify your code for this purpose?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s