RANSAC or “RANdom SAmple Consensus” is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is one of classical techniques in computer vision. My motivation for this post has been triggered by a fact that Python doesn’t have a RANSAC implementation so far.
The single one I have found on the Scipy Cookbook is actually not an original RANSAC implementation. Looking on that page at the figure above we see a nicely fitted model describing the data containing outliers. However, going through the implementation you will see that it judges the model using the squared error between the model and testing points. According to this, testing points satisfying the model are added to a set of previously selected points (considered for the modeling) and a new model is estimated using all these points altogether. The final model is tested against all selected points using the squared error. It’s not a bad idea, but there are two problems. First, it is not an initial RANSAC algorithm which judges the model by calculating distances from testing points to the model and classifies those ones as outliers whose distances are higher than a pre-defined threshold. Second, the present implementation contains some inconsistencies classifying points being far away from the model as inliers (see points marked with blue crosses in the figure below).
Therefore I’ve decided to implement the RANSAC algorithm in Python by myself. Before we will have a look at the Python code there are some details about the method and its implementation. RANSAC is a non-deterministic algorithm in a sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. More details about the RANSAC algorithm you can find here and on external links in the bottom of the page. The introduced implementation performs as follows:
- The input to the RANSAC algorithm is a set of data points which contains outliers. The goal is to find a model describing inliers from the given data set.
- RANSAC achieves its goal by iteratively selecting a random subset of the original data. In case of a line in a two-dimensional plane two points are sufficient to fit a model.
- The randomly chosen points are considered as hypothetical inliers and all other points are then tested against the fitted model. Those points that fit the estimated model well are considered as a part of the consensus set.
- The estimated model is reasonably good if sufficiently many points (due to the pre-defined threshold) have been classified as inliers (a part of the consensus set).
- Optionally the model can be improved by reestimating it using all members of the consensus set. This step is omitted in the current implementation.
- This procedure is repeated a fixed number of times, every iteration producing a new model. The method stops once either a model with an appropriate consensus set size is found, or after a pre-defined number of iterations selecting a model with less outliers as the best model.
Now we are ready to have a look at the Python code. First of all we set all RANSAC parameters and generate a sample set:
import numpy as np import scipy import matplotlib.pyplot as plt import math import sys # Ransac parameters ransac_iterations = 20 # number of iterations ransac_threshold = 3 # threshold ransac_ratio = 0.6 # ratio of inliers required to assert # that a model fits well to data # generate sparse input data n_samples = 500 # number of input points outliers_ratio = 0.4 # ratio of outliers n_inputs = 1 n_outputs = 1 # generate samples x = 30*np.random.random((n_samples,n_inputs) ) # generate line's slope (called here perfect fit) perfect_fit = 0.5*np.random.normal(size=(n_inputs,n_outputs) ) # compute output y = scipy.dot(x,perfect_fit)
Since almost all measurements usually contain noise and outliers, we also add some. We will try the method with both Gaussian and non Gaussian outliers, the difference is just one line of code.
# add a little gaussian noise x_noise = x + np.random.normal(size=x.shape) y_noise = y + np.random.normal(size=y.shape) # add some outliers to the point-set n_outliers = outliers_ratio*n_samples indices = np.arange(x_noise.shape) np.random.shuffle(indices) outlier_indices = indices[:n_outliers] x_noise[outlier_indices] = 30*np.random.random(size=(n_outliers,n_inputs)) # gaussian outliers y_noise[outlier_indices] = 30*np.random.normal(size=(n_outliers,n_outputs)) # non-gaussian outliers (only on one side) #y_noise[outlier_indices] = 30*(np.random.normal(size=(n_outliers,n_outputs))**2)
Before we will go through the RANSAC routine, we implement two short functions essential for the model estimation and its judgement. The first function estimates parameters of the line model for two randomly selected points, where is line’s slope and its y-intercept. For two randomly chosen points and and values are found as: , and .
def find_line_model(points): """ find a line model for the given points :param points selected points for model fitting :return line model """ # [WARNING] vertical and horizontal lines should be treated differently # here we just add some noise to avoid division by zero # find a line model for these points m = (points[1,1] - points[0,1]) / (points[1,0] - points[0,0] + sys.float_info.epsilon) # slope (gradient) of the line c = points[1,1] - m * points[1,0] # y-intercept of the line return m, c
The next function finds an intercept point of the normal from point to the estimated line model. Point is taken from the testing set used for the model evaluation.
def find_intercept_point(m, c, x0, y0): """ find an intercept point of the line model with a normal from point (x0,y0) to it :param m slope of the line model :param c y-intercept of the line model :param x0 point's x coordinate :param y0 point's y coordinate :return intercept point """ # intersection point with the model x = (x0 + m*y0 - m*c)/(1 + m**2) y = (m*x0 + (m**2)*y0 - (m**2)*c)/(1 + m**2) + c return x, y
Now we have everything to start an iterative model search:
data = np.hstack( (x_noise,y_noise) ) ratio = 0. model_m = 0. model_c = 0. # perform RANSAC iterations for it in range(ransac_iterations): # pick up two random points n = 2 all_indices = np.arange(x_noise.shape) np.random.shuffle(all_indices) indices_1 = all_indices[:n] indices_2 = all_indices[n:] maybe_points = data[indices_1,:] test_points = data[indices_2,:] # find a line model for these points m, c = find_line_model(maybe_points) x_list =  y_list =  num = 0 # find orthogonal lines to the model for all testing points for ind in range(test_points.shape): x0 = test_points[ind,0] y0 = test_points[ind,1] # find an intercept point of the model with a normal from point (x0,y0) x1, y1 = find_intercept_point(m, c, x0, y0) # distance from point to the model dist = math.sqrt((x1 - x0)**2 + (y1 - y0)**2) # check whether it's an inlier or not if dist < ransac_threshold: x_list.append(x0) y_list.append(y0) num += 1 x_inliers = np.array(x_list) y_inliers = np.array(y_list) # in case a new model is better - cache it if num/float(n_samples) > ratio: ratio = num/float(n_samples) model_m = m model_c = c print ' inlier ratio = ', num/float(n_samples) print ' model_m = ', model_m print ' model_c = ', model_c # plot the current step ransac_plot(it, x_noise,y_noise, m, c, False, x_inliers, y_inliers, maybe_points) # we are done in case we have enough inliers if num > n_samples*ransac_ratio: print 'The model is found !' break # plot the final model ransac_plot(0, x_noise,y_noise, model_m, model_c, True) print '\nFinal model:\n' print ' ratio = ', ratio print ' model_m = ', model_m print ' model_c = ', model_c
Of course this code might be optimized a bit, but this version is quite clear to catch up the main idea. You can also pick more than two points for the model fitting, but in this case you need to reimplement find_line_model() function, e.g. using the linear least squares for estimation of and values. Below is a function used in the code for plotting input data, intermediate steps and the final result:
def ransac_plot(n, x, y, m, c, final=False, x_in=(), y_in=(), points=()): """ plot the current RANSAC step :param n iteration :param points picked up points for modeling :param x samples x :param y samples y :param m slope of the line model :param c shift of the line model :param x_in inliers x :param y_in inliers y """ fname = "output/figure_" + str(n) + ".png" line_width = 1. line_color = '#0080ff' title = 'iteration ' + str(n) if final: fname = "output/final.png" line_width = 3. line_color = '#ff0000' title = 'final solution' plt.figure("Ransac", figsize=(15., 15.)) # grid for the plot grid = [min(x) - 10, max(x) + 10, min(y) - 20, max(y) + 20] plt.axis(grid) # put grid on the plot plt.grid(b=True, which='major', color='0.75', linestyle='--') plt.xticks([i for i in range(min(x) - 10, max(x) + 10, 5)]) plt.yticks([i for i in range(min(y) - 20, max(y) + 20, 10)]) # plot input points plt.plot(x[:,0], y[:,0], marker='o', label='Input points', color='#00cc00', linestyle='None', alpha=0.4) # draw the current model plt.plot(x, m*x + c, 'r', label='Line model', color=line_color, linewidth=line_width) # draw inliers if not final: plt.plot(x_in, y_in, marker='o', label='Inliers', linestyle='None', color='#ff0000', alpha=0.6) # draw points picked up for the modeling if not final: plt.plot(points[:,0], points[:,1], marker='o', label='Picked points', color='#0000cc', linestyle='None', alpha=0.6) plt.title(title) plt.legend() plt.savefig(fname) plt.close()
Let’s have a look at outputs of the RANSAC algorithm: some intermediate steps and the final result. Green points are samples from the sample set, blue points are points selected randomly for the model fitting, red points are those classified as a part of the consensus set (inliers). The fitted model is depicted by the blue line. The red line on the last plot shows the final solution.
The next run shows how the RANSAC algorithm performs in case of non Gaussian outliers:
# non Gaussian outliers (only on one side) y_noise[outlier_indices] = 30*\ (np.random.normal(size=(n_outliers,n_outputs))**2)
The most interesting parameters to play with are the ratio of outliers in the sample set and the number of iterations. In case of a low outliers ratio it might make sense to pick more points for the modelling, especially for polynomials of a higher degree (e.g. quadratic or cubic). It will increase a probability of getting a good model. But this is not the case when a number of outliers is sufficiently high, since a probability of getting a good model decreases dramatically. Actually it’s a small disadvantage of RANSAC that all parameters need to be adopted to every specific task. But fitting various models and playing with parameters you get a feeling for all RANSAC parameters very quickly.
Enjoy RANSAC in Python,