Friday, November 19, 2010

Faster median calculation and generic rank selections in MatLab via nth_element

Update (2011-09-03): I've added new functions to the release that edit the arrays in-place. This is not good Matlab style but gives significant speedups for the very performance conscious. It appears it may be useful for me to add a multithreaded version as well, so this may come in an upcoming release.

C++ std::nth_element is a standard specification for an efficient rank selection algorithm. This algorithm can be used to solve a number of rank selection problems efficiently.

One example is median finding. A naive median finding algorithm, as implemented in MatLab (at least up to R2010a) is to sort the entire list of values and then just take the value in the middle position. But sorting the entire list is overkill; it is more efficient to do a sort of partial sort just to the point where the desired value is in the correct position. The version of this typically implemented in nth_element is known as quickselect or Hoare's Selection Algorithm. See Wikipedia: Selection algorithm for more details.

I find that a MatLab fast_median function based on nth_element runs about twice as fast as the native MatLab median function based on sort. Theoretically, the average complexity of quickselect is O(n), while the best performance complexity of the sort based method is O(n log n).

Finding median values is central to robust statistics, for example calculating the Median Absolute Deviation.

If you are interested, my MatLab wrapping of nth_element and an example implementation of fast_median are available at MatLab Central File Exchange.