Back To The Basics – Sorting

Every couple of years it can be fun to revisit the basics. Given that I’m back in interview mode, I figured now was a good time to rewrite some of those sorting algorithms from sophomore year… in Go of course. You can find all the code in this post on GitHub under the sort directory here. The benchmark results can be reproduced by running the “gotest -v -bench=.” command from within the sort package directory. All of the results stem from the same input data and each algorithm is run with both sorted and unsorted data.

The easiest to implement and the most widely used sorting algorithm for small sets of data is Selection sort. It uses two loops and checks each entry for a suitable replacement.

package sort

func swap(a []int, x int, y int){
	var temp = a[x]
	a[x] = a[y]
	a[y] = temp
}

func SelectionSort(a []int) {
	for i, _ := range a {
		for j := i + 1; j < len(a); j++ {
			if a[j] < a[i] {
				swap(a, i, j)
			}
		}
	}
}

The results show that Selection sort performs substantially faster if the array is already sorted. The worst-case, average-case, and best-case running time are all quadratic which makes it a bad choice for large quantities of data.

Selection Sort: O(n^2)
sort.BenchmarkSelectionSort_Unsorted      500      4792526 ns/op
sort.BenchmarkSelectionSort_Sorted        2000      1026659 ns/op

Another simple and natural sorting algorithm is Insertion sort. Similar to Selection sort it uses two loops to sort but offers a much better best-case running time. When the data set is nearly sorted it can execute in linear time as the second loop only executes if the value to the left is greater then itself.

func InsertionSort(a []int) {
	for j := 1; j < len(a); j++ {
		var key = a[j]
		var i = j-1
		for i >= 0 && a[i] > key {
			a[i+1] = a[i]
			i = i -1
		}
		a[i+1] = key
	}
}

The results show a substantial increase over Selection sort above. Both the worst-case and average-case running times are quadratic, but the best-case is linear. Notice the benchmark time for an already sorted array.

Insertion Sort: O(n^2)
sort.BenchmarkInsertionSort_Unsorted      1000       1835507 ns/op
sort.BenchmarkInsertionSort_Sorted      200000       8698 ns/op

Quadratic sorting time just isn’t acceptable for large sets of data. To help achieve a linear sorting time several divide-and-conquer recursive algorithms exists. Quicksort is the most common, although slightly more complicated than the above quadratic algorithms. It offers a substantial performance increase when dealing with large unsorted data sets. A complete breakdown of Quicksort is outside the scope of this post, if necessary you can study it more in-depth here. Basically, Quicksort recursively sorts the data set by continually dividing the set into subsets centered around a pivot point (in this case a[r] in partition() below).

func partition(a []int, p int, r int) int {
	var x = a[r]
	var i = p - 1
	for j := p; j <= r - 1; j++ {
		if a[j] <= x {
			i = i + 1
			swap(a,i,j)
		}
	}
	swap(a,i+1,r)
	return i + 1
}

func quickSort(a []int, p int, r int) {
	if p < r {
		var q = partition(a, p, r)
		quickSort(a,p,q-1)
		quickSort(a,q+1,r)
	}
}

func QuickSort(a []int) {
	quickSort(a, 0, len(a)-1)
}

The worst-case running time is quadratic and happens when the array is already sorted. The results show a slower worst-case sorting time then Selection sort, making Quicksort a bad choice if the data set is already or nearly sorted. However, the average-case and best-case running times are linear making it a good all-around sorting algorithm for large data sets.

Quicksort: O(n log n)
sort.BenchmarkQuickSort_Unsorted      500      2836278 ns/op
sort.BenchmarkQuickSort_Sorted          500      6091908 ns/op

Despite the obvious benefits of Quicksort, it’s often a poor choice for applications requiring a consistent predictable sort time (such as in real-time embedded systems). This is because it is often not possible to know how sorted the data set is at any given time, meaning that it’s difficult to predict if the sort operation will be O(n^2) or O(n log n). In these scenarios using the Heap sort algorithm is a better choice. Although at first, it appears more complicated then Quicksort, it offers both a worst-case and average-case O(n log n) sort time.

A heap is a binary tree where the entries of the nodes can be compared with the less-than operator, meaning that the value of a node is never less than its children and each level of the tree is complete with non-complete leaves. The deepest level is as far left as possible. For a complete description of Heap sort look here.

// in Python these would be a lambda and C++ would be an inline
func parent(i int) int {
	return ((i-1)/2)
}

func left(i int) int {
	return ((2*i)+1)
}

func right(i int) int {
	return ((2*i)+2)
}

func MaxHeapify(a []int, i int, size int) {
	var left = left(i)
	var right = right(i)
	var largest = i

	if left  a[i] {
		largest = left
	}

	if right  a[largest] {
		largest = right
	}

	if largest != i {
		swap(a,i,largest)
		MaxHeapify(a, largest, size)
	}
}

func BuildMaxHeap(a []int) {
	for i:=(len(a)/2); i >= 0; i-- {
		MaxHeapify(a,i,len(a))
	}
}

func HeapSort(a []int) {
	BuildMaxHeap(a)
	var size = len(a)
	for i:=len(a)-1; i >=1; i-- {
		swap(a,0,i)
		size = size - 1
		MaxHeapify(a,0, size)
	}
}

Heap Sort: O(n log n)
sort.BenchmarkHeapSort_Unsorted      10000       233454 ns/op
sort.BenchmarkHeapSort_Sorted          10000       262941 ns/op