Every couple of years it can be fun to revisit the basics. Given that I’m back in interview mode, I figured now was a good time to rewrite some of those sorting algorithms from sophomore year… in Go of course. You can find all the code in this post on GitHub under the sort directory here. The benchmark results can be reproduced by running the “gotest -v -bench=.” command from within the sort package directory. All of the results stem from the same input data and each algorithm is run with both sorted and unsorted data.
The easiest to implement and the most widely used sorting algorithm for small sets of data is Selection sort. It uses two loops and checks each entry for a suitable replacement.
package sort
func swap(a []int, x int, y int){
var temp = a[x]
a[x] = a[y]
a[y] = temp
}
func SelectionSort(a []int) {
for i, _ := range a {
for j := i + 1; j < len(a); j++ {
if a[j] < a[i] {
swap(a, i, j)
}
}
}
}
The results show that Selection sort performs substantially faster if the array is already sorted. The worst-case, average-case, and best-case running time are all quadratic which makes it a bad choice for large quantities of data.
Selection Sort: O(n^2)
sort.BenchmarkSelectionSort_Unsorted 500 4792526 ns/op
sort.BenchmarkSelectionSort_Sorted 2000 1026659 ns/op
Another simple and natural sorting algorithm is Insertion sort. Similar to Selection sort it uses two loops to sort but offers a much better best-case running time. When the data set is nearly sorted it can execute in linear time as the second loop only executes if the value to the left is greater then itself.
func InsertionSort(a []int) {
for j := 1; j < len(a); j++ {
var key = a[j]
var i = j-1
for i >= 0 && a[i] > key {
a[i+1] = a[i]
i = i -1
}
a[i+1] = key
}
}
The results show a substantial increase over Selection sort above. Both the worst-case and average-case running times are quadratic, but the best-case is linear. Notice the benchmark time for an already sorted array.
Insertion Sort: O(n^2)
sort.BenchmarkInsertionSort_Unsorted 1000 1835507 ns/op
sort.BenchmarkInsertionSort_Sorted 200000 8698 ns/op
Quadratic sorting time just isn’t acceptable for large sets of data. To help achieve a linear sorting time several divide-and-conquer recursive algorithms exists. Quicksort is the most common, although slightly more complicated than the above quadratic algorithms. It offers a substantial performance increase when dealing with large unsorted data sets. A complete breakdown of Quicksort is outside the scope of this post, if necessary you can study it more in-depth here. Basically, Quicksort recursively sorts the data set by continually dividing the set into subsets centered around a pivot point (in this case a[r] in partition() below).
func partition(a []int, p int, r int) int {
var x = a[r]
var i = p - 1
for j := p; j <= r - 1; j++ {
if a[j] <= x {
i = i + 1
swap(a,i,j)
}
}
swap(a,i+1,r)
return i + 1
}
func quickSort(a []int, p int, r int) {
if p < r {
var q = partition(a, p, r)
quickSort(a,p,q-1)
quickSort(a,q+1,r)
}
}
func QuickSort(a []int) {
quickSort(a, 0, len(a)-1)
}
The worst-case running time is quadratic and happens when the array is already sorted. The results show a slower worst-case sorting time then Selection sort, making Quicksort a bad choice if the data set is already or nearly sorted. However, the average-case and best-case running times are linear making it a good all-around sorting algorithm for large data sets.
Quicksort: O(n log n)
sort.BenchmarkQuickSort_Unsorted 500 2836278 ns/op
sort.BenchmarkQuickSort_Sorted 500 6091908 ns/op
Despite the obvious benefits of Quicksort, it’s often a poor choice for applications requiring a consistent predictable sort time (such as in real-time embedded systems). This is because it is often not possible to know how sorted the data set is at any given time, meaning that it’s difficult to predict if the sort operation will be O(n^2) or O(n log n). In these scenarios using the Heap sort algorithm is a better choice. Although at first, it appears more complicated then Quicksort, it offers both a worst-case and average-case O(n log n) sort time.
A heap is a binary tree where the entries of the nodes can be compared with the less-than operator, meaning that the value of a node is never less than its children and each level of the tree is complete with non-complete leaves. The deepest level is as far left as possible. For a complete description of Heap sort look here.
// in Python these would be a lambda and C++ would be an inline
func parent(i int) int {
return ((i-1)/2)
}
func left(i int) int {
return ((2*i)+1)
}
func right(i int) int {
return ((2*i)+2)
}
func MaxHeapify(a []int, i int, size int) {
var left = left(i)
var right = right(i)
var largest = i
if left a[i] {
largest = left
}
if right a[largest] {
largest = right
}
if largest != i {
swap(a,i,largest)
MaxHeapify(a, largest, size)
}
}
func BuildMaxHeap(a []int) {
for i:=(len(a)/2); i >= 0; i-- {
MaxHeapify(a,i,len(a))
}
}
func HeapSort(a []int) {
BuildMaxHeap(a)
var size = len(a)
for i:=len(a)-1; i >=1; i-- {
swap(a,0,i)
size = size - 1
MaxHeapify(a,0, size)
}
}
Heap Sort: O(n log n)
sort.BenchmarkHeapSort_Unsorted 10000 233454 ns/op
sort.BenchmarkHeapSort_Sorted 10000 262941 ns/op