Tag Archives: Go

Back To The Basics – Sorting

Every couple of years it can be fun to revisit the basics. Given that I’m back in interview mode, I figured now was a good time to rewrite some of those sorting algorithms from sophomore year… in Go of course. You can find all the code in this post on GitHub under the sort directory here. The benchmark results can be reproduced by running the “gotest -v -bench=.” command from within the sort package directory. All of the results stem from the same input data and each algorithm is run with both sorted and unsorted data.

The easiest to implement and the most widely used sorting algorithm for small sets of data is Selection sort. It uses two loops and checks each entry for a suitable replacement.

package sort

func swap(a []int, x int, y int){
	var temp = a[x]
	a[x] = a[y]
	a[y] = temp
}

func SelectionSort(a []int) {
	for i, _ := range a {
		for j := i + 1; j < len(a); j++ {
			if a[j] < a[i] {
				swap(a, i, j)
			}
		}
	}
}

The results show that Selection sort performs substantially faster if the array is already sorted. The worst-case, average-case, and best-case running time are all quadratic which makes it a bad choice for large quantities of data.

Selection Sort: O(n^2)
sort.BenchmarkSelectionSort_Unsorted      500      4792526 ns/op
sort.BenchmarkSelectionSort_Sorted        2000      1026659 ns/op

Another simple and natural sorting algorithm is Insertion sort. Similar to Selection sort it uses two loops to sort but offers a much better best-case running time. When the data set is nearly sorted it can execute in linear time as the second loop only executes if the value to the left is greater then itself.

func InsertionSort(a []int) {
	for j := 1; j &lt; len(a); j++ {
		var key = a[j]
		var i = j-1
		for i >= 0 && a[i] > key {
			a[i+1] = a[i]
			i</span> = i -1
		}
		a[i+1] = key
	}
}

The results show a substantial increase over Selection sort above. Both the worst-case and average-case running times are quadratic, but the best-case is linear. Notice the benchmark time for an already sorted array.

Insertion Sort: O(n^2)
sort.BenchmarkInsertionSort_Unsorted      1000       1835507 ns/op
sort.BenchmarkInsertionSort_Sorted      200000       8698 ns/op

Quadratic sorting time just isn’t acceptable for large sets of data. To help achieve a linear sorting time several divide-and-conquer recursive algorithms exists. Quicksort is the most common, although slightly more complicated than the above quadratic algorithms. It offers a substantial performance increase when dealing with large unsorted data sets. A complete breakdown of Quicksort is outside the scope of this post, if necessary you can study it more in depth here. Basically, Quicksort recursively sorts the data set by continually dividing the set into subsets centered around a pivot point (in this case a[r] in partition() below).

func partition(a []int, p int, r int) int {
	var x = a[r]
	var i = p - 1
	for j := p; j <= r - 1; j++ {
		if a[j] <= x {
			i = i + 1
			swap(a,i,j)
		}
	}
	swap(a,i+1,r)
	return i + 1
}

func quickSort(a []int, p int, r int) {
	if p < r {
		var q = partition(a, p, r)
		quickSort(a,p,q-1)
		quickSort(a,q+1,r)
	}
}

func QuickSort(a []int) {
	quickSort(a, 0, len(a)-1)
}

The worst-case running time is quadratic and happens when the array is already sorted. The results show a slower worst-case sorting time then Selection sort, making Quicksort a bad choice if the data set is already or nearly sorted. However, the average-case and best-case running times are linear making it a good all around sorting algorithm for large data sets.

Quicksort: O(n log n)
sort.BenchmarkQuickSort_Unsorted      500      2836278 ns/op
sort.BenchmarkQuickSort_Sorted          500      6091908 ns/op

Despite the obvious benefits of Quicksort, it’s often a poor choice for applications requiring a consistent predictable sort time (such as in real-time embedded systems). This is because it is often not possible to know how sorted the data set is at any given time, meaning that it’s difficult to predict if the sort operation will be O(n^2) or O(n log n). In these scenarios using the Heap sort algorithm is a better choice. Although at first it appears more complicated then Quicksort, it offers both a worst-case and average-case O(n log n) sort time.

A heap is a binary tree where the entries of the nodes can be compared with the less-than operator, meaning that the value of a node is never less then its children and each level of the tree is complete with non-complete leaves. The deepest level are as far left as possible. For a complete description of Heap sort look here.

// in Python these would be a lambda and C++ would be an inline
func parent(i int) int {
	return ((i-1)/2)
}

func left(i int) int {
	return ((2*i)+1)
}

func right(i int) int {
	return ((2*i)+2)
}

func MaxHeapify(a []int, i int, size int) {
	var left = left(i)
	var right = right(i)
	var largest = i

	if left  a[i] {
		largest = left
	}

	if right  a[largest] {
		largest = right
	}

	if largest != i {
		swap(a,i,largest)
		MaxHeapify(a, largest, size)
	}
}

func BuildMaxHeap(a []int) {
	for i:=(len(a)/2); i >= 0; i-- {
		MaxHeapify(a,i,len(a))
	}
}

func HeapSort(a []int) {
	BuildMaxHeap(a)
	var size = len(a)
	for i:=len(a)-1; i >=1; i-- {
		swap(a,0,i)
		size = size - 1
		MaxHeapify(a,0, size)
	}
}

Heap Sort: O(n log n)
sort.BenchmarkHeapSort_Unsorted      10000       233454 ns/op
sort.BenchmarkHeapSort_Sorted          10000       262941 ns/op

Unit Testing & Benchmarking – Google Go

Writing unit level tests and benchmarks are a critical aspect to ensuring software remaining reliable and performing as expected as a project grows. With many languages this requires the implementer to go out of their way to create a lot of special code, code that rarely gets updated as the unit properties change. Go makes this super easy and I am very impressed with the built in support for both of these operations. As an example, let’s look at a unit level test and benchmark for two different rot13 methods. Below are two methods for calculating a rot13 translation, one uses a look up table and the other calculates each individual value using the mod operator.

Code:

// rot13(rot13(x)) = x
// ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
// NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm

var table = map[byte]byte{'A':'N','B':'O','C':'P','D':'Q','E':'R','F':'S','G':'T','H':'U','I':'V','J':'W',
'K':'X','L':'Y','M':'Z','N':'A','O':'B','P':'C','Q':'D','R':'E','S':'F','T':'G','U':'H','V':
'I','W':'J','X':'K','Y':'L','Z':'M','a':'n','b':'o','c':'p','d':'q','e':'r','f':'s','g':'t',
'h':'u','i':'v','j':'w','k':'x','l':'y','m':'z','n':'a','o':'b','p':'c','q':'d','r':'e','s':
'f','t':'g','u':'h','v':'i','w':'j','x':'k','y':'l','z':'m'} 

func Rot13Table(input []byte) {
	for i, c := range input {
		input[i] = table[c]
	}
}

func Rot13Mod(input []byte) {
        for i, c := range input {
                base := byte(0)
                if c &gt;= 'a' &amp;&amp; c &lt;= 'z' {
                        base = 'a'
                }else if c &gt;= 'A' &amp;&amp; c &lt;= 'Z' {
                        base = 'A'
                }
                if base != 0 {
                        input[i] = byte((c - base + 13) % 26 + base)
                }
        }
}

Using Go’s built in approach for writing tests and benchmarks we can write a *_test.go file for ensuring our methods are producing the correct result and for profiling their performance. In Go, any method that starts with ‘Test’ or ‘Benchmark’ in a _test.go file are automatically compiled and executed when the ‘gotest’ command is run.

Code:

import (
	&quot;testing&quot;
	&quot;fmt&quot;
)

func BenchmarkRot13Mod(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Mod(temp)
	}
}

func BenchmarkRot13Table(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Table(temp)
	}
}

The above (partial) snippet contains two benchmark tests, one for testing the table version and one for the mod operator approach. To run the test, simply execute “gotest -v -bench=.” within the unit and wait for the result. Each benchmark is looped b.N times and is repeated until Go acquires a high enough confidence in the value.

myHttp.BenchmarkRot13Mod 5000000 397 ns/op
myHttp.BenchmarkRot13Table 500000 3761 ns/op

Based on the above results, we can clearly see that the look up table approach is about 10x slower than the mod operator. Having this information available to future developers is highly desirable as it allows them to measure the impact of a unit level change and make an informed decision on which algorithm to use for a solution.

Expanding The Toolbox – Google Go

I’ve decided to experiment with 12 new programming languages or frameworks over the next year. Just like spoken languages, mastery of a coding language can take years even with daily use. With just one month per language I hope to develop a basic understanding of the languages strengths and weaknesses. Languages are tools that help us solve problems and having a broad perspective of what’s in the toolbox can simplify our solutions. Lots of problems can be solved with C, but why use it for the subset that Python can solve faster and more reliably?

All programming languages have a unique feel. Python fells concise and lucid whereas C++ feels bulky and ridged. For August’s language I picked Google’s Go. I selected Go because it supposedly has the feel of a dynamic interpreted language with the speed and type safety of a compiled language.

To get a feel for the syntax I wrote a simple closure that returns two functions, one for adding a list of integers to the sum and one for calculating the average. As you can see, Go certainly has the feel of a scripting language but requires variable type declaration. Like Python it also allows for returning multiple results from a method.

// Note: WordPress doesn't have a 'Go' syntax highlight yet so I'm using javascript
package main

import (
	"fmt"
)

//Closure that returns a sum and average method
func stats() (func(a []int) int, func() float64){
	sum := 0
	entries := 0
	log := func(){
		fmt.Printf("sum: %d, entries: %d\n",sum,entries)
	}
	return func(l []int) int {
		defer log()
		for i:=0;i<len(l);i++ {
			sum += l[i]
			entries += 1
		}
		return sum
	},
	func() float64 {
		return float64(sum)/float64(entries)
	}
}

func main() {
	sum,avg := stats()
	s := sum([]int{1,2,3})
	s = sum([]int{1,1,1,4,1})
	fmt.Printf("sum: %d avg: %f\n", s, avg())
}

As a long time C programmer I initially found the reversed type and declaration syntax to be awkward, but after writing a few hundred lines it actually flows nicely. Rob Pike put together a great three day tutorial that you can access online or in the /go/doc directory. For more information on the syntax check out the day one slides. By far the most compelling feature of Go is the built in support for concurrency. As someone who has done lots of multi-threaded and multi-process development I believe the Go development team did a fantastic job at making it a fundamentally ingrained principle. To illustrate this feature I wrote a concurrent Pell Number generator below.

package pell

// Pell Numbers
// 0, 1, 2, 5, 12, 29, 70, 169, 408, 985, 2378
// P0 = 0, P1 = 1, PN = 2PN-1 + PN-2

// Day 3: Do not communicate by sharing memory.
// Instead, share memory by communicating.

func Pell(ch chan<- uint64) {
	var fn_1, fn_2 uint64 = 1, 0
	ch <- 0
	ch <- 1
	for {
		fn_2, fn_1 = fn_1, ((2*fn_1) + fn_2)
		ch <- fn_1
	}
}

There is nothing special about the above Pell routine. Its parameter is a communication pipe (called a channel in Go) that guarantees synchronization because it is a blocking write. To start the thread you use the “go” keyword and read data from the same channel passed to the routine. Upon a read it unblocks the thread which then calculates the next number in the sequence.

package main

import ( "fmt"
	 "flag"
	 "pell"
)

func main() {
	c := make(chan uint64)
	var iterations *int = flag.Int("i", 10, "number of iterations to run")
	flag.Parse()

	go pell.Pell(c)
	for i:=0;i < *iterations; i++ {
		var num uint64 = <-c
		fmt.Println(num)
	}

}

From my perspective the one major red flag is the lack of support for object oriented design. Although the ability to add a method to a struct or any data type is very cool I don’t yet see how it can fully replace the concept of a class. I added a small miscellaneous Go repository on gitHub that anyone interested in some social coding can fork and modify at will.