Author Archives: cpiekarski

About cpiekarski

create; explore; share.

Colorado Chicago Basin

This past August I had the pleasure of exploring an area of Colorado called the Chicago Basin with my friends Tim & Catone. The area is very remote, requires at least two nights and includes lots of hoofing it to access the Basin. The main back country camping area provides access to three of the states most remote 14ers (Windom, Sunlight and Eolus). The Basin is absolutely stunning and I can’t wait to visit again.
The best way to experience the Basin is to utilize the Durango Narrow Gauge Railroad. The train will drop you and your pack off at the start of the main trailhead, cutting the hike to the main camping area down to ~8 miles. The train is a true piece of Colorado history operating just like it did back in 1880. The train conductor operates in a very casual manner, as long as you paid for a full round trip ticket you can pretty much take it out of the Basin any day you want. So, once you get there and decide to spend another day enjoying the area, don’t worry about the date on your ticket :). As an added bonus, if you end up taking a different route out of the Basin simply flag the train down at any point in the valley and they’ll stop and pick you up.

The Basin supports a very healthy goat population that has been desensitized to humans. Make sure you goat proof your camp before heading out to bag the 14ers. The goats are very aggressive at sniffing out your scraps of dropped food and love the salt in your urine. Our camp was overrun by goats multiple times. The goats are harmless, but you can help minimize the blending of the worlds by hanging left over food instead of burying it and urinate on large rocks instead of the dirt.

If you haven’t already bagged these peaks, make sure you put them on your list for next year. Exploring the area is definitely worth 14 hours in a car, 5 hours on a train, 16 miles by foot (not including the miles required to summit the peaks) and three nights in a tent at 10,000 feet. Enjoy!

Python – Recursive Glob & Line Counter

The other day I needed a recursive glob to find all the *.py files in my home directory. Much to my amazement Python doesn’t have one built into the glob module. So, I built my own and decided to share it for anyone else who might need it. Fork it here and install it with easy_install off PyPi.

"""
Recursive Glob Module
Methods:
	rglob(base, pattern)
	rglob_(pattern)
	lcount(base, pattern, func=lambda x : True)
"""
import glob
import os

def _getDirs(base):
	return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def _count(files, func):
	lines = 0
	for f in files:
		lines += sum([1 for l in open(f) if func(l)])
	return lines

def rglob(base, pattern):
	""" Recursive glob starting in specified directory """
	flist = []
	flist.extend(glob.glob(os.path.join(base,pattern)))
	dirs = _getDirs(base)
	if len(dirs):
		for d in dirs:
			flist.extend(rglob(os.path.join(base,d), pattern))
	return flist

def rglob_(pattern):
	""" Performs a recursive glob in the current working directory """
	return rglob(os.getcwd(), pattern)

def lcount(base, pattern, func = lambda x : True):
	""" Counts the number of lines in each file found matching pattern.
		Params:
			base - root directory to start the search
			pattern - pattern for glob to match (i.e '*.py')
			func - boolean filter function
				example: lambda x : True if len(x.strip()) else False #filter empty lines
				default: lambda x : True
	"""
	allFiles = rglob(base, pattern)
	return _count(allFiles, func)

if __name__ == "__main__":
	#filter out empty lines and comments
	filterFunc = lambda x : True if (len(x.strip()) and x.strip()[0] != '#') else False

	pyFiles = rglob(os.path.dirname(__file__), "*.py")
	print " {} total lines".format(_count(pyFiles, filterFunc))

	pyFiles_ = rglob_("*.py")
	print " {} total lines".format(_count(pyFiles_, filterFunc))
	print " {} total lines".format(lcount(os.path.dirname(__file__), "*.py", filterFunc))

It works by starting off in a base directory where it uses a generator expression to iterate through a glob iterator that matches any subdirectory. It then goes through these subdirectories one at a time and performs a glob pattern match on the contents of said directory. This is repeated recursively until all subdirectories off the base are inspected.

Back To The Basics – Sorting

Every couple of years it can be fun to revisit the basics. Given that I’m back in interview mode, I figured now was a good time to rewrite some of those sorting algorithms from sophomore year… in Go of course. You can find all the code in this post on GitHub under the sort directory here. The benchmark results can be reproduced by running the “gotest -v -bench=.” command from within the sort package directory. All of the results stem from the same input data and each algorithm is run with both sorted and unsorted data.

The easiest to implement and the most widely used sorting algorithm for small sets of data is Selection sort. It uses two loops and checks each entry for a suitable replacement.

package sort

func swap(a []int, x int, y int){
	var temp = a[x]
	a[x] = a[y]
	a[y] = temp
}

func SelectionSort(a []int) {
	for i, _ := range a {
		for j := i + 1; j < len(a); j++ {
			if a[j] < a[i] {
				swap(a, i, j)
			}
		}
	}
}

The results show that Selection sort performs substantially faster if the array is already sorted. The worst-case, average-case, and best-case running time are all quadratic which makes it a bad choice for large quantities of data.

Selection Sort: O(n^2)
sort.BenchmarkSelectionSort_Unsorted      500      4792526 ns/op
sort.BenchmarkSelectionSort_Sorted        2000      1026659 ns/op

Another simple and natural sorting algorithm is Insertion sort. Similar to Selection sort it uses two loops to sort but offers a much better best-case running time. When the data set is nearly sorted it can execute in linear time as the second loop only executes if the value to the left is greater then itself.

func InsertionSort(a []int) {
	for j := 1; j &lt; len(a); j++ {
		var key = a[j]
		var i = j-1
		for i >= 0 && a[i] > key {
			a[i+1] = a[i]
			i</span> = i -1
		}
		a[i+1] = key
	}
}

The results show a substantial increase over Selection sort above. Both the worst-case and average-case running times are quadratic, but the best-case is linear. Notice the benchmark time for an already sorted array.

Insertion Sort: O(n^2)
sort.BenchmarkInsertionSort_Unsorted      1000       1835507 ns/op
sort.BenchmarkInsertionSort_Sorted      200000       8698 ns/op

Quadratic sorting time just isn’t acceptable for large sets of data. To help achieve a linear sorting time several divide-and-conquer recursive algorithms exists. Quicksort is the most common, although slightly more complicated than the above quadratic algorithms. It offers a substantial performance increase when dealing with large unsorted data sets. A complete breakdown of Quicksort is outside the scope of this post, if necessary you can study it more in depth here. Basically, Quicksort recursively sorts the data set by continually dividing the set into subsets centered around a pivot point (in this case a[r] in partition() below).

func partition(a []int, p int, r int) int {
	var x = a[r]
	var i = p - 1
	for j := p; j <= r - 1; j++ {
		if a[j] <= x {
			i = i + 1
			swap(a,i,j)
		}
	}
	swap(a,i+1,r)
	return i + 1
}

func quickSort(a []int, p int, r int) {
	if p < r {
		var q = partition(a, p, r)
		quickSort(a,p,q-1)
		quickSort(a,q+1,r)
	}
}

func QuickSort(a []int) {
	quickSort(a, 0, len(a)-1)
}

The worst-case running time is quadratic and happens when the array is already sorted. The results show a slower worst-case sorting time then Selection sort, making Quicksort a bad choice if the data set is already or nearly sorted. However, the average-case and best-case running times are linear making it a good all around sorting algorithm for large data sets.

Quicksort: O(n log n)
sort.BenchmarkQuickSort_Unsorted      500      2836278 ns/op
sort.BenchmarkQuickSort_Sorted          500      6091908 ns/op

Despite the obvious benefits of Quicksort, it’s often a poor choice for applications requiring a consistent predictable sort time (such as in real-time embedded systems). This is because it is often not possible to know how sorted the data set is at any given time, meaning that it’s difficult to predict if the sort operation will be O(n^2) or O(n log n). In these scenarios using the Heap sort algorithm is a better choice. Although at first it appears more complicated then Quicksort, it offers both a worst-case and average-case O(n log n) sort time.

A heap is a binary tree where the entries of the nodes can be compared with the less-than operator, meaning that the value of a node is never less then its children and each level of the tree is complete with non-complete leaves. The deepest level are as far left as possible. For a complete description of Heap sort look here.

// in Python these would be a lambda and C++ would be an inline
func parent(i int) int {
	return ((i-1)/2)
}

func left(i int) int {
	return ((2*i)+1)
}

func right(i int) int {
	return ((2*i)+2)
}

func MaxHeapify(a []int, i int, size int) {
	var left = left(i)
	var right = right(i)
	var largest = i

	if left  a[i] {
		largest = left
	}

	if right  a[largest] {
		largest = right
	}

	if largest != i {
		swap(a,i,largest)
		MaxHeapify(a, largest, size)
	}
}

func BuildMaxHeap(a []int) {
	for i:=(len(a)/2); i >= 0; i-- {
		MaxHeapify(a,i,len(a))
	}
}

func HeapSort(a []int) {
	BuildMaxHeap(a)
	var size = len(a)
	for i:=len(a)-1; i >=1; i-- {
		swap(a,0,i)
		size = size - 1
		MaxHeapify(a,0, size)
	}
}

Heap Sort: O(n log n)
sort.BenchmarkHeapSort_Unsorted      10000       233454 ns/op
sort.BenchmarkHeapSort_Sorted          10000       262941 ns/op

Unit Testing & Benchmarking – Google Go

Writing unit level tests and benchmarks are a critical aspect to ensuring software remaining reliable and performing as expected as a project grows. With many languages this requires the implementer to go out of their way to create a lot of special code, code that rarely gets updated as the unit properties change. Go makes this super easy and I am very impressed with the built in support for both of these operations. As an example, let’s look at a unit level test and benchmark for two different rot13 methods. Below are two methods for calculating a rot13 translation, one uses a look up table and the other calculates each individual value using the mod operator.

Code:

// rot13(rot13(x)) = x
// ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
// NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm

var table = map[byte]byte{'A':'N','B':'O','C':'P','D':'Q','E':'R','F':'S','G':'T','H':'U','I':'V','J':'W',
'K':'X','L':'Y','M':'Z','N':'A','O':'B','P':'C','Q':'D','R':'E','S':'F','T':'G','U':'H','V':
'I','W':'J','X':'K','Y':'L','Z':'M','a':'n','b':'o','c':'p','d':'q','e':'r','f':'s','g':'t',
'h':'u','i':'v','j':'w','k':'x','l':'y','m':'z','n':'a','o':'b','p':'c','q':'d','r':'e','s':
'f','t':'g','u':'h','v':'i','w':'j','x':'k','y':'l','z':'m'} 

func Rot13Table(input []byte) {
	for i, c := range input {
		input[i] = table[c]
	}
}

func Rot13Mod(input []byte) {
        for i, c := range input {
                base := byte(0)
                if c &gt;= 'a' &amp;&amp; c &lt;= 'z' {
                        base = 'a'
                }else if c &gt;= 'A' &amp;&amp; c &lt;= 'Z' {
                        base = 'A'
                }
                if base != 0 {
                        input[i] = byte((c - base + 13) % 26 + base)
                }
        }
}

Using Go’s built in approach for writing tests and benchmarks we can write a *_test.go file for ensuring our methods are producing the correct result and for profiling their performance. In Go, any method that starts with ‘Test’ or ‘Benchmark’ in a _test.go file are automatically compiled and executed when the ‘gotest’ command is run.

Code:

import (
	&quot;testing&quot;
	&quot;fmt&quot;
)

func BenchmarkRot13Mod(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Mod(temp)
	}
}

func BenchmarkRot13Table(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Table(temp)
	}
}

The above (partial) snippet contains two benchmark tests, one for testing the table version and one for the mod operator approach. To run the test, simply execute “gotest -v -bench=.” within the unit and wait for the result. Each benchmark is looped b.N times and is repeated until Go acquires a high enough confidence in the value.

myHttp.BenchmarkRot13Mod 5000000 397 ns/op
myHttp.BenchmarkRot13Table 500000 3761 ns/op

Based on the above results, we can clearly see that the look up table approach is about 10x slower than the mod operator. Having this information available to future developers is highly desirable as it allows them to measure the impact of a unit level change and make an informed decision on which algorithm to use for a solution.

Expanding The Toolbox – Google Go

I’ve decided to experiment with 12 new programming languages or frameworks over the next year. Just like spoken languages, mastery of a coding language can take years even with daily use. With just one month per language I hope to develop a basic understanding of the languages strengths and weaknesses. Languages are tools that help us solve problems and having a broad perspective of what’s in the toolbox can simplify our solutions. Lots of problems can be solved with C, but why use it for the subset that Python can solve faster and more reliably?

All programming languages have a unique feel. Python fells concise and lucid whereas C++ feels bulky and ridged. For August’s language I picked Google’s Go. I selected Go because it supposedly has the feel of a dynamic interpreted language with the speed and type safety of a compiled language.

To get a feel for the syntax I wrote a simple closure that returns two functions, one for adding a list of integers to the sum and one for calculating the average. As you can see, Go certainly has the feel of a scripting language but requires variable type declaration. Like Python it also allows for returning multiple results from a method.

// Note: WordPress doesn't have a 'Go' syntax highlight yet so I'm using javascript
package main

import (
	"fmt"
)

//Closure that returns a sum and average method
func stats() (func(a []int) int, func() float64){
	sum := 0
	entries := 0
	log := func(){
		fmt.Printf("sum: %d, entries: %d\n",sum,entries)
	}
	return func(l []int) int {
		defer log()
		for i:=0;i<len(l);i++ {
			sum += l[i]
			entries += 1
		}
		return sum
	},
	func() float64 {
		return float64(sum)/float64(entries)
	}
}

func main() {
	sum,avg := stats()
	s := sum([]int{1,2,3})
	s = sum([]int{1,1,1,4,1})
	fmt.Printf("sum: %d avg: %f\n", s, avg())
}

As a long time C programmer I initially found the reversed type and declaration syntax to be awkward, but after writing a few hundred lines it actually flows nicely. Rob Pike put together a great three day tutorial that you can access online or in the /go/doc directory. For more information on the syntax check out the day one slides. By far the most compelling feature of Go is the built in support for concurrency. As someone who has done lots of multi-threaded and multi-process development I believe the Go development team did a fantastic job at making it a fundamentally ingrained principle. To illustrate this feature I wrote a concurrent Pell Number generator below.

package pell

// Pell Numbers
// 0, 1, 2, 5, 12, 29, 70, 169, 408, 985, 2378
// P0 = 0, P1 = 1, PN = 2PN-1 + PN-2

// Day 3: Do not communicate by sharing memory.
// Instead, share memory by communicating.

func Pell(ch chan<- uint64) {
	var fn_1, fn_2 uint64 = 1, 0
	ch <- 0
	ch <- 1
	for {
		fn_2, fn_1 = fn_1, ((2*fn_1) + fn_2)
		ch <- fn_1
	}
}

There is nothing special about the above Pell routine. Its parameter is a communication pipe (called a channel in Go) that guarantees synchronization because it is a blocking write. To start the thread you use the “go” keyword and read data from the same channel passed to the routine. Upon a read it unblocks the thread which then calculates the next number in the sequence.

package main

import ( "fmt"
	 "flag"
	 "pell"
)

func main() {
	c := make(chan uint64)
	var iterations *int = flag.Int("i", 10, "number of iterations to run")
	flag.Parse()

	go pell.Pell(c)
	for i:=0;i < *iterations; i++ {
		var num uint64 = <-c
		fmt.Println(num)
	}

}

From my perspective the one major red flag is the lack of support for object oriented design. Although the ability to add a method to a struct or any data type is very cool I don’t yet see how it can fully replace the concept of a class. I added a small miscellaneous Go repository on gitHub that anyone interested in some social coding can fork and modify at will.

Python Generators

Generators are an easy-to-use tool in Python for making any object method or class an iterator. I’ve found them to be a very useful development tool and wanted to briefly expand on my previous PyQt Charts post.

The simplest generator is an object method that uses the “yield” keyword. Yield is just like return only the instruction counter is persistent in the method. In the example below, a generator method is created to return characters of a string one at a time in reverse order. A more real world example would be making a read socket a generator that returns packet data.

def reversed(s):
    index = len(s)
    while index >= 0:
        yield s[index]
        index = index - 1

revGen = reversed("Android")
for c in revGen:
     print c

The second and more useful approach is to let classes be generators. This is my preferred method because OOD is always better :-). Every class definition can overload the __iter__ and next methods. When using the class approach the StopIteration exception is responsible for terminating the iteration.

class QCharts(object):
	def __init__(self, **kwargs):
		self.charts = []
		self.index = 0
		self.len = 0

	def __iter__(self):
		self.index = 0
		return self

	def next(self):
		if self.index == self.len:
			raise StopIteration
		self.index = self.index + 1
		return self.charts[self.index-1]

	def add(self, c):
		self.charts.append(c)
		self.len = len(self.charts)

Now, using this iterator class we can loop through a series of QWidget based charts. Below is a slightly modified MainWindow class from the PyQt Charts post.


from PyQt4 import QtCore, QtGui
from pygooglechart import PieChart3D

class QChart(parent, type, **kwargs):...
class QCharts(object):...	

class MainWindow(QtGui.QMainWindow):
	def __init__(self, **kwargs):
		super(QtGui.QMainWindow, self).__init__(**kwargs)
		self.charts = QCharts()

	def addPieChart(self):
		i = self.charts.len
		t = QChart(self, PieChart3D, pos=QtCore.QPoint(i*75,i*75), size=QtCore.QSize(250,100))
		t.add_data([(20*i)+10,(i*10)+20])
		t.set_pie_labels(['Hello', 'World'])
		t.download()
		self.charts.add(t)

	def debug(self):
		for x in self.charts:
			print x

if __name__ == '__main__':
	app = QtGui.QApplication(sys.argv)
	app.setApplicationName("Blank PyQt Demo")
	app.setQuitOnLastWindowClosed(True)

	scaledSize = QtCore.QSize(scale.width(500),scale.height(500))
	window = MainWindow(size=scaledSize)
	window.addPieChart()
	window.addPieChart()
	window.addPieChart()
	window.debug()
	window.setWindowTitle("Immersive Blank PyQt Demo")
	window.show()

	sys.exit(app.exec_())

For more information on generators check out this great write up here. Another great feature of generators is that they can be used in generator expressions. These expressions look like this:

#extract the even numbers from the series
ints = [0,1,2,3,4,5,6,7,8,9,10]
even = [x for x in ints if x % 2 == 0 ]

#get the directory names from glob
import glob
def dirs(p):
     return [x for x in glob.iglob(os.path.join( p, '*')) if os.path.isdir(x) ]
dirs("/")

Immersive – The Code Inception

I snapped this photo the moment before I wrote the first line of code for Immersive. I was on an extended month long vacation with friends in Kauai, Hawaii. The trip was a celebration of life for my good late friend Ed Bortolini. We were staying on Poipu Beach and enjoyed everything from helicopter rides, scuba diving, surfing, kayaking, great dinning, and even backpacked the Kalalau trail.

I started with a programming language I didn’t know, using a framework I had never heard of, solving a problem I had never thought about in an industry I knew nothing about. My only motivation was to create something cool and impact the world. Oh… and at the time I did it for free with no legal contract in place. Immersive’s code base still uses much of this code today and demos of it working are now plastered all over the internet. A mere 9 months later Immersive has generated a lot of press and obtained several clients. Even though I no longer server as the VP of Engineering I’m still a big fan of the company. See below for just a small fraction of the coverage. E ho’a’o no i pau kuhihewa.


CNN Money Article


Live CNN Video


Mashable Article


New York Times Article


Huffington Post Article and Video


ADWEEK Article


The Next Web Article


PSFK Article


BNET Article


BetaBeat Article


BrandChannel Article


CNN Forune Article


PC World Article


Sprouter Article


Business Insider Top 25 Startups

Also check out the YouTube Channel.