Category Archives: Code

Windows Resource – Access Denied

In the Windows PC industry it is very common for applications to attached resources to an executable. Resources can be anything from bitmaps, UI components, string translations and copyright information. These resources are then used by the executable at runtime and also by Windows Explorer (for icons and properties information).

Resources can be modified post compilation by using three methods provided by MSDN: BeginUpdateResource, UpdateResource and EndUpdateResource. I recently had to use these methods and painfully discovered a nasty race condition between my resource updater application and Windows Explorer. I found information on these random failures to be scarce, hence the blog post…

When you are modifying the resource of an executable or DLL and Windows Explorer is open and showing the directory of the file being updated (regardless of visibility), a file access race condition exists that produces one of the following two errors:

The system cannot open the device or file specified.

Access denied.

The numbers I collected show an average file access failure rate of 2.3% when updating a resource that Explorer is also showing. If you are experiencing these errors simply close Windows Explorer and your resource updater application will function as expected every time. Given that Windows is closed source I can’t confirm exactly what Explorer is doing but my hypotheses is that Explorer temporarily opens and reads the resources of each file in the directory (most likely so it can determine which icon to show) and refreshes that resource when it notices a change to the file. If your updater application requests a file handle during this time you will see one of the above errors.

Python – Recursive Glob & Line Counter

The other day I needed a recursive glob to find all the *.py files in my home directory. Much to my amazement Python doesn’t have one built into the glob module. So, I built my own and decided to share it for anyone else who might need it. Fork it here and install it with easy_install off PyPi.

"""
Recursive Glob Module
Methods:
	rglob(base, pattern)
	rglob_(pattern)
	lcount(base, pattern, func=lambda x : True)
"""
import glob
import os

def _getDirs(base):
	return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def _count(files, func):
	lines = 0
	for f in files:
		lines += sum([1 for l in open(f) if func(l)])
	return lines

def rglob(base, pattern):
	""" Recursive glob starting in specified directory """
	flist = []
	flist.extend(glob.glob(os.path.join(base,pattern)))
	dirs = _getDirs(base)
	if len(dirs):
		for d in dirs:
			flist.extend(rglob(os.path.join(base,d), pattern))
	return flist

def rglob_(pattern):
	""" Performs a recursive glob in the current working directory """
	return rglob(os.getcwd(), pattern)

def lcount(base, pattern, func = lambda x : True):
	""" Counts the number of lines in each file found matching pattern.
		Params:
			base - root directory to start the search
			pattern - pattern for glob to match (i.e '*.py')
			func - boolean filter function
				example: lambda x : True if len(x.strip()) else False #filter empty lines
				default: lambda x : True
	"""
	allFiles = rglob(base, pattern)
	return _count(allFiles, func)

if __name__ == "__main__":
	#filter out empty lines and comments
	filterFunc = lambda x : True if (len(x.strip()) and x.strip()[0] != '#') else False

	pyFiles = rglob(os.path.dirname(__file__), "*.py")
	print " {} total lines".format(_count(pyFiles, filterFunc))

	pyFiles_ = rglob_("*.py")
	print " {} total lines".format(_count(pyFiles_, filterFunc))
	print " {} total lines".format(lcount(os.path.dirname(__file__), "*.py", filterFunc))

It works by starting off in a base directory where it uses a generator expression to iterate through a glob iterator that matches any subdirectory. It then goes through these subdirectories one at a time and performs a glob pattern match on the contents of said directory. This is repeated recursively until all subdirectories off the base are inspected.

Back To The Basics – Sorting

Every couple of years it can be fun to revisit the basics. Given that I’m back in interview mode, I figured now was a good time to rewrite some of those sorting algorithms from sophomore year… in Go of course. You can find all the code in this post on GitHub under the sort directory here. The benchmark results can be reproduced by running the “gotest -v -bench=.” command from within the sort package directory. All of the results stem from the same input data and each algorithm is run with both sorted and unsorted data.

The easiest to implement and the most widely used sorting algorithm for small sets of data is Selection sort. It uses two loops and checks each entry for a suitable replacement.

package sort

func swap(a []int, x int, y int){
	var temp = a[x]
	a[x] = a[y]
	a[y] = temp
}

func SelectionSort(a []int) {
	for i, _ := range a {
		for j := i + 1; j < len(a); j++ {
			if a[j] < a[i] {
				swap(a, i, j)
			}
		}
	}
}

The results show that Selection sort performs substantially faster if the array is already sorted. The worst-case, average-case, and best-case running time are all quadratic which makes it a bad choice for large quantities of data.

Selection Sort: O(n^2)
sort.BenchmarkSelectionSort_Unsorted      500      4792526 ns/op
sort.BenchmarkSelectionSort_Sorted        2000      1026659 ns/op

Another simple and natural sorting algorithm is Insertion sort. Similar to Selection sort it uses two loops to sort but offers a much better best-case running time. When the data set is nearly sorted it can execute in linear time as the second loop only executes if the value to the left is greater then itself.

func InsertionSort(a []int) {
	for j := 1; j &lt; len(a); j++ {
		var key = a[j]
		var i = j-1
		for i >= 0 && a[i] > key {
			a[i+1] = a[i]
			i</span> = i -1
		}
		a[i+1] = key
	}
}

The results show a substantial increase over Selection sort above. Both the worst-case and average-case running times are quadratic, but the best-case is linear. Notice the benchmark time for an already sorted array.

Insertion Sort: O(n^2)
sort.BenchmarkInsertionSort_Unsorted      1000       1835507 ns/op
sort.BenchmarkInsertionSort_Sorted      200000       8698 ns/op

Quadratic sorting time just isn’t acceptable for large sets of data. To help achieve a linear sorting time several divide-and-conquer recursive algorithms exists. Quicksort is the most common, although slightly more complicated than the above quadratic algorithms. It offers a substantial performance increase when dealing with large unsorted data sets. A complete breakdown of Quicksort is outside the scope of this post, if necessary you can study it more in depth here. Basically, Quicksort recursively sorts the data set by continually dividing the set into subsets centered around a pivot point (in this case a[r] in partition() below).

func partition(a []int, p int, r int) int {
	var x = a[r]
	var i = p - 1
	for j := p; j <= r - 1; j++ {
		if a[j] <= x {
			i = i + 1
			swap(a,i,j)
		}
	}
	swap(a,i+1,r)
	return i + 1
}

func quickSort(a []int, p int, r int) {
	if p < r {
		var q = partition(a, p, r)
		quickSort(a,p,q-1)
		quickSort(a,q+1,r)
	}
}

func QuickSort(a []int) {
	quickSort(a, 0, len(a)-1)
}

The worst-case running time is quadratic and happens when the array is already sorted. The results show a slower worst-case sorting time then Selection sort, making Quicksort a bad choice if the data set is already or nearly sorted. However, the average-case and best-case running times are linear making it a good all around sorting algorithm for large data sets.

Quicksort: O(n log n)
sort.BenchmarkQuickSort_Unsorted      500      2836278 ns/op
sort.BenchmarkQuickSort_Sorted          500      6091908 ns/op

Despite the obvious benefits of Quicksort, it’s often a poor choice for applications requiring a consistent predictable sort time (such as in real-time embedded systems). This is because it is often not possible to know how sorted the data set is at any given time, meaning that it’s difficult to predict if the sort operation will be O(n^2) or O(n log n). In these scenarios using the Heap sort algorithm is a better choice. Although at first it appears more complicated then Quicksort, it offers both a worst-case and average-case O(n log n) sort time.

A heap is a binary tree where the entries of the nodes can be compared with the less-than operator, meaning that the value of a node is never less then its children and each level of the tree is complete with non-complete leaves. The deepest level are as far left as possible. For a complete description of Heap sort look here.

// in Python these would be a lambda and C++ would be an inline
func parent(i int) int {
	return ((i-1)/2)
}

func left(i int) int {
	return ((2*i)+1)
}

func right(i int) int {
	return ((2*i)+2)
}

func MaxHeapify(a []int, i int, size int) {
	var left = left(i)
	var right = right(i)
	var largest = i

	if left  a[i] {
		largest = left
	}

	if right  a[largest] {
		largest = right
	}

	if largest != i {
		swap(a,i,largest)
		MaxHeapify(a, largest, size)
	}
}

func BuildMaxHeap(a []int) {
	for i:=(len(a)/2); i >= 0; i-- {
		MaxHeapify(a,i,len(a))
	}
}

func HeapSort(a []int) {
	BuildMaxHeap(a)
	var size = len(a)
	for i:=len(a)-1; i >=1; i-- {
		swap(a,0,i)
		size = size - 1
		MaxHeapify(a,0, size)
	}
}

Heap Sort: O(n log n)
sort.BenchmarkHeapSort_Unsorted      10000       233454 ns/op
sort.BenchmarkHeapSort_Sorted          10000       262941 ns/op

Unit Testing & Benchmarking – Google Go

Writing unit level tests and benchmarks are a critical aspect to ensuring software remaining reliable and performing as expected as a project grows. With many languages this requires the implementer to go out of their way to create a lot of special code, code that rarely gets updated as the unit properties change. Go makes this super easy and I am very impressed with the built in support for both of these operations. As an example, let’s look at a unit level test and benchmark for two different rot13 methods. Below are two methods for calculating a rot13 translation, one uses a look up table and the other calculates each individual value using the mod operator.

Code:

// rot13(rot13(x)) = x
// ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
// NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm

var table = map[byte]byte{'A':'N','B':'O','C':'P','D':'Q','E':'R','F':'S','G':'T','H':'U','I':'V','J':'W',
'K':'X','L':'Y','M':'Z','N':'A','O':'B','P':'C','Q':'D','R':'E','S':'F','T':'G','U':'H','V':
'I','W':'J','X':'K','Y':'L','Z':'M','a':'n','b':'o','c':'p','d':'q','e':'r','f':'s','g':'t',
'h':'u','i':'v','j':'w','k':'x','l':'y','m':'z','n':'a','o':'b','p':'c','q':'d','r':'e','s':
'f','t':'g','u':'h','v':'i','w':'j','x':'k','y':'l','z':'m'} 

func Rot13Table(input []byte) {
	for i, c := range input {
		input[i] = table[c]
	}
}

func Rot13Mod(input []byte) {
        for i, c := range input {
                base := byte(0)
                if c &gt;= 'a' &amp;&amp; c &lt;= 'z' {
                        base = 'a'
                }else if c &gt;= 'A' &amp;&amp; c &lt;= 'Z' {
                        base = 'A'
                }
                if base != 0 {
                        input[i] = byte((c - base + 13) % 26 + base)
                }
        }
}

Using Go’s built in approach for writing tests and benchmarks we can write a *_test.go file for ensuring our methods are producing the correct result and for profiling their performance. In Go, any method that starts with ‘Test’ or ‘Benchmark’ in a _test.go file are automatically compiled and executed when the ‘gotest’ command is run.

Code:

import (
	&quot;testing&quot;
	&quot;fmt&quot;
)

func BenchmarkRot13Mod(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Mod(temp)
	}
}

func BenchmarkRot13Table(b *testing.B){
	translate := &quot;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&quot;

	temp := []byte(translate)
	for i:=0; i &lt; b.N; i++ {
		Rot13Table(temp)
	}
}

The above (partial) snippet contains two benchmark tests, one for testing the table version and one for the mod operator approach. To run the test, simply execute “gotest -v -bench=.” within the unit and wait for the result. Each benchmark is looped b.N times and is repeated until Go acquires a high enough confidence in the value.

myHttp.BenchmarkRot13Mod 5000000 397 ns/op
myHttp.BenchmarkRot13Table 500000 3761 ns/op

Based on the above results, we can clearly see that the look up table approach is about 10x slower than the mod operator. Having this information available to future developers is highly desirable as it allows them to measure the impact of a unit level change and make an informed decision on which algorithm to use for a solution.

Expanding The Toolbox – Google Go

I’ve decided to experiment with 12 new programming languages or frameworks over the next year. Just like spoken languages, mastery of a coding language can take years even with daily use. With just one month per language I hope to develop a basic understanding of the languages strengths and weaknesses. Languages are tools that help us solve problems and having a broad perspective of what’s in the toolbox can simplify our solutions. Lots of problems can be solved with C, but why use it for the subset that Python can solve faster and more reliably?

All programming languages have a unique feel. Python fells concise and lucid whereas C++ feels bulky and ridged. For August’s language I picked Google’s Go. I selected Go because it supposedly has the feel of a dynamic interpreted language with the speed and type safety of a compiled language.

To get a feel for the syntax I wrote a simple closure that returns two functions, one for adding a list of integers to the sum and one for calculating the average. As you can see, Go certainly has the feel of a scripting language but requires variable type declaration. Like Python it also allows for returning multiple results from a method.

// Note: WordPress doesn't have a 'Go' syntax highlight yet so I'm using javascript
package main

import (
	"fmt"
)

//Closure that returns a sum and average method
func stats() (func(a []int) int, func() float64){
	sum := 0
	entries := 0
	log := func(){
		fmt.Printf("sum: %d, entries: %d\n",sum,entries)
	}
	return func(l []int) int {
		defer log()
		for i:=0;i<len(l);i++ {
			sum += l[i]
			entries += 1
		}
		return sum
	},
	func() float64 {
		return float64(sum)/float64(entries)
	}
}

func main() {
	sum,avg := stats()
	s := sum([]int{1,2,3})
	s = sum([]int{1,1,1,4,1})
	fmt.Printf("sum: %d avg: %f\n", s, avg())
}

As a long time C programmer I initially found the reversed type and declaration syntax to be awkward, but after writing a few hundred lines it actually flows nicely. Rob Pike put together a great three day tutorial that you can access online or in the /go/doc directory. For more information on the syntax check out the day one slides. By far the most compelling feature of Go is the built in support for concurrency. As someone who has done lots of multi-threaded and multi-process development I believe the Go development team did a fantastic job at making it a fundamentally ingrained principle. To illustrate this feature I wrote a concurrent Pell Number generator below.

package pell

// Pell Numbers
// 0, 1, 2, 5, 12, 29, 70, 169, 408, 985, 2378
// P0 = 0, P1 = 1, PN = 2PN-1 + PN-2

// Day 3: Do not communicate by sharing memory.
// Instead, share memory by communicating.

func Pell(ch chan<- uint64) {
	var fn_1, fn_2 uint64 = 1, 0
	ch <- 0
	ch <- 1
	for {
		fn_2, fn_1 = fn_1, ((2*fn_1) + fn_2)
		ch <- fn_1
	}
}

There is nothing special about the above Pell routine. Its parameter is a communication pipe (called a channel in Go) that guarantees synchronization because it is a blocking write. To start the thread you use the “go” keyword and read data from the same channel passed to the routine. Upon a read it unblocks the thread which then calculates the next number in the sequence.

package main

import ( "fmt"
	 "flag"
	 "pell"
)

func main() {
	c := make(chan uint64)
	var iterations *int = flag.Int("i", 10, "number of iterations to run")
	flag.Parse()

	go pell.Pell(c)
	for i:=0;i < *iterations; i++ {
		var num uint64 = <-c
		fmt.Println(num)
	}

}

From my perspective the one major red flag is the lack of support for object oriented design. Although the ability to add a method to a struct or any data type is very cool I don’t yet see how it can fully replace the concept of a class. I added a small miscellaneous Go repository on gitHub that anyone interested in some social coding can fork and modify at will.

Python Generators

Generators are an easy-to-use tool in Python for making any object method or class an iterator. I’ve found them to be a very useful development tool and wanted to briefly expand on my previous PyQt Charts post.

The simplest generator is an object method that uses the “yield” keyword. Yield is just like return only the instruction counter is persistent in the method. In the example below, a generator method is created to return characters of a string one at a time in reverse order. A more real world example would be making a read socket a generator that returns packet data.

def reversed(s):
    index = len(s)
    while index >= 0:
        yield s[index]
        index = index - 1

revGen = reversed("Android")
for c in revGen:
     print c

The second and more useful approach is to let classes be generators. This is my preferred method because OOD is always better :-). Every class definition can overload the __iter__ and next methods. When using the class approach the StopIteration exception is responsible for terminating the iteration.

class QCharts(object):
	def __init__(self, **kwargs):
		self.charts = []
		self.index = 0
		self.len = 0

	def __iter__(self):
		self.index = 0
		return self

	def next(self):
		if self.index == self.len:
			raise StopIteration
		self.index = self.index + 1
		return self.charts[self.index-1]

	def add(self, c):
		self.charts.append(c)
		self.len = len(self.charts)

Now, using this iterator class we can loop through a series of QWidget based charts. Below is a slightly modified MainWindow class from the PyQt Charts post.


from PyQt4 import QtCore, QtGui
from pygooglechart import PieChart3D

class QChart(parent, type, **kwargs):...
class QCharts(object):...	

class MainWindow(QtGui.QMainWindow):
	def __init__(self, **kwargs):
		super(QtGui.QMainWindow, self).__init__(**kwargs)
		self.charts = QCharts()

	def addPieChart(self):
		i = self.charts.len
		t = QChart(self, PieChart3D, pos=QtCore.QPoint(i*75,i*75), size=QtCore.QSize(250,100))
		t.add_data([(20*i)+10,(i*10)+20])
		t.set_pie_labels(['Hello', 'World'])
		t.download()
		self.charts.add(t)

	def debug(self):
		for x in self.charts:
			print x

if __name__ == '__main__':
	app = QtGui.QApplication(sys.argv)
	app.setApplicationName("Blank PyQt Demo")
	app.setQuitOnLastWindowClosed(True)

	scaledSize = QtCore.QSize(scale.width(500),scale.height(500))
	window = MainWindow(size=scaledSize)
	window.addPieChart()
	window.addPieChart()
	window.addPieChart()
	window.debug()
	window.setWindowTitle("Immersive Blank PyQt Demo")
	window.show()

	sys.exit(app.exec_())

For more information on generators check out this great write up here. Another great feature of generators is that they can be used in generator expressions. These expressions look like this:

#extract the even numbers from the series
ints = [0,1,2,3,4,5,6,7,8,9,10]
even = [x for x in ints if x % 2 == 0 ]

#get the directory names from glob
import glob
def dirs(p):
     return [x for x in glob.iglob(os.path.join( p, '*')) if os.path.isdir(x) ]
dirs("/")

JavaScript & Python Closures

After a hard day of climbing in the canyon my fellow climbing/coder friend Tim started to tell me about a feature of JavaScript he had just started to use called Closures. Even though I have written thousands of lines of JavaScript I had never heard of Closures. As a self proclaimed eternal student I was intrigued with what he was telling me.

Turns out that Closures are generally considered to be JavaScripts most advanced and useful feature, and hence, is worthy of a blog post :-). As a C++ coder I like to think of Closures as a method with a privately malloc’d stack that is persistent across multiple invocations. As a Python coder I like to think of Closures as a method object with a preserved variable environment. Regardless of how you want to think about them, Closures allow us to redefine how we approach coding up methods.

Take the following simple JS example. Here, we are creating a very simple incrementer. The method inc() takes a starting value and returns a method that simply adds one to the start value (but not until its actually executed in the loop below).

function inc(start){
	var i = start;
	return function() { i = i + 1; document.write(i); }
}

addOne = inc(0)
for (i = 0; i < 10; ++i){
	addOne();
}
// prints 12345678910

I’m probably just super geeky but I find this feature of the language very cool. Turns out we can also do the same thing with Python.

#Works with Python 3.x only
def inc(start):
    y = start
    def adder():
        nonlocal y
        y += 1
        print y
        return y
    return adder
 
addOne = inc(0)
x = 0
for x in range(10):
    addOne()
#prints 12345678910

The ramifications of Closures to popular programming languages are still in its infancy even though the concept has been around for decades (adding them to Java has been a debate for years). It will be extremely interesting to see how the wide spread adoption of Closures will ultimately impact our coding habits in the near future.