Poor man’s parallel processing

Here’s a nice trick I learned on how you could implement simple parallel processing capabilities to speed up computations. This trick is only applicable in certain simple cases though, and does not scale very well, so it is best used in one-off scripts rather than in scripts that is used routinely or by others.

Suppose you have a list or an array that you are going to loop trough. Each of the elements in the list takes a long time to process and each iteration is NOT dependent on the result of any of the previous iterations. This is exactly the kind of situation where this trick is applicable.

The trick is to save the result for each iteration in a file whose name is unique to the iteration, and at the beginning of each iteration you simply check if that file already exists. If it does, the script skips to the next iteration. If it doesn’t, you create the file. This way you could run many instances of the script simultaneously, without doing the same iteration twice.

With this trick the results will be spread across different files, but if they are named and formated in a consistent way it is not hard to go trough the files and merge them into a single file.

Here is how it could be done in python:

import os.path

myList = ['bill', 'george', 'barack', 'ronald']

for president in myList:
	fileName = 'result_{}'.format(president)
	if os.path.isfile(fileName):
		print('File {} already exists, continues to the next iteration')
	f = open(filename, 'w')

	#Do your thing here

	#myResults is the object where your results are stored

And in R:

myList <- c('bill', 'george', 'barack', 'ronald')

for (president in myList){

	file.name <- paste('results', president, sep='_')

	if (file.exists(file.name)){
		cat('File', file.name, 'already exists, continues to the next iteration\n')


	#Do your thing here
	#Save the my.result object