Monday, October 04, 2010

A Poor Man's Parallel Processing

A very crude, but often good enough, method to achieve parallel processing (e.g., on multi-core computers) is to partition the large input data file into small chunks, run the program to process each of them in parallel, and then merge the output results file back. Fortunately, this process can be done easily with the wise iterative usage of two Unix utilities: split and cat.

2 comments:

Seth Grimes said...

If you're at that level, I wonder how you can make sure that the processing of your split-up files happens in parallel, on separate cores?

But in any case, worth noting: Some problems may be best partitioned vertically rather than horizontally. In this situation, the shell commands "cut" and "paste" could potentially be used with the same effect as horizontal partitioning via "split" and "cat".

Dell Zhang said...

Thanks very much for the cut/paste tip. I would rely on the OS to assign the tasks to separate cores. The computation can usually be spread out (almost) evenly, as shown by the CPU usage chart in the task manager, though it is not guaranteed.