A very crude, but often good enough, method to achieve parallel processing (e.g., on multi-core computers) is to partition the large input data file into small chunks, run the program to process each of them in parallel, and then merge the output results file back. Fortunately, this process can be done easily with the wise iterative usage of two Unix utilities: split and cat.
Monday, October 04, 2010
Subscribe to:
Post Comments (Atom)
2 comments:
If you're at that level, I wonder how you can make sure that the processing of your split-up files happens in parallel, on separate cores?
But in any case, worth noting: Some problems may be best partitioned vertically rather than horizontally. In this situation, the shell commands "cut" and "paste" could potentially be used with the same effect as horizontal partitioning via "split" and "cat".
Thanks very much for the cut/paste tip. I would rely on the OS to assign the tasks to separate cores. The computation can usually be spread out (almost) evenly, as shown by the CPU usage chart in the task manager, though it is not guaranteed.
Post a Comment