Python multiprocessing adventures

Trying to figure out some multiprocessing code from last night.
As one would expect, the single thread code maxes out a single core and the other three(it’s a dual core HT chip) are very lightly loaded with random background crap.
For multiprocessing, the average load per core is in between the peak vs low cores for the single, and it close to the same for all four. Basically what you’d expect to see with smaller tasks spread around.
RAM usage remains flat, or too close to flat for Ubuntus System Monitor to notice a difference.
But, the runtime is about .8s longer. Less stress on CPU cores and same stress on RAM and it takes longer.
Hmm. Some work to do on optimizing the algorithm.