Originally posted by vjs
k7 -v -dickson 30 -resume stage1.txt 1 1550000000000-1650000000000

655mb consumed (<-- this is total system memory consumed at its peak)

Notice, I'm chopping the B2 into 0.1T pieces it seems fast and I believe it works the same as P-1 in this regard... but I'm open to options and opinions.
I'd suggest using the -k parameter instead. I'm not sure whether the results are comparable, but my gut feeling says that splitting into blocks with -k works basically similar to your approach.
You can simply time both methods and look what's faster.

not sure how treefile works etc.
Just add e.g. "-treefile treefile" somewhere before the bounds. As a result, a number of files called treefile.0, treefile.1 etc. gets written at the beginning of a curve and erased again at the end.

Mystwalker... If you write me a *.bat I'd give it a try see if it's faster better etc.
All in all, just try
k7 -v -k 16 -treefile treefile -resume stage1.txt 1 2.9e9-1.5e12
resp. change "-k 16" to "-k 64".

I have a fast 15K scsi on a controller so swapping (disk access) isn't an issue.
Swapping is always an issue, as even the fastest HDD is still several orders of magnitude slower than system memory...