Well, all this competition has proven productive I have produced a version that is 45% faster than the 1.10 version at p=150G, though you may get different results based on the platform and the p values that you are using. However, I have tried to be as friendly as possible to different processors.
At the end of the day, myself and Phil both know that 99% of the time in the code is being spent doing a couple of things, and you can only optimise those things so far. I would expect that in the limit, the only difference between our software on the same platform would be the OS overhead. Which is probably why his Linux version is faster than the Windows version!