Originally posted by Joe O
I don't know if proth_sieve_sse2 "fixes" the problem, but it is faster than proth_sieve_cmov on two SSE2 capable machines that I have tried it on.
At least on Intel (I have no AMDs to try it out, but am pretty sure it holds here as well) SSE2-enabled CPUs, the SSE2 version is ~10% faster than the CMOV one.
When the FSB is clocked at 800 MHz, sieving speed is "not bad" - but not good either.

I'd also suggest P-1 factoring resp. normal PRPing. The P4s fly in these fields.