From http://mersenneforum.org/viewtopic.php?t=862&start=0

Actually, I'm pretty sure I can get another 10% for Athlons and Pentium 3 computers by employing one of the tricks I learned in the P4 optimizations. Unfortunately, it requires a major rewrite of the x86 FFT code. This is not something I relish doing!

Sadly, I also know of a way to improve the proth primes search (seventeen-or-bust) by a substantial amount, but again it requires a major coding effort.