To get a rough estimate of SoB performance across various architectures, you could look at the GIMPS (Prime95) benchmarking page.
While performance will not be identical (or perhaps even all that close), RELATIVE performance, atleast in regards to architecture and instruction set optimization, should be pretty similar.
http://www.mersenne.org/bench.htm
Code:
Type Speed
(MHz) Memory
Speed L2
Cache
Size L2
Cache
Speed 6.52M
to
7.76M
(384K) 7.76M
to
9.04M
(448K) 9.04M
to
10.33M
(512K)
AMD K6-2 400 100 1024 Bus 0.529 0.640 0.708
Celeron 400 66 128 Full 0.235 0.282 0.315
P-II 400 100 512 Half 0.207 0.247 0.276
Formatting sucks, but it is better than nothing.
Smaller iteration times are better (faster).
At the given (smallest included) exponent size and FFT size:
Code:
AMD K6-2 400 0.529
Celeron 400 0.235
P-II 400 0.207
While the FFT algorithms and code hit the system bus hard, love low latency and high speed memory and are also cache-dependent, the type of work being done is very FPU intensive.
At 20.4M to 25.35M sized exponents
(1280K FFT size):
Code:
AMD K6-2 400 2.424
Celeron 400 0.917
P-II 400 0.769
At this point, the FFT is quite obviously too big for the entire FFT to fit into the cache of any mainstream CPUs.
Here we see the Pentium 2 core CPUs just destroy the K6-2 cored CPU.