This is shamelessly being pulled from a post by BlisteringSheep over on Ars Forum.
I'm afraid the thread will roll off and this good information will be eventually lost or hard to find.
It really depends on the specific chip. All of my P4-Xeons have had higher aggregate rates with using all HT cores. My 3.06 GHz Northwood, running WinXP, is faster with just using one core, while the 3.20 GHz Nocano, also running WinXP, is faster while using both cores. Here are some rates gathered from currently running machines, not from -bench or -benchmark. Unless otherwise specified, all x86's are running 32-bit Linux, all PowerPC's are running 64-bit Linux, and all cores are active (num_threads=-1):
  • 3.06 GHz Northwood, using both HT cores, WinXP, 11 Mnodes/s/core, 22 Mnodes/s aggregate
  • 3.06 GHz Northwood, using one core, WinXP, 29 Mnodes/s
  • 3.20 GHz Nocano, using both HT cores, WinXP, 17 Mnodes/s/core, 33-34 Mnodes/s aggregate
  • 3.20 GHz Nocano, using one core, WinXP, 28 Mnodes/s
  • Random other timings:
  • Intel
  • 2.83 GHz E5440, 44 Mnodes/s/core
  • 2.8 GHz HT Xeon, 13-14 Mnodes/s/core
  • 2.8 GHz Xeon (non-HT), 17 Mnodes/s/core
  • 2.5 GHz Northwood (non-HT), 26 Mnode/s
  • 2.40 GHz HT Xeon, 13 Mnodes/s/core
  • 200 MHz Pentium MMX, 1.3 Mnodes/s
  • AMD
  • Opteron 240, 17-18 Mnodes/s/core
  • 1.0 GHz Athlon, 13 Mnodes/s
  • Misc
  • 400 MHz UltraSPARC-IIi, Solaris 6, 4.7 Mnodes/s
  • Sony Playstation 3, 64-bit Linux, 227-232 Mnodes/s
  • PowerPC
  • 2.5 GHz PPC970MP, 23-24 Mnodes/s/core
  • 2.3 GHz PPC970FX Xserve, 22 Mnodes/s/core
  • 2.2 GHz PPC970FX, 20-21 Mnodes/s/core
  • 2.0 GHz PPC970 PowerMac, 19 Mnodes/s/core
  • 1.9 GHz POWER5, 11-12 Mnodes/s/core
  • 1.65 GHz POWER5, 10-11 Mnodes/s/core