So I noticed some of the CHUD tools were mentioned but did anyone look at them and do some profiling?
Performance, Debugging, Profiling
Here is an example of the kind of info they can provide...
This is a partial analysis of a sample of 100000000 instructions of the foldtrajlite process running under OS X grabbed with amber and analyzed with acid.
Code:
Total Instruction Count = 100000000
------------------------------------------------------------------
Instruction Type | Count | % of Total
------------------------------------------------------------------
Integer 39601893 39.60
Floating Point 5291767 5.29
Altivec 0 0.00
Branch 18743879 18.74
Load 18273513 18.27
Store 9192360 9.19
Cache Control 21 0.00
Data Stream 0 0.00
Miscellaneous 8896567 8.90
------------------------------------------------------------------
Revealing eh? The full report gives much more detail.
Looks to me like you do use floating point a bit but not as much as integer math.
You could also use altivec for your pointer tree traversal, if you wanted.
It sure would be nice to see the speed benefits of someone spending some time on this.