Apple Mac PPC G5 proth_sieve update

**Joe O** · 03-24-2006, 04:43 PM

Originally Posted by Matt

I thought some people were working on improving the x86 siever... no?

Improving, Yes.
It goes to 2^52, page faults less, has yet to go into a loop or leak memory.

**Matt** · 03-24-2006, 08:49 PM

Originally Posted by Joe O

Improving, Yes.
It goes to 2^52, page faults less, has yet to go into a loop or leak memory.

And I heard there were significant speed improvements, is this true? Any idea when it will be out?

**rogue** · 03-24-2006, 08:51 PM

Originally Posted by rogue

Clearly there are some caching issues. I have had no time to work them out. Alex has been doing the vast majority of coding and testing. He's been using my ASM routines and I've been pointing out some optimizations along the way. If I have time this weekend, I intend to dig more into the memory issues and will try to find a happy medium. BTW, I hadn't thought about prefetching. I will have to think about where it would make the most sense to do it.

I figured out how to resolve the thrashing issue, but I don't understand what causes it. When I pull some of the functions out and put them into a separate file, I am then able to compile with -O3 and get a 5% gain over -O2. Is gcc inlining one of those functions? If so, why is it killing performance so badly. Maybe one of you guys know gcc much better than I and can answer the question.

**Chuck** · 03-24-2006, 09:20 PM

Mark,
Check the symbol table addresses of the functions in the external file for starters and see if you are getting page alignments. Conversely, pulling out the functions will allow the functions 'above' and 'below' the removed one(s) to potentially load and be put on the same page and/or fit in cache at the same time. Locality of reference with respect to cache is key.

You have control of inlining with GCC as you know, and it shouldn't inline
functions unless you have a default CFLAGS set to do so or specify it in code or on the command line. Check the '-03' flags set for your processor. It might do an 'auto inline' as you suspect. You can override it of course on the command line.

The X86 version behaves differently, but still is largely 'cache centric'. Cache row size vs memory bus interface is the other. The number of cycles required to get a cache row into the cpu makes a big difference. The AMD and Intel versions have major differences right here. I suspect you are seeing similar.

Email me if you wish and we can discuss off board.

C.

**rogue** · 03-25-2006, 11:43 AM

Chuck,

Since Alex (who is aware of my findings) is leading the effort on the PPC port, I'll leave it to him to find a solution. To me it isn't an issue since this workaround works perfectly well.

**Death** · 01-22-2007, 10:08 AM

where can I download G5 siever???

got a few macs around

**cedricvonck** · 01-23-2007, 03:10 AM

I am also interested in a version, but then for the Intel Macbook Pros.

Thank you

Thread: Apple Mac PPC G5 proth_sieve update

Thread Tools

Rate This Thread

Display

Posting Permissions