Originally posted by Brian the Fist
Ok, Ill just pretend I didnt see that last statement. Now please explain how making an 'MPI client' will make it easier to run on a 256-node cluster. You mean so there could be one EXE which then forked 256 times and did 256 separate jobs? I'm not clear why MPI would be needed to do this or how it would help. If you don't want to explain it that don't expect to get it.
Not for 256 seperate jobs, for utilization with the current job. Send a seperate thread to each processor, thereby speeding up the entire process. It may not give a x256 boost in performance, but with a little rewriting, and if your program is utilizing threads, it would speed it up.

If you have N of processors (think node wise), give each processor a piece of the current problem and the performance should increase by N.

If your problem is too small it probably won't help. Lots of factors here (interconnect, memory, code, load), but generally there is a decent boost. I see it all the time.

Who knows, it may take some sort of cavalier effort to adapt your code to MPI, but 99% of the researchers I know are willing to go through the frustration of MPI and parallelizing their code in an effort to get the best bang for their budget.

Without actually digging through your algorithm, I am not sure it would help (perhaps it's too linear), but my thought is this: I have a 45 nodes that are going to essentially sit idle until school startup, this could provide you with a decent amount processing power.

On the Solaris subject, if you have access to the latest Sun ONE tools (forte 7) you will squeeze better performance with a recompile, especially for Ultra Sparc IIIs. Also do you link across Sun's math libs?

Forte Performance Docs - http://docs.sun.com/db/doc/816-2461
http://docs.sun.com/db/doc/816-2463

Sun's blas stuff is very, very nice.

- derek