running on SMP machines [Archive]

View Full Version : running on SMP machines

igor

10-01-2002, 09:55 PM

Hi,

I have a dual-CPU machine running Windows. I'd like to be able to run 2 instances of the client. Right now it's not possible

Thanks

Igor

jjjjL

10-02-2002, 10:28 AM

Not true. It's not possible to run two of the newest client. The instantance check is only a passive one.

You can start a copy of v0.9.7 and then after that one starts, start a copy of v0.9.2. That version is still available in my engineering space at u of m.

http://www-personal.engin.umich.edu/~lhelm/sbust092.exe

I've been working on SMP. Hopefully next version or v1.

-L

igor

10-02-2002, 06:37 PM

I was unable to get 0.9.2 and 0.9.7 working simultaneously, but it's a minor thing. What I did notice is that there's no scalability on a SMP machine - if I'm running 0.9.7 client at normal priority, and I start another cache-intensive application, the speed drops by 50%. That means that there would be no gain whatsoever running a 2nd SB process on an SMP machine.

Linux client (0.9.2) exhibits the same behavior. However, I believe FreeBSD scales well - I'll double check

Igor

jjjjL

10-02-2002, 08:12 PM

interesting. i think it's still beneficial to run two clients since the vast majority of the time, nothing is running on most computers... you may have a special circumstance though.

if you want to do benchmarking across several OSes, I would be most greatful. believe it or not, i have done little study on how many cycles the GUI in the win client consumes. I assume it is less than 10% but i could be wrong. a friend of mine said he believes the linux client is much faster but i haven't had time to test it myself. if you could get a rough idea for me, that would be awesome.

-L

Alien88

10-03-2002, 12:53 AM

Here are some rough benchmarks..

linux running 2.4.18 or .19
p2 450 w/ 128mb ram
Sep 30 12:47 n=1827357
Oct 02 20:57

p4 1.7ghz w/ 256mb ram
Sep 30 15:14 n=1829697
Oct 01 23:08

p3 933 w/ 1G ram
Sep 30 21:03 n=1833297
Oct 02 10:07

Duron 1.2ghz w/ 384MB ram
Sep 29 00:02 n=1809537
Sep 30 18:52

fbsd 5.0-CURRENT
??? w/ 256mb ram
Sep 30 04:37
Oct 02 23:14

Win2k Athlon 1.2ghz w/ 384MB RAM
Oct 01 18:11 n=1842081
Oct 02 20:02

Win2k Athlon 1ghz w/ 256MB DDR RAM
Oct 01 05:56 n=1837401
Oct 02 07:22

Alien88

10-03-2002, 11:20 PM

the fbsd one is a p2-400

igor

10-04-2002, 12:14 PM

Hi,

I did some SMP benchmarking on Linux and FreeBSD. In both cases
underlining hardware is dual P3-1GHz, 256k full speed L2 cache. Speedwise Linux and FreeBSD went had to head. The runtime for 1 interation was 12m running just 1 job, and 18m running 2 jobs simultanously. That means that 2nd CPU only gives you 33% efficiency increase, as opposed to 100% in case of ideal scalability.

Now, on a machine with larger L2 cache, the efficiency increase is even less significant, only 10% on a 2.4Ghz Xeon with 512k full speed L2 cache. This tells me that in SMP cases L2 thrashing takes place, and there're way more cache misses. I wish you had a client for commercial Unix platforms, like Solaris, HPUX and Tru64, so that I can see if that's still the case there.

Igor

jjjjL

10-04-2002, 12:25 PM

windows clients experience 100% speedup on my dual celeron box. linux must need explicit cpu affinity calls so it doesn't swap them between processors. silly linux. :rolleyes:

the code does not thrash the cache. thrashing implies it is overwriting its own cache lines. this doesn't happen... the cache design is incredibly intelligent so that it hardly ever touches mem or overwrites its own cache. we'd have to be testing much larger numbers before cache misses become an issue (considering my celeron doesn't experience them yet).

i'll see what i can do. The_Man would like a faster linux client for his cluster of dual linux computers.

-L

igor

10-04-2002, 04:54 PM

I beg to differ. I originally also though that CPU affinity is not enforced on Linux. However, top command shows that a process runs on the same CPU it started on for the entirety of its lifetime. And BTW, your 100% speedup on Celeron confirms my observation - L2 cache on Celeron is tiny and sb misses it most of the times anyway - you can do profiling if you'd like. Try it on dual Xeon, or at least regular P3/P4 and you'll observe what I'm observing, I'm sure.

Igor

igor

10-04-2002, 05:02 PM

Maybe I need to take back something I said in my previous posting. IF you're saying your cache hit percentage is high on your Celeron ( I assume you've done profiling ), then something different is going on. However, I still stand behind my original statement - the smaller L2 cache is, the closer to 2x factor is SMP speedup. With a different application, which is purely memory/cache intensive, I observed on my dual Xeon, that if 1 instance of a job finishes in 1 unit of time, then 2 simultaneous jobs take 5(!) units of time each. Amazing.

Igor

Alien88

10-04-2002, 10:29 PM

Well, I know that on my dual p3-933 linux box running 2.4.18, one instance averages 12 minutes to go through an iteration. The second takes roughly 17 minutes to go through an iteration.. but since I'm using both processes and running at idle priority, one of them is obviously going to take longer due to the other stuff on the box (httpd, mysql, etc)..

*shrug*