Log in

View Full Version : Linux Client - Server Connection Inefficient



jamroga
01-27-2008, 03:47 AM
When the Linux client attempts to connect to the server to report back blocks it stops its calculations on the current k/n pair. Generally with a fast DNS and fast SOB server response this only amount to a few seconds. If the DNS fails or the SOB server is busy (or your internet line is off-line) the time wasted appears to be 40 seconds or more per block. Current blocks on a newer P4 might only take around 80 seconds; resulting is possible %33 less work being done when there are DNS, server or off-line conditions. Although I run 24x7 on my Linux servers, due to power and security issues, my internet connection is off-line except when needed (perhaps a few hours a day). This results in a significant amount of wasted potential CPU time for SOB. With a small change to the Linux client, it should be possible to queue the block when done, and continue processing; then in a separate fork (process, thread etc.), it could contact the server and handle any on-line processing. Years ago when a block might take an hour on a fast client the 40 second delay was a minor performance hit, but today it can be significant.

I think this issue could be of concern to most Linux client users and would like to invite them to please comment in this thread.

PS - Please note the recent DNS issues resulting in a lock-up of most Linux clients and days of lost work, something that might have been avoided if the client-server handshaking had been an separate process.

DOSGuy
01-28-2008, 12:48 AM
The Windows client does the same thing. When we were doing first pass, I was finishing a block every 50 seconds, and then I would watch my rate fall for 3 or 4 seconds while it waited for the block to send. Figuring that I was losing about 8% by sending intermediate blocks, I would uncheck "Transmit intermediate blocks" for most of the day, and then uncheck it before I shut down my system for the day.

You're absolutely right that the recent DNS failure was brutal. Second pass blocks were starting to finish in as little as 70 seconds (most of my work is performed by two Core 2 Duos and a Core 2 Quad), and they would idle for up to 20 seconds while trying to resolve the DNS. I estimated that I took a 25% performance hit during that fiasco. You're absolutely right that the client should move on to the next block while sending, but the simple solution is to just stop sending intermediate blocks.

tqft
01-28-2008, 06:45 AM
My clients would try and connect - fail somewhere deep in resolving and crash out - lost a few days worth of crunching unless I turned off connection, but I was trying to download the gutsy gibbon install iso.

Those running on remotely located machines must be cranky.

jamroga
02-18-2008, 03:08 AM
Now that the our queuing issues seem to be resolved for months, I would like to re-open discussion on client/server communications issues.

I'm glad I am not alone with client/server connection issues, and their effect on performance. Given the amount of time and effort some of the major teams in SOB have put into the discussion/testing and practice of tweaking of hardware and software/operating systems, it would seem some simple changes in the client would produce larger benefits and would benefit all users contributing to SOB.

A previous suggestion to skip intermediate reporting and simply report when completing whole tests only works when you have a sufficiently fast machine such that the whole test will complete before the (2 week?) dropped test timeout occurs. Only the most modern machine with guaranteed 24x7 operation can probably do this.

I suggest an alternative simple client change that would not require server changes and be backwards compatible with current client configuration files.

In Linux, the sclient.conf file contains the configuration variable "Transmit".
This variable has two settings (example from live sclient.conf):
# 1 - Enable
# 0 - Disable
Transmit 1

Basically 0 means transmit to server at completion of test, 1 means transmit to server after each completed block.

My suggestion is that this variable's meaning be expanded, where it can be set to any number N. Its new meaning would be transmit to server after every N blocks.

A setting of "Transmit 10" would mean connect and transmit to server after every 10 blocks completed and of course upon completion of the test.

On a very fast machine, one might want to set "Transmit 100" to only contact the server every 100 blocks. This would allow the user to reduce the communications overhead significantly (by 100 times), but still have a reasonable connection frequency with the server....to keep statistics up to date, and to insure the current test is not timed out and dropped.

The change to the client would involve adding some addition logic to count down (or up) to the "Transmit" count before contacting the server. In C/C++ this literally would only require a few lines of code to be changed (Probably only 5 or so).

I would like to hear feedback from other users on this suggested Client Improvement.
I personally would be willing to test such an improved client; I would also be willing to help make these changes as I am a very familiar with C/C++.

I am sure a similar simple change could be made to the windows client to give the same capabilities.

If you like (or hate) this idea please comment within this thread. I am sure with enough support this change could be implemented.

IronBits
02-18-2008, 09:35 AM
I LIKE it ! Please make it so. :cheers: :|party|: :clap:

Vato
02-19-2008, 08:41 AM
A setting of "Transmit 10" would mean connect and transmit to server after every 10 blocks completed and of course upon completion of the test.

I was thinking *exactly* the same thing about 30 minutes ago. Although not as nice as handling communications with the server in a seperate thread, it should be very simple to implement. As a nice side effect, if enough people used it, it would reduce the load on the server significantly as well - especially when the server is having difficulties. I'd suggest that any client that implements this should ship with a default setting in sclient.conf of between 10 and 100. We can then reduce/increase this as we wish.

Alien88
02-25-2008, 02:02 AM
In regards to the pausing when the domain name expires.. I really don't know what causes it. I've stared at the code numerous times and can't understand where it'd hang. Hopefully it'll be a non issue since the domain was renewed for a much longer period.

(edit: see later post)

Uncompress, add 'TransmitBlocks X' where X is how many blocks you want it to transmit after. Eg, 'TransmitBlocks 10' will transmit an update after every 10 blocks.

Caveats: YMMV. This is beta. This fixes another bug which may actually cause other problems but probably not. I won't be doing this for the windows client. I may build one for FreeBSD.

Vato
02-25-2008, 08:32 AM
Excellent - Thanks for turning this round so quickly.

Looks good so far - running with 100 for the moment.

jamroga
02-25-2008, 09:44 AM
I am trying the new sb2.5.05 on various Linux systems.
I have access to a mix of Fedora Core FC5 and FC8 and Centos 5.x Linux(s)
Some systems are with a GUI some are text only.

On some systems sob does not start, and reports back:

error while loading shared libraries: libstdc++.so.5:
cannot open shared object file:
No such file or directory

On the systems it does starts, it seems to run fine.

Any chance of a static compile, that seems
to run fine on the system missing: libstdc++.so.5

Here is what was previously installed on all systems
4048581 Aug 28 2006 sb-v2.5-gcc4
(Warning: Date on file might be arbitrary)

I do not have ability to update libraries on these
machine as they are production servers, and
locked down with a standard build formula.

Thank you for the quick response with this
newer client software.











.

Alien88
02-25-2008, 12:19 PM
I am trying the new sb2.5.05 on various Linux systems.
I have access to a mix of Fedora Core FC5 and FC8 and Centos 5.x Linux(s)
Some systems are with a GUI some are text only.

On some systems sob does not start, and reports back:

error while loading shared libraries: libstdc++.so.5:
cannot open shared object file:
No such file or directory

On the systems it does starts, it seems to run fine.

Any chance of a static compile, that seems
to run fine on the system missing: libstdc++.so.5

Here is what was previously installed on all systems
4048581 Aug 28 2006 sb-v2.5-gcc4
(Warning: Date on file might be arbitrary)

I do not have ability to update libraries on these
machine as they are production servers, and
locked down with a standard build formula.

Thank you for the quick response with this
newer client software.


Ah, crap. Yeah.. I'll get a new build out later tonight.

Alien88
02-25-2008, 05:28 PM
I am trying the new sb2.5.05 on various Linux systems.
I have access to a mix of Fedora Core FC5 and FC8 and Centos 5.x Linux(s)
Some systems are with a GUI some are text only.

On some systems sob does not start, and reports back:

error while loading shared libraries: libstdc++.so.5:
cannot open shared object file:
No such file or directory

On the systems it does starts, it seems to run fine.

Any chance of a static compile, that seems
to run fine on the system missing: libstdc++.so.5

Here is what was previously installed on all systems
4048581 Aug 28 2006 sb-v2.5-gcc4
(Warning: Date on file might be arbitrary)

I do not have ability to update libraries on these
machine as they are production servers, and
locked down with a standard build formula.

Thank you for the quick response with this
newer client software.


Can you give me a sha1sum for sb-v2.5-gcc4? I'm trying to figure out what the difference is. The version I posted wasn't compiled differently from the last time I did v2.5..

jamroga
02-25-2008, 05:49 PM
md5sum sb-v2.5-gcc4
7869f4e903c7934196122f48b91072e7 sb-v2.5-gcc4

sha1sum sb-v2.5-gcc4
cf837119c55c39177c558263b6829a36c8bbdaa1 sb-v2.5-gcc4

4048581 Aug 28 2006 sb-v2.5-gcc4

After some dingging, I now think the file date
stamp is in fact correct.

Regards,

Alien88
02-25-2008, 11:35 PM
I'm assuming I did a custom build for you (or someone else) at some point, then. Hm.

jamroga
02-26-2008, 01:11 AM
I did some tracking down, this version of sb 2.5 was produced as
a result of this thread of discussion:

http://www.free-dc.org/forum/showthread.php?t=11604&highlight=linux

On the positive, every Linux system this binary was running on was
upgraded to a newer software O/S environment since...and the binary
and all data transferred without any changes whatsoever. (most systems also
usually upgraded hardware over the same time period).

I did a search for the library missing: libstdc++.so.5 on all the systems
and it only existed on the oldest Fedora Core 5 (FC5) system. One that
is scheduled to be upgraded in the near future.

libstdc++.so.6 or higher seem to be the oldest standard library on the
systems.

Here is the ldd output, looks like a compile/link with the later shared library.

ldd -v sb-v2.5-gcc4
linux-gate.so.1 => (0x00fa4000)
libm.so.6 => /lib/libm.so.6 (0x00110000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00266000)
libc.so.6 => /lib/libc.so.6 (0x00883000)
/lib/ld-linux.so.2 (0x00866000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00c5c000)

Version information:
./sb-v2.5-gcc4:
libstdc++.so.6 (CXXABI_1.3) => /usr/lib/libstdc++.so.6
libc.so.6 (GLIBC_2.3) => /lib/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/libc.so.6
libm.so.6 (GLIBC_2.0) => /lib/libm.so.6
/lib/libm.so.6:
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
libc.so.6 (GLIBC_2.1.3) => /lib/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/libc.so.6
/usr/lib/libstdc++.so.6:
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
libc.so.6 (GLIBC_2.4) => /lib/libc.so.6
libc.so.6 (GLIBC_2.1.3) => /lib/libc.so.6
libc.so.6 (GLIBC_2.2) => /lib/libc.so.6
libc.so.6 (GLIBC_2.3) => /lib/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/libc.so.6
libgcc_s.so.1 (GCC_3.3) => /lib/libgcc_s.so.1
libgcc_s.so.1 (GCC_4.2.0) => /lib/libgcc_s.so.1
libgcc_s.so.1 (GCC_3.0) => /lib/libgcc_s.so.1
libgcc_s.so.1 (GLIBC_2.0) => /lib/libgcc_s.so.1
/lib/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
/lib/libgcc_s.so.1:
libc.so.6 (GLIBC_2.2.4) => /lib/libc.so.6
libc.so.6 (GLIBC_2.1.3) => /lib/libc.so.6
libc.so.6 (GLIBC_2.4) => /lib/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/libc.so.6

Hope this helps

Alien88
02-26-2008, 11:14 AM
I did some tracking down, this version of sb 2.5 was produced as
a result of this thread of discussion:

http://www.free-dc.org/forum/showthread.php?t=11604&highlight=linux


(edit: see later post)

Give that a try. If others could try that and see if it works on their older machines, I may just default to building it that way now..

KriZp
02-26-2008, 01:11 PM
[root@localhost SB]# ./sb
[Tue Feb 26 18:01:50 2008] client process [v2.5.05] invoked
[Tue Feb 26 18:01:50 2008] priority set to idle
[Tue Feb 26 18:01:50 2008] got k and n from cache
[Tue Feb 26 18:01:50 2008] Dual Core AMD Opteron(tm) Processor 165 detected. Enabling cpu specific optimizations.
[Tue Feb 26 18:01:51 2008] restarting proth test from cache (k=55459, n=7209418) [0.5%]
Segmentation fault

[root@Athlon64 SB]# ./sb /sieve/SB/sclient.conf
[Tue Feb 26 19:00:39 2008] client process [v2.5.05] invoked
[Tue Feb 26 19:00:39 2008] priority set to idle
[Tue Feb 26 19:00:39 2008] got k and n from cache
[Tue Feb 26 19:00:39 2008] AMD Athlon(tm) 64 Processor 3000+ detected. Enabling cpu specific optimizations.
[Tue Feb 26 19:05:56 2008] restarting proth test from cache (k=55459, n=7209418) [0.5%]
Segmentation fault

[root@p3 SB]# ./sb sclient.conf
[Tue Feb 26 19:12:08 2008] client process [v2.5.05] invoked
[Tue Feb 26 19:12:08 2008] priority set to idle
[Tue Feb 26 19:12:08 2008] got k and n from cache
[Tue Feb 26 19:12:08 2008] Intel(R) Pentium(R) III processor detected. Enabling cpu specific optimizations.
[Tue Feb 26 19:12:14 2008] restarting proth test from cache (k=55459, n=7209418) [0.5%]
[Tue Feb 26 19:26:43 2008] iteration: 40000/7209434 (0.55%) k = 55459 n = 7209418

[root@bp6 SB]# ./sb sclient.conf
[Tue Feb 26 20:34:07 2008] client process [v2.5.05] invoked
[Tue Feb 26 20:34:07 2008] priority set to idle
[Tue Feb 26 20:34:07 2008] got k and n from cache
[Tue Feb 26 20:34:07 2008] Intel Celeron processor detected. Enabling cpu specific optimizations.
[Tue Feb 26 20:34:14 2008] restarting proth test from cache (k=55459, n=7209418) [0.5%]
[Tue Feb 26 20:53:07 2008] iteration: 40000/7209434 (0.55%) k = 55459 n = 7209418

jamroga
02-26-2008, 01:27 PM
Runs fine so far on P3's, but on P4's
I can confirm the segmentation fault
--------------- cut ----------
[Tue Feb 26 13:05:47 2008] client process [v2.5.05] invoked
[Tue Feb 26 13:05:47 2008] priority set to idle
[Tue Feb 26 13:05:47 2008] got k and n from cache
[Tue Feb 26 13:05:47 2008] Intel(R) Pentium(R) 4 CPU 1400MHz detected. Enabling cpu specific optimizations.
[Tue Feb 26 13:05:54 2008] restarting proth test from cache (k=24737, n=14555791) [89.5%]
Segmentation fault
------------- cut --------------

Here are my the testing results
Operating System
FC5 P3 -> fine
FC8 P3 -> fine
FC5 P4 -> segmentation fault
Centos 5.x P4 -> segmentation fault

Both P4's are single core,
(one a with hyperthreading enabled)

Alien88
02-26-2008, 01:54 PM
[root@localhost SB]# ./sb
[Tue Feb 26 18:01:50 2008] client process [v2.5.05] invoked
[Tue Feb 26 18:01:50 2008] priority set to idle
[Tue Feb 26 18:01:50 2008] got k and n from cache
[Tue Feb 26 18:01:50 2008] Dual Core AMD Opteron(tm) Processor 165 detected. Enabling cpu specific optimizations.
[Tue Feb 26 18:01:51 2008] restarting proth test from cache (k=55459, n=7209418) [0.5%]
Segmentation fault

How quickly does it segfault?

KriZp
02-26-2008, 03:52 PM
How quickly does it segfault?

Immediately, as far as I can tell. I should add that all these computers run fedora 7, the athlon and opteron the 64-bit version.

On the opteron I tried 2.5.0 aswell, it seemed to work fine.

Alien88
02-26-2008, 04:41 PM
Hrmph. I'll keep digging.. I finally found a P4 to reproduce it on.

Theadalus
02-26-2008, 06:23 PM
Why not create a built-in cron?

Transmitting once a day is difficult translate to the Transmit = x blocks method.


[slightly offtopic]
I also noticed that every 10000 iterations are logged, can this be reduced to only client/test start and finish entries (like the Windows client), or specify the number of iterations logged?

Alien88
02-27-2008, 01:15 AM
I've redirected this over to the new 'Beta Client' subforum. Please follow up and check there for updates:

http://www.free-dc.org/forum/forumdisplay.php?f=126

DOSGuy
02-28-2008, 04:48 AM
A previous suggestion to skip intermediate reporting and simply report when completing whole tests only works when you have a sufficiently fast machine such that the whole test will complete before the (2 week?) dropped test timeout occurs. Only the most modern machine with guaranteed 24x7 operation can probably do this.

I love the idea of Transmit x blocks. Transmitting every 100 blocks would eliminate 99 delays, but still keep me up-to-date. This would be a great way to increase performance (I had estimated 8%). I just want to point out that, with my suggestion, it was never necessary to complete the entire test before it expired. I did mention that I used to turn transmit intermediate blocks back on at least once a day for stats purposes, so there's no risk of having the test expire unless you go weeks without turning it back on. Still, it was a minor nuisance to have to switch it manually, so it would be great to have an automatic method of reducing the number of times the client stops to transmit results.

I'm running Windows, so I don't get to play with this new toy. Me and my Core 2 Duos and Quad await the day when a Windows SOB client with this feature is available. :)

Alien88
02-28-2008, 09:33 AM
I love the idea of Transmit x blocks. Transmitting every 100 blocks would eliminate 99 delays, but still keep me up-to-date. This would be a great way to increase performance (I had estimated 8%). I just want to point out that, with my suggestion, it was never necessary to complete the entire test before it expired. I did mention that I used to turn transmit intermediate blocks back on at least once a day for stats purposes, so there's no risk of having the test expire unless you go weeks without turning it back on. Still, it was a minor nuisance to have to switch it manually, so it would be great to have an automatic method of reducing the number of times the client stops to transmit results.

I'm running Windows, so I don't get to play with this new toy. Me and my Core 2 Duos and Quad await the day when a Windows SOB client with this feature is available. :)

I'll see what I can do in regards to Windows. I've built a new client (as can be seen in the forum), so now it's a matter of doing the code for it. It may be a registry only option to start with.