Combustion
08-22-2005, 08:55 AM
Originally posted by graeme
Hey, thanks for the offer -- that's very generous. What we really need, though, is someone who knows something about client/server code. I don't understand why our server runs into trouble when it does. When it starts given communication errors, it is well within it's physical memory (4GB), using about 1/100th of the theoretical bandwidth limit (10mbits/s of a 1000 mbits/s, gigabit line), and almost none of the cpu (dual opteron 246). I think there is something in our server code which is hitting a limit -- perhaps something to do with timing, threads, or the tcp/ip stack. I'm pretty convinced that with better software, our hardware would be fine. We are truly a bunch of network neophytes.
Firstly: 100mb ethernet will always deliver a stable 8.7mb/sec. However, if you ead the Gigabit Ethernet spec, you will a) wonder how it works at all and b) observe that you don't ever get much more than about 30mb/sec. I don't use Gigabit yet for those reasons.
Given you are running at an observed 10mb/sec, I would drop the config of the opteron's ethernet card to 100mb. I find that two bridged 100's provide more stable delivery/transport than Gigabit. (I work with this all the time at work (L-3)).
I personally am also using Opterons... 250's to be precise. I have both ASUS and Tyan motherboards. The Tyan has 'auto-bridging' built into the chipset (7751 ethernet chipset) and by putting eth0 @ 192.168.x.1 and eth2 @ 192.168.x.2, they bridge beautifully. The inbound traffic gets full, low latency response. I am also using Fedora FC4 and Windows Xp Sp2 for this. It's all done in hardware and controlled by bios with exception of IP addy assignment.
In response to your possible threads, I would suggest you simply ensure your threads are properly spawned and 'thread safe' with appropriate semaphores as needed. I would then have a single receiver thread read in a packet, drop it in a 'to-be-processed slot', and then release the semaphore for that thread. The thread will then do its thing, and put itself back to sleep as it comes around and waits for the next read.
If you have multiple threads attempting to read against the single socket, you may run into trouble UNLESS you have each one sitting on the 'accept()' where the connection starts and then handle all outbound socket comm back to the client on the local thread-stack. Make sure to close and make sure that you aren't getting stuck in a 'fin-wait' state after the socket close call when processing is complete. I have a hunch you are getting stuck here due to corrupted gigabit data or dropped connections, for which you might not be handling the timeout properly. My rule of thumb is 5 retries at 10 second intervals... UNLESS IT IS OBVIOUSLY A DIALUP DATAFLOW RATE...
Then it's adjusted to 5 retries and no more than 2 overlapping streams. If you deduce it's a dialup user, immediately drop to single data stream. General intial setup would be to start as single stream (1 at time) from server to client and go from there.
Users on broadband / dsl will be able to obtain more connections, (typically 5) using this method, but it works until you settle out your streaming problem.
I am available to help with kernel, mobo, and appl tuning (server or client) as needed.
CC
Hey, thanks for the offer -- that's very generous. What we really need, though, is someone who knows something about client/server code. I don't understand why our server runs into trouble when it does. When it starts given communication errors, it is well within it's physical memory (4GB), using about 1/100th of the theoretical bandwidth limit (10mbits/s of a 1000 mbits/s, gigabit line), and almost none of the cpu (dual opteron 246). I think there is something in our server code which is hitting a limit -- perhaps something to do with timing, threads, or the tcp/ip stack. I'm pretty convinced that with better software, our hardware would be fine. We are truly a bunch of network neophytes.
Firstly: 100mb ethernet will always deliver a stable 8.7mb/sec. However, if you ead the Gigabit Ethernet spec, you will a) wonder how it works at all and b) observe that you don't ever get much more than about 30mb/sec. I don't use Gigabit yet for those reasons.
Given you are running at an observed 10mb/sec, I would drop the config of the opteron's ethernet card to 100mb. I find that two bridged 100's provide more stable delivery/transport than Gigabit. (I work with this all the time at work (L-3)).
I personally am also using Opterons... 250's to be precise. I have both ASUS and Tyan motherboards. The Tyan has 'auto-bridging' built into the chipset (7751 ethernet chipset) and by putting eth0 @ 192.168.x.1 and eth2 @ 192.168.x.2, they bridge beautifully. The inbound traffic gets full, low latency response. I am also using Fedora FC4 and Windows Xp Sp2 for this. It's all done in hardware and controlled by bios with exception of IP addy assignment.
In response to your possible threads, I would suggest you simply ensure your threads are properly spawned and 'thread safe' with appropriate semaphores as needed. I would then have a single receiver thread read in a packet, drop it in a 'to-be-processed slot', and then release the semaphore for that thread. The thread will then do its thing, and put itself back to sleep as it comes around and waits for the next read.
If you have multiple threads attempting to read against the single socket, you may run into trouble UNLESS you have each one sitting on the 'accept()' where the connection starts and then handle all outbound socket comm back to the client on the local thread-stack. Make sure to close and make sure that you aren't getting stuck in a 'fin-wait' state after the socket close call when processing is complete. I have a hunch you are getting stuck here due to corrupted gigabit data or dropped connections, for which you might not be handling the timeout properly. My rule of thumb is 5 retries at 10 second intervals... UNLESS IT IS OBVIOUSLY A DIALUP DATAFLOW RATE...
Then it's adjusted to 5 retries and no more than 2 overlapping streams. If you deduce it's a dialup user, immediately drop to single data stream. General intial setup would be to start as single stream (1 at time) from server to client and go from there.
Users on broadband / dsl will be able to obtain more connections, (typically 5) using this method, but it works until you settle out your streaming problem.
I am available to help with kernel, mobo, and appl tuning (server or client) as needed.
CC