PDA

View Full Version : eOn server/client



Combustion
08-22-2005, 08:55 AM
Originally posted by graeme
Hey, thanks for the offer -- that's very generous. What we really need, though, is someone who knows something about client/server code. I don't understand why our server runs into trouble when it does. When it starts given communication errors, it is well within it's physical memory (4GB), using about 1/100th of the theoretical bandwidth limit (10mbits/s of a 1000 mbits/s, gigabit line), and almost none of the cpu (dual opteron 246). I think there is something in our server code which is hitting a limit -- perhaps something to do with timing, threads, or the tcp/ip stack. I'm pretty convinced that with better software, our hardware would be fine. We are truly a bunch of network neophytes.

Firstly: 100mb ethernet will always deliver a stable 8.7mb/sec. However, if you ead the Gigabit Ethernet spec, you will a) wonder how it works at all and b) observe that you don't ever get much more than about 30mb/sec. I don't use Gigabit yet for those reasons.

Given you are running at an observed 10mb/sec, I would drop the config of the opteron's ethernet card to 100mb. I find that two bridged 100's provide more stable delivery/transport than Gigabit. (I work with this all the time at work (L-3)).

I personally am also using Opterons... 250's to be precise. I have both ASUS and Tyan motherboards. The Tyan has 'auto-bridging' built into the chipset (7751 ethernet chipset) and by putting eth0 @ 192.168.x.1 and eth2 @ 192.168.x.2, they bridge beautifully. The inbound traffic gets full, low latency response. I am also using Fedora FC4 and Windows Xp Sp2 for this. It's all done in hardware and controlled by bios with exception of IP addy assignment.

In response to your possible threads, I would suggest you simply ensure your threads are properly spawned and 'thread safe' with appropriate semaphores as needed. I would then have a single receiver thread read in a packet, drop it in a 'to-be-processed slot', and then release the semaphore for that thread. The thread will then do its thing, and put itself back to sleep as it comes around and waits for the next read.
If you have multiple threads attempting to read against the single socket, you may run into trouble UNLESS you have each one sitting on the 'accept()' where the connection starts and then handle all outbound socket comm back to the client on the local thread-stack. Make sure to close and make sure that you aren't getting stuck in a 'fin-wait' state after the socket close call when processing is complete. I have a hunch you are getting stuck here due to corrupted gigabit data or dropped connections, for which you might not be handling the timeout properly. My rule of thumb is 5 retries at 10 second intervals... UNLESS IT IS OBVIOUSLY A DIALUP DATAFLOW RATE...
Then it's adjusted to 5 retries and no more than 2 overlapping streams. If you deduce it's a dialup user, immediately drop to single data stream. General intial setup would be to start as single stream (1 at time) from server to client and go from there.

Users on broadband / dsl will be able to obtain more connections, (typically 5) using this method, but it works until you settle out your streaming problem.

I am available to help with kernel, mobo, and appl tuning (server or client) as needed.

CC

PCZ
08-22-2005, 10:04 AM
Combustion wrote:

"Firstly: 100mb ethernet will always deliver a stable 8.7mb/sec. However, if you ead the Gigabit Ethernet spec, you will a) wonder how it works at all and b) observe that you don't ever get much more than about 30mb/sec. I don't use Gigabit yet for those reasons."


You are making a bold statement about 1Gb ethernet aren't you ?
I use IGb ethernet and it deliveres a lot more than you are giving it credit for.

We as a company stream media, real and WMT.
This is extremely bandwidth intensive, it's not uncommon for us to be delivering aggregate bandwidths well in excess of 10Gb.

In the past this mean't having lots of streamers with 100Mb nics.
100's of them.

The modern servers with 1Gb nics are a real blessing for us as you need a lot less of them.

So how much do they deliver ?
We usually allow them to run up to 600Mb, before increasing the number of servers in a pool.
If you get a sudden demand, a news story just hits for example the servers often go up to an average of 800Mb each.

It is only when pushing above 80% of the nics capacity that we get concerned.
Connected to modern switches 1Gb nics work well.

If it has been your observation that !Gb nics only deliver 30MB {I presume you mean bytes not bits} then your network infrastructure really needs looking at.

graeme
08-22-2005, 08:36 PM
I think this issue of handling the network connections with threads is probably the source of the problem.

Although our server code seems to max out at 10 Mb/s, I can transfer files to and from the machine, while the server is running, and get transfer rates of about 300-400 Mb/s.

Using both ethernet ports is a neat idea. I didn't realize the motherboard could handle this with hardware.

If you are willing to put any more time into this, it would be extremely helpful if you could take a look at the client/server code. It sounds like you might be able to point to a problem area, or suggest a change that I could try. The parts of the code dealing with the sockets and threads is relatively small, so this might not take too much time.

The communication code, fida, is available through anonymous cvs from theory.cm.utexas.edu. There is also a tar file at http://theory.cm.utexas.edu/fida/ . The thread loop is in the main() routine at the end of the fida/source/server.c file. Basically the code listens for a connection and passes it to an unused thread when a connection comes in. I think this is the structure you are suggesting.

I do see some connections in FIN_WAIT and TIME_WAIT, but more in SYN_RECV and LAST_ACK states, using netstat. But I couldn't say if this is normal tcp/ip behavior or not.

We are not using sockets directly. Instead we are using the cosm package (http://www.mithral.com/projects/cosm/). All network and thread calls are contained in this package. These calls are easy to identify because they are all prefixed with v3, such as v3NetAccept and v3ThreadBegin. The main point of this package is to provide cross platform client/server functionality.

Anyways, I understand how tedious it is to debug someone else's code, so I don't expect anyone to dive in here. But any suggestions would be greatly appreciated.