Slow Server

**Mikus** · 06-21-2003, 07:48 AM

This morning a multi-generation upload is going quite quite slowly. I estimate that the elapsed time to upload will exceed 2% of the elapsed time it took to generate those filesets.

A disproportionately large amount of time to do an upload got to be a problem with Phase I, until the change was made to smaller filesets. Today's problem reminds me of those earlier slow times, but seems to involve server turnaround time -- my modem lights blink, then stay dark for more than 15 seconds, then blink again.

mikus

**Terminator** · 06-21-2003, 11:43 AM

Same here, it's taking anywhere between 1 to 3 minutes to upload each generation

Update - the latest attempt has timed out completely

**Chaser** · 06-21-2003, 12:26 PM

it's uploading... not very fast, but it does

**TyOI** · 06-21-2003, 06:24 PM

at this rate an appointment to upload schedule should be considered

book me for a 96 hour connection as at the current upload rate that is approx what will be required to upload our current work load

will return the cluster to the project once the problem has been resolved

Admin
TyOI

**Hua Luo Han** · 06-21-2003, 07:58 PM

And I have over a hundred buffered generations to send, why so slow

:sleepy:

I left D20L because of their slow server, hope DF can resolve this problem.

**Hua Luo Han** · 06-22-2003, 02:11 AM

Can someone address this issue please !

**Scotttheking** · 06-22-2003, 03:10 AM

I've got machines sitting there idle, trying to upload.
This is getting absurd. At any one time I can check my machines and find at least one using no processor.
My error logs are filled with "cannot connect to server" errors.

Edit: Collectively I have about 75 generations waiting to upload.

I can't put my secondary (and more powerful) farm onto DF until these problems are resolved since I don't have daily access to it.

**alpha** · 06-22-2003, 03:55 AM

If uploading is slow there is obviously something wrong that needs to be addressed. Instead of whining, why don't we all run the clients offline until the problem is resolved.

**Grumpy** · 06-22-2003, 03:59 AM

Because when people connect after running offline, they end up losing hundreds of generation trying to upload. We at OCworkbench have lost a very large amount of work because of this..our offline Folders are getting mighty upset..they are planning a lynching for someone

**alpha** · 06-22-2003, 04:20 AM

I run all my clients offline and have yet to lose any work when uploading buffered generations. If you and your team members are losing work due to the slow uploading speeds, why not buffer offline until the server is fixed.

**pfb** · 06-22-2003, 05:50 AM

Folding off-line for the moment as the server (for me) went down around 0830GMT...

**rstarr** · 06-22-2003, 10:48 AM

I've got 12 machines all on line and only occasionally will they upload the last Gen. Most systems have 10 or more Gen buffered but when the pc sends, it only send the last done Gen. All the others stay in the buffer.

I found a way, if the server don't time out, to upload from the Dos Text Box.

Here it is: I only had the Dos Text box running and then downloading and ran GUI 3.01. Configured the GUI they way I wanted it and then stopped the client. I then hit the upload button and it works most of the time to clear out the buffer. But, not always. I think trying at off-peak hours would be best.

**cedricvonck** · 06-22-2003, 10:56 AM

Yep, same problem here with me....
Last Friday I lost about 15 generations and 8 hours of work...

**Brian the Fist** · 06-22-2003, 11:01 AM

The only 'out of the ordinary' thing I see is that our database server has about 200 processes running simultaneously (as opposed to the usual 2 or 3). The bottleneck appears to be the speed with which the datamase manager can process and handle requests to insert into the database. We have a pretty powerful machine handling this and its CPUs are not overloaded, they are in fact mostly idle. So it seems to be an I/O problem possibly.

We have been making some hardware changes recently but none that we expect to affect the speed. I suspect the main difference is that the new algorithm uploads data much more frequently than the past one. Do others agree with this?

I have fairly little knowledge about databases in general so if this is the case I'm not sure what we could do short of increasing the generation size a bit to reduce the data influx. We'll see what we can come up with though.

**Hua Luo Han** · 06-22-2003, 11:06 AM

I have over 300 gens buffered, hope i won't loose them

**rstarr** · 06-22-2003, 11:12 AM

It's possible your database is written in SQL and not SQL + .

A flat database vrs a smart database.

But there are problems with uploading right now. I've seen it before (slow uploading) but it's really bad today.

Has anyone checked the server today?

**HaloJones** · 06-22-2003, 11:21 AM

With an upload every 5000 "old" structures, it might be uploading once per hour. NOw, sometimes the generations go by in just one or two minutes, sometimes 2 hours, so sometimes it will be uploading much more often, sometimes less often.

Either way, I'm getting generations queueing up all over the place.

**TyOI** · 06-22-2003, 11:53 AM

TyOI

our apologies but we have had to remove the cluster completely from the project till such time as the problem is resolved (in excess of 6 million generations spread accross 70 folders) and basically getting cannot cnnect to server messages when a client tries to accesss DF

as advised earlier will be watchful of the project and return once the problem of upload have been resolved

Admin
TyOI

**Scotttheking** · 06-22-2003, 01:43 PM

I'm about ready to pull my machines too.
I'm checking boxes, and more clients died.

scott@linuxbox:/distcomp/distribfold$ tail error.log
ERROR: [010.003] {taskapi.c, line 1217} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4127} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [010.003] {taskapi.c, line 1217} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4127} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [000.000] {foldtrajlite2.c, line 4127} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [010.003] {taskapi.c, line 1217} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4127} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [000.000] {foldtrajlite2.c, line 4050} Warning during upload: STATUS 910 MISSING PREVIOUS OR ILLEGAL GENERATION

ERROR: [000.000] {foldtrajlite2.c, line 4127} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER

In all honesty, what good is a DC project that keeps losing work and not accepting work completed?

**IronBits** · 06-22-2003, 01:50 PM

Patience...
If more folks would have helped out during the BETA phase, we might not be having some of these problems.

Howard IS working on it.

**Scotttheking** · 06-22-2003, 02:18 PM

Originally posted by IronBits
Patience...
If more folks would have helped out during the BETA phase, we might not be having some of these problems.

Howard IS working on it.

If Howard had had a mac client out during the beta test, I would have.

**furballexpress** · 06-22-2003, 02:20 PM

Same here as most everyone else. I have over 250+ WU waiting..and growing. I would imagine part of the issue is the hundreds more processes running than Howard thinks should be. Sounds like yet another issue of having a few more users (and results) than Howard originally anticipated...or the backend I/O is waaaay slower than everyone thought it would be.

And IB - Although I applaud your sentiments, and don't disagree to a point, this may or may not be something which would have been discovered with more testing. I've been in a number of DCPs where the beta was large in volume and widespread, and everything fine. Then the project started for real...and the response was so overwhelming that the beta could have been 10 times as big and not shown the ill effects of the true project launch.

Sadly, this is likely a two or more-part problem. Sit tight if you can, boys and girls...I bet it will get better. Also remember - it's not like YOU'RE DOWN and everyone else ISN'T, so no one's really getting that far ahead of anyone else.

**Scotttheking** · 06-22-2003, 02:26 PM

Part of the problem is that the client also keeps failing.
When it fails, you have to remove filelist.txt to restart.
I just removed 66 or so buffered generations. I figure I've lost at least 100, in addition to not ever making it to the higher ones, which give more points per generation.

**Brian the Fist** · 06-22-2003, 02:46 PM

We are victims of our own success

. Unfortunately the beta testing did not include load testing and the load is indeed higher than Phase I now.. Not to fear though, we'll find a way around it.

**Scotttheking** · 06-22-2003, 02:49 PM

Originally posted by Brian the Fist
We are victims of our own success . Unfortunately the beta testing did not include load testing and the load is indeed higher than Phase I now.. Not to fear though, we'll find a way around it.

Good to hear

**DB7654321** · 06-22-2003, 03:53 PM

Bah... Stop whining and be patient. I'm sure Howard wants these problems resolved more than anyone else. It will get done -- give it time.

Thread: Slow Server

Thread Tools

Rate This Thread

Display

Slow Server

Posting Permissions