Results 1 to 22 of 22

Thread: Feature Request: Upload mode retries after failure

  1. #1

    Feature Request: Upload mode retries after failure

    I have 2 folding machines that are offline so I have no choice but to transport them to a networked machine and upload them in batches every couple of days when I get a chance.

    Since this protein is so fast and the servers are loaded to the max, it took me from noon yesterday until about 8am this morning to upload all the structures because the clients kept getting "Cannot connect to server, will try later" messages and quit (since I am doing an upload only -u t on one of my networked machines).

    It would be really nice if the DF client was running in -u t mode and it lost the connection to the server but still has more structures buffered, that it would try again in 30 seconds, then 60 seconds, etc... so that I wouldn't have to babysit it and eventually the structures would get uploaded with little intervention from me.

    My guess is that a lot of other people would find this useful as well. Any possibility of condering adding a feature like that?

    Thanks,
    Jeff.

  2. #2
    I'm in total agreeance. It would be very beneficial if it could wait a preset time and then recommence the upload procedure.

    I've got 3 offline machines (there is no network connection to my garage) and each time I've brought one in to upload it manages to upload 5 or 6 gens then stops. If it could recommence 60 seconds later I might have a chance to get to almost 5,000 gens on those machines up to the server.

    Is there some way of writing a batch file to do it perhaps?
    Crunching for OCAU

  3. #3
    Senior Member
    Join Date
    Mar 2003
    Location
    Gilbert, AZ
    Posts
    157
    Originally posted by deranged128[OCAU]
    I'm in total agreeance. It would be very beneficial if it could wait a preset time and then recommence the upload procedure.

    I've got 3 offline machines (there is no network connection to my garage) and each time I've brought one in to upload it manages to upload 5 or 6 gens then stops. If it could recommence 60 seconds later I might have a chance to get to almost 5,000 gens on those machines up to the server.

    Is there some way of writing a batch file to do it perhaps?


    A simple batch file / script...

    @echo off
    cd\where-DF-is
    :START
    .\foldtrajlite -f protein -n native -ut
    echo sleeping for 30 seconds
    sleep 30
    goto START


    This should keep restarting when it times out...

    >>TJ

  4. #4
    Thanks TeeJay. I'm assuming you could tailor that to whatever time frame is concerned. I'll try it out tonight after work.

    Cheers
    Crunching for OCAU

  5. #5
    many thanks.
    many many thanks.
    about 2000 thanks acutally, since that is how many gens i have buffered.
    Driving home the sky accelerates and the clouds all form a geometric shape.

  6. #6
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    Originally posted by TeeJay
    A simple batch file / script...

    @echo off
    cd\where-DF-is
    :START
    .\foldtrajlite -f protein -n native -ut
    echo sleeping for 30 seconds
    sleep 30
    goto START


    This should keep restarting when it times out...

    >>TJ
    how does this work?

    I don't quite understand how it will know when the server has timed out

    Oh and is it totally safe (i don't wann risk my 2000 + buffered gens, which I planned on leaving buffered until the servers camed down [which they don't appear to be doing])

    anyone fancy commenting the bat file to show what each lne is doing (as i'm pretty useless at working with them).


    cheers

  7. #7
    add a little checking:

    Look in the directory for the second oldest file called foldblahblah. In my case that is fold_22_xxxxxxxx_80_xxxxxxxx_protein_36.log.bz2

    ****batch file contents****

    :start
    if exist fold_22_xxxxxxxx_80_xxxxxxxx_protein_36.log.bz2 goto upload
    if not exist fold_22_xxxxxxxx_80_xxxxxxxx_protein_36.log.bz2 goto exit

    :upload
    .\foldtrajlite -n native -f protein -u t
    goto start

    :exit

    ****batch file end****

    So this will try to upload over and over until only the most recent pair is still in place.

    If you have a directory with large numbers of files waiting to upload, run the above batchfile while crunching in a separate directory off-line.

    Whether this is a good idea for the project is another matter. The last thing we need is even more attempted connections

  8. #8
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    Originally posted by HaloJones

    .\foldtrajlite -n native -f protein -u t
    goto start

    does this bit execute ".\foldtrajlite -n native -f protein -u t" , and only advance to the "goto start" bit once foldtrajlite has uploaded what it can (wheether it's connection was broken or not) and then exited?

  9. #9
    Thanks for the extra code. I was using the first batch file successfully but did wonder what it would do when it got to the end of the buffered gens.

    I'm currently uploading off one of my normally offline machines. It started with 1300 buffered gens 4 hours ago and is now down to 1068 remaining. That's 232 in 4 hours or under 30 per hour. At this rate it will finish uploading in over 36 hours time. This was the smallest number of buffered gens on any of my offline machines. Whilst it's doing the upload I've got it running an 'alternate' project rather than exacerbate the problem by accruing yet more gens that will take too long to upload.

    As it is, when each of my offline machines completes it's upload I'll be taking them to another project. I'll still have a number of online computers folding away but if the problems continue to get worse I would have to start thinking seriously about moving them also. I understand some of the projects major contributors are out of action, Bong88 from OCW comes to mind, due to the server problems and I can only imagine how much worse it would be if contributors like that were also trying to upload.

    Food for though, but I did like Ironbits suggestion in another thread regarding the handling of data at the backend.
    Crunching for OCAU

  10. #10
    Originally posted by jonnyw
    does this bit execute ".\foldtrajlite -n native -f protein -u t" , and only advance to the "goto start" bit once foldtrajlite has uploaded what it can (whether it's connection was broken or not) and then exited?
    Yes... once foldtrajlite exits (which it does after the upload finishes or times out) the batch file starts the client again, and again...
    Crunching for OCAU

  11. #11
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Too Many Connections

    Whether this is a good idea for the project is another matter. The last thing we need is even more attempted connections
    Alas... I agree with Halo... The use of these never ending bat files is swamping the back end... Now the online folders are starting to accumulate generations. That had not happened as of yesterday...

    Ned

  12. #12

    Re: Too Many Connections

    Originally posted by Ned
    Alas... I agree with Halo... The use of these never ending bat files is swamping the back end... Now the online folders are starting to accumulate generations. That had not happened as of yesterday...

    Ned
    Funny, I've had recurring problems with my 'online' clients with them cacheing generations due to time-out problems. These problems were there in a big way last Friday (Australian time) and occurred off and on over the past 2 days. The only clean run I've had ( managed to upload over 5,000 cached generations from the weekend) was on Monday (Sunday in the USA/Canada). The online clients usually do catch up after a few more attempts though.

    You may have noticed that connecting is usually not the problem, it's once connected and the dialogue starts between the upload server and database server that the timeout occurs.

    As for swamping the backend, I'm pretty sure that was happening a long time before people like myself started using batch files in an effort to upload long cached work.
    Crunching for OCAU

  13. #13
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92
    As for swamping the backend, I'm pretty sure that was happening a long time before people like myself started using batch files in an effort to upload long cached work.
    I'll agree that the servers were already swamped before... It just seemed like it was WORSE this morning than previously experienced. My online machines were not accumulating gererations that I had noticed before this morning...

    Ned

  14. #14
    The batch file is interesting but as someone pointed out, the first one will contionue forever even if the client has finished uploading, and the second one is kinda of a pain since you have to figure out which is the second to last pair of structure files, etc...

    It would be really nice if that feature was built into the client so it could take care of it automatically and stop when it is finished.

    But in the mean time, the batch file will be useful, thanks for posting it.

  15. #15
    We will consider adding a re-try feature; however, as a few members have already pointed out, in the long run it may result in even greater waiting times, as the server will constantly be bombarded with connection attempts.
    Elena Garderman

  16. #16
    Senior Member
    Join Date
    Apr 2002
    Location
    Santa Barbara CA
    Posts
    355
    I don't know how to do this in windows, but if you checked for how many lines filelist.txt was you could stop trying to upload once it was down to 2 lines.

  17. #17
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Windows doesn't have "wc"?
    HOME: A physical construct for keeping rain off your computers.

  18. #18
    Originally posted by Stardragon
    We will consider adding a re-try feature; however, as a few members have already pointed out, in the long run it may result in even greater waiting times, as the server will constantly be bombarded with connection attempts.
    Well its better that the software does it than me having to manually do it a million times...

    But that is why you would have the time fall back to some maximum so it i failed the first time, you wait 30 seconds, then 60, then 120, until some maximum and you won't get constantly bombarded with connection attempts, just periodically.

  19. #19
    Alive and XXXXing
    Join Date
    Nov 2003
    Location
    GMT +3
    Posts
    55

    Batch Script

    The following batch file is what I wrote and use. It intelligently stops when everything is done. It's not idiot-proof, but I've been using it for over a week now with no problems.

    If you want to stop uploading, delete the "upload.lock" file in the directory. The BAT checks for it every time it restarts the upload process. If you just kill the BAT, you can end up losing your error.log file.

    The "if [%E2%]==[] set E2=0. . . . ." (first 30 or so lines) at the beginning is just for the counter. Don't delete it!

    Code:
    @echo off
    
    IF EXIST error.log REN error.log error.log.old
    
    :restart
    
    
    :: *************************
    ::          COUNTER
    :: *************************
    
    :: Increments a three digit number
    :: Works by comparing each digit
    :: E2=hundreds, E1=tens, E0=ones
    
    if [%E2%]==[] set E2=0
    if [%E1%]==[] set E1=0
    if [%E0%]==[] set E0=0
    :E0
    if %E0%==9 goto E1
    if %E0%==8 set E0=9
    if %E0%==7 set E0=8
    if %E0%==6 set E0=7
    if %E0%==5 set E0=6
    if %E0%==4 set E0=5
    if %E0%==3 set E0=4
    if %E0%==2 set E0=3
    if %E0%==1 set E0=2
    if %E0%==0 set E0=1
    goto DONE
    :E1
    set E0=0
    if %E1%==9 goto E2
    if %E1%==8 set E1=9
    if %E1%==7 set E1=8
    if %E1%==6 set E1=7
    if %E1%==5 set E1=6
    if %E1%==4 set E1=5
    if %E1%==3 set E1=4
    if %E1%==2 set E1=3
    if %E1%==1 set E1=2
    if %E1%==0 set E1=1
    goto DONE
    :E2
    set E1=0
    if %E2%==9 set E2=0
    if %E2%==8 set E2=9
    if %E2%==7 set E2=8
    if %E2%==6 set E2=7
    if %E2%==5 set E2=6
    if %E2%==4 set E2=5
    if %E2%==3 set E2=4
    if %E2%==2 set E2=3
    if %E2%==1 set E2=2
    if %E2%==0 set E2=1
    goto DONE
    :DONE
    
    :: *************************
    ::       COUNTER END
    :: *************************
    
    
    	ECHO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    	ECHO Starting Upload (Attempt: %E2%%E1%%E0%)
    	echo Upload lock file > upload.lock
    	START /w /m foldtrajlite -f protein -n native -df -u t 
    	
    	FIND /C "NO RESPONSE FROM SERVER" error.log > nul
    	:: FIND returns an errorlevel of 1 or higher if the search string WASN'T found.
    	IF ERRORLEVEL 1 GOTO finished
    	IF NOT EXIST Upload.lock GOTO quit
    
    	copy error.log.old+error.log error.log.tmp > nul
    	del error.log.old
    	del error.log 
    	ren error.log.tmp error.log.old
    	
    	GOTO restart
    
    :finished
    
    	echo Upload finished !!!
    	GOTO end
    
    :quit
    	echo Lock file not found, quitting . . . 
    	GOTO end
    
    :end
    	IF EXIST Upload.lock DEL Upload.lock
    	@copy error.log.old+error.log error.log.tmp
    	del error.log.old
    	del error.log 
    	ren error.log.tmp error.log
    
    	@echo off
    	cls
    P.S. Forgot to mention - this runs fine on Win98 (my office machine with fast i-net). Didn't check with 2K or XP - but hey, this is generic BAT code, should run. Let me know if it doesn't.

  20. #20
    Registered User gOhAsE's Avatar
    Join Date
    Jul 2003
    Location
    Germany
    Posts
    24
    That Batchfile looks great!

    I'll try it tomorrow and report.

    Thank you!

  21. #21
    Alive and XXXXing
    Join Date
    Nov 2003
    Location
    GMT +3
    Posts
    55
    Hmm - looks like under XP you need to change

    Code:
    START /w /m foldtrajlite -f protein -n native -df -u t
    to

    Code:
    START /w foldtrajlite -f protein -n native -df -u t
    i.e. get rid of the "/m".

    Also, the ECHO commands seem to be a bit screwed up - i.e. different form win98, but otherwise it runs fine. Get ride of the

    Code:
     ECHO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    line, and it will even look OK.

  22. #22
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137


    Xelas

    Use the right tool for the right job!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •