Results 1 to 10 of 10

Thread: farming feature request

  1. #1
    Big Fat Gorilla guru's Avatar
    Join Date
    Dec 2001
    Location
    Warren, OR
    Posts
    501

    Lightbulb farming feature request

    Is there any way we can get an option added to foldtrajlite to specify that the system is not networked and to package the results into a specified path? Then have a separate mini client that's only purpose it to upload these results once they are copied to a system that has access to the internet?

    Basically I want the client to dump all re results into a directory that is specified. That way I can either have all the clients dump their results via nfs or scp'd onto a single system. Then from there they can be uploaded via a mini version of the foldtrajlite which only has the code to upload results.

    It would make sneakerneting much easier since the client would never have to be stopped and restarted to get the work results. It would also make it easy to create a third party application that could act as a proxy to upload results from people's farms.

    It may also help with the daytime server loads as the upload process could be automated and updating could take place at night when the server is less loaded.

    thanks,
    guru

    It's great to be back.
    I'm having fun!!! I'm just not sure if it's net fun or gross fun.

  2. #2

    Re: farming feature request

    Originally posted by guru
    It may also help with the daytime server loads as the upload process could be automated and updating could take place at night when the server is less loaded.
    I agree the option would be great but please remember that your night is my day

  3. #3
    Big Fat Gorilla guru's Avatar
    Join Date
    Dec 2001
    Location
    Warren, OR
    Posts
    501
    Yes, but a third party proxy could space out the uploads as needed depending on a load value given to it from the main server. If the main server is lightly loaded it would let the proxy know to upload as much as possible and if it was under a heavy load it could wait a given time before it trys to upload the next result thus evening the main server's load. This would help prevent large farmers from overloading the main server when they do their daily dumps.

    guru
    I'm having fun!!! I'm just not sure if it's net fun or gross fun.

  4. #4
    I have some scripts that I wrote to do this exact thing. I'll post the UNIX version that I run on my FreeBSD box.

    The only caveat is that you have to manually restart the folding program. The folding client will need to read in the new filelist.txt file anyway.

    This perl script prints out how many gens are buffered:

    ###########################

    #!/usr/bin/perl -w

    use strict;

    my $folddir = "/fold/distribfold";
    my $uploaddir = "/tmp/upload";
    my $uploaddir2 = "$uploaddir/distribfold";
    my $filelist = "$uploaddir2/filelist.txt";
    my $filepairs = 0;
    my $uploaded = 0;

    if(!-e $filelist) {
    print "0\n";
    exit(0);
    }

    open(READ, $filelist) || die "Could not open $filelist: $!\n";

    while(<READ>) {
    chomp($_);

    if(/^\.\//) {
    $filepairs++;
    }

    elsif(/^CurrentStruc/) {
    my @tmp = split(/ /, $_);
    $uploaded = $tmp[5];
    }
    }
    close(READ);

    # Divide by 2 to get the number of file pairs
    $filepairs /= 2;

    printf("%d\n", $filepairs - $uploaded);

    ###########################

    This (shell) script will check to see if there are gens buffered for upload and resume. Otherwise, it'll stop the running client and copy the folding directory. It will continue to upload until it has submitted all of the gens.

    ###########################
    #!/bin/bash

    FOLDDIR=/fold/distribfold
    FOLDBIN=$FOLDDIR/foldtrajlite
    LOCKFILE=$FOLDDIR/foldtrajlite.lock
    UPLOADDIR=/tmp/upload
    UPLOADBIN=$UPLOADDIR/distribfold/upload
    CP=/usr/local/bin/gcp
    COUNT=1

    cd $UPLOADDIR

    GENS=`$UPLOADDIR/listqueue.pl`

    if [ $GENS -gt 1 ]
    then
    echo "Previous upload incomplete. Resuming."

    # upload
    cd $UPLOADDIR/distribfold

    # execute the perl script to find out how many gens are buffered
    GENS=`$UPLOADDIR/listqueue.pl`

    while [ $GENS -gt 1 ]
    do
    echo "Connection $COUNT"
    $UPLOADBIN
    GENS=`$UPLOADDIR/listqueue.pl`
    COUNT=`expr $COUNT + 1`
    done

    exit 1
    fi

    # remove the lock file to stop the folding program
    rm $LOCKFILE
    echo "Stopping folding program"
    sleep 5

    # remove the current upload directory
    rm -Rf $UPLOADDIR/distribfold

    # gcp the new directory
    $CP -a $FOLDDIR .

    # purge the upload list from the active folding directory

    echo "Purging upload list from the active folding directory"
    cd $FOLDDIR
    $FOLDBIN -purgeuploadlist 10000

    echo "Please restart folding program"

    # upload
    cd $UPLOADDIR/distribfold

    # execute the perl script to find out how many gens are buffered
    GENS=`$UPLOADDIR/listqueue.pl`

    while [ $GENS -gt 1 ]
    do
    echo "Connection $COUNT"
    $UPLOADBIN
    GENS=`$UPLOADDIR/listqueue.pl`
    COUNT=`expr $COUNT + 1`
    done
    ########################################


    Finally, I there's a script/batch file that you need in your folding directory that just uploads only. I just called the script 'upload' and it's a copy of the foldit script with the '-u t' flag appended to the foldtrajlite call so that it'll upload only.

    Yeah, so unfortunately, it requires three scripts, but once you get it working, it should be smooth sailing. One thing that you should probably add is something that'll back up the directory with all of the gens (in case something goes wrong). I don't want to be responsible for lost work. Hope that helps.
    --phil

  5. #5
    Big Fat Gorilla guru's Avatar
    Join Date
    Dec 2001
    Location
    Warren, OR
    Posts
    501
    Thanks Phil, but I've already created my own scripts for passing the work around. The big problem is things don't always go as planned and some part of the data file get's hosed and all the work is lost. It would be much easier if you could specify an output directory where the data files are put should it be run with the no net option. Second instead of a single file that contains a list of all the work files the output should be put into easy to manage files. Combine all the files for a single upload unit into a single .bz file with the checksum in the name of the file. That way it's easy to move the files around without losing some small part that will render the rest of the work units useless.

    guru
    I'm having fun!!! I'm just not sure if it's net fun or gross fun.

  6. #6
    Big Fat Gorilla guru's Avatar
    Join Date
    Dec 2001
    Location
    Warren, OR
    Posts
    501

    Angry Enough is enough!

    Is anyone listening?

    I just lost about 3800 results due to a zip issue where it couldn't compress the results due to the number of results in the directory. I got some error saying too many arguments passed. I lost all the results on 4 out of 10 systems over the weekend.

    We need a better packaging of the work units right out of foldtrajlite. Simply dumping them into a directory with a single file that contains the list of files is not robust enough. I'm sure I speak for many more people who have lost data while trying to move results off of non internet connected systems.

    Based on what the program is currently doing it can't be that hard to implement what I have asked for. If you don't understand what I'm asking for then please respond with some questions. I will be more then happy to answer them.

    guru
    I'm having fun!!! I'm just not sure if it's net fun or gross fun.

  7. #7
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    I'll second that guru

  8. #8
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92
    We need a better packaging of the work units right out of foldtrajlite. Simply dumping them into a directory with a single file that contains the list of files is not robust enough.
    I've already made a suggestion that they package their data into multigeneration units to streamline and minimise the processing at the server end. I think this is just another reason this should be done... That way, the system has to only hang onto those data files until it has enough for a multigen result, it can process them into a combined generation result, then delete the data files. It may need to keep another file with the multigenerational results until it can upload, but that is a whole lot less files.... (and they could keep duplicates for extra precaution...)

    I'm sure they can come up with a design that can keep their security intact over multiple data records just as it is for their current single record design... Ditto for their DB design.

    Ned

  9. #9
    I am not clear what the issue is with having several files versus one packaged file. How can you 'lose' a file? Its not like you have a briefcase full of papers, and one could fall out.. Anyways having a single packaged upload with multi-generations in it is something we are planning to add in a future release but will require some rewriting and significant testing. Definitely planned though.
    Howard Feldman

  10. #10
    Alive and XXXXing
    Join Date
    Nov 2003
    Location
    GMT +3
    Posts
    55
    I am not clear what the issue is with having several files versus one packaged file. How can you 'lose' a file? Its not like you have a briefcase full of papers, and one could fall out.. Anyways having a single packaged upload with multi-generations in it is something we are planning to add in a future release but will require some rewriting and significant testing. Definitely planned though.
    Easily! Filelist.txt is just a single text file. It gets corrupted all the time by the client. It borks if the machine crashes (HD buffer not getting written ASAP?). And when it does, all the (perfectly good) results in the directory become so much wasted space.

    Happens all the time. It's just not reliable to hang so much work on one scrappy txt file!

    Look at any decent file system (NTFS, whatever Linux commonly uses,etc.). Look at any decent database. They all implement journaling and ROLLBACK. If something borks, you only lose the latest little bit.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •