Results 1 to 36 of 36

Thread: A couple requests/modifications for the client.

  1. #1
    Junior Member
    Join Date
    Jul 2003
    Location
    Connecticut, the dumbest place on earth
    Posts
    9

    Lightbulb A couple requests/modifications for the client.

    Hi there. I have a couple requests for the client:

    Vectorized version for OS X. Take advantage of the altivec routines.

    MPI based client. I am sure there are many people out there that would like an MPI based client and have resources to spare (I have 90 procs sitting idle right now).


    Even better, an altivec optimized, MPI based client for OSX.

    Also is the sparc client compiled with forte? or did you do it with gcc? I would think the 64bit version would be compiled with forte as gcc 3.3 doesn't produce the cleanest 64bit code.

    - Derek
    Ars Technica Team Stir Fry

  2. #2

    Re: A couple requests/modifications for the client.

    Originally posted by derek
    Hi there. I have a couple requests for the client:

    Vectorized version for OS X. Take advantage of the altivec routines.

    MPI based client. I am sure there are many people out there that would like an MPI based client and have resources to spare (I have 90 procs sitting idle right now).


    Even better, an altivec optimized, MPI based client for OSX.

    Also is the sparc client compiled with forte? or did you do it with gcc? I would think the 64bit version would be compiled with forte as gcc 3.3 doesn't produce the cleanest 64bit code.

    - Derek
    Ars Technica Team Stir Fry
    I know that Howard examined the Altivec optimizations in the past and concluded they would not significantly speed up the client.
    A member of TSF http://teamstirfry.net/

  3. #3
    for the 32-bit Solaris version, cc -V gives:

    cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2

    for the 64-bit one:

    cc: Sun WorkShop 6 2000/04/07 C 5.1

    altivec will not significantly speed up stuff (there are old threads on this discussion if you search). I am not clear on how 'making an MPI client' will somehow make it better.
    Howard Feldman

  4. #4
    altivec will not significantly speed up stuff (there are old threads on this discussion if you search). I am not clear on how 'making an MPI client' will somehow make it better.
    Um. Some people have 256 node clusters they could run folding on. Some of those people don't have time to install it on every single machine (sometimes twice).

    You don't see how making an MPI client would make it better? You're a dumbass.

  5. #5
    Administrator Dyyryath's Avatar
    Join Date
    Dec 2001
    Location
    North Carolina
    Posts
    1,850
    Originally posted by ddn
    Um. Some people have 256 node clusters they could run folding on. Some of those people don't have time to install it on every single machine (sometimes twice).

    You don't see how making an MPI client would make it better? You're a dumbass.
    Um...easy there, ddn. I think a simple, "We've got clusters that could easily be of use if we had an MPI client" would have been sufficient.

    The personal attack at the end doesn't help your case at all.
    "So utterly at variance is destiny with all the little plans of men." - H.G. Wells

  6. #6
    Originally posted by ddn
    Um. Some people have 256 node clusters they could run folding on. Some of those people don't have time to install it on every single machine (sometimes twice).

    You don't see how making an MPI client would make it better? You're a dumbass.
    Ok, Ill just pretend I didnt see that last statement. Now please explain how making an 'MPI client' will make it easier to run on a 256-node cluster. You mean so there could be one EXE which then forked 256 times and did 256 separate jobs? I'm not clear why MPI would be needed to do this or how it would help. If you don't want to explain it that don't expect to get it.
    Howard Feldman

  7. #7
    Junior Member
    Join Date
    Jul 2003
    Location
    Connecticut, the dumbest place on earth
    Posts
    9
    Originally posted by Brian the Fist
    Ok, Ill just pretend I didnt see that last statement. Now please explain how making an 'MPI client' will make it easier to run on a 256-node cluster. You mean so there could be one EXE which then forked 256 times and did 256 separate jobs? I'm not clear why MPI would be needed to do this or how it would help. If you don't want to explain it that don't expect to get it.
    Not for 256 seperate jobs, for utilization with the current job. Send a seperate thread to each processor, thereby speeding up the entire process. It may not give a x256 boost in performance, but with a little rewriting, and if your program is utilizing threads, it would speed it up.

    If you have N of processors (think node wise), give each processor a piece of the current problem and the performance should increase by N.

    If your problem is too small it probably won't help. Lots of factors here (interconnect, memory, code, load), but generally there is a decent boost. I see it all the time.

    Who knows, it may take some sort of cavalier effort to adapt your code to MPI, but 99% of the researchers I know are willing to go through the frustration of MPI and parallelizing their code in an effort to get the best bang for their budget.

    Without actually digging through your algorithm, I am not sure it would help (perhaps it's too linear), but my thought is this: I have a 45 nodes that are going to essentially sit idle until school startup, this could provide you with a decent amount processing power.

    On the Solaris subject, if you have access to the latest Sun ONE tools (forte 7) you will squeeze better performance with a recompile, especially for Ultra Sparc IIIs. Also do you link across Sun's math libs?

    Forte Performance Docs - http://docs.sun.com/db/doc/816-2461
    http://docs.sun.com/db/doc/816-2463

    Sun's blas stuff is very, very nice.

    - derek

  8. #8
    Junior Member
    Join Date
    Jul 2003
    Location
    Rochester, MN
    Posts
    5
    Whether MPI is valuable or not is largely based on the aims and desires of the project. If you are serious about absolute performance and results, then MPI is a very good route to efficiency, ease of deployment, and ultimately more results.

    If you are using the "flock of chickens" approach, it may be entirely satisfactory to just throw more chickens at the problem, rather than make them more efficient and effective. Since this is a volunteer project and nobody is really paying for the cycles, then chickens might be okay.

    But then again, the people doing serious high performance computing, whether that be distributed, clusters, or traditional supercomputing would laugh and point at you. Effectively the "rice-boy" of performance computing.

    I would be really happy for a properly pthreaded version of the client as would, I imagine, a lot of other dual (or better) cpu) box owners. It would make life much easier on the 128 cpu Origin 2000.

    -Dean

  9. #9
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76
    The thing Howard would have to weigh though is the amount of CPU power that would be donated full time towards the project and then weigh that against the time (and any frustrations of yet another client) to see if the pros out weigh the cons. Hence the reason that (IMO) the Linux and Windows client (Windows of course being even higher priority than the Linux) due to the amount of Linux and Windows users compared to others.

    I'm sure that probably 85% of the clients being run are most likley Windows and then Linux coming in second and then the rest split (Only Howard may know this for sure from the results coming in from the respective clients) I myself use a mixture of Linux and Windows, but theres alot of people who use Windows Only (I think IB has something like 30 boxen that are mostly, if not all, Windows)

    I'm sure though he'd give it some good consideration as it sounds as if there could be some good CPU cycles sitting there idle.




  10. #10
    Junior Member
    Join Date
    Jul 2003
    Location
    Connecticut, the dumbest place on earth
    Posts
    9
    Originally posted by ^7_of_9
    The thing Howard would have to weigh though is the amount of CPU power that would be donated full time towards the project and then weigh that against the time (and any frustrations of yet another client) to see if the pros out weigh the cons. Hence the reason that (IMO) the Linux and Windows client (Windows of course being even higher priority than the Linux) due to the amount of Linux and Windows users compared to others.

    I'm sure that probably 85% of the clients being run are most likley Windows and then Linux coming in second and then the rest split (Only Howard may know this for sure from the results coming in from the respective clients) I myself use a mixture of Linux and Windows, but theres alot of people who use Windows Only (I think IB has something like 30 boxen that are mostly, if not all, Windows)

    I'm sure though he'd give it some good consideration as it sounds as if there could be some good CPU cycles sitting there idle.

    Ok fair enough (although I would put linux in first place),however, I would venture to say that a 128 proc Origin 2k would utterly destroy ANY competing windows solution. Even an origin (or for that matter any piece of big iron or mid range equipment) working part time could produce more results faster than lots of small windows boxes. Especially when people are playing games on them, or doing whatever.

    - derek

  11. #11
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211
    Even still, I have dual Window's boxen and dual Linux boxen. Life would be a whole lot easier by having a single client like the d.net client.
    Train hard, fight easy


  12. #12
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76
    Originally posted by derek
    Ok fair enough (although I would put linux in first place),however, I would venture to say that a 128 proc Origin 2k would utterly destroy ANY competing windows solution. Even an origin (or for that matter any piece of big iron or mid range equipment) working part time could produce more results faster than lots of small windows boxes. Especially when people are playing games on them, or doing whatever.

    - derek

    Even when you've got say over 10,000-15,000 Windows machines plus any/all the Linux Boxen hanging around? (I'm sure it's actually more, as you have to remember there is about 20k+ members but some (not all of course) run farms/Multiple machines. I've got a few machines myself running and I only have one that I play games on the rest are 24/7 crunchers)

    The only thing is can you say that if Howard puts the effort into getting a client for what you and others would need, can you guarentee those processors for the remainder of this project?

    Now, I'd like to see a client that your proposing cause I'd really like to see the ouput of what you've got running/available.

    (The Numbers I took off the top of my head which I thought *may* be a decent estimation, but personally I think it would be kind of interesting if there was some way Howard could get some statistics/graphs on the number of machines/clients running within say the last 2 weeks (with totals included of course))


    [EDIT] I put the windows in first place as I think the majority of people are using Windows mostly. Linux hasn't made THAT big of a dent ... YET

  13. #13
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    For the two teams I've been associated with - Windows is what the majority are running. OCWorkbench was almost exclusively Windows, with one or two dabbling with Linux. (Ask Grumpy if that's changed much).
    The folks at TheGenomeCollective seem a bit more diverse - with a bit higher representation from the PsYcHo PeNgUiN genre, and at least one other OS (at least in the past). But the majority is still Windows.

  14. #14
    Junior Member
    Join Date
    Jul 2003
    Location
    Connecticut, the dumbest place on earth
    Posts
    9
    Originally posted by ^7_of_9
    Even when you've got say over 10,000-15,000 Windows machines plus any/all the Linux Boxen hanging around? (I'm sure it's actually more, as you have to remember there is about 20k+ members but some (not all of course) run farms/Multiple machines. I've got a few machines myself running and I only have one that I play games on the rest are 24/7 crunchers)
    There are 20,000 users registered. There are not that many people doing it all the time (as in dedicated hosts). If that was the case I would not be 783 overall (after 2.5 days). I am testing with around 10-12 machines. Dean's comment about the "ricer" approach is accurate.


    The only thing is can you say that if Howard puts the effort into getting a client for what you and others would need, can you guarentee those processors for the remainder of this project?
    Check my previous post. I said I have some spare processing time right now. That's all. However, thats not to say I wouldn't mind dedicating some machines when I experience slow processing times (assuming the client improves in stability - I have had to restart my clients several times on Sparc and OS X).


    Now, I'd like to see a client that your proposing cause I'd really like to see the ouput of what you've got running/available.
    I would love to see the numbers as well, like I said in earlier, I am 783 overall (as of 10:41 EST). Thats after 2.5 days of work, and with almost no machines. I know several people that have access to huge clusters (much larger than my tiny grid), who have downtime right now until fall. And those folks are pretty competative (i.e. they like testing this types or projects). I find this stuff absolutely amazing, and thats one of the reasons why I work in research.

    - derek

  15. #15
    I for one would love to see a cluster/beowulf approach where one could take older machines and bond them as one. Causing a greater output than the individual machines combined.


    But then again I'm a technoFreak, Imagining things that may not be physically possible.

    Ahhhhh! Who knows, one day someone will eliminate the data dependencies of parallel processing. But as for now, the data is not very fit for parallel processing. Each step is dependant upon the previous step.

    In order to acheive the ultimate would require bonding the processor registers to the network client. i.e. not on the here's a bunch of work to do, but here is one thing to do and a 1000 processes are waiting; could you hurry!

    Then you run into network capabilities; which makes chopping things up to make them run faster impossible.

    Damn data dependency calculations.

    But you could calculate your mortgage or car payment really, really, really, really fast.

    I think, therefore I am a PinHead

  16. #16
    Junior Member
    Join Date
    Jul 2003
    Location
    Rochester, MN
    Posts
    5
    The only thing is can you say that if Howard puts the effort into getting a client for what you and others would need, can you guarentee those processors for the remainder of this project?

    Now, I'd like to see a client that your proposing cause I'd really like to see the ouput of what you've got running/available.
    I am perfectly willing to work on the code as well, although I leave on a two week vacation in a couple of days. I was part of the optimization team at SGI for SETI@Home in the early days and I work on very big iron every day. On the side, I am the sysadmin for a research group using Amber, running on a couple of decent-sized clusters that I built, one with myrinet. Before that I put together a 420cpu cluster for high throughput therapeutic drug screening. A long time ago I vectorized and parallelized Zuker's rnafolding software.

    If those in charge of the code want more specifics about my qualifications, they can give me a holler.

    -Dean

  17. #17
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I would imagine we at OCworkbench are 95 % Windows, though the recent stuff up has got a couple of Members to Dual Boot or at least give Linux a go. It is as said previously, a matter of resources going to what will give the maximum results. I would love to be running Linux on all my Boxes, but after a long fight with the Grim Reaper, I have corrupted memory and cracked solder joints, so this Genuine Dumbass is left with the computer OS for the lazy
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  18. #18
    While there may be a few large multi-node systems out there with little to do, this project would benefit infinitely more by ensuring that the Windows and Linux clients (particularly the Windows one) would work 100%.

    Because of the memory leak in the Phase2 client, I have lost access to 160GHz of P4s. That's pretty major but pales into insignificance when you consider how many have given up in the last week.

    Reassuring the user base that changes have been made to the procedures for software release would do more for this project than anything else.

  19. #19
    I happen to have about 5 years experience with parallel computing and Beowulf clusters actually. We have had a Beowulf cluster here for quite some time, and across the street at Hospital for Sick Children is a 128-CPU Origin 2000 (or something like that). Both have been running Distributed Folding happily. The client was designed intentionally such that each CPU could do an autonomous piece of work, with no communication to other CPUs. Each CPU carries out its own separate evolutionary experiment, and the more experiments done the better the chances of finding a 'good' one.

    So I guess my main problem with this is, suppose you have a 256-CPU super computer. Right now you must install 256 copies and run them separately. Now yes, it might take up 5 Gig of disk space (big deal) and yes, it might use 25 Gig of RAM with the -rt option (and you'd better have more than that in a 256-CPU box). But doing so you achieve essentially a full 256 times speedup. All the starting and stopping and monitoring can be easily scripted, and so can the installation. I installed DFP on our Beowulf cluster in about 5 minutes, and it is independent of the number of nodes since its all scripted.

    So I'm still not sure what we gain by using MPI, and what the MPI will actually DO for us when we do not require communication between nodes. One possibility is you could do all 50 structures of a given generation in parallel on 50 CPUs, but then you are still stuck waiting for the slowest one to complete and get less thanoptimal speedup. The truth of the matter is, you get the best speedup by keeping each CPU with its own independent process so it can go at its own pace.

    P.S. for those talking about clusters with 'downtime' now, please remember by default the client runs at the lowest possible priority. Thus it is safe to leave it running all year round. When any other processes are running, the client automatically comes to a dead halt on any well-behaved operating system (i.e. not Windows )
    Howard Feldman

  20. #20
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    Originally posted by ddn
    You're a dumbass.
    that isn't very nice
    Use the right tool for the right job!

  21. #21
    Junior Member
    Join Date
    Jul 2003
    Location
    Rochester, MN
    Posts
    5
    So I'm still not sure what we gain by using MPI, and what the MPI will actually DO for us when we do not require communication between nodes. One possibility is you could do all 50 structures of a given generation in parallel on 50 CPUs, but then you are still stuck waiting for the slowest one to complete and get less thanoptimal speedup. The truth of the matter is, you get the best speedup by keeping each CPU with its own independent process so it can go at its own pace.
    The correct way to code that would be to have the N slaves complete their work and come back for more. Then fast nodes are able to keep busy while waiting for the slower nodes to complete. An old boss referred to that as a "hungry puppy" algorithm. Although in pvm, I did this for a 3D elastic registration algorithm and it worked fabulously because not all chunks of data took the same amount of time to complete. I got a 13x speedup on 15 nodes. Another benefit is that you don't have to assume any node counts and split things up beforehand. Just let'r rip.

    P.S. for those talking about clusters with 'downtime' now, please remember by default the client runs at the lowest possible priority. Thus it is safe to leave it running all year round. When any other processes are running, the client automatically comes to a dead halt on any well-behaved operating system (i.e. not Windows )
    The only problem with leaving them running is that people see that they are running during the day and *assuming* that they are slowing them down. Facts don't seem to placate those weenies. I monitor my big Origin runs (90+ copies) and it really does seem to stay out of the way. The only time it gets in the way is when memory is tight and it does visibly slow things down. Thankfully the Origins have like 64G of memory.

  22. #22
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by dtj
    The correct way to code that would be to have the N slaves complete their work and come back for more. Then fast nodes are able to keep busy while waiting for the slower nodes to complete.
    That's how it works now. Each node has the task of completing a set of generations, and the nodes that finish first start on a new set.

  23. #23
    Junior Member
    Join Date
    Jul 2003
    Location
    Rochester, MN
    Posts
    5
    Originally posted by AMD_is_logical
    That's how it works now. Each node has the task of completing a set of generations, and the nodes that finish first start on a new set.
    I'm talking about an MPI version, rather than the single threaded way that it is done now.

    -Dean

  24. #24
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by dtj
    I'm talking about an MPI version, rather than the single threaded way that it is done now.
    So currently: running a script creates N processes to run on the N nodes.

    You want: a process creates N threads to run on the N nodes.

    I agree with Howard -- doing that would gain nothing. It would just be a waste of time.

    And programming time is valuable. A little time spent on the algorithm could easily increase the crunching speed an order of magnitude for the entire project.

  25. #25
    Junior Member
    Join Date
    Jul 2003
    Location
    Rochester, MN
    Posts
    5
    And programming time is valuable. A little time spent on the algorithm could easily increase the crunching speed an order of magnitude for the entire project.
    That is precisely why I offered to help. I rescind my offer and will just let you PC's weenies play science and take my cycles to a project actually interested in performance computing.

    -Dean

  26. #26
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by dtj
    That is precisely why I offered to help. I rescind my offer and will just let you PC's weenies play science and take my cycles to a project actually interested in performance computing.

    -Dean
    I was under the impression that you were offering to add MPI, which wouldn't help throughput.

    However, optimizing the program using the current algorithm wouldn't be very useful either, since the algorithm itself is still being worked out. Folding speed could be dramatically improved, though, by making improvements to the algorithm itself.

    During the Beta, there was a time where I found a trick to force a very short timeout for stuck structures. The result was a very dramatic increase in folding speed. I also got very good RMSD values.

    I assume Howard isn't doing this because the native structure is tight, and can't be created when using such a short timeout. So what's needed is a change to the algorithm that will make it possible to create the native structure while keeping the time per structure under one second (on a reasonably fast computer).

    There are several ways one might do this. One approach would be as follows. After trying to place residues a number of times, take the best placement found so far (no matter how bad) and continue building the structure. Then relax the completed structure to allow atoms to slide off of each other. There are only about 100 residues, and each is only near a few others, so it is easy to find which way each is being "pushed", and how hard. The residues can be allowed to move in a series of steps until the atom overlap issues have been resolved. It should only take a few milliseconds to do an adequate job of this. Then the structure can be scored.

  27. #27
    Member lemonsqzz's Avatar
    Join Date
    Sep 2002
    Location
    Montain View, CA
    Posts
    97
    Still on the original topic I hope... Any chance of getting a date/timestamp in the error log .. or making that a command line option?? Helps me to sort out how long a problem has existed. Most stuff here of course is a networking issue.. proxies go down

  28. #28
    After this last update, is there any chance of getting a -v switch for foldtrajlite.exe. i.e. dump the version to screen. Or maybe a version stamp in the file?

  29. #29
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Vote #1 Timestamp...very requested by OCworkbench Folders, and put the version # at the beginning of the error log when it starts up That would work I would think
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  30. #30
    Originally posted by Brian the Fist
    I happen to have about 5 years experience with parallel computing and Beowulf clusters actually. We have had a Beowulf cluster here for quite some time, and across the street at Hospital for Sick Children is a 128-CPU Origin 2000 (or something like that). Both have been running Distributed Folding happily. The client was designed intentionally such that each CPU could do an autonomous piece of work, with no communication to other CPUs. Each CPU carries out its own separate evolutionary experiment, and the more experiments done the better the chances of finding a 'good' one.

    Care to provide a linky to that beowulf client?
    PinHead wants to play with it!

  31. #31
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Time stamps on every error and log output would be a terrific idea

  32. #32
    Member
    Join Date
    Jan 2003
    Location
    Connecticut, USA
    Posts
    96
    Originally posted by PinHead
    After this last update, is there any chance of getting a -v switch for foldtrajlite.exe. i.e. dump the version to screen. Or maybe a version stamp in the file?
    If you run the client without any options it'll print the version and a list of available options. Maybe not exactly what you want but it's something. UNIX example follows:

    # ./foldtrajlite
    Distributed Foldtraj v2003.07.25 arguments:

    -f Input Trajectory Distribution File (NO EXTENSION) [File In]
    -n Native structure filename (NO EXTENSION) [File In]
    .
    .
    .


    Shortfinal

  33. #33
    Yes, the version is indicated when you run it with no arguments. As for timestamps in the log, I think we added it for some messages but not others. It is just a matter of going through the code and doing it. However, the network errors are not in our code but the NCBI toolkit, so I don't want to go changing error messages in there. There may be an option to print the time stamp directly in the error though so I'll take a look. you DO get a time stamp everytime the program starts of course.
    Howard Feldman

  34. #34
    Member
    Join Date
    Jun 2002
    Location
    Atlanta, GA
    Posts
    35
    Howard, I just want to go off topic to say thanks for being BIG enough to overlook the moronic statements of a few around here, and being willing to turn this into a rather interesting thread... For whatever that's worth...

    I'm glad a few clowns around here don't reflect on the whole...

    Mash

  35. #35
    Member
    Join Date
    Jun 2002
    Location
    Atlanta, GA
    Posts
    35
    Oh, and Dean, impressive qualifications..... Really.....

    Mash

  36. #36
    Administrator Dyyryath's Avatar
    Join Date
    Dec 2001
    Location
    North Carolina
    Posts
    1,850
    OFFTOPIC: MashRinx! Haven't seen your around in awhile, buddy. You really need to get by the forums more often.
    "So utterly at variance is destiny with all the little plans of men." - H.G. Wells

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •