Results 1 to 40 of 56

Thread: Server down?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Quote Originally Posted by endless mike View Post
    On the PrimeGrid message board, someone mentioned that a SOB work unit had to be sent out on average of 4.7 times to get a matching doublecheck. That post is three years old, but I can't image the situation is much different now. I still think double checking is valuable.
    While I'm solidly in the "double checking is a necessity camp" (and I'm one of the people making the decisions), let me correct that "4.7" statistic. While it's true that some of our sub-projects require a lot of tasks to be sent out in order to get two matching results, it's not because the results don't match. It's because most of the tasks either don't get returned at all (or at least not by the deadline), or have some sort of error that prevents the result from completing. Bad residues are more common than I'd like, but they're not THAT common. Here's some hard data on our SoB tasks currently in the database:

    SoB:
    Completed workunits: 1089
    Average number of tasks per WU (2 matching tasks are required, and 2 are sent out initially): 3.7805 tasks per workunit (4117 tasks total)
    Number of tasks successfully returned but eventually proven to be incorrect: 61

    As you can see, about 6% of the workunits had tasks that looked like they returned a correct result, but in fact didn't. These are SoB tasks -- the same as you run here. We use LLR, but it uses the same gwnum library as you do here, so the error rates are going to be comparable. LLR has lots of internal consistency checks, so many computation errors are caught and not even returned to us. That's just the ones that slipped through all the checks and made it to the end.

    At PrimeGrid we detect the errors, so the user gets an immediate indication that's something's wrong. On projects that don't double check, the users never know there's a problem, so the error rate might be higher.

    The numbers are worse on GPU calculations. It's much harder to get GPUs to work at all, resulting in many tasks which fail immediately. On our GFN (n=22) tasks, which are GIMPS-sized numbers:

    GFN-22:
    Completed WUs: 2217
    Tasks: 17996 (about 8 tasks per WU)
    Completed but incorrect tasks: 85 (about 4%)

    Some of those tasks are CPU tasks, but the vast majority are GPU tasks.

    So there's your hard data: On the long tasks (SoB on CPU, GFN-22 on GPU), about 6% of workunits had seemingly correct results from CPUs which turned out to be wrong, and about 4% of the workunits had GPU tasks which were wrong.

    (Frankly I'm surprised that the CPU error rate is higher than the GPU error rate.)
    Last edited by AG5BPilot; 05-21-2016 at 01:51 PM.

  2. #2
    Senior Member engracio's Avatar
    Join Date
    Jun 2004
    Location
    Illinois
    Posts
    237
    A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??

  3. #3
    Quote Originally Posted by engracio View Post
    A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??
    None at all.

  4. #4
    Quote Originally Posted by AG5BPilot View Post

    SoB:
    Completed workunits: 1089
    What is a "workunit"?

  5. #5
    Quote Originally Posted by jMcCranie View Post
    What is a "workunit"?
    For the purposes of this discussion, "a candidate", .i.e., a number to be tested, is a reasonable definition.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •