Results 1 to 40 of 56

Thread: Server down?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Quote Originally Posted by endless mike View Post
    A false negative would mean a missed prime. Potentially wasted years of computing would count as harm in my book. That's the main reason I gave up on SOB and went back to GIMPS.
    Yes, but the purpose of SOB is to try to prove the Sierpenski conjecture. Large primes are very rare and a false positive is extremely rare. Double-checking in SOB cuts the throughput down by half. In other words, double-checking essentially doubles the expected computing that has to be done to prove the conjecture.

  2. #2
    Quote Originally Posted by jMcCranie View Post
    Yes, but the purpose of SOB is to try to prove the Sierpenski conjecture. Large primes are very rare and a false positive is extremely rare. Double-checking in SOB cuts the throughput down by half. In other words, double-checking essentially doubles the expected computing that has to be done to prove the conjecture.
    Consider the highly unlikely possibility that there's only one prime for a given k. A false negative with no double checking means we crunch that k forever and never prove the conjecture. Unlikely to happen that way, but not impossible. People more in the know claim an error rate of about 4% (IIRC) on GIMPS. On the PrimeGrid message board, someone mentioned that a SOB work unit had to be sent out on average of 4.7 times to get a matching doublecheck. That post is three years old, but I can't image the situation is much different now. I still think double checking is valuable.

  3. #3
    Quote Originally Posted by endless mike View Post
    On the PrimeGrid message board, someone mentioned that a SOB work unit had to be sent out on average of 4.7 times to get a matching doublecheck. That post is three years old, but I can't image the situation is much different now. I still think double checking is valuable.
    While I'm solidly in the "double checking is a necessity camp" (and I'm one of the people making the decisions), let me correct that "4.7" statistic. While it's true that some of our sub-projects require a lot of tasks to be sent out in order to get two matching results, it's not because the results don't match. It's because most of the tasks either don't get returned at all (or at least not by the deadline), or have some sort of error that prevents the result from completing. Bad residues are more common than I'd like, but they're not THAT common. Here's some hard data on our SoB tasks currently in the database:

    SoB:
    Completed workunits: 1089
    Average number of tasks per WU (2 matching tasks are required, and 2 are sent out initially): 3.7805 tasks per workunit (4117 tasks total)
    Number of tasks successfully returned but eventually proven to be incorrect: 61

    As you can see, about 6% of the workunits had tasks that looked like they returned a correct result, but in fact didn't. These are SoB tasks -- the same as you run here. We use LLR, but it uses the same gwnum library as you do here, so the error rates are going to be comparable. LLR has lots of internal consistency checks, so many computation errors are caught and not even returned to us. That's just the ones that slipped through all the checks and made it to the end.

    At PrimeGrid we detect the errors, so the user gets an immediate indication that's something's wrong. On projects that don't double check, the users never know there's a problem, so the error rate might be higher.

    The numbers are worse on GPU calculations. It's much harder to get GPUs to work at all, resulting in many tasks which fail immediately. On our GFN (n=22) tasks, which are GIMPS-sized numbers:

    GFN-22:
    Completed WUs: 2217
    Tasks: 17996 (about 8 tasks per WU)
    Completed but incorrect tasks: 85 (about 4%)

    Some of those tasks are CPU tasks, but the vast majority are GPU tasks.

    So there's your hard data: On the long tasks (SoB on CPU, GFN-22 on GPU), about 6% of workunits had seemingly correct results from CPUs which turned out to be wrong, and about 4% of the workunits had GPU tasks which were wrong.

    (Frankly I'm surprised that the CPU error rate is higher than the GPU error rate.)
    Last edited by AG5BPilot; 05-21-2016 at 01:51 PM.

  4. #4
    Senior Member engracio's Avatar
    Join Date
    Jun 2004
    Location
    Illinois
    Posts
    237
    A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??

  5. #5
    Quote Originally Posted by engracio View Post
    A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??
    None at all.

  6. #6
    Quote Originally Posted by AG5BPilot View Post

    SoB:
    Completed workunits: 1089
    What is a "workunit"?

  7. #7
    Quote Originally Posted by jMcCranie View Post
    What is a "workunit"?
    For the purposes of this discussion, "a candidate", .i.e., a number to be tested, is a reasonable definition.

  8. #8
    Quote Originally Posted by endless mike View Post
    Consider the highly unlikely possibility that there's only one prime for a given k. A false negative with no double checking means we crunch that k forever and never prove the conjecture. Unlikely to happen that way, but not impossible...
    OK, so how about: no double checking to try to resolve the conjecture nearly twice as quickly, but if and when it gets down to only one k with unknown status, run a double check on those.

    ----Added----

    I'll make an analogy. Suppose that there are a large number of boxes. A small number of boxes contain a diamond and you want to find diamonds. The first time you look in a specific box, if it contains a diamond, there is a 5% chance that you will not see it.

    Should you (1) spend half of your time double-checking boxes you have already opened, or (2) open as many boxes as you can? I would open as many boxes as I can.
    Last edited by jMcCranie; 05-22-2016 at 08:37 PM.

  9. #9
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    Quote Originally Posted by jMcCranie View Post
    OK, so how about: no double checking to try to resolve the conjecture nearly twice as quickly, but if and when it gets down to only one k with unknown status, run a double check on those.

    ----Added----

    I'll make an analogy. Suppose that there are a large number of boxes. A small number of boxes contain a diamond and you want to find diamonds. The first time you look in a specific box, if it contains a diamond, there is a 5% chance that you will not see it.

    Should you (1) spend half of your time double-checking boxes you have already opened, or (2) open as many boxes as you can? I would open as many boxes as I can.
    It is important to note that the boxes are numbered, and
    1) The lower numbered boxes are more likely to contain a diamond than the higher numbered boxes.
    2) The higher numbered boxes are harder to open than the lower numbered boxes.
    Joe O

  10. #10
    Quote Originally Posted by Joe O View Post
    It is important to note that the boxes are numbered, and
    1) The lower numbered boxes are more likely to contain a diamond than the higher numbered boxes.
    2) The higher numbered boxes are harder to open than the lower numbered boxes.
    Taking that a bit further, the difficulty of opening the boxes is proportional to the square of the box number, and the overall chance of finding a diamond (taking into account how hard it is to open the box as well as the likelihood of a given box containing a diamond) is inversely proportional approximately to the cube of the box number times the logarithm of the box number. Diamonds in higher numbered boxes are much harder to find. You really don't want to miss the easy ones, ever.

    The allure of progressing twice as fast is obvious, but the penalty for missing a prime is tremendous.

  11. #11
    Senior Member tim's Avatar
    Join Date
    Jan 2003
    Location
    WA/ND/CA
    Posts
    177
    Mike, please check your email for my results.txt files.

  12. #12
    Greetings,

    about the double check discussion.

    I like to remind you guys that at least one of the primes was found via secondpass - that means a prime was missed with the firstpass tests, aka we already had a false negative - right in the SoB project. (the one at ~3M)


    Chris

  13. #13
    Is there any word on the SeventeenOrBust project?

  14. #14
    Quote Originally Posted by jMcCranie View Post
    Is there any word on the SeventeenOrBust project?
    Nothing new to report.

    Rest assured that someone (probably me, unless Louie jumps in) will let you know any information as soon as we know anything.

    If I were Louie, I wouldn't give up until every last possibility was tried. And that may take a while.

  15. #15
    Senior Member
    Join Date
    Dec 2002
    Location
    australia
    Posts
    118
    Just emailed a results.txt - close to 5 years worth

    If I can find the others I will send them too

  16. #16
    I also believe double checks are worth it. Basically, if we have a 5% error rate, then it's even faster to find a prime if we can complete a double check in 1/20th the time of an initial check.

    A recent change at GIMPS is to send out a double check assignment to everyone when they first join and once every year. This helps the project find bad computers quickly so their work can be double checked immediately. Sending out a double check at the beginning is also good in that new users get to finish something sooner.

  17. #17
    Quote Originally Posted by AG5BPilot View Post
    Nothing new to report.

    Rest assured that someone (probably me, unless Louie jumps in) will let you know any information as soon as we know anything.
    I did say I'd let you know as soon as I heard anything, so here it is. It's not good news, unfortunately.

    There's no hope of recovering the data or software from the SoB server. It's gone. SoB is not coming back.

    Louie has asked us to take over the entire SoB search. We intend to do so, but I can't tell you exactly what that means. For now, we are crunching all 6 Ks in the 31M < n < 32M range, and we'll continue with that until we decide how to move forward.

    You are all, of course, welcome to come on over to PrimeGrid and help with SoB.

    I'd like, at this point, to sincerely thank everyone who sent us their log files. Of all the information we've been able to gather from different sources, I suspect that your log files may end up being the most useful. They contain the most recent information, regarding the largest tasks. Those are most useful.

  18. #18
    Thanks for the update, bad news is better than not knowing. I've been peeking at this forum and PrimeGrid's forum every few days to see if there's been any updates.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •