Server down?

**endless mike** · 05-20-2016, 11:22 PM

Originally Posted by jMcCranie

Yes, but the purpose of SOB is to try to prove the Sierpenski conjecture. Large primes are very rare and a false positive is extremely rare. Double-checking in SOB cuts the throughput down by half. In other words, double-checking essentially doubles the expected computing that has to be done to prove the conjecture.

Consider the highly unlikely possibility that there's only one prime for a given k. A false negative with no double checking means we crunch that k forever and never prove the conjecture. Unlikely to happen that way, but not impossible. People more in the know claim an error rate of about 4% (IIRC) on GIMPS. On the PrimeGrid message board, someone mentioned that a SOB work unit had to be sent out on average of 4.7 times to get a matching doublecheck. That post is three years old, but I can't image the situation is much different now. I still think double checking is valuable.

**AG5BPilot** · 05-21-2016, 07:06 AM

Originally Posted by endless mike

On the PrimeGrid message board, someone mentioned that a SOB work unit had to be sent out on average of 4.7 times to get a matching doublecheck. That post is three years old, but I can't image the situation is much different now. I still think double checking is valuable.

While I'm solidly in the "double checking is a necessity camp" (and I'm one of the people making the decisions), let me correct that "4.7" statistic. While it's true that some of our sub-projects require a lot of tasks to be sent out in order to get two matching results, it's not because the results don't match. It's because most of the tasks either don't get returned at all (or at least not by the deadline), or have some sort of error that prevents the result from completing. Bad residues are more common than I'd like, but they're not THAT common. Here's some hard data on our SoB tasks currently in the database:

SoB:
Completed workunits: 1089
Average number of tasks per WU (2 matching tasks are required, and 2 are sent out initially): 3.7805 tasks per workunit (4117 tasks total)
Number of tasks successfully returned but eventually proven to be incorrect: 61

As you can see, about 6% of the workunits had tasks that looked like they returned a correct result, but in fact didn't. These are SoB tasks -- the same as you run here. We use LLR, but it uses the same gwnum library as you do here, so the error rates are going to be comparable. LLR has lots of internal consistency checks, so many computation errors are caught and not even returned to us. That's just the ones that slipped through all the checks and made it to the end.

At PrimeGrid we detect the errors, so the user gets an immediate indication that's something's wrong. On projects that don't double check, the users never know there's a problem, so the error rate might be higher.

The numbers are worse on GPU calculations. It's much harder to get GPUs to work at all, resulting in many tasks which fail immediately. On our GFN (n=22) tasks, which are GIMPS-sized numbers:

GFN-22:
Completed WUs: 2217
Tasks: 17996 (about 8 tasks per WU)
Completed but incorrect tasks: 85 (about 4%)

Some of those tasks are CPU tasks, but the vast majority are GPU tasks.

So there's your hard data: On the long tasks (SoB on CPU, GFN-22 on GPU), about 6% of workunits had seemingly correct results from CPUs which turned out to be wrong, and about 4% of the workunits had GPU tasks which were wrong.

(Frankly I'm surprised that the CPU error rate is higher than the GPU error rate.)

**engracio** · 05-21-2016, 04:18 PM

A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??

**AG5BPilot** · 05-21-2016, 04:37 PM

Originally Posted by engracio

A while back we ran a double check up to 30M. After most wu units were returned 22699K only had 1 wu needed to be submitted. The higher k had less than 10 wu each to be return, Unfortunately I am not sure if the results were matched with the previous results. Mike any Idea??

None at all.

**jMcCranie** · 05-21-2016, 09:33 PM

Originally Posted by AG5BPilot

SoB:
Completed workunits: 1089

What is a "workunit"?

**AG5BPilot** · 05-21-2016, 11:50 PM

Originally Posted by jMcCranie

What is a "workunit"?

For the purposes of this discussion, "a candidate", .i.e., a number to be tested, is a reasonable definition.

**jMcCranie** · 05-21-2016, 09:32 PM

Originally Posted by endless mike

Consider the highly unlikely possibility that there's only one prime for a given k. A false negative with no double checking means we crunch that k forever and never prove the conjecture. Unlikely to happen that way, but not impossible...

OK, so how about: no double checking to try to resolve the conjecture nearly twice as quickly, but if and when it gets down to only one k with unknown status, run a double check on those.

----Added----

I'll make an analogy. Suppose that there are a large number of boxes. A small number of boxes contain a diamond and you want to find diamonds. The first time you look in a specific box, if it contains a diamond, there is a 5% chance that you will not see it.

Should you (1) spend half of your time double-checking boxes you have already opened, or (2) open as many boxes as you can? I would open as many boxes as I can.

**Joe O** · 05-23-2016, 07:37 AM

Originally Posted by jMcCranie

OK, so how about: no double checking to try to resolve the conjecture nearly twice as quickly, but if and when it gets down to only one k with unknown status, run a double check on those.

----Added----

I'll make an analogy. Suppose that there are a large number of boxes. A small number of boxes contain a diamond and you want to find diamonds. The first time you look in a specific box, if it contains a diamond, there is a 5% chance that you will not see it.

Should you (1) spend half of your time double-checking boxes you have already opened, or (2) open as many boxes as you can? I would open as many boxes as I can.

It is important to note that the boxes are numbered, and
1) The lower numbered boxes are more likely to contain a diamond than the higher numbered boxes.
2) The higher numbered boxes are harder to open than the lower numbered boxes.

**AG5BPilot** · 05-23-2016, 10:14 AM

Originally Posted by Joe O

It is important to note that the boxes are numbered, and
1) The lower numbered boxes are more likely to contain a diamond than the higher numbered boxes.
2) The higher numbered boxes are harder to open than the lower numbered boxes.

Taking that a bit further, the difficulty of opening the boxes is proportional to the square of the box number, and the overall chance of finding a diamond (taking into account how hard it is to open the box as well as the likelihood of a given box containing a diamond) is inversely proportional approximately to the cube of the box number times the logarithm of the box number. Diamonds in higher numbered boxes are much harder to find. You really don't want to miss the easy ones, ever.

The allure of progressing twice as fast is obvious, but the penalty for missing a prime is tremendous.

**tim** · 05-29-2016, 12:34 AM

Mike, please check your email for my results.txt files.

**chris** · 05-30-2016, 02:41 PM

Greetings,

about the double check discussion.

I like to remind you guys that at least one of the primes was found via secondpass - that means a prime was missed with the firstpass tests, aka we already had a false negative - right in the SoB project. (the one at ~3M)

Chris

**engracio** · 06-01-2016, 10:42 AM

All true. Just saying. Hate to go 10 miles down the road and find out we missed the turn.

Originally Posted by chris

Greetings,

about the double check discussion.

I like to remind you guys that at least one of the primes was found via secondpass - that means a prime was missed with the firstpass tests, aka we already had a false negative - right in the SoB project. (the one at ~3M)

Chris

**jMcCranie** · 06-01-2016, 07:17 PM

Is there any word on the SeventeenOrBust project?

**AG5BPilot** · 06-01-2016, 08:53 PM

Originally Posted by jMcCranie

Is there any word on the SeventeenOrBust project?

Nothing new to report.

Rest assured that someone (probably me, unless Louie jumps in) will let you know any information as soon as we know anything.

If I were Louie, I wouldn't give up until every last possibility was tried. And that may take a while.

**tqft** · 06-07-2016, 04:48 PM

Just emailed a results.txt - close to 5 years worth

If I can find the others I will send them too

**shifted** · 06-08-2016, 10:18 AM

I also believe double checks are worth it. Basically, if we have a 5% error rate, then it's even faster to find a prime if we can complete a double check in 1/20th the time of an initial check.

A recent change at GIMPS is to send out a double check assignment to everyone when they first join and once every year. This helps the project find bad computers quickly so their work can be double checked immediately. Sending out a double check at the beginning is also good in that new users get to finish something sooner.

**AG5BPilot** · 07-26-2016, 09:54 AM

Originally Posted by AG5BPilot

Nothing new to report.

Rest assured that someone (probably me, unless Louie jumps in) will let you know any information as soon as we know anything.

I did say I'd let you know as soon as I heard anything, so here it is. It's not good news, unfortunately.

There's no hope of recovering the data or software from the SoB server. It's gone. SoB is not coming back.

Louie has asked us to take over the entire SoB search. We intend to do so, but I can't tell you exactly what that means. For now, we are crunching all 6 Ks in the 31M < n < 32M range, and we'll continue with that until we decide how to move forward.

You are all, of course, welcome to come on over to PrimeGrid and help with SoB.

I'd like, at this point, to sincerely thank everyone who sent us their log files. Of all the information we've been able to gather from different sources, I suspect that your log files may end up being the most useful. They contain the most recent information, regarding the largest tasks. Those are most useful.

**endless mike** · 07-26-2016, 11:46 PM

Thanks for the update, bad news is better than not knowing. I've been peeking at this forum and PrimeGrid's forum every few days to see if there's been any updates.

**AG5BPilot** · 11-06-2016, 08:43 AM

Want some good news?

There's now 5 Ks remaining in the Sierpinski Problem.

Thread: Server down?

Thread Tools

Rate This Thread

Display

Hybrid View

Posting Permissions