PDA

View Full Version : sieve stats are going crazy



Moo_the_cow
06-17-2003, 01:08 PM
I just looked at Mike H's sieve stats, and things don't look good
:(

First of all, my score dropped by 1500 points :cry:
Second, we have 28,000 MORE candidates to test :cry:
Third, biwema somehow makes up 66% of the
sieving effort, while 2 days ago he made up only 3-4% :shocked:

It appears that P-1 factoring is causing this.
I'm obsessed with stats, so unless this is fixed, what is
the idea of P-1 factoring ? :bs:

Important note: I am not blaming Mike H for this. He has done a
great job of constructing and maintaining the sieve stats, but unfortunately, the P-1 factors make almost all methods of
calculating scores ineffective :(

Moo_the_cow
06-17-2003, 01:45 PM
I investigated this problem a bit and it turns out that a
change in the result.txt file was also a reason for this problem;
not only P-1 factoring. My bad :blush:

Nuri
06-17-2003, 01:53 PM
Originally posted by Moo_the_cow
First of all, my score dropped by 1500 points :cry:
Second, we have 28,000 MORE candidates to test :cry:
Third, biwema somehow makes up 66% of the
sieving effort, while 2 days ago he made up only 3-4% :shocked:


The reason for the first two is that Louie changed the lowerbound for results.txt from 1T to 3T. As far as I can see, Mike's program takes the data on results.txt, and incorporates the changes in it, while assuming the data below that as unchanged. Since the lowerbound is changed to 3T, todays stats do not reflect the factors for 1T<p<3T. I'm sure Mike will fix this soon.

As far as the third point is concerned, you're right. It's because of P-1 factors. I'm sure that will also be fixed as soon as we decide how to score P-1 factors.

EDIT: Moo, you found the reason faster than I wrote. ;)

garo
06-17-2003, 03:23 PM
Woo hoo! I'm number 2 in the stats!!!:|party|:

Ah I better enjoy it till the stats are fixed.

biwema
06-17-2003, 08:30 PM
Hi,

It was easy to predict that the new distribution of the p-1 factors will mess up the statistics quite a bit. One factor could score up to 18 million (64 bits).
It was really funny to see me on the top of the statistics, but believe me, it is quite embarrassing and I feel quite exposed. I hope that this will be fixed soon because this scoring is not fair at all. Sorry if I made some trouble with that.

What can be done?
The best thing is to separate the factors found by sieving and p-1.
The scoring of the factors found by sieving is quite good, and we can leave it as it is. The factors found by p-1 should be scored proportional to the effort they save, when we don’t need to do PRPing. Due to the fft-sizes, this saved effort will be more or less proportional to the computing effort of the p-1 test.

How can we separate these factors?
If the range is not too high, we can easily recognize the factors which are found by sieving if a whole range is submitted by one user.
Factors found by p-1 somewhere beyond that and therefore probably not in a reserved range.

Now it is possible, that some users find big factors and want that they are scored as sieved factors. To prevent that people just reserve that range around this factor, we can do the following test:

If a big factor is found and there is a small range reserved around it:

* Did this user find 2 or more factors in that range (but the expected number of factors should not be significantly higher)?
If so, it is not very probable that p-1 finds 2 factors which are close to each other, and these factors might have been found by sieving.

* If there is one factor in the range: Is the expected number of factors in that range more then 0.2?
If not, it is very unlikely that this user just picked that range and found that factor by sieving. Hence, this factor is probably a p-1 factor.
If the expected number of factors if more than 0.2, and the factor big, that range must be 100G or more what would be quite suspicious for users who did not complete so much factoring before (check smoothness here). If the factor is not so big, the range is smaller, but in that case this user won’t get a height score with one factor. So he need to do that with several factors what would look suspicious (also check smoothness of factor).

* Check if p-1 is smooth.
The bigger the factor get the smaller is the probability that a factor can be found with a b2 <100 million. For example, all five factors beyond 1P, which were found by sieving are not smooth (a b2 of more than 200 million would be necessary)
If not,

Using these 3 tests, it might be easy to separate all the factors. I guess these tests would not be necessary too often if people didn’t try to cheat.

Scoring of p-1 factors:
The goal is to set the weight of a successful p-1 to such a value, that a computer scores as much with sieving as with p-1 factoring. I think, it will be something like score=c*exp² or similar.

I hope we can clean up the sieving stats soon, because it is one of the main motivator for many sievers.

Nevertheless, have fun

biwema

MikeH
06-18-2003, 02:48 AM
Since the lowerbound is changed to 3T, todays stats do not reflect the factors for 1T<p<3T. I'm sure Mike will fix this soon. Sorry everyone, I had a 7 hour power outage at home last night, so I wasn't able to do anything. Hopefully all will be OK tonight.

Mystwalker
06-18-2003, 06:53 AM
biwema:

Don't put too much effort into distinguishing sieving factors from P-1 factors. When I find the time for a project of mine, there will be no chance to circumvent the reservation criteria by reserving the range the P-1 factor lies in anymore. It will still take 2-3 weeks from now, though...

MikeH
06-18-2003, 04:50 PM
OK, the stats are now using the new 3T file and updated results file, but they are not yet fixed. :swear:

I am experimenting with a few ideas (from biwema, others and myself), but basically I will make p-1 factors roughly equal to a PRP test of that n, but cap the n at a little (maybe 0.5-1M) above the top of current PRP window.

I will also change stats for sieving so that again, the max score for any factor will be the same as that for a PRP test, but I'll make sure that any scores up to the point before p-1 factoring started will be maintained, so no one will lose out.

If I don't get this done in the next few days, it'll be about 10 days before I can get this done. Watch this space. P-1 factoror's enjoy your five minutes of fame.:D

Louie, I have noticed one potential issue/problem. In my Tuesday update I had a factor for garo (961094450858074349 | 67607*2^4022171). This factor, nore any other for this k/n pair can be seen in the current results.txt (which means garo goes from possition 2 to 60). Where has it gone?

jjjjL
06-18-2003, 05:03 PM
Originally posted by MikeH
Louie, I have noticed one potential issue/problem. In my Tuesday update I had a factor for garo (961094450858074349 | 67607*2^4022171). This factor, nore any other for this k/n pair can be seen in the current results.txt (which means garo goes from possition 2 to 60). Where has it gone?

I forgot a -f in the gzip line of the script so it wasn't updating the file. it's working again.

-Louie

MikeH
06-19-2003, 04:28 PM
I've made a quick fix to the stats that should keep everyone a little happier for the next two weeks, by which time I should have something a little better sorted.

Simple change is to cap all scores at 3500 points. This roughly equates to the sieve score you'd get for sieving for the period of time it would take to PRP @n=5M.

Thus some normality is restored. :banana: