Works good on my systems, too (Win2k).
Got a nice speed increase on my Duron system: ~45kp/sec (NbeGon 010) -> ~80kp/sec (SoBSieve 1.12)
Speed of my Intel system stayed more or less the same, though...
I have no problems with the new version.
Tested on Intel P III 700 with win 2K
and AMD XP2000+ on Win XP pro.
Lars
www.Rechenkraft.net - most comprehensive german website about distributed computing projects
Works good on my systems, too (Win2k).
Got a nice speed increase on my Duron system: ~45kp/sec (NbeGon 010) -> ~80kp/sec (SoBSieve 1.12)
Speed of my Intel system stayed more or less the same, though...
Yea! SoBSieve is much faster now... even faster than (the albeit more portable) NBeGon:
Athlon 1.33GHz WinXP
----------------------------------------------------
88k p/sec with NBeGon v0.10 (d=3.5)
117k p/sec with SoBSieve v1.22 (a=2.75)
33% speedup is always nice
I'd recommend people sieving in windows use SoBSieve again. Ideally, windows machines should go back to SB when sieving loses most of it's steam.... however, with all the speedups, it's hard to say where that will be exactly.
I remember initially setting a goal of 1T before n > 3000000 start getting assigned. But hey, that was in the ancient history of... 8 days ago ... back when NBeGon didn't exist, SoBSieve was v1.06, and the best rates being reported were around 9000 p/sec. Fast forward a week to the present, and we're up to around 100k p/sec on the same kinds of computers! That changes the optimal sieve limits quite a bit. Unfortunatly, SB hasn't managed to go 11x faster in the last week like the sieve. Long story short: sieving higher than we planned makes sense now.
I think the new goal should be 5T. 10T would be closer to optimal but may be unrealistic to do in the next month. We'll see. If I put The Man to work on it, 10T may be realistic after all. What do you guys think?
-Louie
Phils page has some info on optimal sieve depth. I'd calculate it myself, but I don't know how long a PRP test takes.
Yeah!!!!!!!!
SoBSieve v.1.22 alpha 2.75 ----- 60,000p/s
(PIII 800 MHZ)
ahhhh.... what a contrast to the prehistoric days
when everyone was sieving 10X slower.
I've been trying to estimate the optimal sieve depth for about a week now. I have a hunch it's in the hundreds of T.
I generally assume that a PRP test takes 1 day, that the project will go to 20M, and that an average machine is 1.5 Ghz, aka Athlon 1.2, what I have
I can sieve about 6G per day. As you can see from Nuri's graph, the rate of removal halves with each doubling of p. If you extend that graph, you get 0.12 factors per G (one per day) somewhere around 200T.
Here's another view, based on Phil's statement that the number of candidates remaining is proportional to natural log p. As of 25G we had about 750,000 candidates. At 200T we will be down to 550,000.
Sieving to 200T is a month's work, systemwide, and will remove 6-7 months' PRP work. Sieving to 400T is another month's work, and will remove an additional 12k candidates, or only a half-month's work.
So there you have it Of course, PRP'ing will slow down, so we might go to 500T or a quadrillion or two, but at that point you're talking about very long timeframes and the ability to predict breaks down.
yeah, you're right on the high sieve limit being optimal thing. there are several things your model most likely doesn't take into account that may cause you to overestimate the amount of sieving that should be done... such as repeated factoring of the same number, finding a prime for a given k, etc but most of that can't be accurately predicted anyway. the rule of thumb for those of you who aren't as into the math should probably be:
if you want to sieve it, you probably should
in other words, I don't think that it is likely that we will ever reach the optimal sieve level no matter how many people sieve.
however, it is likely that we will get arbitririly close to the optimal level. in other words, the levels of sub-optimal-ness of our sieve limit will not be too high so long as at least a decent threshold like p = 1 - 5T is achieved. if you were to picture it as a graph, it would look something like:
for those of you who are into the more intiricate, mathy details, some of my comments on the matter can be found here:
http://www.free-dc.org/forum/showthr...&threadid=2424
-Louie
Hi guys,
First of all ,the software will not work with the K6 - sorry That is because it requires the CMOV instructions, which are available on practically every other processor in use today. I should put a check in for that.
As for the message WARNING: 64 bit integers are badly aligned as qwords!, this is only a warning and while the software will run fine, it will just not be as fast as possible. This is something that I intend to fix today, if I can - it is probably just a few instructions to the compiler.
Regards,
Paul.
Yeah, NbeGon stopped development (or I stopped development on it), 30 seconds before I handed my source code and description to Paul. I realised that the only thing that I could do further was optimise, and as Paul said from the outset this shouldn't be an optimisation competition. I'd spent the previous week working out _algorithmics_, proof of principle, not optimising. As a proof of principle I don't think it was too bad.Originally posted by jjjjL
Yea! SoBSieve is much faster now... even faster than (the albeit more portable) NBeGon:
Paul's software has since then had two major algorithmic boosts, one of which I had already put in NbeGon, the other of which isn't, it was merely <taps head> up here. So in theory I _could_ put together one final NbeGon, which would be at the same "tech" as Pauls, but not as highly optimised, as mine would be all C.
To be honest, I've not looked at the problem, save exchanging mails with Paul, for several days, I've had several other things on my plate. (Anyone want a wacky replacement to Pollard's P+1 factoring algorithm?).
Anyway, back on topic, I noticed mention of using candidate-by-candidate factoring algorithms in the thread, and it's worth trying to analyse to see what they say:
If we pluck a figure out of the air for sieving depth, say 10T, and pretend that the average machine does <100000p/s, then we conclude that the job's >1000 CPU days. What else could we do in that number of CPU days? If we were to sieve half way, then we'd have 500 days spare for factoring. Say we have 700000 candidates left (I'm in 1s.f. land here), then that's about 1 minute per candidate. You can't do anything useful in one minute. Not even 10 minutes would be useful. Once you know no factors are below 5e12, and you're time limited, you're looking probably at just running Pollard's P-1, with B1>10000, B2=B1*30, and on numbers of this size that's just prohibitive.
i.e. sieving 700000 candidates is _much_ more cost-effective than factoring.
The reason GIMPS use P-1 and ECM to factor candidates is because they _can't_ sieve. Even the old style trial-division NewPGen was probably more effective than factoring, and we're 100 times faster than that now.
Phil
Phil, I would be very very grateful if you could put together that one final version. I've found that your Windows version is very easy to put together in an automated p range distributor, and factor collection.Phil said
Paul's software has since then had two major algorithmic boosts, one of which I had already put in NbeGon, the other of which isn't, it was merely <taps head> up here. So in theory I _could_ put together one final NbeGon, which would be at the same "tech" as Pauls, but not as highly optimised, as mine would be all C.
10% faster now would mean 10% faster from here on in.
Paul's client is great on a 'regular' desktop PC, but not on unattended PCs.
Please, please, please. And how about the Linux guys - do it if only for them. Please, please please.
Is there a reason why nbegon 1.0 for windows would fail to write a batch file upon exiting? I'm playing around with it and the sob.bat file isn't appearing like it used to.
Hi Louie,
Would it be possible to package a new Sob.dat - esp since ceselb has returned 25-50G - with 1.22 and put that on the sieve page? I'm sure the number of tests has gone below 746k now.
Thanks,
Odd. I decreased the progress reporting frequency by a factor of 8, and in theory didn't touch the batch file frequency (10 mins IIRC). It doesn't write a file when exiting, only when a 10 minute period expires, so if you do a really short run you won't get a batch file.Originally posted by dudlio
Is there a reason why nbegon 1.0 for windows would fail to write a batch file upon exiting? I'm playing around with it and the sob.bat file isn't appearing like it used to.
I'll be looking at NbeGon again this weekend, as I've just completed my new factoring algorithm. I shall bear this issue in mind as I reacquaint myself with my code.
Phil
Phil,
Could you change it to write a batch file when it exits?
thanks
PS: Any chances of a ver 010 or 011 for Sun?
OK, it works. I wasn't letting it run long enough. Next question: can Tk spawn a separate thread?
Heh. I can make a GUI that runs nbegon but the gui hangs lol.
A new sieve file is bundled with v1.22 of SoBSieve on the sieve page
http://www.seventeenorbust.com/sieve/
Reduced from 746961 -- > 674551 candidates left.
All currently submitted factored numbers have been removed, but the lower bound is technically only p < 100 billion.
Just so you know, this file shouldn't increase performance or reduce memory usage. However, it will make it so that less numbers will have multiple factors submitted so I recommend you upgrade anyway. It is possible that there may be a slight speedup since less factors will need to be verified. Enjoy
-Louie
only been sieving for a day now...running v1.22 on a dell machine with a P4 1.8 ghz and 512 ram.
Only getting about 55000 p/sec with an alpha of 2.9
that seems awful slow to me for this computer....
Slatz
The PIV isn't really that fast at this.
My 1.5 gets 41kp/sec, so your figure is quite normal.
It seems like P4's aren't doing as well on sieving as compared to other processors. My guess is that this type of work isn't as heavily SSE2 optimized as the normal SoB client, so you're numbers may seem bad compared to others. I was only getting around 55000 p/sec with a 2GHZ P4 too, so don't think it's just you.
thanks for the replys...guess it will just crunch along at whatever speed it will go
Slatz
The program doesn't exit cleanly unless it's finished. The only other way to stop it is to kill it. So detecting that requires signal handling, whichI can do under most Unices, but would know how well it translates into 'doze.Originally posted by garo
Phil,
Could you change it to write a batch file when it exits?
thanks
PS: Any chances of a ver 010 or 011 for Sun?
I stopped NbeGon at 009 on sun because the maths in the floating point unit was slower than the integer unit, unlike all other processors. 010 relied on the FPU more, and 011 relies entirely on the FPU. I don't reckon that the small speedup I'll get in 010->011 will be enough to counter the slow-down. I'll try it, obviously, but if it's slower than 009, then I won't be surprised
Phil
what would be the optimal alpha value in sobsieve 1.22 for an amd 2000+ running windows xp?
Unfortunately, a decent benchmarking option is missing.
The only chance I see is to set a value and look at the rate. But this is very inaccurate.
With NbeGon, there's a better way of finding the optimal alpha value, though...
I guess your alpha value should be a bit higher than the default 2.5 - it's 3.4 @ my Duron. Don't know if the architectural differences of the AthlonXP change this a lot...
Okay! I don't know if anyone else is using Sun so it's probably not worth your time to put in a signal handler. Let's see if other people - esp Linux folks want it.Originally posted by FatPhil
The program doesn't exit cleanly unless it's finished. The only other way to stop it is to kill it. So detecting that requires signal handling, whichI can do under most Unices, but would know how well it translates into 'doze.
Phil
Thanks for trying and seeing if the FPU on Sun can go faster. BTW, what machine do you have/are compiling on. I am running this on a dog slow Sparcstation5 110 Mhz and still getting a decent 1700p/sec. If you do not minding sending me the src code I could see if I can try and optimize this on my machine. But since there really aren't going to be very many people using that machine I'll understand if you don't want to.
As a fairly serious user of NbeGon on 'dose right now, for me a signal handler would be no gain.
The client gets killed, I lose (at most) 10 minutes work. No problem.
When I do an 011 build (this weekend) I'll gather together a source archive. I can't distribute the full source, but I can point you to the missing files - a marginally slower version of the prime generator (that mine's based on, and hence mine is not 'mine').Originally posted by garo
Okay! I don't know if anyone else is using Sun so it's probably not worth your time to put in a signal handler. Let's see if other people - esp Linux folks want it.
Thanks for trying and seeing if the FPU on Sun can go faster. BTW, what machine do you have/are compiling on. I am running this on a dog slow Sparcstation5 110 Mhz and still getting a decent 1700p/sec. If you do not minding sending me the src code I could see if I can try and optimize this on my machine. But since there really aren't going to be very many people using that machine I'll understand if you don't want to.
There's a far easier method of losing less than 10 minutes, and that's to have the save time more frequent. 1 minute fast enough for ya?
Phil
Oh yeah!! I think 1minute is cool! Don't worry about anything else. Just make it 1min
Originally posted by MikeH
As a fairly serious user of NbeGon on 'dose right now, for me a signal handler would be no gain.
The client gets killed, I lose (at most) 10 minutes work. No problem.
Actually, I found that if you take a screenshot right before you kill the program (alt-printscrn when you have focus on nbegon), you can use the last number it outputted and start it from there. Then you lose almost nothing besides the time it took you to end and restart the program.
You can just mark the p number and press the right mouse button (under W2k; with other Windows versions, it's a little different). That way, you copied the value and can paste it into the batch file.
A question for Phil. Which version of GCC did you use for the Mac OS X version of your sieve? What level of optimization was used?
--Mark
random stats:
667094 candidates 3000000 < n < 20000000
243210 factors submitted total
143218 factors submitted by me [i did do the lowest range plus i have access to super computers]
17647 unfinished tests n < 3000000
70 factors submitted for n < 3000000 [by me and one other person]
all in all, things are looking good.
-Louie
"""Originally posted by rogue
A question for Phil. Which version of GCC did you use for the Mac OS X version of your sieve? What level of optimization was used?
--Mark
$ gcc --version
[xxxxxxxx:~/Temporary Items/Prime] music% gcc --version
gcc (GCC) 3.1 20020420 (prerelease)
$ uname -a
Darwin xxxxxx.xxxxx.xxx.xxx 6.3 Darwin Kernel Version 6.3: Sat Dec 14 03:11:25 PST 2002; root:xnu/xnu-344.23.obj~4/RELEASE_PPC Power Macintosh powerpc
"""
Compilation options -O3
The problem with the Mac is that its only a 32-bit processor, and, like the P4, is relatively underpowered when using just the ordinary FPU. You've got to use the SIMD instructions to get it to fly.
A 32-bit processor is seriously hindered for this kind of work, as a 64-bit product requires 4 times the work.
Our baby-steps is unusual in that we don't double or halve, we do full modular multiplies, and with the x86 it's possible to do some of those entirely in the FPU. With the less capable FPUs (all other chips), there isn't that option. So non-x86 are partly at a disadvantage, but PPCs are doubly disadvantaged.
Sorry, that's just the way that things are.
My bet is that it really flies on a proper Power4 chip, however...
Phil
And it's also unfortunate that AltiVec can only handle 32 bit floats and not 64 bit ones. I was curious as to the version of GCC as I know (from personal experience) that GCC 3.1 optimizes much better than GCC 2.95 in OS X. I've seen 10 to 15 percent gains on stuff I've written.
I don't have access to a Power4, but if the rumors are true about Apple using IBMs 970 chip (a stripped down Power4), then one would get both 64 bit support and 2 FPUs to boot.
In case you or anyone else is curious, I get about 6400 p/sec with d=2.0 on a G4-500. Its a little higher with d<2.0, but there are many more rehashes. At the current rate, I will finish the range in about 9 days.
--Mark
Well, here is a 1.24 release. This contains the following:
- minor GUI changes to ensure that it all looks nice;
- the selection of displaying rate or status is toggled by clicking in the box, rather than through a menu item;
- when the range is finished, the data is automatically placed in the clipboard ready to be pasted into the web page;
- the priority and info display selection are saved and restored when you run again;
- there may be a slight performance gain on some processors - I see a boost for the Athlon; no gain for PIII.
Regards,
Paul.
CoolOriginally posted by paul.jobling
- the selection of displaying rate or status is toggled by clicking in the box, rather than through a menu item;
8% increase for my P4-1700. Even cooler.Originally posted by paul.jobling
- there may be a slight performance gain on some processors - I see a boost for the Athlon; no gain for PIII.
Nice job Paul, thanx.
Well, ~5% speed increase on my Duron, but 11% on my P3!I see a boost for the Athlon; no gain for PIII.
Good job, Paul!
Here are some "alpha benchmark" results of my Celeron Tualatin @ 10x112 MHz (SoBSieve 1.24):
Alpha | ~p/s
------------------
1.000 | 92000
1.500 | 96500
2.000 | 98000
2.250 | 98000
2.500 | 98000
2.750 | 97500
3.000 | 97000
3.250 | 96500
3.500 | 95500
5.000 | 90500
yay i got about 10% increase on my p3 600. great job
I have a request for one minor change for NgeGon. Can you put in an option to state how frequently you want to see output? Currently it is 10 seconds, but it would be nice to change that to minutes or even hours. Thanks.
--Mark
I was going to implement it that way. I've not had a chance to work on it yet, as when I finished my other project I fell ill with a flu which has prevented me from thinking straight. I'm psyching myself up for tackling it tomorrow.Originally posted by rogue
I have a request for one minor change for NgeGon. Can you put in an option to state how frequently you want to see output? Currently it is 10 seconds, but it would be nice to change that to minutes or even hours. Thanks.
--Mark
Phil