Has anyone given any thought to an FPGA siver for SoB?
For anyone who doesn't know, an FPGA is a Field Programmable Gate Array. You design a circuit in a procedural language and it is implemented via circuit-logic.
Traditionally they are no good at mathematical operations (division, exponentiation, multiplication by non-powers of 2) but I think there may be a way to implement a sieve for proth numbers relatively easily.
I've got an FPGA development kit (costs around $120 - Spartan-3 from Xilinx) and I'm willing to have a go but I've never done anything near as complicated as this with it.
A naive sieve, that would be difficult to implement in an FPGA, would be to calculate N=(k*2^n)+1 and use trial division by increasing p's, however this just wouldn't fly.
In a seperate project I'm trying to implement a reasonably fast pure C (i.e. portable to non-x86 architectures) siever and I'm using the principle of the cycle generated by computing the remainder for n=1 for a given k and p and then iterating this function.
Given r=(k*2^n)+1 mod p
The remainder for n+1 would be:-
(2*(r-1))+1 mod p
or 2r-1 mod p.
If r is ever equal to 1 then it will remain at 1 for ever more. You would not bother computing it any further. p will never factor k*(2^n)+1 whatever n you use.
If r is ever equal to 0 then you have a factor. (You can also just set r=p-1 for the next iteration).
The algorithm looks like this:-
Code:
Consider a given k and p.
Compute initial_r=k*2+1 mod p. (This is the remainder for n=1)
r = initial_r; /* Thanks for spotting this missing line Joe O */
if( r == 1 ) {
stop. p will never be a factor for any n.
}
set n=1
while( n <= upper_bound_n)
if( r == 0 ) {
factor found at n. record f1=n. stop.
}
r=r<<1; /* left shift r by one bit == multiply by 2 */
r--; /* decrement r by 1 */
if( r >= p ) {
r=r-p; /* simple subtraction is quicker than modulus */
}
n++;
if( r == initial_r ) {
stop. in a non-factoring loop of period n-1
}
}
(You don't have to stop when r=0, you can continue by setting it to p-1 and waiting for it to reach r=initial_r. When it does you know that the remainder runs in a cycle of period=n-1. This also tells you that p factors k*2^n+1 for n = f1, f1+period, f1+2*period, f1+3*period, etc...
As far as an FPGA goes, the code in the while loop is relatively easy to do. It only involves comparisons, a left shift, decrement by one (r is guaranteed to be >1 so this is simple), a 'complex' subtraction (FPGA's have ALU's to support this) and increment by one.
My FPGA has 200,000 gates, which gives 4320 logic cells equivalents.
As a wild guess, I think the design would fit into ~100 logic cells, which would mean roughly 40 concurrent tests could be performed at the same time.
The kit I have would interface to a controlling PC via Parallel port direct to the I/O pins, or via the onboard RS-232 Serial port.
The controlling computer would do the initial maths that FPGA's aren't good at:-
- pick k value
- pick p value
- compute intial_r=k*2+1 mod p
- send these to a piece of code on the board that listens on the port and stores the data in the 1M-byte of onboard memory.
The individual sievers speak to this controller code and ask for a chunk of work. The work is marked as given out and off the siever goes.
When it finishes it tells the controller code the result and it stores this in the memory. This can be retreived by the controlling computer via the serial/parallel port.
The controlling computer would simply be sending it work and grabbing back the results.
What you have to get your head around when dealing with FPGAs is that everything can happen at once, it's not a processor following instructions. For example, the 3 steps involving r (left shift, decrement and comparison) will have to executed one after each other, but at the same time it can be decrementing n.
Now imagine 40 sievers on the same board, all running at the same time. For a bit more money you can get larger and larger FPGAs.
Also the on board control code (which handles the interface between controlling computer and memory, and sievers and memory) does not take up any 'time' of the FPGA's processing, only chip 'real-estate' (i.e. gates, logic cells).
The board is only 50MHz, but with 40 sievers you're getting the same performance as a 2GHz single instance siever on the same chip. The actual level of performance is completely unknown until I implement it.
If you have a look at some of the other things that FGPA's are good at (encryption) you'll see that $1000 spent on FPGA's performs hundreds, if not thousands, of times faster than $1000 of PC.
The recent FPGA hack on the encryption used on RFIDs was a good example. http://rfidanalysis.org/. Cracking one key would have taken 2 weeks using 10 very fast PCs. A single $200 FPGA was able to do the same work in just over 10 hours. With 16 FPGAs (cost ~ $3500) they cracked 5 unique keys in 2 hours (compared to 10 weeks for 10 very fast PCs).
The board also has a few other nifty features (push button switches, 8 LEDs, 4 7 segment displays) and a VGA-port . Not sure what I'd do with that as the VGA controller would take up lots of space on the FPGA but it could be amusing. At least a basic screen using blocks of colour to indicate progress/status.
I will let you know of the progress although my time is a bit limited in the next few months due to fact I have to go watch us England get beaten by the Australians at cricket over and over again (but you can't beat a whole weekend drinking in the sun).
Anyway, if anyone has any FPGA experience or would like to comment/suggest/advise please let me know.