PDA

View Full Version : $10,000



jlcooke
06-16-2004, 07:10 PM
http://www.certainkey.com/md5challenge/

Disclaimer: Jean-Luc is a co-founder of CertainKey.

jlcooke
06-17-2004, 12:52 AM
1.0.7 Client released.

Changelog: http://www.md5crk.com/dl/CHANGELOG.txt
Go get 'em: http://www.md5crk.com/?sec=download

Digital Parasite
06-17-2004, 08:30 AM
Any press pick up on the release yet?

jlcooke
06-17-2004, 08:44 AM
Originally posted by Digital Parasite
Any press pick up on the release yet?

Ottawa Sun Pg 58 (Money Section) Photo and all.
Ottawa Citizen Pg C3 (Business Section, fron page banner mention as well) no photo.

The write ups are as you'd expect for a newspaper, but still good to see. Next: Local TV, national newspapers, newswire services.

Digital Parasite
06-17-2004, 10:50 AM
Cool picture with all the bills in a briefcase. So did you know that you were going to get your picture taken or do you always carry that much cash on you? :D

jlcooke
06-17-2004, 11:09 AM
Originally posted by Digital Parasite
Cool picture with all the bills in a briefcase. So did you know that you were going to get your picture taken or do you always carry that much cash on you? :D

I usually carry more. :P No, we just robbed (:spank: ) a few politicains on the way to the shoot.

Mystwalker
06-17-2004, 11:50 AM
Share of prize money:

User(s) finding input to collision

This will most likely be two persons, as there have to be to disjoint inputs in order to get the collision, right?


Project Organizers $5,000

Hm, considering that next to noone else is going to set up such a challenge (well, I think there already is this other project) ans that the project organizers of this project are those who made the reward, this more or less results in $5,000 really leaving the company... ;)

2 additional questions:

- Is the website provider capable of bearing the stress that will be put on the website once this message hits certain news sites (eg ./ or heise)?

- Is the "Probability of Completion At Current Rate" correct?

24 Hrs 1 : 1,234,831
30 Days 1 : 683,901
365 Days 1 : 45,987
730 Days 1 : 14,084

AFAI get it, the 30 Days percentage should be at least 30 times better than the 24 Hrs percentage. This statistic could play a major role if someone consideres to participate in this project...

jlcooke
06-17-2004, 12:06 PM
Originally posted by Mystwalker
This will most likely be two persons, as there have to be to disjoint inputs in order to get the collision, right?


Yes very likly.



Hm, considering that next to noone else is going to set up such a challenge (well, I think there already is this other project) ans that the project organizers of this project are those who made the reward, this more or less results in $5,000 really leaving the company... ;)


Organizers (in my view) include CORE developers (very few of my orginal cores are still used), translators, proof readers, etc. I'd say I'd end up getting <$2k of that money. Maybe less depending if I hand the project over sometime in the future...doubt I will though.



- Is the website provider capable of bearing the stress that will be put on the website once this message hits certain news sites (eg ./ or heise)?


Yes. It's hosted at 1and1.com. They can take it.



- Is the "Probability of Completion At Current Rate" correct?

24 Hrs 1 : 1,234,831
30 Days 1 : 683,901
365 Days 1 : 45,987
730 Days 1 : 14,084

AFAI get it, the 30 Days percentage should be at least 30 times better than the 24 Hrs percentage. This statistic could play a major role if someone consideres to participate in this project...

The probability calculation is done like this:
N = Total DPs found so far + DPs in the next X hrs/days assuming current rate
Prob = <html> 1 /(1 - e<sup>N*(1-N) / 2<sup>129</sup></sup>) </html>

As you cacn see, the value of Prob increases much faster the closer you get to C * 2^64. So the linear assumption used in d.net and other "key search" type distriburted project doesn't apply.

Mystwalker
06-17-2004, 01:31 PM
Originally posted by jlcooke
Yes. It's hosted at 1and1.com. They can take it.

Yeah, I know. I use them, too. :)



The probability calculation is done like this:
N = Total DPs found so far + DPs in the next X hrs/days assuming current rate
Prob = <html> 1 /(1 - e<sup>N*(1-N) / 2<sup>129</sup></sup>) </html>

As you cacn see, the value of Prob increases much faster the closer you get to C * 2^64. So the linear assumption used in d.net and other "key search" type distriburted project doesn't apply.

That's what I meant with "at least". I know it's even better, so my question is:

Why is "30 Days percentage" < 30 * "24 Hrs. percentage"? (It's even smaller than 2 * "24 Hrs. percentage")
With a linear assumption, it would be equal (leaving out special events like slowdown on weekends etc.), thus the 30 Days percentage should be even better than 30 * 24 Hrs. percentage...

jlcooke
06-18-2004, 02:10 PM
N_1day = (A + R) * 2^32
N_30day = (A + 30*R) * 2^32

Prob_1day = 1 / (1 - exp(N_1day * (1-N_1day) / 2^129 ) )
Prob_30day = 1 / (1 - exp(N_30day * (1-N_30day) / 2^129 ) )

Use a spreadsheet, you'll see the curve.

GP500
06-18-2004, 06:21 PM
Nice that there is a price.


"current rater" why do you take 24 hours because in 24 ther rate goes from min. too max.

not a nice average.
I's rather see you take a 7 days as an current rate.
Too take out the daily swing and the weekly swing.

the getting out of bed :sleepy: & the weekend hangover :bonk: . :rotfl:

jlcooke
06-19-2004, 11:37 AM
Originally posted by GP500
Nice that there is a price.
I's rather see you take a 7 days as an current rate.


Getting the 7 day rate may be possible. I'll have to make minor changes to the DB and nightly scripts.

I also welcome input on a proposed distribution of prize money: http://www.md5crk.com/?sec=prize


JLC

jkr
06-19-2004, 02:09 PM
I agree with Mystwalker: the 30 days chances should be atleast 30
times better than the 1 day chances.

Starting at the beginning of the experiment, the probability
that we have no collision after calculating N md5 values is roughly

exp( -N^2 / ( 2T ))

where T is 2^128, the total number of possible md5 hash values.
So the probability of a collision before calculating N md5 calculation is

1 - exp( -N^2 / ( 2T ) )

So the one in n chances type number is given by

1 / ( 1 - exp( -N^2 / ( 2T ) ) )

Jean-Luc has a similar expression, but has -N(N-1) where I have -N^2.
Jean-Luc's expressions are more accurate for small N, but make little
difference when N is several million.

However, what the web-site is should display is the conditional
probability that a collision will occur in the next 1, 30, or 365 days
given that it hasn't occured so far.

We want to know the chances of a collision in the next R samples given
that we haven't had a collision after the initial A samples. We
approach the calculation like this:

The probability of no collision after A+R samples, given that there
hasn't been a collision after A samples is

exp( -(A+R)^2 / (2T) ) / exp( -A^2 / (2T) )

= exp( ( A^2 - (A+R)^2 ) / 2T )

= exp( - R ( 2A + R ) / (2T) )

So probability of collision in the next R samples, given no collision
after the first A samples is

1 - exp( - R ( 2A + R ) / (2T) )

and the 1 in n type chances of a collision are

1 in 1 / [ 1-exp( -R(2K+R) / (2T) ) ]

In the comparatively short timescales of 1 and 30 days, we have

1-exp( -R(2K+R) / 2T) = roughly R(2K+R)/2T

so the expressions for 1 and 30 days, writing R for the number of
samples per day, will be

Prob_1_day = 2T / ( R(2K+R) )
and Prob_30_day = 2T / ( 30R(2K+30R) )

These expressions give the intuitively expected result that 30 days are
atleast 30 times more likely to produce a collision than 1 day.

jlcooke
06-19-2004, 05:41 PM
let me get some data for you in a second...

N = DPs upto today
R = DPs in last 24hrs

Prob_now = 1 / (1 - exp(N(1-N) / 2^129) )
Prob_now = 1 / (1 - exp(N-N*N / 2^129) )

Prob_1day = 1 / (1 - exp((N+1R)-(N+1R)(N+1R) / 2^129) )
Prob_T-day = 1 / (1 - exp((N+TR)-(N+TR)(N+TR) / 2^129) )
= 1 / (1 - exp((N+30R - N^2 -2TRN -(TR)^2) / 2^129) )
Gamma = N+TR - N^2 -2TRN -(TR)^2
= N - N^2 -(2RN+R)T -(R^2)T^2

The Gamma term is a parabolic term in T. The ratio of 1day to 30day probability is less than the linear expectation. As time goes on, the radio will "accelerate" if you will. Even if you use the simpler Gamma = -(N+TR)^2 term, you'll see a parabolic curve.

Hope this helps.

JLC

Mystwalker
06-19-2004, 06:29 PM
Originally posted by jlcooke
I also welcome input on a proposed distribution of prize money: http://www.md5crk.com/?sec=prize

Well, first we have the usual suspects:

- Free Software Foundation
- Electronic Frontier Foundation

Then maybe:
- Wikipedia
- Mozilla
- "MyPersonalPensionFond" Oops, how did that come here?!? :jester:

How will the donation work? Does only the most voted institution get all the money or if that part of the prize spread across them according to the distribution of the votes?


Concerning the stats maths, I'll have to get me a pen and some sheets of paper and do some calculations. Already began with this and found out some things (e.g. that in the long run (starting from a couple of months), my version doesn't differ that much from Jean-Luc's when it comes to the actual likeliness). But I still have to figure out a lot more.
I'm just interested in the math that lies behind all these projects. :thumbs:

jlcooke
06-19-2004, 06:35 PM
Originally posted by Mystwalker
I'm just interested in the math that lies behind all these projects. :thumbs:

That's what get people interested ;)

ps. we're getting about 40 new users every day now. Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.

jkr
06-20-2004, 08:55 AM
The statistics this morning show:

Number DP reported so far: 5,645,904
Number DP reported in 24 Hrs: 58,655

so these give

Number of MD5 calculated so far: 5645904 x 2^32
Number of MD5 calculates in 24 hours 58655 x 2^32

Using these numbers, we can draw the following graph:

http://www.arrows.demon.co.uk/md5crack/graph1.gif

The Red Line is calculated using (nearly) the Probability of completion expression used on the Stats web page:

p_completion = 1 - exp( -n*n/(2*t) )

where
n = Number-so-far + days * number-per-day
t = 2^128

The green line is based on the conditional probability of finding a collision in the next x days given that we haven't found a collision so far, which gives an expression

conditional_p = 1 - (exp( -(N + x*D)^2 / (2T) ) / exp( -N^2 / (2T) ) )

where
N is the number of MD5 calculations so far
x is the number of days
D is the number of MD5 calculations per day
T is the total number of possible MD5 values, 2^128

Notice how the conditional probability curve starts at 0, showing that the probability of completing in the next nano-second is zero, and gradually curves upwards from linear.

----------------------------------------------------------------------------------------------------------
I append the commands I passed into GNUPLOT to draw the graph.......
----------------------------------------------------------------------------------------------------------

##
## Input file to GNUPLOT to draw curves showing probability of completion
## of the md5crk project in the next few days...
##
## John Smith, 20 June 2004
##
##

p_incomplete( n, t ) = exp( -n*n/(2*t) )

prob_complete(n,t) = 1-p_incomplete(n,t)

total_md5 = (2.0 ** 128)

##
## statistics taken from md5crk website, 20 June 2000, 10:00 GMT
##
daily_rate = 58655.0 * (2.0 ** 32)
current_total = 5645904.0 * (2.0 ** 32)

set xrange [0:60]

conditional_p_incomplete(now,extra,t) = \
p_incomplete(now+extra,t)/p_incomplete(now,t)

conditional_prob_complete(now,extra,t) = 1-conditional_p_incomplete(now,extra,t)

set xlabel "Extra Days of Calculation"
set ylabel "Probability"

set label 1 "Daily Rate: 58655 * 2^32 MD5 per day" at graph 0.1,0.8
set label 2 "Current Total: 5645904 * 2^32 MD5 per day" at graph 0.1,0.75

set label 3 "Prob complete after 0 day: raw: 8.64e-07 " at graph 0.1,0.65
set label 4 "Prob complete after 1 day: raw: 8.62e-07 conditional: 1.80e-08" at graph 0.1,0.60
set label 5 "Prob complete after 30 day: raw: 14.87e-07 conditional: 62.24e-08" at graph 0.1,0.55


plot prob_complete( current_total + x*daily_rate, total_md5 ) title "Probability complete", \
conditional_prob_complete( current_total,x*daily_rate,total_md5) title "Conditional Probability Complete"

jlcooke
06-20-2004, 11:53 AM
Ok so. You you still dont' agree that probDay30 < 30*probDay1?

Check out mathworld's equations for birthday probabilities. they use the same as mine (-n(1-n)/2^2d).

I will make you table that shows how probDay30 < 30*probDay1 UP TO A CERTAIN CRITICAL POINT and then probDay30 > 30*probDay1 takes over.

JLC

Mystwalker
06-20-2004, 01:41 PM
Originally posted by jlcooke
we're getting about 40 new users every day now.

New users or addiotional computers? The new user rate is above average, but 8 << 40... ;)


Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.

Just send a mail to Heise, one of german's biggest online news sites. I guess there will be some bandwidth demands on Monday or Tuesday... ;)

jkr
06-20-2004, 05:34 PM
I'm still worried about the expressions used on http://www.md5crk.com/stats/ for the probabilities of completion. I've tried to explain, without success, that conditional probabilities are needed. This time I'll try a reductio ad absurdum argument.

Using the expression
N = DPs upto today
R = DPs in last 24hrs

Prob_T-day = 1 / (1 - exp((N+TR)-(N+TR)(N+TR) / 2^65))

with N= 5,663,785 and R = 56,268 we can calculate the following table of
probability of completion at current rate:

1 us 1: 1150102.3
1 ms 1: 1150102.3
1 second 1: 1150102.0
1 minute 1: 1150086.4
1 hour 1: 1149150.7
1 day 1: 1127586.5
30 day 1: 682589.8
365 day 1: 53740.0
730 day 1: 16888.7
10 years 1: 828.8
100 years 1: 9.2
200 years 1: 2.7

The values for 1, 30, 365 and 730 days agree with the current figures on
http://www.md5crk.com/stats/. (In a few days time they will differ but, trust me, today they match.)

My point is that it goes against all intuition to suggest that the chances of success in the next microsecond are roughly the same as the chances of success in the next day. One would reasonably expect the chances of success in the next day to be about 85 billion times greater than the chances of success in the next microsecond.

One can go further and calculate the probability of completion yesterday, that is after -1 more days.
the result is 1:1173299. This is clearly wrong. The probability that we completed yesterday is zero.
We know we didn't complete! (OK, this paragraph is a bit suspect, but it makes one worry)

I have checked mathworld's equations for birthday probabilities. They give probabilities that a room with n people in it will have two with the same birthday. They don't give the probability that, starting with a room of N people with different birthdays, then adding another R people will cause a clash of birthdays.

Am I making any sense?

John

jlcooke
06-20-2004, 08:51 PM
Yes I see your points:
1) prob @ -1 days doesn't make sense since we didn't complete
2) 1us from now is about the same as 1 day from now - doesn't make sense.

The issue with #1 is classic. The statement of probability is: "The probability of a collision with N values".

As you put it. "The probability of two people in the room having the same birthday given N people." We calculate this value for N (today) N+R (tomorrow) N+30R (30 days from now), N-R (yesterday) and N+R/(1000*1000*60*60*24) (1us from now)

Agreed that prob @ N-R is academic, and likly so is prob@1us - but this doesn't invalidate the statement. An electron in your CPU has a very real (though remote) possibility to exist in my brain any any point time. Weither it ever happens or not is not important.

#2 - Yes it's not intuative that 1us from now be about the same a 1day from now. But try to change the N value to something 1000 times larger. See what happens to the ratio of Prob@1us and Prob@1day. Is that radio greater or lesser then N of today? What does this tell us? That the "probability of collision" starts increasing much faster the closer we get to the magic 1.17 x 2^64 value.

"Lies, damned lies and statistics." - Edmond Disraili.

Einstein really had a problem with using stats in Quantum Physics because he didn't like how everything was left up to games chance and complex probability equations (MUCH more complex than collision searchs). I think you're having the same dilema.

JLC - playing the ultimate game of chance - trying to spell.

AySz88
06-21-2004, 09:13 PM
I think the problem is this (and, be warned, I don't understand all of what is being said):

The statistics on the MD5CRK web page are for "probability of completing AT OR BEFORE time from now, including the past".

You (jkr) are talking about "probability of completing BETWEEN now and time from now".

Matt
06-23-2004, 07:39 AM
Originally posted by jlcooke
Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.

I have submitted the story to slashdot so we will see what happens...

Matt
06-24-2004, 03:09 AM
My article was rejected :(

Mystwalker
06-28-2004, 07:31 PM
Originally posted by AySz88
I think the problem is this (and, be warned, I don't understand all of what is being said):

The statistics on the MD5CRK web page are for "probability of completing AT OR BEFORE time from now, including the past".

You (jkr) are talking about "probability of completing BETWEEN now and time from now".

That's one thing that circled around in head, too.
I'm certainly not sure, but maybe the probabilities on the webpage show the likeliness the collision will be found on that day?
Esp. the 24 hours probability has decreased a lot in the last two weeks, although the DPs reported in the last 24 hours is more or less the same. But of course, there are more DPs already found.
The 30 Days likeliness is a bit worse than double as good as the 24 Days likeliness. As the project is running for 3 months now and there was a small increase of productivity, the probability of 1 : 590,222 is something I expect to have around end July when goes well - which coincides with my assumption given above...