Results 1 to 25 of 25

Thread: $10,000

  1. #1

    $10,000

    http://www.certainkey.com/md5challenge/

    Disclaimer: Jean-Luc is a co-founder of CertainKey.
    Last edited by jlcooke; 06-16-2004 at 07:18 PM.

  2. #2

  3. #3
    Any press pick up on the release yet?

  4. #4
    Originally posted by Digital Parasite
    Any press pick up on the release yet?
    Ottawa Sun Pg 58 (Money Section) Photo and all.
    Ottawa Citizen Pg C3 (Business Section, fron page banner mention as well) no photo.

    The write ups are as you'd expect for a newspaper, but still good to see. Next: Local TV, national newspapers, newswire services.

  5. #5
    Cool picture with all the bills in a briefcase. So did you know that you were going to get your picture taken or do you always carry that much cash on you?

  6. #6
    Originally posted by Digital Parasite
    Cool picture with all the bills in a briefcase. So did you know that you were going to get your picture taken or do you always carry that much cash on you?
    I usually carry more. :P No, we just robbed ( ) a few politicains on the way to the shoot.
    Last edited by jlcooke; 06-17-2004 at 11:17 AM.

  7. #7
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Share of prize money:
    User(s) finding input to collision
    This will most likely be two persons, as there have to be to disjoint inputs in order to get the collision, right?

    Project Organizers $5,000
    Hm, considering that next to noone else is going to set up such a challenge (well, I think there already is this other project) ans that the project organizers of this project are those who made the reward, this more or less results in $5,000 really leaving the company...

    2 additional questions:

    - Is the website provider capable of bearing the stress that will be put on the website once this message hits certain news sites (eg ./ or heise)?

    - Is the "Probability of Completion At Current Rate" correct?

    24 Hrs 1 : 1,234,831
    30 Days 1 : 683,901
    365 Days 1 : 45,987
    730 Days 1 : 14,084

    AFAI get it, the 30 Days percentage should be at least 30 times better than the 24 Hrs percentage. This statistic could play a major role if someone consideres to participate in this project...

  8. #8
    Originally posted by Mystwalker
    This will most likely be two persons, as there have to be to disjoint inputs in order to get the collision, right?
    Yes very likly.

    Hm, considering that next to noone else is going to set up such a challenge (well, I think there already is this other project) ans that the project organizers of this project are those who made the reward, this more or less results in $5,000 really leaving the company...
    Organizers (in my view) include CORE developers (very few of my orginal cores are still used), translators, proof readers, etc. I'd say I'd end up getting <$2k of that money. Maybe less depending if I hand the project over sometime in the future...doubt I will though.

    - Is the website provider capable of bearing the stress that will be put on the website once this message hits certain news sites (eg ./ or heise)?
    Yes. It's hosted at 1and1.com. They can take it.

    - Is the "Probability of Completion At Current Rate" correct?

    24 Hrs 1 : 1,234,831
    30 Days 1 : 683,901
    365 Days 1 : 45,987
    730 Days 1 : 14,084

    AFAI get it, the 30 Days percentage should be at least 30 times better than the 24 Hrs percentage. This statistic could play a major role if someone consideres to participate in this project...
    The probability calculation is done like this:
    N = Total DPs found so far + DPs in the next X hrs/days assuming current rate
    Prob = <html> 1 /(1 - e<sup>N*(1-N) / 2<sup>129</sup></sup>) </html>

    As you cacn see, the value of Prob increases much faster the closer you get to C * 2^64. So the linear assumption used in d.net and other "key search" type distriburted project doesn't apply.

  9. #9
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by jlcooke
    Yes. It's hosted at 1and1.com. They can take it.
    Yeah, I know. I use them, too.


    The probability calculation is done like this:
    N = Total DPs found so far + DPs in the next X hrs/days assuming current rate
    Prob = <html> 1 /(1 - e<sup>N*(1-N) / 2<sup>129</sup></sup>) </html>

    As you cacn see, the value of Prob increases much faster the closer you get to C * 2^64. So the linear assumption used in d.net and other "key search" type distriburted project doesn't apply.
    That's what I meant with "at least". I know it's even better, so my question is:

    Why is "30 Days percentage" < 30 * "24 Hrs. percentage"? (It's even smaller than 2 * "24 Hrs. percentage")
    With a linear assumption, it would be equal (leaving out special events like slowdown on weekends etc.), thus the 30 Days percentage should be even better than 30 * 24 Hrs. percentage...
    Last edited by Mystwalker; 06-17-2004 at 06:15 PM.

  10. #10
    N_1day = (A + R) * 2^32
    N_30day = (A + 30*R) * 2^32

    Prob_1day = 1 / (1 - exp(N_1day * (1-N_1day) / 2^129 ) )
    Prob_30day = 1 / (1 - exp(N_30day * (1-N_30day) / 2^129 ) )

    Use a spreadsheet, you'll see the curve.

  11. #11
    Nice that there is a price.


    "current rater" why do you take 24 hours because in 24 ther rate goes from min. too max.

    not a nice average.
    I's rather see you take a 7 days as an current rate.
    Too take out the daily swing and the weekly swing.

    the getting out of bed :sleepy: & the weekend hangover .

  12. #12
    Originally posted by GP500
    Nice that there is a price.
    I's rather see you take a 7 days as an current rate.
    Getting the 7 day rate may be possible. I'll have to make minor changes to the DB and nightly scripts.

    I also welcome input on a proposed distribution of prize money: http://www.md5crk.com/?sec=prize


    JLC

  13. #13
    I agree with Mystwalker: the 30 days chances should be atleast 30
    times better than the 1 day chances.

    Starting at the beginning of the experiment, the probability
    that we have no collision after calculating N md5 values is roughly

    exp( -N^2 / ( 2T ))

    where T is 2^128, the total number of possible md5 hash values.
    So the probability of a collision before calculating N md5 calculation is

    1 - exp( -N^2 / ( 2T ) )

    So the one in n chances type number is given by

    1 / ( 1 - exp( -N^2 / ( 2T ) ) )

    Jean-Luc has a similar expression, but has -N(N-1) where I have -N^2.
    Jean-Luc's expressions are more accurate for small N, but make little
    difference when N is several million.

    However, what the web-site is should display is the conditional
    probability that a collision will occur in the next 1, 30, or 365 days
    given that it hasn't occured so far.

    We want to know the chances of a collision in the next R samples given
    that we haven't had a collision after the initial A samples. We
    approach the calculation like this:

    The probability of no collision after A+R samples, given that there
    hasn't been a collision after A samples is

    exp( -(A+R)^2 / (2T) ) / exp( -A^2 / (2T) )

    = exp( ( A^2 - (A+R)^2 ) / 2T )

    = exp( - R ( 2A + R ) / (2T) )

    So probability of collision in the next R samples, given no collision
    after the first A samples is

    1 - exp( - R ( 2A + R ) / (2T) )

    and the 1 in n type chances of a collision are

    1 in 1 / [ 1-exp( -R(2K+R) / (2T) ) ]

    In the comparatively short timescales of 1 and 30 days, we have

    1-exp( -R(2K+R) / 2T) = roughly R(2K+R)/2T

    so the expressions for 1 and 30 days, writing R for the number of
    samples per day, will be

    Prob_1_day = 2T / ( R(2K+R) )
    and Prob_30_day = 2T / ( 30R(2K+30R) )

    These expressions give the intuitively expected result that 30 days are
    atleast 30 times more likely to produce a collision than 1 day.

  14. #14
    let me get some data for you in a second...

    N = DPs upto today
    R = DPs in last 24hrs

    Prob_now = 1 / (1 - exp(N(1-N) / 2^129) )
    Prob_now = 1 / (1 - exp(N-N*N / 2^129) )

    Prob_1day = 1 / (1 - exp((N+1R)-(N+1R)(N+1R) / 2^129) )
    Prob_T-day = 1 / (1 - exp((N+TR)-(N+TR)(N+TR) / 2^129) )
    = 1 / (1 - exp((N+30R - N^2 -2TRN -(TR)^2) / 2^129) )
    Gamma = N+TR - N^2 -2TRN -(TR)^2
    = N - N^2 -(2RN+R)T -(R^2)T^2

    The Gamma term is a parabolic term in T. The ratio of 1day to 30day probability is less than the linear expectation. As time goes on, the radio will "accelerate" if you will. Even if you use the simpler Gamma = -(N+TR)^2 term, you'll see a parabolic curve.

    Hope this helps.

    JLC
    Last edited by jlcooke; 06-19-2004 at 06:01 PM.

  15. #15
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by jlcooke
    I also welcome input on a proposed distribution of prize money: http://www.md5crk.com/?sec=prize
    Well, first we have the usual suspects:

    - Free Software Foundation
    - Electronic Frontier Foundation

    Then maybe:
    - Wikipedia
    - Mozilla
    - "MyPersonalPensionFond" Oops, how did that come here?!?

    How will the donation work? Does only the most voted institution get all the money or if that part of the prize spread across them according to the distribution of the votes?


    Concerning the stats maths, I'll have to get me a pen and some sheets of paper and do some calculations. Already began with this and found out some things (e.g. that in the long run (starting from a couple of months), my version doesn't differ that much from Jean-Luc's when it comes to the actual likeliness). But I still have to figure out a lot more.
    I'm just interested in the math that lies behind all these projects.

  16. #16
    Originally posted by Mystwalker
    I'm just interested in the math that lies behind all these projects.
    That's what get people interested

    ps. we're getting about 40 new users every day now. Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.

  17. #17
    The statistics this morning show:

    Number DP reported so far: 5,645,904
    Number DP reported in 24 Hrs: 58,655

    so these give

    Number of MD5 calculated so far: 5645904 x 2^32
    Number of MD5 calculates in 24 hours 58655 x 2^32

    Using these numbers, we can draw the following graph:



    The Red Line is calculated using (nearly) the Probability of completion expression used on the Stats web page:

    p_completion = 1 - exp( -n*n/(2*t) )

    where
    n = Number-so-far + days * number-per-day
    t = 2^128

    The green line is based on the conditional probability of finding a collision in the next x days given that we haven't found a collision so far, which gives an expression

    conditional_p = 1 - (exp( -(N + x*D)^2 / (2T) ) / exp( -N^2 / (2T) ) )

    where
    N is the number of MD5 calculations so far
    x is the number of days
    D is the number of MD5 calculations per day
    T is the total number of possible MD5 values, 2^128

    Notice how the conditional probability curve starts at 0, showing that the probability of completing in the next nano-second is zero, and gradually curves upwards from linear.

    ----------------------------------------------------------------------------------------------------------
    I append the commands I passed into GNUPLOT to draw the graph.......
    ----------------------------------------------------------------------------------------------------------

    ##
    ## Input file to GNUPLOT to draw curves showing probability of completion
    ## of the md5crk project in the next few days...
    ##
    ## John Smith, 20 June 2004
    ##
    ##

    p_incomplete( n, t ) = exp( -n*n/(2*t) )

    prob_complete(n,t) = 1-p_incomplete(n,t)

    total_md5 = (2.0 ** 128)

    ##
    ## statistics taken from md5crk website, 20 June 2000, 10:00 GMT
    ##
    daily_rate = 58655.0 * (2.0 ** 32)
    current_total = 5645904.0 * (2.0 ** 32)

    set xrange [0:60]

    conditional_p_incomplete(now,extra,t) = \
    p_incomplete(now+extra,t)/p_incomplete(now,t)

    conditional_prob_complete(now,extra,t) = 1-conditional_p_incomplete(now,extra,t)

    set xlabel "Extra Days of Calculation"
    set ylabel "Probability"

    set label 1 "Daily Rate: 58655 * 2^32 MD5 per day" at graph 0.1,0.8
    set label 2 "Current Total: 5645904 * 2^32 MD5 per day" at graph 0.1,0.75

    set label 3 "Prob complete after 0 day: raw: 8.64e-07 " at graph 0.1,0.65
    set label 4 "Prob complete after 1 day: raw: 8.62e-07 conditional: 1.80e-08" at graph 0.1,0.60
    set label 5 "Prob complete after 30 day: raw: 14.87e-07 conditional: 62.24e-08" at graph 0.1,0.55


    plot prob_complete( current_total + x*daily_rate, total_md5 ) title "Probability complete", \
    conditional_prob_complete( current_total,x*daily_rate,total_md5) title "Conditional Probability Complete"
    Attached Images Attached Images

  18. #18
    Ok so. You you still dont' agree that probDay30 < 30*probDay1?

    Check out mathworld's equations for birthday probabilities. they use the same as mine (-n(1-n)/2^2d).

    I will make you table that shows how probDay30 < 30*probDay1 UP TO A CERTAIN CRITICAL POINT and then probDay30 > 30*probDay1 takes over.

    JLC

  19. #19
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by jlcooke
    we're getting about 40 new users every day now.
    New users or addiotional computers? The new user rate is above average, but 8 << 40...

    Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.
    Just send a mail to Heise, one of german's biggest online news sites. I guess there will be some bandwidth demands on Monday or Tuesday...

  20. #20
    I'm still worried about the expressions used on http://www.md5crk.com/stats/ for the probabilities of completion. I've tried to explain, without success, that conditional probabilities are needed. This time I'll try a reductio ad absurdum argument.

    Using the expression
    N = DPs upto today
    R = DPs in last 24hrs

    Prob_T-day = 1 / (1 - exp((N+TR)-(N+TR)(N+TR) / 2^65))

    with N= 5,663,785 and R = 56,268 we can calculate the following table of
    probability of completion at current rate:

    1 us 1: 1150102.3
    1 ms 1: 1150102.3
    1 second 1: 1150102.0
    1 minute 1: 1150086.4
    1 hour 1: 1149150.7
    1 day 1: 1127586.5
    30 day 1: 682589.8
    365 day 1: 53740.0
    730 day 1: 16888.7
    10 years 1: 828.8
    100 years 1: 9.2
    200 years 1: 2.7

    The values for 1, 30, 365 and 730 days agree with the current figures on
    http://www.md5crk.com/stats/. (In a few days time they will differ but, trust me, today they match.)

    My point is that it goes against all intuition to suggest that the chances of success in the next microsecond are roughly the same as the chances of success in the next day. One would reasonably expect the chances of success in the next day to be about 85 billion times greater than the chances of success in the next microsecond.

    One can go further and calculate the probability of completion yesterday, that is after -1 more days.
    the result is 1:1173299. This is clearly wrong. The probability that we completed yesterday is zero.
    We know we didn't complete! (OK, this paragraph is a bit suspect, but it makes one worry)

    I have checked mathworld's equations for birthday probabilities. They give probabilities that a room with n people in it will have two with the same birthday. They don't give the probability that, starting with a room of N people with different birthdays, then adding another R people will cause a clash of birthdays.

    Am I making any sense?

    John

  21. #21
    Yes I see your points:
    1) prob @ -1 days doesn't make sense since we didn't complete
    2) 1us from now is about the same as 1 day from now - doesn't make sense.

    The issue with #1 is classic. The statement of probability is: "The probability of a collision with N values".

    As you put it. "The probability of two people in the room having the same birthday given N people." We calculate this value for N (today) N+R (tomorrow) N+30R (30 days from now), N-R (yesterday) and N+R/(1000*1000*60*60*24) (1us from now)

    Agreed that prob @ N-R is academic, and likly so is prob@1us - but this doesn't invalidate the statement. An electron in your CPU has a very real (though remote) possibility to exist in my brain any any point time. Weither it ever happens or not is not important.

    #2 - Yes it's not intuative that 1us from now be about the same a 1day from now. But try to change the N value to something 1000 times larger. See what happens to the ratio of Prob@1us and Prob@1day. Is that radio greater or lesser then N of today? What does this tell us? That the "probability of collision" starts increasing much faster the closer we get to the magic 1.17 x 2^64 value.

    "Lies, damned lies and statistics." - Edmond Disraili.

    Einstein really had a problem with using stats in Quantum Physics because he didn't like how everything was left up to games chance and complex probability equations (MUCH more complex than collision searchs). I think you're having the same dilema.

    JLC - playing the ultimate game of chance - trying to spell.

  22. #22
    I think the problem is this (and, be warned, I don't understand all of what is being said):

    The statistics on the MD5CRK web page are for "probability of completing AT OR BEFORE time from now, including the past".

    You (jkr) are talking about "probability of completing BETWEEN now and time from now".

  23. #23
    Originally posted by jlcooke
    Can someone with contacts @ slashdot or other large online news sites drop them a line? Use the prize money as an angle.
    I have submitted the story to slashdot so we will see what happens...



  24. #24
    My article was rejected



  25. #25
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by AySz88
    I think the problem is this (and, be warned, I don't understand all of what is being said):

    The statistics on the MD5CRK web page are for "probability of completing AT OR BEFORE time from now, including the past".

    You (jkr) are talking about "probability of completing BETWEEN now and time from now".
    That's one thing that circled around in head, too.
    I'm certainly not sure, but maybe the probabilities on the webpage show the likeliness the collision will be found on that day?
    Esp. the 24 hours probability has decreased a lot in the last two weeks, although the DPs reported in the last 24 hours is more or less the same. But of course, there are more DPs already found.
    The 30 Days likeliness is a bit worse than double as good as the 24 Days likeliness. As the project is running for 3 months now and there was a small increase of productivity, the probability of 1 : 590,222 is something I expect to have around end July when goes well - which coincides with my assumption given above...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •