Results 1 to 37 of 37

Thread: Test restarted

  1. #1
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    Unhappy Test restarted

    I just booted up this morning and my test that was 90% complete restarted with the same k&n but from scratch (180 blocks lost!). Here's the log (username: boojiboy):

    Tue Jun 17 20:46:46 2003] resolving hostname
    [Tue Jun 17 20:46:46 2003] opening connection
    [Tue Jun 17 20:46:46 2003] receiving from server
    [Tue Jun 17 20:46:46 2003] logging into server
    [Tue Jun 17 20:46:46 2003] login successful
    [Tue Jun 17 20:46:47 2003] n.high = 3215897 . 25 blocks left in test
    [Tue Jun 17 21:17:47 2003] block processing paused
    [Tue Jun 17 23:54:32 2003] got k and n from cache
    [Tue Jun 17 23:54:34 2003] restarting proth test from cache (k=22699, n=3667222) [87.9%]
    [Tue Jun 17 23:55:06 2003] block processing paused
    [Wed Jun 18 10:16:38 2003] got k and n from cache
    [Wed Jun 18 11:09:54 2003] block processing paused
    [Wed Jun 18 11:10:03 2003] got k and n from cache
    [Wed Jun 18 11:10:03 2003] restarting proth test from cache (k=22699, n=3667222) [0.4%]
    [Wed Jun 18 11:16:06 2003] resolving hostname
    [Wed Jun 18 11:16:07 2003] opening connection
    [Wed Jun 18 11:16:07 2003] receiving from server
    [Wed Jun 18 11:16:07 2003] logging into server
    [Wed Jun 18 11:16:08 2003] login successful
    [Wed Jun 18 11:16:08 2003] n.high = 18589 . 197 blocks left in test

    None of the registry keys look interesting so I'm assuming my previous work is gone... Is there any chance of recovery??

  2. #2
    Member
    Join Date
    Jan 2003
    Location
    Germany
    Posts
    36
    Same thing here. That's really not good!!!


    [Thu Jun 19 01:47:33 2003] got k and n from cache
    [Thu Jun 19 01:47:33 2003] restarting proth test from cache (k=55459, n=3877546) [54.4%]
    [Thu Jun 19 01:50:00 2003] resolving hostname
    [Thu Jun 19 01:50:00 2003] opening connection
    [Thu Jun 19 01:50:00 2003] receiving from server
    [Thu Jun 19 01:50:01 2003] logging into server
    [Thu Jun 19 01:50:01 2003] login successful
    [Thu Jun 19 01:50:02 2003] n.high = 2111629 . 107 blocks left in test

    ...

    [Thu Jun 19 03:07:59 2003] resolving hostname
    [Thu Jun 19 03:08:00 2003] opening connection
    [Thu Jun 19 03:08:00 2003] receiving from server
    [Thu Jun 19 03:08:00 2003] logging into server
    [Thu Jun 19 03:08:01 2003] login successful
    [Thu Jun 19 03:08:01 2003] n.high = 2194764 . 102 blocks left in test
    [Thu Jun 19 08:21:14 2003] got k and n from cache
    [Thu Jun 19 09:32:53 2003] got k and n from cache
    [Thu Jun 19 09:32:54 2003] restarting proth test from cache (k=55459, n=3877546) [0.4%]

    This is SB v1.10
    Last edited by rosebud; 06-19-2003 at 05:02 AM.

  3. #3
    my only thought is that it somehow didn't set the working directory correctly and so your cache got put somewhere other than the SB directory on one of your runs.

    this could happen if you didn't install SB on a computer (aka copied an install from another directory or moved it after installing and then deleted the original directory.

    even if you didn't do either of those things, search your hard drive for zXXXXXXXX files and make sure it's not putting them somewhere wierd. it will put them in the directory pointed to in the reg key HKLM\Software\LhDn\sob\config\dir . if that's an invalid key or doesn't exist, it should give you an error but there are cases where it could not realize it's invalid and write your cache files all kinds of places depending on which version of windows you're running. If I remember, windows95/98 drop it in C:\, 2k and NT put it in c:\WINNT and XP puts it in the documents directory of the current user logged in.

    that key is set by the installer so moving the client or not correctly installing it could bork stuff.

    -Louie

  4. #4
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16
    No extra zXXX cache files are hiding on my (win98) computer.

    I noticed with the old client that if I exited sb without pausing first I would lose the progress on the current block, maybe I zapped the whole test this time?

  5. #5
    Member
    Join Date
    Jan 2003
    Location
    Germany
    Posts
    36
    i can't find another zxxxxxx file either...

    And the registry key is set correctly to my sb directory.

    Maybe the reason is that somebody (my brother) killed the sb-client before it restartet its work:

    [Thu Jun 19 08:21:14 2003] got k and n from cache <--
    [Thu Jun 19 09:32:53 2003] got k and n from cache
    [Thu Jun 19 09:32:54 2003] restarting proth test from cache (k=55459, n=3877546) [0.4%]

    and sothe cache was somehow cleared. Is that possible??

  6. #6
    Member
    Join Date
    Nov 2002
    Location
    Haverhill, MA
    Posts
    76
    i had a similar problema while back and then when the testing completed for some reson my stats when skewy like it had been a test value or something and the result was wrong???


    anyway i know that the directory was in the cotrrect place since i had installed over the old install

  7. #7
    Member
    Join Date
    Jan 2003
    Location
    Germany
    Posts
    36
    It happened again, might this be a bug in v1.10??

  8. #8
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    Scary

    You're computer must be faster than mine.... Did yours stop at the same block it did last time? I won't know if mine can go farther for another 5-8 days.

  9. #9
    Member
    Join Date
    Jan 2003
    Location
    Germany
    Posts
    36
    Did yours stop at the same block it did last time?
    No, it was a different block. The full test was at about 85%, now it's at zero again.

    This is really annoying...

  10. #10
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    Time to retire?

    My test got restarted again, the same k&n, this time at 59% complete:

    [Tue Jul 01 13:58:27 2003] login successful
    [Tue Jul 01 13:58:27 2003] n.high = 2137735 . 83 blocks left in test
    [Tue Jul 01 14:36:05 2003] block processing paused
    [Tue Jul 01 16:15:09 2003] got k and n from cache
    [Tue Jul 01 16:15:13 2003] restarting proth test from cache (k=22699, n=3667222) [58.8%]
    [Tue Jul 01 16:17:37 2003] resolving hostname
    [Tue Jul 01 16:17:38 2003] opening connection
    [Tue Jul 01 16:17:38 2003] receiving from server
    [Tue Jul 01 16:17:38 2003] logging into server
    [Tue Jul 01 16:17:38 2003] login successful
    [Tue Jul 01 16:17:39 2003] n.high = 2156324 . 82 blocks left in test
    [Tue Jul 01 23:18:43 2003] got k and n from cache
    [Tue Jul 01 23:18:45 2003] restarting proth test from cache (k=22699, n=3667222) [59.0%]
    [Tue Jul 01 23:19:18 2003] block processing paused
    [Tue Jul 01 23:21:22 2003] got k and n from cache
    [Tue Jul 01 23:21:24 2003] restarting proth test from cache (k=22699, n=3667222) [59.0%]
    [Tue Jul 01 23:21:27 2003] block processing paused
    [Wed Jul 02 08:55:14 2003] got k and n from cache
    [Wed Jul 02 09:39:54 2003] resolving hostname
    [Wed Jul 02 09:39:54 2003] opening connection
    [Wed Jul 02 09:39:55 2003] receiving from server
    [Wed Jul 02 09:39:55 2003] logging into server
    [Wed Jul 02 09:39:55 2003] login successful
    [Wed Jul 02 09:39:56 2003] n.high = 18589 . 197 blocks left in test
    [Wed Jul 02 10:23:21 2003] resolving hostname
    [Wed Jul 02 10:23:21 2003] opening connection
    [Wed Jul 02 10:23:21 2003] receiving from server
    [Wed Jul 02 10:23:22 2003] logging into server
    [Wed Jul 02 10:23:22 2003] login successful
    [Wed Jul 02 10:23:22 2003] n.high = 37178 . 196 blocks left in test
    [Wed Jul 02 11:31:35 2003] resolving hostname
    [Wed Jul 02 11:31:35 2003] opening connection
    [Wed Jul 02 11:31:35 2003] receiving from server
    [Wed Jul 02 11:31:35 2003] logging into server
    [Wed Jul 02 11:31:35 2003] login successful
    [Wed Jul 02 11:31:36 2003] n.high = 55767 . 195 blocks left in test
    [Wed Jul 02 18:34:08 2003] got k and n from cache
    [Wed Jul 02 18:34:11 2003] restarting proth test from cache (k=22699, n=3667222) [1.7%]


    I'll give it one more shot.... Any ideas as the what's causing this?

  11. #11
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    Changed k&n

    Well this test crashed for the third time now, this time at 70%. I've abondoned this test (k=22699, n=3667222) and picked a new one. If this new one does not complete I'd say that the client has a serious issue (at least on a 1 GHz Win 98 box like mine).

  12. #12
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    AAAaaack

    My computer has failed to finish its 4th staright test. Louie, I (boojiboy) am assigned k=33661, n=4139160 but I am uninstalling the client so I will not finish this test.

    It restarted after the full test was at 83%. I believe the cause is some bug with the pause button but I'm not sure.

    Anyways, when a new version is out I'll re-install, untill then peace.

  13. #13
    If you're unhappy with the Prp client perhaps you could offer some assistance to the P-1 factoring effort or the sieve effort. That way you wouldn't have to abandon this project and you can be productive while waiting for the new and improved (hopefully) client to be here (hopefully) soon.

  14. #14
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16

    Good idea...

    Which effort needs more help at the moment (from a 1GHz box)?

  15. #15
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    As it seems to be no P4, I think it's better suited for sieving. Factoring likes SSE(2) again...

  16. #16
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    Factoring is the part of the project which needs, and I would say it needs urgently, more power.
    If P4's are only well suite for PRPing and factoring, that doesn't means that the other processort aren't well suited for those works! For example, Athlons are very well suited for any kind of works.

  17. #17
    Junior Member
    Join Date
    Jun 2003
    Location
    Turku, Finland
    Posts
    3
    Why is this made so difficult,
    I can't understand what are the upsides in storing information in Windows registry and make it hard for
    1) uninstalling the client and
    2) for people to want to migrate from machine to machine, and
    3) people who like to backup their process, and
    4) making non-Windows-port.

    Why not just
    1) read status from sob.ini on startup
    2) save status every 30 min to that file
    3) save status on close request
    4) no hassle

    Easy, reliable (no lost work/blocks, none), simple and comfortable client keeps people on this project. Uncomfortable and confusing does not.

  18. #18
    Senior Member
    Join Date
    Dec 2002
    Location
    australia
    Posts
    118
    What in the registy is the client so sensitive to?

    What I do not understand is that after installing an anti-virus product (NOD32) and using spybotSearchNDetroy, SB freaked and went back to the beginning of the test. I had the client (installed as service) paused or stopped at the time.

    If anyone wants to investigate I will post the log.

    Now I wish I had backed up the zxxxxxx file.

    No drama with the (k,n) pair but it did not resume from the zxxxxxx file, and I did not notice to start with - so the zxxxxxx file was overwritten.

    I still think something got through my firewall - machine playing up slightly but can't find it (mind you it is windows)
    Last edited by tqft; 08-28-2003 at 11:45 PM.

  19. #19
    Junior Member
    Join Date
    Jun 2003
    Location
    Ottawa, CAN
    Posts
    16
    I believe the problem lies with the pause feature as I used it before every crash I had....

    My problem was if I exited sb without pausing first I would lose the progress on the current block (about 20 minutes of work). Doing this was like playing russian roulette because once in a while the whole test would disappear. Now I simply exit without regards for the current block and haven't had a crash *yet*.

  20. #20
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    This has just happened to me twice in a row, after never happening before, so I'm guessing it's something weird at the server end, Louie. Both times the test restarted after exiting the client (not manually, just a shutdown)

    [Sun Aug 31 07:04:14 2003] n.high = 792670 . 284 blocks left in test
    [Sun Aug 31 07:18:20 2003] resolving hostname
    [Sun Aug 31 07:18:20 2003] opening connection
    [Sun Aug 31 07:18:21 2003] receiving from server
    [Sun Aug 31 07:18:22 2003] logging into server
    [Sun Aug 31 07:18:22 2003] login successful
    [Sun Aug 31 07:18:23 2003] n.high = 805455 . 283 blocks left in test
    [Sun Aug 31 07:32:29 2003] resolving hostname
    [Sun Aug 31 07:32:29 2003] opening connection
    [Sun Aug 31 07:32:30 2003] receiving from server
    [Sun Aug 31 07:32:31 2003] logging into server
    [Sun Aug 31 07:32:31 2003] login successful
    [Sun Aug 31 07:32:32 2003] n.high = 818240 . 282 blocks left in test

    (shutdown here)

    (boot up a few hours later)

    [Sun Aug 31 12:28:33 2003] got k and n from cache

    Bang, test lost. 0%. I thought this was just a freak ocurrence after it happened once, but now it's twice in a row, and I've made no changes whatsoever at my end. I'm using the 1.10 windows client with the service handler. The reg setting hasn't changed, and all the cache files (2 of them, each 400-500k) are in the SB directory. It's not every reboot, either... I rebooted at least three times yesterday because I was moving the computer around, and kept the test each time. It was only when it was turned off this morning that the test was lost.

    If it suddenly started happening to all these people, couldn't something have gone wrong at the server end?

  21. #21
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    Actually, now that I've had a dig through my logs, it looks like I've been doing this test for a number of days (since it was assigned on Thursday, k=4847, n=4421871), and it's restarted a LOT. There's definitely some weird stuff going on.

    [Sat Aug 30 13:22:23 2003] n.high = 102280 . 338 blocks left in test
    [Sat Aug 30 13:33:46 2003] got k and n from cache
    [Sat Aug 30 13:48:11 2003] resolving hostname
    [Sat Aug 30 13:48:11 2003] opening connection
    [Sat Aug 30 13:48:11 2003] receiving from server
    [Sat Aug 30 13:48:12 2003] logging into server
    [Sat Aug 30 13:48:12 2003] login successful
    [Sat Aug 30 13:48:13 2003] n.high = 12785 . 345 blocks left in test
    [Sat Aug 30 14:01:51 2003] resolving hostname
    [Sat Aug 30 14:01:51 2003] opening connection
    [Sat Aug 30 14:01:51 2003] receiving from server
    [Sat Aug 30 14:01:52 2003] logging into server
    [Sat Aug 30 14:01:53 2003] login successful
    [Sat Aug 30 14:01:54 2003] n.high = 25570 . 344 blocks left in test

    Why the jump from 338 to 345?

    And...

    (After a whole lot of work while not connected)
    [Sat Aug 30 11:02:47 2003] got k and n from cache
    [Sat Aug 30 11:02:47 2003] restarting proth test from cache (k=4847, n=4421871) [24.0%]
    [Sat Aug 30 11:02:56 2003] got k and n from cache
    [Sat Aug 30 11:19:21 2003] resolving hostname
    [Sat Aug 30 11:19:21 2003] opening connection
    [Sat Aug 30 11:19:22 2003] receiving from server
    [Sat Aug 30 11:19:23 2003] logging into server
    [Sat Aug 30 11:19:23 2003] login successful
    [Sat Aug 30 11:19:24 2003] n.high = 12785 . 345 blocks left in test

    Another restart back to 345 blocks.

    And then the one I mentioned in my previous post. I ditched the test and got a new one, because there was just no way it was going to finish. If I find out it was restarting because there was a prime in there, I'm going to be irritated

  22. #22
    Member Cmarc's Avatar
    Join Date
    Dec 2002
    Location
    SF Bay Area
    Posts
    70
    It seems xyzzy has also been experiencing this problem
    <Xyzzy> [Mon Sep 08 14:41:13 2003] logging into server
    <Xyzzy> [Mon Sep 08 14:41:13 2003] login successful
    <Xyzzy> [Mon Sep 08 14:41:13 2003] n.high = 1184076 . 256 blocks left in test
    <Xyzzy> [Mon Sep 08 14:47:41 2003] resolving hostname
    <Xyzzy> [Mon Sep 08 14:47:41 2003] opening connection
    <Xyzzy> [Mon Sep 08 14:48:02 2003] temporarily unable to connect to server -- block added to submit queue
    <Xyzzy> [Mon Sep 08 14:54:40 2003] resolving hostname
    <Xyzzy> [Mon Sep 08 14:54:40 2003] opening connection
    <Xyzzy> [Mon Sep 08 14:54:40 2003] receiving from server
    <Xyzzy> [Mon Sep 08 14:54:40 2003] logging into server
    <Xyzzy> [Mon Sep 08 14:54:41 2003] login successful
    <Xyzzy> [Mon Sep 08 14:54:51 2003] block report stalled -- block added to submit queue
    <Xyzzy> [Mon Sep 08 15:01:26 2003] resolving hostname
    <Xyzzy> [Mon Sep 08 15:01:26 2003] opening connection
    <Xyzzy> [Mon Sep 08 15:01:29 2003] receiving from server
    <Xyzzy> [Mon Sep 08 15:01:30 2003] logging into server
    <Xyzzy> [Mon Sep 08 15:01:30 2003] login successful
    <Xyzzy> [Mon Sep 08 15:01:30 2003] n.high = 1222272 . 253 blocks left in test
    <Xyzzy> [Mon Sep 08 15:08:11 2003] resolving hostname
    <Xyzzy> [Mon Sep 08 15:08:11 2003] opening connection
    <Xyzzy> [Mon Sep 08 15:08:11 2003] receiving from server
    <Xyzzy> [Mon Sep 08 15:08:12 2003] logging into server
    <Xyzzy> [Mon Sep 08 15:08:12 2003] login successful
    <Xyzzy> [Mon Sep 08 15:08:12 2003] n.high = 1235004 . 252 blocks left in test
    <Xyzzy> [Mon Sep 08 15:34:46 2003] connecting to server
    <Xyzzy> [Mon Sep 08 15:34:48 2003] logging into server
    <Xyzzy> [Mon Sep 08 15:34:48 2003] requesting a block
    <Xyzzy> [Mon Sep 08 15:34:50 2003] block processing paused
    <Xyzzy> z4431193 is still in folder
    <Xyzzy> 541kb
    <Xyzzy> 4th time this week\
    <Xyzzy> if the ****ing blocks were smaller this would not be a problem
    <Xyzzy> 242163 28433•2^4431193+1 66.57.64.82 Mon Sep 8 08:29:57 2003 Mon Sep 8 19:08:05 2003 1235004 27 % 633355
    <Xyzzy> 27% done
    Somewhat frustrating, we can't have everyone doing Sieving or factoring and most users don't keep a close eye on their clients and are likely to miss restarts.

  23. #23
    perhaps something should be incorporated into the client to allow the work to be backed up and restarted if the client stalls. Perhaps that would also allow for a stop and return results so far button. Would probably be very helpful and will prevent alot of the loss that occurs from this problem and fro mpeople quitting tests prematurly. Perhaps even have the client save its progress with the server at say the 1/3 and 2/3 mark.

  24. #24
    Senior Member eatmadustch's Avatar
    Join Date
    Nov 2002
    Location
    Switzerland
    Posts
    154
    I agree this is a serious problem and something needs to be done, but saving the block on the server is not going to work. The size of such a block is now more than 500kb, that means a whole megabyte of traffic if you want to back it up twice! I really don't think that is going to be possible on the already very limited budget seventeen or bust has!
    On checking the size of the cache files I just noticed 5 z******* files in the SB directory, does that mean my client has abandoned a block 4 times? That really isn't good
    EatMaDust


    Stop Microsoft turning into Big Brother!
    http://www.againsttcpa.com

  25. #25
    Member
    Join Date
    Nov 2002
    Location
    Haverhill, MA
    Posts
    76
    [Wed Sep 10 21:30:07 2003] resolving hostname
    [Wed Sep 10 21:30:08 2003] opening connection
    [Wed Sep 10 21:30:08 2003] receiving from server
    [Wed Sep 10 21:30:09 2003] logging into server
    [Wed Sep 10 21:30:09 2003] login successful
    [Wed Sep 10 21:30:10 2003] n.high = 2949840 . 127 blocks left in test
    [Wed Sep 10 21:48:07 2003] got k and n from cache
    [Wed Sep 10 21:48:08 2003] restarting proth test from cache (k=5359, n=4509990) [65.6%]
    [Wed Sep 10 21:48:27 2003] got k and n from cache
    [Wed Sep 10 22:10:50 2003] resolving hostname
    [Wed Sep 10 22:10:50 2003] opening connection
    [Wed Sep 10 22:10:53 2003] receiving from server
    [Wed Sep 10 22:10:53 2003] logging into server
    [Wed Sep 10 22:10:54 2003] login successful
    [Wed Sep 10 22:10:55 2003] n.high = 12291 . 366 blocks left in test
    [Wed Sep 10 22:29:16 2003] resolving hostname



    and the sobsvc.log showed this:
    [2003/09/10 21:48:06.953]: Parms retrieved: NumClients = 1, AffType = 0, WUQueue = 0, PeriodicRestart = 0, AutodialType = 0
    [2003/09/10 21:48:06.968]: TrueIdle = FALSE, MonitorRestart = TRUE, StuckRestart = TRUE, KeepVisible = TRUE
    [2003/09/10 21:48:06.968]: NormalizeXP = FALSE
    [2003/09/10 21:48:06.968]: AutoRestart OFF
    [2003/09/10 21:48:06.968]: ** SB Client version 1.10 **
    [2003/09/10 21:48:06.984]: Starting Service
    [2003/09/10 21:48:06.984]: starting client 1
    [2003/09/10 21:48:07.906]: Client start successful: 0x90 (0x6ec) - 0x10080 (0x100b6)
    [2003/09/10 21:48:14.656]: KeepVisible restart initiated...
    [2003/09/10 21:48:14.656]: Stopping client 1...
    [2003/09/10 21:48:26.984]: Restarting client 1...
    [2003/09/10 21:48:27.625]: Restart successful: 0xa0 (0x8f4) - 0x200ba (0x20084)

  26. #26
    Here's a thought, folks - Stricker's problem seems to have manifested itself on a KeepVisible restart initiated by the service handler. Are the rest of you experiencing the problem using the service handler? If so, has the problem occurred (mostly or exclusively) on the KeepVisible restart that occurs when a user logs on?

    I ask this because the service handler delays between requesting a shutdown from the client and allowing the shutdown to proceed ("pressing the OK button") in order to allow the client time to cache its progress. The amount of delay was chosen after experimentation to be sufficient on the oldest processor I could lay my hands on and then bumped up a bit from there just to be sure...you can force the client to fail to cache (at least on a slow machine) by pressing the Exit button on the GUI and then immediately pressing the OK button on the message box - you have to be *really* quick, but it can be done!

    HOWEVER during boot and/or logon, the processor is often busy with other things, so maybe the amount of delay that works just fine normally wouldn't be sufficient at the time of the KeepVisible restart.

    Of course, if Stricker's the only one whose problem occurred at a KeepVisible restart, then I have no idea what's happening...an all-too-common occurrence, I'm afraid!

  27. #27
    Member
    Join Date
    Nov 2002
    Location
    Haverhill, MA
    Posts
    76
    i have an athlon XP 1900+ with 1gig of ram so i doubt that it is my system being too slow but you are correct about it being during login since i had restarted my system just before it happened

  28. #28
    Try disableing some of the startup applications and disable windows update I had a similar problem with my wondows client once and that solved it. BTW is the the windows version you are discussing?

  29. #29
    Member
    Join Date
    Nov 2002
    Location
    Haverhill, MA
    Posts
    76
    yes this is my windows box
    it is on winxp
    and all but 2 of my startup programs i use consistently so i have already cleaned it up about as much as i'm willing to go

  30. #30
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    MathGuy:

    I'm using the service handler with keep clients visible on... now that I think about it, the restarts probably only started after I turned on +k

    If that is the problem, I hope it's fixable, because my machine often sits idle at the login prompt, but I also like to be able to check my progress

  31. #31
    Well, it's certainly easy enough to check and see if this is the problem. I'll get you a new version to test in a day or so (I'm flu-ridden at the moment...trust me, you don't want to run code written by me right now!)

    Edit: for the brave of heart - I went ahead and made a quick change that should fix this problem if it's what I think it might be...
    it's available at http://kerryjones.home.mindspring.com/sobsvcx.zip
    and to install it, you stop the service, unzip into \program files\sb, then restart.

    Essentially all I did was to make the service handler wait longer on KeepVisible restarts to confirm the client stop. If it turns out that this does solve the problem, then I'll try to work out a way to do it "right" (meaning to wait only until the client has finished saving its progress).
    Last edited by MathGuy; 09-11-2003 at 11:00 PM.

  32. #32
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    Wow, well done for the quick work

    I'll give it a try, but I won't really be able to let you know if it's solved it or not, because the problem doesn't happen with every reset...

  33. #33
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    no joy.

    [Sun Sep 14 02:19:48 2003] n.high = 376185 . 344 blocks left in test
    [Sun Sep 14 12:09:50 2003] got k and n from cache
    [Sun Sep 14 12:09:50 2003] restarting proth test from cache (k=55459, n=4538794) [8.4%]
    [Sun Sep 14 12:11:41 2003] got k and n from cache
    [Sun Sep 14 12:12:25 2003] got k and n from cache
    [Sun Sep 14 12:12:32 2003] got k and n from cache
    [Sun Sep 14 12:13:31 2003] got k and n from cache

    i've taken off the service handler for now, just to see whether that really is the thing causing the problem... but two of my friends using the linux client have had their tests restarted repeatedly as well. i'm stumped :/

  34. #34
    Eh, rough luck, lad...

    I've never seen that repeated restart thing.

    Was the service handler doing anything during that period of time (that is, does sobsvc.log show any activity at the time of those restarts)?

  35. #35
    Team Anandtech
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    50
    [2003/09/14 12:09:47.750]: Parms retrieved: NumClients = 1, AffType = 0, WUQueue = 0, PeriodicRestart = 0, AutodialType = 0
    [2003/09/14 12:09:47.796]: TrueIdle = FALSE, MonitorRestart = TRUE, StuckRestart = TRUE, KeepVisible = TRUE
    [2003/09/14 12:09:47.796]: NormalizeXP = FALSE
    [2003/09/14 12:09:47.796]: AutoRestart OFF
    [2003/09/14 12:09:47.796]: ** SB Client version 1.10 **
    [2003/09/14 12:09:49.015]: Starting Service
    [2003/09/14 12:09:49.093]: starting client 1
    [2003/09/14 12:09:50.109]: Client start successful: 0x94 (0x52c) - 0x10076 (0x100ac)
    [2003/09/14 12:11:35.483]: KeepVisible restart initiated...
    [2003/09/14 12:11:35.483]: Stopping client 1...
    [2003/09/14 12:11:38.061]: Restarting client 1...
    [2003/09/14 12:11:43.030]: Restart successful: 0xa4 (0x198) - 0x1011e (0x1015a)
    [2003/09/14 12:12:25.358]: Process 0x198 (client 1) terminated
    [2003/09/14 12:12:25.358]: Attempting restart
    [2003/09/14 12:12:25.733]: Restart successful: 0xa0 (0xb38) - 0x20154 (0x20158)
    [2003/09/14 12:12:32.561]: Process 0xb38 (client 1) terminated
    [2003/09/14 12:12:32.561]: Attempting restart
    [2003/09/14 12:12:32.764]: Restart successful: 0xa8 (0xc1c) - 0x30154 (0x30158)

  36. #36
    I just lost a test for the first time: 90% complete, too.

    I'm almost completely sure this happened on a "keep clients visible" restart because the log shows my test functioning normally and then right after I logged back in it restarted the test.

    I'm running winxp on a p4 2.8c with 512 mb of ram. The only programs I have running on startup are the service (obviously) and norton internet security 2003. I do have automatic windows update on, though.

    I am also running in -o2 "mode" since I have hyperthreading. Therefore I have two clients. Only one client had its test restarted....the other didn't. According to the logs, the one that restarted (when the system came up...not referring to the test) first was the one that lost its progress. It took a good 30 seconds or so before the system logged back in, too. Finally, by looking at the run time on the clients, I noticed they had restarted about 30 seconds apart....which, by my reckoning, means one restarted while the system was still coming back up and the other restarted after the system was already up.

    Some factors to take into consideration: I have a WD Raptor HD (10000 RPM). I'm guessing if the system was "cacheing" the data it should take considerably less time considering my access time is faster than 7200 RPM HD's seek times. However, I am also running two tests. Furthermore, I had been running my system for a good 40 hours straight at 100% cpu usage for SB. According to my temperature monitor, though, it was only 46 degrees Celsius near the cpu. I had also logged out and logged back in many times during this 40 hour period and had had no problems at all.....not even lost blocks.

    I'm guessing that I should just give my computer a break about every 35 hours or so. I had been doing this before anyway because my speed seemed to decrease (even with intermediate restarts of the client) after this amount of time, but I didn't have a chance to do this today because I was out. Furthermore, logging out of my system did seem to raise my speed a bit and also prevent speed loss over time.

    I'll try the new version of the service and see if I have any future problems. If the problems keeps occuring I guess I could try one of two things: 1) never log out of my computer (thereby decreasing speed by a bit) or 2) after returning to my computer after having logged out, restart my computer instead of logging back in, thus (I think) giving me visible icons and no problems.

    I don't know much about programming (or how SB is currently programmed), but here are my thoughts on a more permenant solution: if the client allowed queuing and the server recorded the n.high as reported by the client (as compared to having everything transferred to the server), then if the client suddenly reported a lower n.high on the same test, the server could direct the client to stop whatever it was currently doing and start working from the highest previous n.high reported to the server. This would also only work if the server knew what the n.high should be for each test at the end of each block, in order to make sure that the client was actually performing the full test. The only situation this would not be plausible in is if SB currently has primes sent to the server by the test and not the block. If the converse were true, I think this would at least be a logical, but not necessarily perfect (because of the resources necessary), solution.

    Oh yeah, there are no out-of-place .zxxxx files on my computer and the k and n of the restarted test were 21181 and 4535348.

  37. #37
    Senior Member
    Join Date
    Dec 2002
    Location
    australia
    Posts
    118

    Hopefully...

    [Mon Sep 15 07:39:14 2003] restarting proth test from cache (k=4847, n=4381983) [98.7%]
    [Mon Sep 15 12:22:43 2003] residue: AAE5DFAE152A57D2
    [Mon Sep 15 12:22:43 2003] completed proth test(k=4847, n=4381983): result 3

    Hopefully this result was registered. Machine definitely chucked a wobbly when it tried to submit.

    It does look odd when starting up - my machine is slow enough when starting up that I can see some of what it does, and SB startup has seemed funnier since I added some more startup stuff.

    BTW I haven't lost a block or had much drama since I started backing up the z file every day ot two. I have had to use the backup - when pausing the client using the GUI interface (service handler is running - should I be doing this?), it has have wiped the z file. Hence the handiness of the backup file.

    Can't run SB and European Air War at the same time.

    PII 450MHz Win98SE, ver1.1 with service handler.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •