PDA

View Full Version : I'm Trying to Help!



Dyyryath
06-30-2003, 04:16 PM
Unfortunately, this is what's holding me back:

http://www.zerothelement.com/offsite-images/dfp2-ss.png

That's not really even a bad one. I had two clients this morning that were over 450mb of RAM. :(

I'm not sure how it will affect my output, but I'm going to add a cron job to restart the clients on several of my boxes every night. If it helps, I'll reload it on some of my home machines.

Unfortunately, I can't even consider running this client on my work boxes until they get this all sorted out. :bang: :bang:

Dyyryath
06-30-2003, 04:25 PM
LOL, here's another one! :D

http://www.zerothelement.com/offsite-images/dfp2-ss-2.png

This one's funny because it shows the swap daemon going nuts when I log in and require resources for something other than DF:

http://www.zerothelement.com/offsite-images/dfp2-ss-3.png

I definately can't have this kind of nonsense going on in a box that actually has to do something other than crunch. :(

rsbriggs
06-30-2003, 04:34 PM
Ugh. I never run client applications as root...
Anyone know of an AIX port that has been done?

Dyyryath
06-30-2003, 04:43 PM
On a real box, I wouldn't either, but these are boxes solely dedicated to DF. That is, in fact, the only reason the client is still running on them. :D

As for an AIX client, I haven't heard anything.

xj10bt
06-30-2003, 05:03 PM
Hey Dyyryath,

Same thing here. With the new client, I figured I'd give DF another shot and loaded it on some machines to check it out, but had to remove it.

I hoped this client would be more bug-free.

rsbriggs
06-30-2003, 05:49 PM
As for an AIX client, I haven't heard anything.



Too bad - I have LOTS of AIX box cpu available.....


Is it only the UNIX clients that have memory problems? I have 15 or so copies running on XP or Win2k and have never seen a single memory problem....

rsbriggs
06-30-2003, 05:54 PM
On a real box, I wouldn't either, but these are boxes solely dedicated to DF. That is, in fact, the only reason the client is still running on them.


(Doesn't matter.) You might try running this as some other user, just to see if the behavior is different. My guess would be that even the guy that did the *NIX port didn't run them as root. Been running unix boxes for nearly 30 years, and don't run or do ANYTHING as root that can be done as some other user. Learned long ago that a mistake like rm dash rf slash star enter can ruin your whole day, most of your week, and much of your job sometimes.

Angus
06-30-2003, 06:40 PM
The memory problem is rampant in Windows boxes as well.

I just added another P4 HT 3.0 box. Everything's going full tilt :mouserun:

Beyond
06-30-2003, 06:53 PM
Just in from work and the first box I check, the client is using 208MB RAM 234MB VM and the client has had 1,487,633 page faults. All this since I restarted the client this morning, less than 12hrs ago. :bang: :swear: I am tempted to redeployment to different project is looking to be very soon.


/edited for clarity/

rsbriggs
06-30-2003, 07:24 PM
Well, that's very curious. I've got 15 copies of the client running on various OSes from XP pro to Win2k Pro to Win2k Advanced Server, and I've never seen memory utilization over (roughly) 142k.

But then, I don't have less than 1Gig of RAM on any of these boxes, and some have up to 4....

I don't understand why you would be seeing all those page faults, unless your swap partition was set too small -

EDIT - the picture didn't get inserted. Grrr....

Tell me, under control panel / System / Advanced / Performance tab / Advanced / Virtual Memory - (or the equivalent of these selections on the version of the OS you are running), what the settings are on that page for paging file size....

IronBits
06-30-2003, 07:37 PM
Dyyryath - I restart all my clients each and every day, at least once per 24hrs is highly recommended. :)
Thanks to everyone else that is having this problem and still trying to work with some of the boxen, to help out as much as possible!

:cheers:

Paratima
06-30-2003, 07:44 PM
And if our Number One Cruncher says he does it, then that's a strong hint! :notworthy

rsbriggs - An AIX port been talked about on and off. At one point last fall, Howard even went so far as to predict it might be happening soonish. However, as he would tell anyone who wants a particular port, he has to have one locally for compiling and testing. As we haven't seen an AIX client, I guess that no RISC-6000's have been beamed on board the DF mother ship. :cool:

This is unfortunate, as I could throw in a couple of RISCs myself, if we had it. I thought I had one I could donate, but our QA people grabbed it.

rsbriggs
06-30-2003, 07:49 PM
But that's the crazy thing - I have 5 sets of clients that have been running for 4 days on various boxes and OSes, and - just checked - max memory use, as shown by taskman, on any of the 5 of them is 97K.... Nary a glitch.

Now, what do you suppose is the difference in settings, or whatever, between these OK boxes, and the boxes people are having so much trouble with????

One is XP, one is XP Pro, one is WIn2k Pro, 2 are Win2k Advanced Server. I'd be happy to compare any/all settings if we thought it might give us a clue to the difference !!

PCZ
06-30-2003, 07:49 PM
I checked my remote PC's today.
They all had very high memory utilisation.
The DF clients had either stopped or were running so slowly they might as well have.
They have been running about ten days and the memory used was between 350 and 450 meg per instance. The dual boxes had no memory left, 20k at most.
The OS is W2K Server.


Brian

rsbriggs
06-30-2003, 07:53 PM
Is Howard the only one that does the porting? Is, for example, Linux source available anywhere? It wouldn't take much work to make Linux code run under AIX, and it might compile with GCC and work fine as is.

Angus
06-30-2003, 07:53 PM
Yup. That's what happens. They need to have DF stopped and restarted at least every couple of days until there is a fix. :bang: :bang: :bang: :bang: :bang:

magnav0x
06-30-2003, 08:00 PM
I'm tapped out on boxe, all of em are running DF and I've got some realy tough protiens on em, they don't seem to realize that Ars take over is happening very soon :swear:

Beyond
06-30-2003, 08:02 PM
Originally posted by rsbriggs
Tell me, under control panel / System / Advanced / Performance tab / Advanced / Virtual Memory - (or the equivalent of these selections on the version of the OS you are running), what the settings are on that page for paging file size....

14 clients running, 5 running W2k Pro, 2 running Win98SE, 1 W2k Server, the rest Linux of various flavours. all systems except 1 of the Win98's have paging file set at 768 MB's. 3 of them are showing over 1 million page faults, the rest I've checked are in the 10's and 100's of thousands, and all are at or near 200MB's RAM used. The excessive RAM usage is also present on the Linux boxes. :(

PCZ
06-30-2003, 08:03 PM
I can't see myself putting the client on any more boxes until it is fixed.
Certainly not risking it on the Corporate boxes.


Brian

rshepard
06-30-2003, 08:13 PM
This is very strange-- I have 4 win2K server duallies at work, a Win2K pro box, a Linux (Mandrake 9.1) box, and another Mandrake box at home-- NONE of them have thrown a memory error. The worst mem usage I have seen is ~ 253 on one of the WIN2K servers. Now I grant you, the servers are carrying a Gig of RAM, so they aren't too likely to go down; but I'm not having to restart the client on them. The Linux box at the house is only carrying 128meg; but I've even tried running it with the -rt switch set and it won't fold up (no pun intended) --it runs slow as anything if I want to open a browser, for example, but it swaps out the memory and then picks it back up when I close out.
One thought- forgive me if this is really crazy- but all the boxes at work are also running the ChessBrain client in parallel with the DF client. When CB kicks in and forces the DF client to "pause", could it be releasing some memory, or at least clearing out some of the space the DF client has allocated, thereby keeping the mem usage from going out the window? In other words, is that pause acting in some way like restarting the client?

rsbriggs
06-30-2003, 08:33 PM
Page faults are not the exception - they occur when a block of memory has been swapped out, or a new page requested for allocation. A million page faults, over a one hour period, is not even an interesting event.

This box, for example, has 1 gig of ram, but the act of just bringing up IE, navigating to this forum, and typing in this many characters has produced (alt-tabbing to task manager) *213,024 * page faults on this new IEXPLORER task alone.

Hummm. Now why do you suppose that is - there is only 233 MB of ram in use in the entire system, yet every character I type into this message results in even more.
Page faults for IEXPLORER are now up to 238,523. I suspect that if I navigate around a little, open a couple new windows, and return to this message that I could easily top 1M page faults on the IEXPLORER task over the next couple of minutes.

Ugh - My background McAffe virus check that kicked in 1/2 hour ago is showing over well over 120,000,000 page faults...... Guess I'll turn that sucker off - that will start to bother my performance a little if it keeps up at that rate....


PS - IEXplorer task is now up to... 451,276 page faults over the course of about 4 minutes. While that would be unusual on a Linux box, it's probably the norm here on Windows.

Beyond
06-30-2003, 08:49 PM
So nothing to worry about ith page faults then..never could understand all those windows thingies. :)

Well still cranking wokunits out so we will let them go till it falls over. :cool:

Paratima
06-30-2003, 09:06 PM
Originally posted by rsbriggs
Is Howard the only one that does the porting? Is, for example, Linux source available anywhere? It wouldn't take much work to make Linux code run under AIX, and it might compile with GCC and work fine as is. You are correct as to the likely ease of that particular port. However, if you read back thru these threads to a year or further back, Howard and the project management at the hospital are quite adamant about retaining control of the source code. There are some good reasons for this. I don't agree with all of them, but they do have some good points. So, for the nonce, Howard does all the porting. :rolleyes:

rsbriggs
06-30-2003, 09:26 PM
Can't say I blame him, I guess, although there ARE those of us who are professional programmers, and would honor an NDA...

(and I'd REALLY like to run the benchmark piece on an AIX box)

bwkaz
06-30-2003, 09:43 PM
Well, page faults don't happen *all* the time. Only when something bad happens in the paging subsystem of the processor -- i.e., a page is marked not present, or it's marked readonly but the calling process is trying to write to it, or it's marked as a supervisor page (user/supervisor bit is either cleared or set, I don't remember which, in the page table), and user code is trying to access it. These are not the only causes, though -- check the Intel processor manuals to find out what all of them are (I don't remember at the moment).

The vast majority of the time, page faults are handled by swapping the requested page back into RAM and restarting the offending instruction. If it's a permissions thing, though, then the process is generally killed (using signal 11, SIGSEGV, in Unix, or the "this program has generated errors and needs to close" messsagebox in Windows).

The high number of page faults is probably caused by a lot of stuff getting swapped out. It shouldn't have anything to do with swapfile size, but rather swapfile usage (and what you're trying to access that isn't in main memory). Especially if the page faults are happening in another process -- then it's almost assuredly because there isn't enough physical RAM for everything, and one or more bits of one or more processes has to get put into swap. And then it causes a page fault when it gets accessed and comes back out of swap.

Darkness Productions
06-30-2003, 09:52 PM
Are any of you running the client with any flags other than -rt, -qt and -it? I'm just wondering if one of the other flags could have something to do with it....

Paratima
06-30-2003, 09:57 PM
Nope. Just -rt & -qt here. And "Run Hidden" using dfGUI.
Everything else is defaults.

IronBits
06-30-2003, 10:06 PM
Originally posted by Darkness Productions
Are any of you running the client with any flags other than -rt, -qt and -it? I'm just wondering if one of the other flags could have something to do with it.... Not that I could tell. :(
http://www.free-dc.org/forum/showthread.php?threadid=3385

Darkness Productions
06-30-2003, 10:23 PM
New question. Can you all compare laxness levels to memory usage? I'm thinking there might be a connection there, but as usual, probably not...

IronBits
06-30-2003, 10:28 PM
Whatever you guys/gals are doing, KEEP IT UP! :D
:fireboun: Looks like most of the Calvary is arriving just in time :cheers:

Dyyryath
06-30-2003, 11:03 PM
Been running unix boxes for nearly 30 years, and don't run or do ANYTHING as root that can be done as some other user. Learned long ago that a mistake like rm dash rf slash star enter can ruin your whole day, most of your week, and much of your job sometimes.
LOL :D

Running it as root isn't contributing to the problem or helping it. I've actually still got it on two 'non-dc' boxes running under a user account and it behaves the same way.

The boxes I run it as root on don't actually have *any* regular user accounts. In fact, they are essentially running in single user mode all the time. They're stripped down Linux installs that boot directly to an open console with very little other than the kernel & a few basic tools & libs required to make the client run. As I said, they're *strictly* DF boxes. They even live on their own network here at the house. They were originally cluster nodes. :cool:

I've got the client running in the background on my development workstation upstairs as a regular user and it's around 420mb or so right now. The box has 1.5gb of RAM in it, so it's not a huge deal on that box, but it still needs to be fixed before I can start installing it elsewhere. :(

Dyyryath
06-30-2003, 11:09 PM
Originally posted by IronBits
Whatever you guys/gals are doing, KEEP IT UP! :D
:fireboun: Looks like most of the Calvary is arriving just in time :cheers:

I turned two dual MP2600's and 2 P4-2ghz that I had here to crunching with a cron job set to restart them every 24 hours. It's not much, but it should help...

rsbriggs
06-30-2003, 11:27 PM
I just put an old Athalon 1200+ in a case. No hard drive - not certain how to boot it off the network. Might be able to get it running tomorrow - getting too late to do anything else with it tonight... Other than AIX boxen, don't have anything else to run DF on....

rsbriggs
06-30-2003, 11:31 PM
Actually - it appears that team Intel Corp down at number 16 is producing more than both FreeDC and ARS combined. Look for them to be catching up in a couple of days...

IronBits
06-30-2003, 11:33 PM
Originally posted by Dyyryath
I turned two dual MP2600's and 2 P4-2ghz that I had here to crunching with a cron job set to restart them every 24 hours. It's not much, but it should help... :elephant: Watch for the process to terminate! It hangs there sometimes as long as 3 minutes...
You don't want/need two processes fighting over the same work :scared:

I hope xj10bt can find a few boxen so he can try to keep up.
I know I'm gonna regret it but, can't help it :D
xj10bt - this is what it looks like to me most of the time...
:moon: :moon: :harhar: ;)
/me goes back to making more Grape Koolaid to make my boxen smarter :crazy:

rsbriggs
07-01-2003, 04:51 AM
Gahhh. Broke down and ordered new 2.8 GHz 800 Mhz FSB HT box from Dell. Be here middle of next week. Going to need a new hub/switch too - the current one is maxed out once the AMD XP box gets going.

It's starting to get a little crowded in here. :rolleyes:

Too bad we can't convince djp to quit running the client for a while, or switch over to FreeDC :D - has anyone looked at those numbers ??

Darkness Productions
07-01-2003, 09:26 AM
You want to order one for me too? :spray:

IronBits
07-01-2003, 09:45 AM
Originally posted by rsbriggs
Gahhh. Broke down and ordered new 2.8 GHz 800 Mhz FSB HT box from Dell. Be here middle of next week. Going to need a new hub/switch too - the current one is maxed out once the AMD XP box gets going. Nice! :D
:thumbs: :cheers:

rsbriggs
07-01-2003, 09:55 AM
You want to order one for me too?
Sure - Boxen for everyone !!! :banana: :drums: :cheers: :|party|: :banana:

Might add couplea-k-per-hour to my output. IronBits must have a whole HOUSE full of boxen.... :notworthy

Hey - if smarter means lower RMSD, and lower RMDS means longer generation times and less points per hour - what do I feed my computers to make them DUMBER?

magnav0x
07-01-2003, 11:13 AM
My computers are natrualy stupid......but they don't know that means they get to go faster....maybe dumber isn't better :rotfl:

Angus
07-01-2003, 04:22 PM
Originally posted by rsbriggs
Actually - it appears that team Intel Corp down at number 16 is producing more than both FreeDC and ARS combined. Look for them to be catching up in a couple of days...

I think not. :spank:


It looks like a dump from one user made a momentary spike. Their Daily and Weekly rates are no-where near ours or ars.

:banana: