Minimizing energy (?)

**bwkaz** · 08-13-2003, 11:31 PM

I said something about this during the beta (I think; either that, or soon after the phase 2 switchover), but it appears to be worse now. I'm guessing that it might be because the client is minimizing energy on every structure during gen 0 now?

Anyway, the attached gkrellm image is the CPU usage I'm seeing. Green is normal (userspace), nice time -- what the client gets, usually. Orange is kernel-mode time. The graph's right side is "now", and it moves about one pixel per second.

Relevant output of ps aux:

Code:

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
<user>     368 97.6  7.9 76444 61992 ?       RN   05:42 115:56 ./foldtrajlite

This is Linux 2.4.20 (though the kernel version probably doesn't matter too much). Any other system details can be given, but I don't think (at the moment) that they'd be relevant.

My question basically is, why is it using so much kernel time? Should I run it through strace to try to figure out what in the heck it's trying to do in kernel mode? These orange spike-looking things happen every time it starts to minimize energy (between every generation), also, but the frequency is much lower once gen 0 is done.

**Brian the Fist** · 08-14-2003, 10:39 AM

Don't know what the kernel's doing but try strace, sure. Whatever it is, I can assure you its nothing naughty

**bwkaz** · 08-14-2003, 06:32 PM

Originally posted by Brian the Fist
Whatever it is, I can assure you its nothing naughty

I'm sure it's not, I'm just wondering if you might have known.

I'm running strace on it now, but (probably due to the power outage) the DF servers aren't reachable, so it's going through the long timeout process before it'll even start.

Will post results here, though. Thanks.

**jlandgr** · 08-14-2003, 06:48 PM

it's going through the long timeout process before it'll even start.

You can try inserting the "-if" option in foldit.bat to tell the client to run "offline".
Jérôme

**bwkaz** · 08-14-2003, 07:06 PM

Oh, duh, forgot about that. It had already timed out, but thanks anyway! However...

OK, the strace log is filled (and I mean quite literally, filled) with some strangeness.

Apparently the code is calling time(NULL) about, oh, 42425 times every second. Then, approximately once every six seconds (at the end of second <something huge>55, and again at the end of second <same huge number>61), the code goes through the following:

[list=1][*]Open blpotential.txt for reading (O_READONLY | O_LARGEFILE)[*]fstat64() the file, probably to get the size[*]old_mmap something (not sure if this is you or the OS, though). 4096 bytes get mapped as private, anonymous, read, and write.[*]Do a bunch of read()s of the blpotential file (actually, one too many -- the last one is returning 0, but the one before it is returning <4096 already)[*]munmap() the previously mmap()ped area[*]Call brk() a bunch of times (17 tries, 4096 bytes more per try), adding a total of 0x11000 bytes to your address space[*]close() the blpotential file[/list=1] At this point, it goes back to calling time(NULL) 40,000 times/sec for another few seconds, then repeats.

Now, I don't think strace follows threads, so it is possible that something else is going on here in another thread (and that's what's causing the kernel to execute). However, from the time interval between the open() -- close() calls, it "feels" like the culprit is the list posted above. Probably either the reads, the mmap, or the brk() calls, is my thinking ATM.

If this time corresponds to you reading in the contents of blpotential.txt all at once (based on the fstat64() call), then I wouldn't think that'd be it, unless blpotential.txt was gigantic, and it's not. So I'm thinking that it's probably another thread... in which case I have no way of seeing it anyway, I don't think. Crud.

Don't suppose you could get rid of those time() calls, could you? Or make them happen less often (sleep(1)

).

**Welnic** · 08-15-2003, 02:30 AM

Try running with -g 25 and see if that changes anything.

**Brian the Fist** · 08-15-2003, 04:38 PM

Not sure at the moment where the time() calls are coming from but Ill take a look later and if its not necessary, make it less frequent.

**bwkaz** · 08-15-2003, 06:38 PM

Welnic -- I'm not sure, but I don't think it would be related to the -g setting. It appears to do it after every structure in gen 0, and once per generation after that, not once per 5 structures. I am running with the default -g value (5). Unless you have another reason for suggesting that?

Howard -- thanks.

Would you happen to know of any way to strace one of the other threads? Or do you think all threads were getting traced? (I would think a fairly easy way to find out would be to check which one (or ones) look at blpotential.txt -- if it's just the first one, the one that starts the others up and sleeps, then that was all it was tracing, but if any of the others check the file, then it was probably getting all of them).

**Brian the Fist** · 08-18-2003, 10:47 AM

Ok, I checked and that is a 5 second delay loop. I can probably replace it with a sleep(5) for the same purpose, though I doubt it makes much difference in practice other than making strace prettier. Anyhow Ill change it for style's sake.

**rsbriggs** · 08-18-2003, 11:11 AM

Ummm....

Howard, actually there is a BIG difference between doing a sleep(5) :sleepy: and doing a 5 second spin in a hard loop

that makes a system call and burns CPU time...

NB: there is also a major difference between doing a "sleep" and doing a "thread sleep"...

**Paratima** · 08-18-2003, 12:14 PM

My good friend mr. rsbriggs has a point, but is perhaps a bit reticent to tell it like it is...

There is a HUGE difference!

**Welnic** · 08-18-2003, 02:53 PM

Originally posted by bwkaz
Welnic -- I'm not sure, but I don't think it would be related to the -g setting. It appears to do it after every structure in gen 0, and once per generation after that, not once per 5 structures. I am running with the default -g value (5). Unless you have another reason for suggesting that?
...

I just thought that you might have had -g 1 set since some people were doing that for better statistics during the later generations. I thought the writing to disk might show up as system time and that it would have a big effect during gen 0 since the calculations are so much faster than the later generations. But it sounds like you have the real cause tracked down.

**Brian the Fist** · 08-19-2003, 11:02 AM

Originally posted by rsbriggs
Ummm....

Howard, actually there is a BIG difference between doing a sleep(5) :sleepy: and doing a 5 second spin in a hard loop that makes a system call and burns CPU time...

NB: there is also a major difference between doing a "sleep" and doing a "thread sleep"...

Ok, Ill bite, what is a sleep vs. a thread sleep.
All I see is sleep() on UNIX or Sleep() on Winblows.

**bwkaz** · 08-19-2003, 06:13 PM

Well on Linux at least, there is no difference between a process and a thread. The sleep() syscall will suspend the current task (from the kernel, they're both tasks) for some number of seconds.

If you write a program that's multithreaded, you can't get all the threads to suspend themselves at the same time with one function like sleep(). You'd have to use some sort of inter-thread signalling mechanism (a barrier, probably, where all threads block until a preset number of them get there, then they all get released) to synchronize the threads first, then call sleep() in each one.

In Windows, there is a decent-sized difference between a process and a thread (threads don't get PIDs, AFAIK, and processes are much, much slower than threads to create). But Sleep() (the Win32 version anyway) still suspends only the current thread. From the MSDN docs on the Sleep() function:

The Sleep function suspends the execution of the current thread for at least the specified interval.

http://msdn.microsoft.com/library/de...base/sleep.asp

I also don't know of any way to suspend all threads in a process with Win32, other than the barrier -> Sleep() solution.

**m0ti** · 08-20-2003, 08:12 AM

As far as I know sleep always affects only the calling thread; it would be very strange (and probably result in very unpredictable behavior) to have a process sleep function, where all threads are put to sleep due to the result of a call that one of them made, without them going through synchronization (via a barrier as mentioned) first.

Thread: Minimizing energy (?)

Thread Tools

Rate This Thread

Display

Minimizing energy (?)

Posting Permissions