PDA

View Full Version : IRIX 32-bit problem



showdown
06-17-2003, 05:06 PM
(w00t! first phase II bug(?) report)

The new client stops quite often (every 15-20 gen.) with this error:

[foldtrajlite] FATAL ERROR: [001.001] sin(phi) was zero - hit return

And I have to restart it.

Is it just me, or are anyone else experiencing this?

Brian the Fist
06-17-2003, 06:10 PM
Please post the full msg - there should be a line number with the message in the error.log I think.

It sounds like an FPU problem. What is the exact hardware you get this on? Thanks

showdown
06-17-2003, 06:23 PM
This is the full output:

========================[ Jun 17, 2003 10:42 PM ]========================
FATAL ERROR: [001.001] {charmm.c, line 2909} sin(phi) was zero

========================[ Jun 17, 2003 10:47 PM ]========================
FATAL ERROR: [001.001] {charmm.c, line 2909} sin(phi) was zero

========================[ Jun 17, 2003 10:53 PM ]========================
FATAL ERROR: [001.001] {charmm.c, line 2909} sin(phi) was zero

========================[ Jun 17, 2003 11:13 PM ]========================
FATAL ERROR: [001.001] {charmm.c, line 2909} sin(phi) was zero



The hardware is an Indy R5000PC 150MHz (it's slow, I know :) ) I have 160MB RAM, so I don't use the Extra RAM switch.

showdown
06-20-2003, 12:45 PM
A little more info: I noticed that the error occurs when it starts with the "calculating energy" phase.

Brian the Fist
06-21-2003, 11:34 AM
That is very interesting because it is theoretically impossible. Does this occur on only once machine? Can anyone else verify this error message? If you are the only one getting it, I would be suspect of your hardware, perhaps the FPU...

bwkaz
06-21-2003, 12:57 PM
Howard, is this related to the bugreport that someone (not me) had a long time ago, where you said something about some point was getting calculated as being right at the top (AKA north pole) of some sphere?

Well, hang on, no it's not. The sine of an angle is the y-coordinate of the point on the unit circle; if that's 0, then that means this point is right on the equator of that sphere (if it's even the same sphere, or the same point, or whatever).

Never mind... ;)

showdown
07-03-2003, 10:42 AM
*Bump*

Does anyone have a 32-bit Irix box they can test with? I only have the one Indy...

MrMr
07-04-2003, 11:37 AM
Just downloaded the client for an antique Indy.
I will post results when and if they come in

MrMr
07-04-2003, 11:40 AM
Yes, that was quick (but this is a real racing machine: 180MHz R5k)
On structure 6 generation 0:
[foldtrajlite] FATAL ERROR: ... sin(phi) was zero

(this box is running Irix 6.5.18m)

showdown
07-05-2003, 10:21 AM
Thanks a lot!

A least it's not just me. Could it be the code, or perhaps something strange with the R5000?

showdown
07-06-2003, 09:37 AM
I've got one more report of the same, Veerappan from the Ars forum was kind enough to test the 32-bit client on an R12000 CPU and got the exact same error:

========================[ Jul 5, 2003 12:55 PM ]========================
FATAL ERROR: [001.001] {charmm.c, line 2909} sin(phi) was zero

showdown
07-10-2003, 11:17 AM
I also got this tip from Coredog64:

I2 with an R4400 gets the same results.

My guess is that it's an issue with the code and the implicit "o32" ABI that's being used. They should probably produce two 32 bit binaries. One that's "o32" for really old machines (or pre 6.2 Irix) and one that's "n32" for recent 32 bit and hybrid 32 bit/64 bit machines (i.e. O2s with R10K and above CPUs).

Besides, wouldn't -mips3 perform better on an R4k than mips1? Now that I think about it, they probably need 3 32 bit clients. One "o32" and two "n32": One for mips3 and one for mips4.
Or just drop support for anything less than an R4k and be done with it...

MrMr
07-10-2003, 01:17 PM
Originally posted by showdown
I also got this tip from Coredog64:
Otoh is the 64bit client useful for such a tiny program? perhaps just o32 and n32 clients are enough. I don't think the Charmm code runs any faster on 64 bits, and I expect that to be the most time consuming part. (Assuming of course the memory leakage doesn't cross the 4Gig ;) )

Brian the Fist
07-10-2003, 10:19 PM
IF it is on multiple machines, it is clearly a compiler thing. This could possibly be related to the Tru64 FPE problem as well. I will test it out on our Ancient SGI and if I can reproduce it I may be able to figure it out in the debugger. We do not officially support any machines slower than 233 Mhz though :D