More Bugs. I woke up to find one of my nodes not crunching, and the following in the error.log :
FATAL ERROR: [023.024] {trajtools.c, line 2492} RLEUnPack failed, size=2, should=400*0 - likely this is caused by overclocked or faulty RAM chips, please test your RAM
Note that none of my nodes are overclocked, and all have passed memtest86.
The switches were: -rt -if -qt -p0 -g0
This node was using the +=185 realtime clock acceleration (for a 2 second timeout).
(During beta3 I a similar error on a different node, but also using the +=185 acceleration. As I recall, that previous time occurred during the minimize.)
Sometime after the bug hit, my automatic upload script woke up. It removes the f*.lock file and waits for the client on the node to exit. In the morning it was still waiting. I suppose the node had put an error message on the screen and was waiting for keyboard input, but it had no monitor or keyboard so I couldn't tell. The easiest thing to do was to just reset the node.
Then I got hit by a second bug. The error.log contained:
ERROR: [001.001] {trajtools.c, line 3465} Unable to open trajectory distribution file <handle>_protein_239.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 4237} Unable to read trajectory distribution, please create a new one
Indeed that file wasn't there. (How do you create a new one?) There was a file <handle>_protein_240.trj though.
Apparently the reset left the work files in an inconsistent state, and the client was unable to recover. I consider that a bug.
I made a backup of the directory, then I renamed <handle>_protein_240.trj to <handle>_protein_239.trj and the client was happy.
BTW, the structure being crunched was the 5.18 one on my "AMD beta test account".