PDA

View Full Version : Filelist.txt Corrupted



PinHead
07-25-2003, 11:57 AM
Error Log:

========================[ Jul 24, 2003 12:31 AM ]========================
FATAL ERROR: [003.001] {foldtrajlite2.c, line 1247} filelist.txt has been corrupted, cannot continue - please delete it and try again


OK, I vaguely remember this from beta, but I thought it was a thing of the past.

Was this caused by a setting or is it just a random unlucky crash?

It's on a dual processor machine and the other client is clicking along just fine, so I know the machine itself hasn't crashed.

Darkness Productions
07-25-2003, 01:37 PM
IS there any way to have the client take care of things like this automagically, so that the user wouldn't have to babysit it? Just log the fact that filelist.txt was corrupt, remove it, and start over immediately?

Brian the Fist
07-25-2003, 06:26 PM
I do not believe anyone has reported this error before and it refers to an internal inconsistency in the program. If you count post the filelist.txt (xxxx out your handle first if you like) it would be very helpful and I could tell you more. It is not likely a random event but some unusual bad combination of events.

DP:

Anything that is marked as a fatal error is not recoverable and thus should never happen in theory. We need to know about these so they can get fixed. If it just deleted things and carried on, we might never know there was a problem, nor would the user. We have yet to see any of the fatal errors occur on our systems but they are likely caused by two things happening at once and interfering with each other in some way which is why they are rare and hard to reproduce, and fix.

bob.sherunkel
07-25-2003, 08:21 PM
Machine 1: W2k

========================[ Jul 25, 2003 12:31 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file .\xzty2hjd_0_protein_0006149.val; cannot continue - replace file and start again, or manually delete filelist.txt

Machine 2:W2k

========================[ Jul 24, 2003 4:14 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file xzty2hjd_0_protein_0002955.val; cannot continue - replace file and start again, or manually delete filelist.txt

========================[ Jul 25, 2003 8:42 AM ]========================
FATAL ERROR: [003.001] {foldtrajlite2.c, line 1247} filelist.txt has been corrupted, cannot continue - please delete it and try again

Machine 3: W2k

========================[ Jul 23, 2003 6:30 PM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file .\xzty2hjd_protein_7.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution .\xzty2hjd_protein_7, please create a new one

========================[ Jul 24, 2003 12:57 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file .\xzty2hjd_0_protein_0000272.val; cannot continue - replace file and start again, or manually delete filelist.txt

Machine 4: W2k

========================[ Jul 25, 2003 9:43 AM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_3.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_3, please create a new one

========================[ Jul 25, 2003 9:44 AM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_3.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_3, please create a new one

Machine 5: W2K

========================[ Jul 23, 2003 6:36 PM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file .\xzty2hjd_protein_3.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution .\xzty2hjd_protein_3, please create a new one

========================[ Jul 24, 2003 9:07 AM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file .\xzty2hjd_1_xzty2hjd_protein_11_0000022.val; cannot continue - replace file and start again, or manually delete filelist.txt

========================[ Jul 24, 2003 12:56 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file .\xzty2hjd_0_xzty2hjd_protein_1_0000018.val; cannot continue - replace file and start again, or manually delete filelist.txt

========================[ Jul 25, 2003 9:04 AM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file .\xzty2hjd_protein_2.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution .\xzty2hjd_protein_2, please create a new one

Machine 6: W2k

========================[ Jul 25, 2003 9:19 AM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_10.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_10, please create a new one

Machine 7: NT4

========================[ Jul 25, 2003 9:19 AM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_7.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_7, please create a new one

========================[ Jul 25, 2003 9:20 AM ]========================

========================[ Jul 25, 2003 7:36 PM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_2.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_2, please create a new one

========================[ Jul 25, 2003 7:37 PM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file xzty2hjd_protein_2.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution xzty2hjd_protein_2, please create a new one

Machine 8: NT4

========================[ Jul 24, 2003 12:38 PM ]========================
ERROR: [001.001] {trajtools.c, line 3507} Unable to open trajectory distribution file .\xzty2hjd_protein_1.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 5262} Unable to read trajectory distribution .\xzty2hjd_protein_1, please create a new one

Bob "Rare Fatal Error" Sherunkel

PinHead
07-25-2003, 08:43 PM
Here is the filelist.txt :

CurrentStruc 0 31 127 34 1 22 9.718 -2236.183 -855.503 -932.116 78047016.000 1.450 2.700 1337.565 ---------------------HHHHHHH-HHHH-----------------------HHHHH----------HHHH----------HHHHH------
6819d8128c22c439d8afec470f76e424

Comparing it to another client to figure out where my handle might have been in the file, I see it lost track of the .val and .log.bz2 best structure file.

/shrug

anyway I kept a copy of the directory immediately after hitting enter to clear the screen error message. Also, that was the only entry in the error log.

PinHead
07-25-2003, 09:00 PM
bob.sherunkel

You may want to check the size of your foldtrajlite.exe file. There were many know problems with the first few attempts at the update.

Judging by the number of errors and the dates and times, I would guess that you are running the original update that was built using .net.

foldtrajlite.exe should be 3,026,944 bytes in size or 2.88 MB or 2.9 MB depending on how your OS displays it.

bob.sherunkel
07-25-2003, 09:13 PM
Originally posted by PinHead
bob.sherunkel

You may want to check the size of your foldtrajlite.exe file. There were many know problems with the first few attempts at the update.

Judging by the number of errors and the dates and times, I would guess that you are running the original update that was built using .net.

foldtrajlite.exe should be 3,026,944 bytes in size or 2.88 MB or 2.9 MB depending on how your OS displays it.

At approx 1400 EST on the day the update came out, I started phasing in the 1:09 executable update. I continued having problems with the 2.88 Mb version, and any subsequent failures, removed the old installation and used the 1:43 complete client install. I continue to have the [001.001] errors ...

Bob.NOT Sherunkel.NET

PinHead
07-25-2003, 09:51 PM
So you are using the correct build and still getting errors.

/sigh

PinHead
07-25-2003, 10:21 PM
Damnit bob.sherunkel

Now you gave it to me! Just mentioning in my thread has caused it to spread.

NT4 Server:

First the Dr. Watson message.
Clean and clear .lock, restart and then:

FATAL ERROR: [002.003] Unable to read trajectory distribution xxxxxxxx_protein_21, please create a new one.

^7_of_9
07-26-2003, 12:37 AM
This is the new client on a dual AMD MP 1900+ system. Windows 2000 Server with 512MB of DDR. Using DFGUI.



Error.log:



========================[ Jul 23, 2003 1:01 AM ]========================

========================[ Jul 23, 2003 1:01 AM ]========================

========================[ Jul 23, 2003 10:21 PM ]========================

========================[ Jul 24, 2003 9:32 AM ]========================

========================[ Jul 24, 2003 11:51 PM ]========================

========================[ Jul 25, 2003 3:08 AM ]========================

========================[ Jul 25, 2003 5:33 AM ]========================

========================[ Jul 25, 2003 7:25 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file .\3w6mtm7n_0_3w6mtm7n_protein_182_0000002.val; cannot continue - replace file and start again, or manually delete filelist.txt

========================[ Jul 25, 2003 11:57 PM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file 3w6mtm7n_0_3w6mtm7n_protein_182_0000002.val; cannot continue - replace file and start again, or manually delete filelist.txt

========================[ Jul 26, 2003 12:20 AM ]========================
FATAL ERROR: [000.000] {foldtrajlite2.c, line 3458} Unable to find file 3w6mtm7n_0_3w6mtm7n_protein_182_0000002.val; cannot continue - replace file and start again, or manually delete filelist.txt




filelist.txt:

fold_0_3w6mtm7n_5680_protein.log.bz2
3w6mtm7n_0_protein_0005688.val
.\fold_0_3w6mtm7n_0_3w6mtm7n_protein_1.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_1_0000003.val
.\fold_0_3w6mtm7n_40_3w6mtm7n_protein_2.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_2_0000044.val
.\fold_0_3w6mtm7n_0_3w6mtm7n_protein_3.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_3_0000003.val
.
.
.
.
.
.
.\fold_0_3w6mtm7n_20_3w6mtm7n_protein_179.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_179_0000039.val
.\fold_0_3w6mtm7n_40_3w6mtm7n_protein_180.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_180_0000047.val
.\fold_0_3w6mtm7n_40_3w6mtm7n_protein_181.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_181_0000042.val
.\fold_0_3w6mtm7n_0_3w6mtm7n_protein_182.log.bz2
.\3w6mtm7n_0_3w6mtm7n_protein_182_0000002.val
CurrentStruc 0 51 127 182 0 2 5.600 -1689.249 283.823 -772.794 42277736.000 1.350 2.500 1011.385 -HHHH--------------HHHHH-------HHHHHHHHH----------------HHHHHHH------------------------HHHHH----
cdef0f2a4b0c3529be2dd1bbe0444e2c



I've got the whole directory saved and available on my Web server if Howard (or StarDragon) wants a copy. Just drop me a line at seven.of.nine@NOSPAM.cuic.ca (You know what ya gotta remove :D ) and I'll give ya the addy.

Brian the Fist
07-26-2003, 03:27 PM
Attn: bob.


It is clear that you are doing something different which is causing those errors to appear. Those 3 (or 4?) errors have been in the client since its initial phase 2 release, however, try as we might, we cannot get a single one to occur on our systems (mostly W2K). Can you think of any 'unusual' programs you are running that might interefere with the client - virus scanners, firewalls, other memory-resident programs? and how do you start and stop the client - manually, or is it a service? We cannot fix these errors until we can get them on our own machines and Im literally banging my head on the wall :bang: trying to figure out what you guys are doing different. It may be helpful to us if you could tar up one example directory for each distinct error type and e-mail them (in separate messages in case they bounce due to size) to us at trades@mshri.on.ca, or alternatively put them on a web site or FTP site somewhere we can grab them from. This may at least help us to debug these remaining issues.

Thanks.

PinHead
07-27-2003, 10:16 AM
I have had the filelist.txt has become corrupt 2 more times now.
Once on the same machine as listed above with the filelist.txt missing the 2 file entries.

It occured on a new machine now, but the filelist.txt does not exist. Only filelist.txt.tmp exists.

I have overwritten both directories with clean clients to see if this will fix the problem. During the overwrite, only 2 files appeared to be different. Both were database.* files and it struck me odd that the new files were bigger than the files that were in use.

Brian the Fist
07-28-2003, 12:08 AM
when you get this or a similar error, please do NOT erase the directory but let us have a look at it. The more I see the fatser I can fix it. If possible stick it on a ftp or web site I can access, or a Yahoo Briefcase for example. As a last resort, e-mail to trades@mshri.on.ca (but preferably ask first so we dont get an influx of 10MB e-mails!!!)

Help us help you ;)

bob.sherunkel
07-28-2003, 08:45 AM
Originally posted by Brian the Fist
Attn: bob.


It is clear that you are doing something different which is causing those errors to appear. Those 3 (or 4?) errors have been in the client since its initial phase 2 release, however, try as we might, we cannot get a single one to occur on our systems (mostly W2K). Can you think of any 'unusual' programs you are running that might interefere with the client - virus scanners, firewalls, other memory-resident programs? and how do you start and stop the client - manually, or is it a service? We cannot fix these errors until we can get them on our own machines and Im literally banging my head on the wall :bang: trying to figure out what you guys are doing different. It may be helpful to us if you could tar up one example directory for each distinct error type and e-mail them (in separate messages in case they bounce due to size) to us at trades@mshri.on.ca, or alternatively put them on a web site or FTP site somewhere we can grab them from. This may at least help us to debug these remaining issues.

Thanks.

I can give you 12 directories of failed clients, running NT4 or W2k as they all stopped sometime this weekend. All machines are on a domain, are monitored by UniversalDP, and have most programs and services shutdown. The only thing unique to all these failing workstations is they are running Sophos AntiVirus version 3.71.

Let me know if you want any of those directories, though I don't know where to put them ...

Bob "Stop in the name of ..." Sherunkel

PinHead
07-28-2003, 01:05 PM
Howard,

Here are 3 zips of filelist.txt directories where filelist.txt has become corrupt. files (http://www.pcliquid.com/howard)

I tried the following yesterday:

memtest86 came up clean.

flip flopped folders to processors. This is a dual processor on win2k, so I tried folder 2 on processor 1 and folder 1 on processor 2. folder 2 still had filelist.txt corrupt.

On one occasion, filelist.txt did not corrupt, but the client lost track of one of the generations so the work just queued up.

PinHead
07-28-2003, 05:29 PM
What would happen on a slower machine if a monitor program opened filelist.txt for read only and then while it was open the client renamed filelist.txt to filelist.txt.tmp?

PinHead
07-28-2003, 08:35 PM
OK this is officially driving me nuts!

Clearing the temp directory gets the client past gen 0 and the filelist.txt is corrupt error.

But it still loses track of filelist.txt and keeps filelist.txt.tmp and then comes up with a missing gen error. Probably the one that was in filelist.txt.tmp.

Right now it is queueing up gen's that it will never upload.

PinHead
07-29-2003, 10:02 AM
Both demon boxes seem to now be operational!! :cheers:

Demon 1:
Filelist.txt would corrupt everytime in gen 0

solution - clear temp dir and remove copies of corrupt directories that begin with "distribfold". As in distribfold_cor1.


Demon 2:
Client would lose track of filelist.txt and get 1 of the following 3 errors:
filelist.txt is corrupt
missing previous generation
unable to read the trajectory file

solution - my monitoring program was reading filelist.txt at the same time the client was writing. Turned off the monitor and it has now made it thru gen 3.

Brian the Fist
07-30-2003, 09:07 PM
What monitor program were you using? This may account for some of the problems people have been having and we may be able to remedy it somewhat (though I don't think it can ever be 100% eliminated.

Chinasaur
07-30-2003, 10:42 PM
Howard,

Sorry..didn't know bout uploading the dir.

Running Knoppix on dual XP1800. I heard a CD spin up which was unusual.

Checked machine and one instance of DF had just died. No error msg in error.log. Nothing. Tried to restart and got "missing file" in flilelist.txt... if it happens again I'll send the dir.. Course I ignorantly tried to modify the filelist.txt file and got "corrupted" msg...didn't know you had protected the filelist.txt file since Phase 1 as I haven't had this problem before. Live and learn.

Running nothing but Knoppix with Read/Write on a Win98 disk. 512MB RAM.

I can still upload it even tho I restarted it if that would be helpful?

PinHead
07-30-2003, 10:59 PM
The monitor program is one that I am still in the middle of writting and the only other person to have a copy received it sometime around 3pm today.

Taking the monitor program off of that particular didn't stop the error, it only slowed the occurance.

I have watched the error, forgive the pun, unfold several times over the last few days and here is what little I have gained from watching.

1.) If you can ever clear gen 0 without the filelist.txt file disappearing, the errors almost never happens after that.

2.) Sometimes when a better fold is found filelist.txt.tmp is created, but filelist.txt is not recreated.

3.) The next time a better fold is found 1 of 2 things happens. Either Filelist.txt is recreated in with one set of files. In which case you eventually get the missing gen error. Or filelist.txt is recreated on the 2 best fold after it went missing and you get a filelist.txt with no file entries in it.

4.) Once the error occurs, the tmp files must be cleared or else you either get the filelist.txt corrupt error message or the unable to read the trajectory error.

5) It almost never happens the same twice, but always begins with filelist.txt.tmp being created and filelist.txt is nowhere in site.


Today just because! And because my monitor alarms when there is any problem reading filelist.txt. I caught the alarm and found filelist.txt.tmp and no filelist.txt. I figured what the heck, it hasn't made it past gen 0 in 3 days; so I copied filelist.txt.tmp to filelist.txt while the client was running. She kept on chugging and is now up to gen 35 with no errors and no queued data.

None of this means anything, it's just what I saw!!

MgKnight
08-04-2003, 12:10 PM
Said problem has occurred on my XP220. The machine is not net connected and had been running without problem as of Friday afternoon. At 12:25 yesterday afternoon the program stopped. This morning I got the "tampered with filelist.txt" message when attempting to restart/upload. The directory is intact, with 170 completed generations, and my only recourse will be to email it tonight via my cable, dialup here just will not cut it.

I attempted to look at the filelist.txt file but when it opens in notepad it is a blank file, but is 17KB is size in the directory.

Awaiting instructions and praying I do not lose all this work... :confused:

IronBits
08-04-2003, 12:59 PM
If it's Windows based, wait no longer :)
http://www.free-dc.org/forum/showthread.php?threadid=3806
Get the new TEST client ;)

PinHead
08-04-2003, 01:44 PM
Got it about 2 hours ago, looks pretty good so far!