PDA

View Full Version : Problems on diskless computers with /tmp



Syonyk
06-19-2003, 06:38 PM
Wow. That's an INSANE number of animated smilies.

*5 minutes of watching the smilies later*

I'm trying to run Folding on a bunch of diskless cluster servers. I've got permission to do so (since they're basically my boxes), etc.

However, the client won't run. It bails out with an error:

ERROR: [010.000] {trajtools.c, line 2022} Could not open/create data file /tmp/fileTy0OvR
FATAL ERROR: [010.000] {trajtools.c, line 2024} Unable to continue

Normally, /tmp is symlinked to /home/tmp, which is writable. However, the root filesystem is readonly. I tried mounting a r/w NFS share on /tmp, and while I could create a file manually, the folding client still had problems.

Ideas/suggestions? I've got 30000+ of CPU power just idle right now :-)

-=Russ=-

bwkaz
06-19-2003, 07:19 PM
I'm pretty sure that /tmp (wherever it ends up) needs to be on the same filesystem as the DF client (apparently the client does a link()/unlink() when it moves the files in /tmp, instead of shelling out to the mv binary or something). If DF is installed in your home directory, then that should be good enough, though I'm not quite sure what the software is doing if/when it checks for that.

If it just does a naive check for /tmp in /etc/fstab, and also a check for whatever directories it's in, then that might cause issues, because /tmp won't be in there, but /home will be (...right? is that how it gets to be read-write?).

Maybe if you set TMPDIR (is that the variable's name? can't remember anymore, argh) to /home/tmp instead of using the default /tmp, might that help?

Welnic
06-19-2003, 07:35 PM
The following is an excerpt from my /etc/exports file:

# The lines between the 'LTS-begin' and the 'LTS-end' were added
# on: Mon Jan 27 03:50:53 PST 2003 by the ltsp installation script.
# For more information, visit the ltsp homepage
# at http://www.ltsp.org
#

#/opt/ltsp/i386 192.168.0.100(ro,no_root_squash)
/var/opt/ltsp/swapfiles 192.168.0.100(rw,no_root_squash)
/opt/ltsp/i386 192.168.0.100(rw)
#/var/opt/ltsp/swapfiles 192.168.0.100(rw)

# the following were entered by a clueless person on Feb 1
/farm/acre000 192.168.0.100(rw,no_root_squash,sync)
/farm/tmp000 192.168.0.100(rw,no_root_squash,sync)

The last two lines I added, one is the working directory and the other my tmp. The directory setup on my machine has a /farm with acre000 through acre003. All of those have a distribfold and a tmp. I don't really think this is the whole story as there is no tmp000 directory.

I used this article to set my rig up: http://www.extremeoverclocking.com/articles/howto/FAH_Diskless_Farm_1.html I know that there is one more file that I had to change from his setup to get it to use the right tmp, but I can't remember what at the moment. I'll think about it. It is definitely using the tmp directory in acre000.

Syonyk
06-19-2003, 07:44 PM
No love.

I set it to use /home/rgraves/tmp as it's temp directory... and it's now running out of /home/rgraves/71

Same error message, just with /home/rgraves/tmp instead of /tmp

That's certainly the same filesystem.

-=Russ=-

Welnic
06-19-2003, 07:51 PM
Originally posted by bwkaz

...
Maybe if you set TMPDIR (is that the variable's name? can't remember anymore, argh) to /home/tmp instead of using the default /tmp, might that help?

If I was running the diskless node with a keyboard I was able to set the TMPDIR variable and run that way. But I was never able to get this to work with a boot script.

Welnic
06-19-2003, 08:06 PM
I think I have it figured out. Right below the section in /opt/ltsp/i386/etc/rc.local where the Ramdisk is created I found these lines:

echo "Creating ramdisk, not"
/bin/mount -n -t nfs -o nolock 192.168.0.1:/farm/${HOSTNAME}/tmp /tmp

I have the Ramdisk lines all commented out. When this runs the first time everything is happy, after that there are complaints that directories and files that it tries to create are already there. This is watching the node boot up on a monitor. It doesn't seem to hurt anything though, and it does work.

Syonyk
06-19-2003, 08:31 PM
Even running it out of /tmp/71 doesn't work with TMPDIR set to /tmp...

Heck, it doesn't even work with TMPDIR set to . - it can't write to it's own directory. Yet it can write to it's own directory. How exactly is it handling file IO???

-=Russ=-

AMD_is_logical
06-19-2003, 10:44 PM
Originally posted by bwkaz
I'm pretty sure that /tmp (wherever it ends up) needs to be on the same filesystem as the DF client (apparently the client does a link()/unlink() when it moves the files in /tmp, instead of shelling out to the mv binary or something). I'm running the client on diskless nodes with the /tmp on a local ramdisk and the DF directory on a NFS. It runs fine.

I don't know why you're having problems writing in your /tmp directory. Perhaps there is a permission problem with you're /tmp ? Or maybe the file already exists in /tmp and was created by a different UID ? :confused:

Brian the Fist
06-20-2003, 12:38 PM
If your /tmp partition is an NFS one, you get this exact same error if it is NOT mounted with the nolock option. I'm not sure if this can directly apply to you, but for the NFS case, mounting the partition with 'nolock' fixed the problem..

Syonyk
06-20-2003, 02:47 PM
Originally posted by Brian the Fist
If your /tmp partition is an NFS one, you get this exact same error if it is NOT mounted with the nolock option. I'm not sure if this can directly apply to you, but for the NFS case, mounting the partition with 'nolock' fixed the problem..

That fixed it. Thanks! Time to enjoy cluster-love.

Now... care to explain *why* that fixed it? :-)

-=Russ=-

Brian the Fist
06-20-2003, 05:23 PM
Nope, no clue. But if there's something to be learned here, it's "always check the known bugs" page first :D