Trouble trying to get diskless node to work. [Archive]

View Full Version : Trouble trying to get diskless node to work.

Welnic

02-07-2003, 11:21 AM

I am struggling to get a diskless node to work using Debian Linux. I am pretty clueless about the whole thing but I have gotten to the point where the node boots and I can access the server hard drive. I have a distribfold directory in the node's home but when I try to run I get this error:

========================[ Feb 6, 2003 3:30 PM ]========================
FATAL ERROR: CoreLib [002.005] {ncbifile.c, line 689} File write error

I am guessing that it is trying to write a file in /var/temp or something like that. As a test I was able to a run a distributed.net client, but that seems to be self contained in one folder. I need to know either what directory that I need to make sure that it can write to or how to change where it writes.

Scoofy12

02-07-2003, 01:08 PM

you could try setting an environment var DFPTEMP (as howard actually just mentioned in the other active thread in this forum) to somwhere writable and see if that helps.

AMD_is_logical

02-07-2003, 02:07 PM

The client puts files in the /tmp directory. It's about 10MB worth. The ramdisk root dir on my nodes is 16MB, and that includes the /tmp directory, and so far I haven't had a problem.

AMD_is_logical

02-07-2003, 03:49 PM

Another thing, if you're making a cluster there is a "gotcha" in the way the client seeds the random number generator. The client uses the pid, which will be identical for identical nodes started at the same time, and the time (with one second resolution), which can be the same if the client is started at the same time on both. If the seeds are the same, then the results will be identical, and the dublicate results will be rejected.

PinHead

02-08-2003, 01:09 AM

Try

export TMPDIR=/home/username

before you start the client. (Where username is a variable)

Had a similar situation with Linux and LTSP!

Also, if this is a headless client, make sure to set the -qt switch on the client.

Brian the Fist

02-08-2003, 11:08 AM

Originally posted by AMD_is_logical
Another thing, if you're making a cluster there is a "gotcha" in the way the client seeds the random number generator. The client uses the pid, which will be identical for identical nodes started at the same time, and the time (with one second resolution), which can be the same if the client is started at the same time on both. If the seeds are the same, then the results will be identical, and the dublicate results will be rejected.

Yes. I believe someone made the suggestion of using IP and/or MAC address to further randomize the seed for such cases. Does anyone have any idea how to access either of these in a relatively platform-independent way using standard C calls?

bwkaz

02-08-2003, 12:14 PM

I don't know of any way inside the C library to do it, no. But if your Windows implementation has BSD sockets capability, you could call gethostbyaddr("127.0.0.1", whatever, whatever) to get a hostent struct filled out with info on localhost. One of the fields in hostent is for aliases, and one of those should be the IP address. AFAIK almost all *nix-like OSes have BSD sockets... right?

I think there's a Winsock header you can include to get most of the BSD functions, as well. ws2_32.h or something similar?

If you wanted to go with the MAC address... I have no idea short of talking to the lowlevel NIC driver. Obviously that's not platform-independent.

I realize all of this is a long shot, though...

Welnic

02-08-2003, 12:42 PM

Thanks for the help, guys. export TMPDIR=/home /username works. I didn't get a chance to try to figure out how to make the /tmp writeable, and changing the TMPDIR variable seems easier so I will probably just go with that. :D

I think that the tcp/ip numbers would be better than the MAC address, especially if they are easier to get. While the MAC address is guaranteed to be unique if you buy a bunch of NICs at the same time (like I just did) they can have consecutive numbers. Then if the time that they booted was off by a second you could have a duplication. Where you can just give out the tcp/ip numbers in increments of 10 or even increment the second set (or whatever you call the little groups separated by dots) from the right.

IronBits

02-08-2003, 12:46 PM

Originally posted by Brian the Fist
Yes. I believe someone made the suggestion of using IP and/or MAC address to further randomize the seed for such cases. Does anyone have any idea how to access either of these in a relatively platform-independent way using standard C calls? Does this help? :D
http://tangentsoft.net/wskfaq/examples/getmac-snmp.html
AIX
/bin/netstat -v or /bin/entstat en0 -- It's listed next to Hardware Address.
use "netstat -i" which gives you a list of all configured interfaces
Assuming that you're trying to get the source MAC address from the machine on which your program will run, I found this bit of code at http://www.linuxrouter.org, which should do the trick:

#include <net/if.h> /* contains the struct ifreq definition */
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <linux/if_ether.h> /* out of habit, I use a SOCK_PACKET socket
*/

int main()
{
unsigned char *ch;
int i;

int fd;
struct ifreq ifr;

fd = socket(PF_INET, SOCK_PACKET, htons(ETH_P_ALL)); /* open socket */

strcpy(ifr.ifr_name, "eth0"); /* assuming we want eth0 */

ioctl(fd, SIOCGIFHWADDR, &ifr); /* retrieve MAC address */

/*
* at this point, the interface's HW address is in
* 6 bytes in ifr.ifr_hwaddr.sa_data. You can copy them
* out with memcpy() or something similar. You'll have to
* translate the bytes into a readable hex notation.

* this is done below with a %x format of printf, and a for loop to
* add the traditional colons in the appropriate places
*/

ch = ifr.ifr_hwaddr.sa_data;

for ( i = 0; i <= 5; i++ )
{
printf( "%02hX", *ch );
if ( i != 5 )
printf( ":" );
else printf( "\n" );
ch++;
}

close(fd);
}

Running this gets you output like:
00:90:27:D0:89:FF
Which is a 3Com nic... Note that running this as a regular user yields :

04:08:54:FB:FF:BF
Which is not really anything...

http://techsupt.winbatch.com/TS/T000005061F5.html
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=391&lngWId=10
http://www.dotnet247.com/247reference/msgs/4/21219.aspx
http://www.codeproject.com/internet/getmac.asp
http://bdn.borland.com/article/0,1410,26040,00.html
http://www.codeguru.com/network/GetMAC.html (scroll down 1/2 way)
http://cplus.kompf.de/macaddr.html
http://www.ndis.com/faq/QA01030302.htm
http://www.coders.eu.org/manualy/win/wskfaq/examples/getmac-netbios.html
http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q118623
http://www-hlab.iis.u-tokyo.ac.jp/~maoxc/vb/8995.htm (scroll down past non-english content)
I'm not a programmer, but I hope some of the above helps you reach your goal. Using the MAC is the single most bestest way to ensure you have a unique ID ;) IMHO of course.

Darkness Productions

02-08-2003, 12:50 PM

I think using the MAC address would be best, simply because no two in the world are supposed to be alike. This should definitely keep problems with diskless clusters from happening...

AMD_is_logical

02-08-2003, 01:17 PM

If getting the MAC or IP in a platform independent way turns out to be too hard, you could just add a switch that passes an integer to the client. The seed would then be hashed by that integer. Then anyone with a cluster could come up with their own way of insuring that the client on each node was given a different integer using that switch.

Paratima

02-08-2003, 03:52 PM

Or, since all God's chillun got to have a unique IP, assigned or DHCP'd, make the last operation before starting the client be to get the IP and sleep the yyy seconds in xxx.xxx.xxx.yyy.

AMD_is_logical

02-08-2003, 05:52 PM

Originally posted by Paratima
Or, since all God's chillun got to have a unique IP, assigned or DHCP'd, make the last operation before starting the client be to get the IP and sleep the yyy seconds in xxx.xxx.xxx.yyy. There are three problems with the sleep idea.

First, cluster nodes often boot using their own local clocks, which drift over time. Thus, sleeping varying amounts is as likely to cause two clients to start at the same time (according to their local clocks) as it is to prevent it.

Second, some clusters don't use TCP/IP, so this doesn't solve the problem in general. (In fact, there may be some clusters that don't use standard network cards at all.)

Third, even with a small cluster I am adverse to the idea of having nodes sleeping when they could be working, even for a short time. For a really large cluster, quite a bit of work would be lost.

A command line switch would be easy to do, and could always be made to work. Each node would be using its own DF directory anyway, so each directory could have a startup script that passes a unique integer to the client. Or each directory could have a subdirectory with a name like "123" (different for each directory), and a common script could pass 1* to the client, which the shell would expand to "123" (or whatever the name was in that directory). (Making 123 a subdirectory would allow it to survive "rm *", which is convenient when doing protein changeovers.)

bwkaz

02-08-2003, 05:55 PM

The other thing, Paratima, is that "all God's chillun" don't have a unique IP address. ;) I'm sure there are thousands of computers in the world that are 192.168.0.1. If the computer is NAT'ed or proxied or otherwise on a private subnet, that won't work.

MAC address, OTOH, would. As would passing a command-line arg, assuming the user is intelligent.

IronBits

02-08-2003, 07:25 PM

Originally posted by bwkaz
user is intelligent. Oxymoron alert! :jester: :D /ducking

bwkaz

02-08-2003, 07:31 PM

:D

Which is why MAC address would be more foolproof.

Then again, on a sneakernet setup, there may be no NIC, so then that wouldn't work either.

:-/

IronBits

02-08-2003, 07:47 PM

Well, if that's the case, and, it was Intel, we could get the processor ID ... :rolleyes: :crazy:

Welnic

02-09-2003, 01:26 AM

It seems to me that having a file with an integer in it in the distribfold directory would work well. It could just be 0 for people with separate machines. People with several nodes that all start at the same time could put different values for every node.

And the people that think they get their best results just after doing a fresh install could try to figure out the lucky number that will crank out that low RMS result. Everbody would be happy. :D

Brian the Fist

02-09-2003, 11:13 AM

I did actually originally have a switch to let the user choose the random seed, way back. However, this can lead to other problems, such as people deliberately duplicating data. (Yes, you don't get credit for that but someone is bound to do it anyways, for one reason or another...). Anyhow the MAC address seems like the best thing to go with since IPs can easily be repeated for example with 192.168.*.* (and if it has no NIC, it'll just default to the way it is done now). The only problem I see with IronBits code there is it assumes the interface is called 'eth0', which of course is a Linux naming convention.

So this begs the question, does anyone know a platform independent way to get the name of the primary NIC interface? :)

AMD_is_logical

02-09-2003, 12:47 PM

Originally posted by Brian the Fist
I did actually originally have a switch to let the user choose the random seed, way back. Just in case I wasn't clear, the integer I'm suggesting would be used in combination with the pid and time to create the seed. If the cluster is restarted, the clients would be given the same integers that they had before, but since the startup times are different the seeds would end up being different.

Further, the numbers would not be combined by simple addition, as that would mean a node with a one greater integer started one second earlier would end up with the same seed. The time, PID, and the integer should be hashed together in such a way that a different integer will produce a completely different seed.

It wouldn't matter to me if the integer was passed to the client with a switch or a file (as Welnic suggested). Either way would be easy for me to set up on my (currently 6-node) cluster.

Using the MAC would work for me, but I have no idea how to get the MAC in a platform-independant way.

Darkness Productions

02-09-2003, 01:03 PM

No, but (and I know this isn't an optimal solution, but a temp workaround), you could make the user put it in a text file, (the name of the device) then you could scan that. However, I think if you decide to do that, you should probably make a change so that all the config options are in 1 file. Keeping up with 3 or 4 (handle, proxy file, foldtraj_fileserver, plus this new one) gets to be a hassle. Maybe one main config file, setup kina like:

handle=xxxxxxxxx
foldtraj_fileserver=http://blah.com//distribfold
proxyserver=blah

and so on. Just my $.02

Originally posted by Brian the Fist
So this begs the question, does anyone know a platform independent way to get the name of the primary NIC interface? :)

bwkaz

02-10-2003, 02:29 PM

Heh, here's one way around it.

I was reading on dslreports.com the other day where it's possible to guess at how many machines are behind a NAT firewall, using the ID field in the IP header. The way around it is for the NAT box to rewrite ID fields pseudo-randomly.

The grsecurity set of Linux kernel patches does this, along with a lot of other things.

So I was perusing what else grsecurity has available, and found an interesting kernel option that the patchset adds:

CONFIG_GRKERNSEC_RANDPID:

If you say Y here, all PIDs created on the system will be pseudo-randomly generated. This is extremely effective along with the /proc restrictions to disallow an attacker from guessing pids of daemons, etc. PIDs are also used in some cases as part of a naming system for temporary files, so this option would keep those filenames from being predicted as well. We also use code to make sure that PID numbers aren't reused too soon. If the sysctl option is enabled, a sysctl option with name "rand_pids" is created.

Sounds like this might be useful if you can rebuild a kernel on each of your cluster's nodes. It depends on how random the generator is, though.

PinHead

02-24-2003, 10:51 PM

Originally posted by Brian the Fist
So this begs the question, does anyone know a platform independent way to get the name of the primary NIC interface? :) [/B]

It's not 100% OS independant, but it looks like most programmers either poke around in the arp cache
(arp -a for windows or /sbin/arp -a for linux). This seems to be a more universal solution since most all tcpip connections will have this.

Or they use ioctrl.h and sockets to determine the MAC address. This method seems much less OS independant.

Anyway, that the results of my brief research. Not thorough by any means.