The database seems to have been down (not accepting uploads) since about 24:00 EST.
Time once again for some Draino/Liquid Plumber.
Login does not work - what's up?Error
Unable to Initialize Databases.
If the problem appears to be server related, contact the administrator for this site trades@mshri.on.ca
The database seems to have been down (not accepting uploads) since about 24:00 EST.
Time once again for some Draino/Liquid Plumber.
ANYONE THERE??
How about an update people?
Would like to do a or sixteen
Still not uploading...
no go, I'm switching all to offline now, at least it will save time from seeking the server.........right ?Originally posted by Juxu
Still not uploading...
That is probably the way to go, Hua Luo Han...
I guess they are only just waking up where the servers are !
Perhaps they are moving to Kansas in case Isabel decides to pay them a visit...
someone should have been in for at least a few hours now.
As we are all wondering what is going on, it would be nice to get an update, of any sort.
I tried to reach them via email, but have not received any reply yet.
I agree, some info on what's up would be nice at the moment. There was nothing in the DF project website "news" -section, either. :sleepy:
jack-knifed somewhere up the trail.
or the seal blubber generator ran out.
Folks -
A little calm here, please? Haven't you ever gotten to work and found things either broken or in complete disarray? If so, what was the first thing you did? Notify the users immediately? If that's first on your triage list, you aren't a very good sysadmin.
Be calm, fold offline, and I'm sure we'll hear within a few hours what's happened. Hell, if they just got there, I doubt they're finished even ASSESSING the damage/problem.
Well, if not the first thing, it's right up there.Originally posted by furballexpress
Folks -
A little calm here, please? Haven't you ever gotten to work and found things either broken or in complete disarray? If so, what was the first thing you did? Notify the users immediately? If that's first on your triage list, you aren't a very good sysadmin.
Assess problem
Call vendor if needed
Notify users
Get to work fixing problem
If those users aren't notified, I end up with every management type in the building with their noses pressed against the glass, making noises to get my attention.
So, it doesn't take a whole heck of a lot of time to post a simple message, like "The HP9000 rolled over and died - HP is due on-site in x hours" or "the database is corrupt , we're loading a backup now, and expect to be back online by xx:00 today."
It's simple respect for the users, and let's you get on with doing your job.
Hmm, not sure I agree with your definition of a good sysadmin. First course of action is to have a cup of coffee then ensure no more calls are received for the same problem by placing a message for all users to see that says you kwow about the problem and are working on it.
Users will tolerate almost any inconvenience as long as they are kept informed.
I am CALM and I also happento agree with Halo & Angus
I think it is wise to inform people that there's a problem and that somebody is looking into it. Otherwise you'll be flooded with questions & messages about the problem...it should only take about 2-3 mins to draft a quick message to the users to keep them happy.
But of course, I am not a good sysadmin, and actually not a sysadmin at all. (Lucky me?)
only foolin' with these hot under the collar Canadian types.
In my job we have 15 minutes to update the customer on a server based issue. Every 15 minutes after that they require an update to all members of the senior managemnt team.
Resolution or notification of what will resolve the issue is expected within the hour.
Then again these guys are paying top dollar for our services so they should expect this type of customer service.
We ain't paying jacques merde so one would postulate that is why we have heard nada from the powers that be.
Did you all also consider that maybe nobody has gotten in yet?
They might be in class or something to that effect. Not everybody can afford a system that will page you when a machine/database goes down.
Just stay calm, and hope that they get to working on it soon.
We haven't abandoned you, we are working on fixing the database backend, which appears to have been overloaded. Please fold offline for now, we will notify you as soon as everything is back to normal. Thank you for your patience.
We are also more than happy to receive donations of new hardware for the DB
Elena Garderman
Thanks!
Wonderful news, now we have knowledge and much more than just hope to look forward to !
What DB?Originally posted by Stardragon
We haven't abandoned you, we are working on fixing the database backend, which appears to have been overloaded. Please fold offline for now, we will notify you as soon as everything is back to normal. Thank you for your patience.
We are also more than happy to receive donations of new hardware for the DB
What hardware? Still an HP9000?
I'm trying to imagine overloading one of my L or N class servers.....
That's all we are looking for, thanks!Originally posted by Stardragon
We haven't abandoned you, we are working on fixing the database backend, which appears to have been overloaded. Please fold offline for now, we will notify you as soon as everything is back to normal. Thank you for your patience.
We are also more than happy to receive donations of new hardware for the DB
Folding 24/7 for a cause
www.hardCOREware.net
HCW DF Team!
So what exactly does "overloaded" mean? To much traffic? To much data????? Any data getting lost?Originally posted by Stardragon
We haven't abandoned you, we are working on fixing the database backend, which appears to have been overloaded. Please fold offline for now, we will notify you as soon as everything is back to normal. Thank you for your patience.
We are also more than happy to receive donations of new hardware for the DB
Crunchin D.F. for www.procooling.com
If those users aren't notified, I end up with every management type in the building with their noses pressed against the glass, making noises to get my attention.Users will tolerate almost any inconvenience as long as they are kept informed.I think it is wise to inform people that there's a problem and that somebody is looking into it. Otherwise you'll be flooded with questions & messages about the problem...it should only take about 2-3 mins to draft a quick message to the users to keep them happy.We ain't paying jacques merde so one would postulate that is why we have heard nada from the powers that be.
I guess my first reaction is "the project folks are Eastern Time Zone, so they're likely JUST getting there and finding out something's wrong. Do they 1) tell us something is wrong, they don't know what and they will get back to us when they have an idea and an estimated outage time (which, in reality, is contacting the user community TWICE at the beginning of/during a crisis outage, or 2) wait until I discover at least SOME information, THEN inform the users what we've found out and some approximation of outage window?"
I still go with #2. Why tell someone something's wrong but you're not sure what yet, especially on a project like this, where it's not a fairly "computer ignorant community" you're dealing with, it's a number of computer enthusiasts who might have not only experience with such problems but two cents they want to contribute? Why not assess for, well, around 15-30 minutes (which would be about the amount of time it took to get a response here, given they got to work around 9:30am ET), then give a more informative response?
I can see both sides, and I think waiting an hour or two, given all that can go wrong, isn't totally unreasonable.
But given the responses, is there any way to "standardize" the response from our fearless project leaders? Say, they find something amiss, blurb here that something's up within a half hour or hour, then get back to us later with more info? Conversely, we expect to hear something within the half hour/hour, and don't expect a full explanation after that until, say 4 hours has passed? For brevity's sake, let's attach some scheduling to that: we have to assume some work schedule, so this is only valid Monday to Friday from 8am-5pm ET? I would have to say we probably should get some notice, seeing as we're dedicating our time, systems and energy to the project. What form that notice should take, if standardized, isn't entirely mine to decide.
Discuss.
there is no SLA between howard and us, so give'em a break
i noticed i couldn't upload at 12:15 am local time, so i copied my stuff to my laptop and went home to bed, no skin off my nose. stuff happens
Use the right tool for the right job!
It is our policy to post a note on the News section of the website that there is a problem, once such a problem has been identified and been investigated for 10-15 minutes. If it is not something that is immediately obvious and can be trivially fixed basically, we will post such a message to notify the users. It is indeed only courteous to do so (and more importantly reduces all the pesky e-mails as mentioned above)
We normally monitor the forum daily, even on wkends so we should know if there's some sort of problem as long as someone posts here.
In this case, our might N9000 seems to have run out of memory. This has not happened before and we're not sure why at the moment. A simple reboot fixed it of course but it may happen again if we can't figure out whats going on exactly. In the mean time we'll keep an eye on memory usage for the next few days.
We may be able to fix up some of the general server/connection issues shortly too by doing some minor rewrites to how the server processes uploads, so stay tuned.
Howard Feldman
What kind of memory does that thing take?
Would it even help in these cases to give it more, or are you suspecting something like a leak, either in the DB sotware itself or in the way the web server talks to it? Because if the former, somebody might be willing to donate a bit; if the latter, then that won't help anyway.
"If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."
-- Originally posted by Paratima
Honestly I don't think more memory is the answer here. Unfortunately, we simply do not know what is really going on yet. Some sort of leak seems possible but I do not know why we never saw it in the past. More monitoring of the system is required before we can pinpoint the problem.
Howard Feldman
Howard,
Is there some advantage you get from using the (?) NCBI (?) socket library? As someone who has spent the last 20 years writing sockets code, I don't think it would be difficult to write plug-compatible replacement code, just given a description of the API....
And I'd be willing to bet that if platform independence is an issue, a post of that same code would probably provide more offers to port it to various platforms than you could ever actually support...
If you are at all interested, you could email me at
rsbriggs AT mailblocks DOT com
HEHEHE - now we know where the memory leaks in the client disappeared to - they decided to take up residence over in the server code
It's not that nice at all, that the database server stoppen working.
But fortunately we could be happy, if no data was lost, ain't we?
We use the NCBI socket library mainly for its platform independence, and also for convenience - we make use of other libraries in the NCBI toolkit that are more directly related to molecular structure as well.
The API calls are fairly standard and easily replaceable by a similar library probably, yes. They are fairly low level though allowing us to do things like write to talk to SMTP servers, and talk to proxy servers to handle authentications (though I suspect other libraries already support this without the need to manually implement it..)
Anyhow they are actively being worked on and improved by very knowledgeable people so I have faith that whatever kinks it may have will be ironed out in the next couple of releases. I/you have already helped identify and fix several issues with the lib on several platforms which they've fixed promptly.
Again, it IS open source so if you are a network guru, download the code and take a look - maybe you can find problems/fixes yourself? The full NCBI toolkit is available in various compression formats here:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT
Howard Feldman