PDA

View Full Version : Linux-Need badblocks program advice



jasong
02-07-2009, 12:29 AM
Okay, I've got a hard drive that might have a bad write head. But if it IS bad, it's only bad about 1 in 500 million times, in terms of individual bits(1s and 0s). So I'd never notice a problem in a text file, but it could easily screw up data for distributed computing.

So, unless there's a better program out there, I need help with a good combination of variables for the badblocks program. I read something about hard drive buffers, and I know very little about hard drives, so I'm hoping someone can feed me a command to try. The hard drive is about 120GB in size(base-10 GB, about 112GiB) and I think the first pass should trigger a mistake if the problem is truly the write head.

Lastly, don't worry about ruining the data on the drive. Anything that the drive had that was important is long gone, right now I just need to determine if I need to ditch the drive.(I mean recycle when I say ditch, I'm a good boy :) )

IronBits
02-07-2009, 12:48 AM
Need make and model of HDD.
Each one is different.
Manufacturers usually have a utility on their websites you can download and use to test/diagnose and map out the bad sectors.

alpha
02-07-2009, 01:18 AM
Yeah, the utilities IB is referring to can be hit or miss sometimes, but that's what I'd start with too. I've had some success with them in the past. Most of them are burnable ISOs which you then boot from a CD, thus removing any OS-interoperability issues.

How did you go about diagnosing the problem and deciding there is one error every 500 million writes?

jasong
02-07-2009, 10:55 PM
How did you go about diagnosing the problem and deciding there is one error every 500 million writes?
Well, my conclusion isn't totally scientific, but basically...

I've been having trouble with my quad-core for a while. I like to have the same Linux distro on my laptop as my quad-core, but it's only the quad-core that consistently suffers from problems. I tried to run p-1 on it a few weeks ago, and it got some fairly bad errors, so I assumed it was a cpu or RAM problem. The thing is that even though I continuously have errors with installs and distributed computing stuff, the Prime95 torture test and a bootable iso RAM test revealed absolutely nothing. After I thought about, I realized that the errors disappeared when most of the hard drive activity just involved(or mostly involved) hard drive reads. The 1 in 500 million is a SWAG, based on the fact that some installs boot and some don't. Suffice it to say that you might not notice the errors if you were just using the computer for word processing.

alpha
02-08-2009, 04:49 AM
How long are you running the Prime95 test? How long are you running the RAM test? These need to be run for 12+ hours really. What are your temps like?

Have you run the hard disk manufacturer's diagnostic utility yet? Were there any errors? How is your disk's SMART status?

When you're referring to the "errors" and "problems" that give you reason to think something is wrong, what are they, specifically? If you can give actual error messages it always helps to track the problem down.

Need lots more info to narrow this down.

jasong
02-09-2009, 06:06 PM
How long are you running the Prime95 test? How long are you running the RAM test? These need to be run for 12+ hours really. What are your temps like?

Have you run the hard disk manufacturer's diagnostic utility yet? Were there any errors? How is your disk's SMART status?

When you're referring to the "errors" and "problems" that give you reason to think something is wrong, what are they, specifically? If you can give actual error messages it always helps to track the problem down.

Need lots more info to narrow this down.
I ran the RAM for 88.5 hours without an error and I ran Prime95 for a couple days(blend test) with no errors.

I really wanted to get my stuff working again, so I'm running boinc on a bootable linux iso. Because of (1) my anxiety disorder, (2) my father's father is in really bad health, meaning my dad is stressed, and (3) me my dad don't get along that well anyway, I've decided it would be best to just ignore the problem and run BOINC on projects that can tolerate errors, assuming it's the cpu. But I still think it's the hard drive.

I might decide to re-google the badblocks and hard drive stuff to try to solve the problem, but I can't afford to get fixated on it because that's when I start yelling at the computer. And I don't want to subject my father to that.

Thanks for the help, though. :)

alpha
02-10-2009, 02:07 AM
That's why we were trying to help. If you answer all the questions in my last post then we might be able to help you solve it so you won't have to shout at it.

Ignoring the problem (if there is one) is not a good idea, because it will only cause you grief again later.

gopher_yarrowzoo
02-10-2009, 04:17 AM
Very true words them,
I should know I was busy muttering at my pc last night, ever tried to play an online game with a group of people when you've got a flaky connection where latency goes from 0-impossible in the time it takes to type this (from about 250ms to 2000ms and beyond I mean I can play it just about if the latency stays stable but then it disconnected me from the game oh 4 - 5 times in the same area, I saw latency times of well 16000ms yes 16s of latency... Managed to get done what needed doing and got out of there, went to a major city and was disconnected again, so I just gave up.

Heres how to test if it's the HDD, swap it out for a known good one, preferably one with linux on it or at least a copy of windows you can wipe out and reinstall (since windows isn't very bright when it comes to hardware changes :bang:).
Hope that helps a little.
Write down the problem and break it in to bits, keep breaking the bits down until you have easy managable identifiable chunks, this is also how I learned to code / problem solve.

alpha
02-10-2009, 06:00 AM
Gopher, if your internet connection is provided wirelessley and you're running XP or Vista, make sure you disable the "Wireless Zero Configuration" service and use the client software provided by the manufacturer of the wireless card instead. Otherwise you get a 1000ms+ ping spike every 60 seconds when Windows searches for new wireless networks. :looney:

Couldn't believe it when I first saw those ping spikes appearing, then discovered it was another MS-ism and just rolled my eyes.

gopher_yarrowzoo
02-10-2009, 06:24 AM
Fully wired mate, even got pings dropped from the router itself ping testing a known good ip i.e yahoo.com

alpha
02-10-2009, 06:40 AM
Ah, must be the Scottish snow freezing your pipe. :scarf:

gopher_yarrowzoo
02-10-2009, 06:53 AM
aye tell me about it, it will be a joint somewhere all I will say is BT cable + joints + cold weather = fail some of the time, since they don't always seal the join correctly, and it obviously affects DSL more than analogue.

Roll on CN 21 when this sort of stuff will be a major headache, unless they do FTTH

gopher_yarrowzoo
02-11-2009, 05:12 AM
Oh and on a side note:
It may not be bad blocks, it maybe a glitch with the MBT for that drive or the swap space... speaking of which why can't windows do as linux does and created a partition that swap only? As how fragmented can a swap file become how about very, 1,536 parts for 1 swap file time to move it somewhere else maybe! Yes I've not really been defragging my system very often..
It's now being worked on by Diskeeper, the full version not the cut down version that is XP's defrag tool, that's right they replace the nice defrag they had, well it was pretty with the console and bought a cut down "free" version of Diskeeper from Exec Software - don't believe me: start > run > cmd > defrag c: -a and see what it says.