Our November Competition
User Reviews - Add Yours!
The PCHF Lounge
Go Back   PC Help Forum » Tutorials » Hardware Tutorials
Register for a Free Account

Hardware Tutorials - HDD Failures and S.M.A.R.T. Records posted in the Tutorials forums; It is with scores of butterflies in my -alas- considerable stomach that I decided to put together this tutorial/discussion . Much controversy surrounds the subject of S.M.A.R.T. and HDD failure ...


Reply
Recommended Driver Scanner
Old 01-07-2009   #1
Tech Support Team
 
georgeks's Avatar
 
Join Date: Sep 2006
Location: The Netherlands
Posts: 1,119
PC Experience: Enough
Default HDD Failures and S.M.A.R.T. Records

It is with scores of butterflies in my -alas- considerable stomach that I decided to put together this tutorial/discussion. Much controversy surrounds the subject of S.M.A.R.T. and HDD failure prediction, and the passions run high. Let's try to make the best of it, and learn whatever we can in the process
Additional links I have found useful:
Google HDD Failures Report (Summary)

At the Conference on File and Storage Technologies, Google presented a report that challenged accepted HDDs failure explanations, claiming that there are many more factors impacting the on a hard disk’s life expectancy, and failure predictions –if any- are much more complex than established dogma wanted us to believe.
Just the fact that Google's server infrastructure is estimated in excess of 450.000 systems, uses consumer-grade Hard Disks ranging between 80 to 400 GB capacities, makes the report -at least -interesting
More than 100,000 drives were covered, manufacturer from 2001 onwards. Rotational speeds ranged from 5400 to 72000rpm, and were various (9) models from different well-known and established manufacturers.
Google claims that collection and storage of “vital information" about all of its systems occurs every few minutes.

Environmental factors (e.g. temperature), activity levels and S.M.A.R.T. records commonly considered to be good indicators of a Hard Drive’s health, are examples of such collected information.

In general, Google's hard drive population saw a failure rate that was increasing with the age of the drive.


·Within the group of hard drives up to one year old, 1.7% of the devices had to be replaced due to failure.
·The rate jumps to 8% in year 2 and 8.6% in year 3.
·The failure rate levels out after that, but Google believes that the reliability of drives older than 4 years is influenced more by "the particular models in that vintage than by disk drive aging effects."
Breaking out different levels of utilization, the Google study shows an interesting result:

Only drives with an age of six months or younger show a definite higher probability of failure when put into a high activity environment.

Once the drive survives its first months, the probability of failure due to high usage decreases in year 1, 2, 3 and 4 - and increases significantly in year 5.

Equally surprising was Google's temperature research:
·Failures do not increase when the average temperature increases.
·In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates.
·Only at very high temperatures is there a slight reversal of this trend," the authors of the study found.
In contrast, Google reported that certain SMART parameters have a noticeable effect on drive failures.
Drives typically scan the disk surface in the background and report errors as they discover them. Significant scan errors can hint to surface errors and Google reports that fewer than 2% of its drives show scan errors.
However, drives with scan errors turned out to be ten times more likely to fail than drives without scan errors.
Approximately 70% of Google's drives with scan errors survived the first eight months after the first scan error were reported.

Similarly, reallocation counts, a number that results from re-mapping faulty sectors to a new physical sector, can have a dramatic impact on a hard drive's life:

Google found that drives with one or more reallocations fail more often than those with none.
The observed average impact on the average fail rate came in at a factor of 3-6, while about 85% of the drives survive past eight months after the first reallocation.

Google discovered similar effects on hard drives by other SMART parameters, but the bottom line revealed that 56% of all failed drives had no count in either one of these categories - which means that:


More than half of all failed drives were put out of operation by factors other than scan errors, reallocation count, offline reallocation and probational counts.

Bottom line:

Google's research does not provide adequate means of predicting when hard drives are likely to fail. It shows that temperature and high usage alone are not responsible for failures by default.
Raises awareness over a trend christened "infant mortality phase" - a time frame early in a hard drive's life that shows increased probabilities of failure under certain circumstances.
The report does not provide us with a definite conclusion, and the authors indicate that currently there is no promising approach for HDDs failure prediction:
"Powerful predictive models need to make use of signals beyond those provided by SMART."
S.M.A.R.T. is as good as it goes:
If you read the FULL report-highly recommended- you’ll see that HDD manufacturers have an “Open Season” manipulating the output of the various S.M.A.R.T. attributes.
Hard Drive failures peak when new and when old, with failures leveling out in between.
From personal experience I have found that a Hard Drive with less than 100 power cycle counts (fairly new, if you discount the possibility of staying on for extended periods of time) and more than 1000 reallocated sectors is one to avoid buying.
In an initial –personal and limited-research the S.M.A.R.T. records of 40 HDDs (SATA and IDE) were examined.
Disks with more than 900 reallocated sectors and less than 100 power cycle counts appeared problematic (Failed Secure/Normal Erase with various Utilities, installing/running FreeDOS or Windows XP).
A second research on 500 disks (2.5”/3.5” EIDE/SATA) had similar results -along with a few spectacular successes –but also failures- to predict a HDD’s health.
Obtaining a drive’s S.M.A.R.T. record requires about a minute.
SMARTUDM, the tool I use(d) comes from SysInfo Lab http://www.sysinfolab.com/, the makers of ASTRA - Advanced Sysinfo Tool and Reporting Assistant.
It does not detect 2.5” SATA drives when connected to a Notebook-but IT DOES detect them when connected to a Desktop PC-the Power/Data connections are the same.
I have a SATA I 80GB drive, which is in constant use (and mostly, abuse) since 2002-and paid 185€ for it then.
Has been formatted/partitioned hundreds of times (no joke, really!) has been –and currently is- on a RAID 0 with a similar drive running Windows Vista Home Premium.
The drive had 3890 Power Circles and 3 (three) reallocated sectors in September 2008 (and that has changed to 4250/3 on Monday 05/01/2009.
I was careful, cautious (yeah ..Hundreds of times) and most probably lucky.
I’d like to hear your views, but please refrain from mentioning specific Brands, as everyone has his/her favorites, and the one that he/she abhors.

georgeks
Attached Files
File Type: pdf disk_failures1.pdf (172.3 KB, 4 views)
__________________
You are, What you do, When it Counts
georgeks is offline   Reply With Quote
Advertisement - Register to Remove

Old 01-07-2009   #2
PCHF Founder & Owner
 
Hengis's Avatar
 
Join Date: Jan 2004
Location: The PCHF Bunker
Posts: 14,069
PC Experience: Microsoft Certified Professional
Default Re: HDD Failures and S.M.A.R.T. Records

Very interesting paper you have put together there. TBH, a lot of it goes over my head but it has been carefully researched and well worded.

I echo your request not to turn this into a preference thread, thank you.
__________________
Hengis is offline   Reply With Quote

Reply

Bookmarks

Tags
failures, hdd, hdd failure, information, Information:, records, s.m.a.r.t., s.m.a.r.t. records, smart, smart records
Similar discussions...
Thread Thread Starter Forum Replies Last Post
Information: HDD Data security georgeks Hard Drives 3 01-08-2009 09:17 AM
How i upgrade the firmware of HDD? LebronJames23 Unfinished Threads 1 05-14-2007 08:33 AM
[Resolved] S.M.A.R.T Capable and Status Bad Cavanaughd Hard Drives 5 12-02-2006 04:53 PM

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are On




All times are GMT. The time now is 04:36 PM.
Powered by vBulletin
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.3.2