Hardware issues. How many per server per year?

SolidShellSecur

New member
This year alone we have had 2 hard drives replaced and a SATA cable. This is maybe 6-10months time frame for one server.

Anyone else have such bad moments?
 
How heavily used are your servers? It depends on load and all.

2 drives isn't too bad - what drives are these? SATA cable though is rare.
 
Personally I think 2 hard drives on the same server in 6 to 10 months is not good but as concert049 mentioned it does depend on the server's usage. I've also seen some cases that crimp the SATA cable causing it to fail but also very rare.
 
Hopefully you were running at least a RAID1 or RAID5 so that your users weren't impacted by the hardware failure!

For us, if there were two failures in a 12 month period, there's some serious problems going on. The hardware that we use on our machines is NEW hardware however, so that really helps cut down on possibilities. If you're using recycled hardware, or older models, you run a higher risk of hardware failure.

Usage will play a part in things. The general school of thought is that a home computer hard drive would last between 3 and 5 years (I have some drives still running that are 10 years old and run daily, just not 24 hours).

Another thing to be aware of is that in general, hard drives are not designed to run 24x7 with massive read/writes for years on end. If you're looking for stability when it comes to drives, check about investing in "server grade" hardware. There's actually specific drives designed to run with the high stresses of servers.

When all else fails - mirror :) A RAID 5 is good, a RAID 10 is better, and a RAID 50 you'll not have to be concerned about any failures at all. You can have several drives fail at the same time, and you won't even bat an eye at it!

As always, the logs are your friends. Before a server fully tanks, usually there's some sort of heads up (not always, but many times you'll get a warning). If you notice SMART errors or anything else related to failures, then you need to address that. Some server owners disable the FSCK after "X" bootups (usually 10 boots), but this is a very bad thing to do. the FSCK is going to detect the integrity of the drive and determine if there's any errors found. Not doing this on a semi-regular basis, or at least after every "X" boots, can lead to issues down the road.

If you're replacing a drive within a matter of months of each other, 9 times out of 10, you're using recycled hardware. Install NEW drives. (hardware degradation starts the moment they spin up, but USUALLY they don't fail right out of the box)

Final advise regarding harddrive selection - don't buy the cheap ones with a 1 year warranty - get one with a 3-5 year warranty. Reason? Manufacturers don't want to be replacing drives so they use quality parts (usually). A 1TB drive for $60 is generally not made the same way a 1TB drive for $300 is made.
 
i think In addition the points that the friends say the brand of hdd is an other important point
i m working with different server from more than 3 years and i had just had one hdd replaced
 
How did your sata cable fail? I would check the temperature of your hardrives. We used to have some servers with a very popular colo company in LA and had some hardware failures due to improper cooling. 2 in 10 months for sure is too much.
 
The failure rate depends on the usage of your server.

It's important to use RAID protection to prevent data loss against hard drive failures.
 
What do you expect with a desktop hard drive. Go with SCSI for reliability

I don't think the OP mentioned anywhere they were using desktop drives?

They are using SATA, which as long as enterprise grade drives are used, is perfectly acceptable.

SCSI or SAS will generally give greater performance, but reliability should not necessarily be a factor.

Steve
 
I don't think the OP mentioned anywhere they were using desktop drives?

They are using SATA
Steve


Exactly. What is the ratio of SATA drives to SAS/SCSI drives in desktops compared to enterprise grade mission critical servers? Do I need to answer that question for you, or the next question: And why?

FACT: 99%+ of SATA drives aren't designed to work 24/7/365 under duress. Also putting them under busy server workloads can dramatically affect their MTBF.
 
Last edited:
Exactly. What is the ratio of SATA drives to SAS/SCSI drives in desktops compared to enterprise grade mission critical servers? Do I need to answer that question for you, or the next question: And why?

FACT: 99%+ of SATA drives aren't designed to work 24/7/365 under duress. Also putting them under busy server workloads can dramatically affect their MTBF.

It's not the fault of a HDD interface standard if the wrong type of drive is used (desktop drives in a server environment). It's the fault of the person installing the wrong grade of drive in a server.

Just because the majority of SATA drives are designed for desktops, it doesn't mean you should discount the use of the "1%" that are designed for the purpose. It's perfectly acceptable to use SATA drives on a server as long as they are enterprise grade drives and they are applicable for what the server will be used for.

Steve
 
Exactly. What is the ratio of SATA drives to SAS/SCSI drives in desktops compared to enterprise grade mission critical servers? Do I need to answer that question for you, or the next question: And why?

FACT: 99%+ of SATA drives aren't designed to work 24/7/365 under duress. Also putting them under busy server workloads can dramatically affect their MTBF.

In additions SCSI controllers are much more reliable than sata controllers (faster too)
 
When all else fails - mirror :) A RAID 5 is good, a RAID 10 is better, and a RAID 50 you'll not have to be concerned about any failures at all. You can have several drives fail at the same time, and you won't even bat an eye at it!
  • RAID 5
    • Disk failure has a medium impact on throughput.
    • Difficult to rebuild in the event of a disk failure (as compared to RAID level 1).
  • RAID 10
    • RAID 10 has the same fault tolerance as RAID level 1.
    • Under certain circumstances, RAID 10 array can sustain multiple simultaneous drive failures.
  • RAID 50
    • RAID 50 is more fault tolerant than RAID 5 but has twice the parity overhead.
    • Failure of two drives in one of the RAID 5 segments renders the whole array unusable.
Even RAID 50 carries a chance of your entire array being FUBARed. ;-)
 

Forum statistics

Threads
81,134
Messages
248,802
Members
20,696
Latest member
DreamHostBrett
Top