How to test new hard-drives
By Sergey Nosov
July 21, 2014
Everybody has his or her procedure for deploying new hard drives. Some do not test drives at all, others let a quick test run and throw the drive into the operation, especially scrupulous individuals gruel new arrivals with different kinds of surface tests.
With modern Self-Monitoring, Analysis, and Reporting Technology (SMART) enabled hard drives, here is a routine I settled on, that lets me test multiple hard drives simultaneously in about two-day period. The outline of the testing procedure is as follows.
- Run quick SMART test.
- Run block level surface test.
- Run long SMART test.
If any problems show up at any stage—the drive goes back to the store/manufacturer.
Multiple hard drives can be tested at the same time. Furthermore, I can hot-plug additional hard drives, without interrupting already running tests.
For the testing environment I use Ubuntu version 14.04, desktop, amd64 operating system that I boot from a Universal Serial Bus (USB) stick. This method should work just as well with any kind of Linux, FreeBSD, or other Unix-like operating system. Even though I use Windows for my daily work; for various bare-metal testing labs and environments I often employ open source software, so that I do not have to keep track of licensing and activation issues.
For communication with hard drive SMART interface I use the smartctl utility. It was not included in the Linux distribution that I use, but the first time I tried to run the utility, I got a helpful prompt on one command I needed to run to download and install smartctl (sudo apt-get install smartmontools).
smartctl --help
Run the above to make sure you have the smartctl utility, and to see options available.
To test new hard drives I open up several terminal (command prompt) windows, one for each hard drive to be tested. To open a terminal window in Ubuntu desktop, you can use the “Ctrl-Alt-t” key combination.
To see a list of all hard drives available to the operating system you can run the following command.
ls /dev/disk/by-id/
Any Advanced Technology Attachment (ATA) connected disks will start with “ata-“, Small Computer System Interface (SCSI) connected disks will start with “scsi-“, depending on your controller, the same disk may show up as both ATA and SCSI.
It is also important to note that the testing method we use will only work with controllers and host bus adapters that give operating system full access to the disks (the majority of the motherboard built-in controllers do). If the disks are connected to a full-fledged hardware RAID controller, then you should use the controller’s interface to test the disks, or connect them somewhere else (unless the RAID controller can be placed in the so-called IT-mode, which again gives full control over disks to the operating system).
Now, the first test I usually run is the short SMART self-test. Here is the command to start it.
sudo smartctl -t short /dev/disk/by-id/ata-YOUR-DRIVE-ID
“sudo” is the Linux equivalent of run as administrator. Replace “ata-YOUR-DRIVE-ID” with your disk identifier, it is going to be one of those shown by the ls command you have run earlier.
SMART hard drive function usually turns on automatically, if not, you may get a prompt about a command you need to run to turn it on. Also make sure your computer supports SMART (most modern ones do), and that it was not turned off in the motherboard BIOS. The controller mode, if changeable, should be set to the newer AHCI rather than older IDE mode.
What SMART tests actually do is up to hard drive manufacturers. A short SMART test usually takes less than five minutes to run.
To see results of the test, and all other SMART data, run the following command.
sudo smartctl --all /dev/disk/by-id/ata-YOUR-DRIVE-ID
This command will show a great deal of information about the drive, including device model, serial number, parameters, status, SMART capabilities, a list of SMART attributes, log of SMART tests run.
Once the short test completes, in the SMART self-test logs section you will see a new entry, something akin to, “Short offline Completed without error.”
A very interesting section in the SMART report, for us, is the list of SMART Attributes. That will be two dozen or so attributes, each one starting with the ID# and name (for example ID# 1, ATTRIBUTE_NAME: Raw_Read_Error_Rate) and a few other values, the last of which is the RAW_VALUE. Depending on the width of your terminal window each attribute is displayed on its own line, or it may wrap to the next line of text.
The attributes to pay especial attention to are the following.
- Reallocated_Sector_Ct
- Current_Pending_Sector
- Offline_Uncorrectable
All of these should have the RAW_VALUE of zero, for a new drive.
There are also a few error rate attributes such as the Raw_Read_Error_Rate and the Seek_Error_Rate that for most hard drive manufacturers, with notable exception of Seagate, should also have the RAW_VALUE of zero. For Seagate drives the least significant 32 bits of the Raw_Read_Error_Rate and the Seek_Error_Rate refer to the number of operations, and the most significant 16—to the error count; as such, for Seagate drives, if the raw values in the error rates attributes are below 4294967296 (2^32) then no errors have occurred.
If you have not noticed anything wrong with your disks up until now, it is time to run more advanced tests.
Some hard drive manufactures program disks with a special self-test, conveyance test, whose function is to check for damage the drive may have sustained during transport. To start this test run the following command.
sudo smartctl -t conveyance /dev/disk/by-id/ata-YOUR-DRIVE-ID
The conveyance tests usually finish in less than ten minutes. Any tests you see in this article, you can (and should, to save time) run in parallel on as many new drives as are connected to your computer. Upon the test completion check the SMART status for results again.
sudo smartctl --all /dev/disk/by-id/ata-YOUR-DRIVE-ID
If everything is still fine, the next step is to write something to every accessible block on the disk, and see if the same values can be read back. Do not run this test on solid state drives (SSDs). SSDs are built with certain underlying storage failure rate in mind that increases during the drives’ life-time. The reliability of SSDs is ensured by built-in controllers employing special entropy management algorithms.
If you run the block level test on an SSD you will be wearing it out in vain; it will not actually test all the storage, as it is up to the controller to decide where to store data. So, run the following test on magnetic hard drives only. Warning: this is a destructive test; do not run this test on disks with data that you want to keep.
sudo badblocks -ws /dev/disk/by-id/ata-YOUR-DRIVE-ID
This test takes long time; I usually let it run overnight or over a weekend. It will touch all available blocks with a pattern, and keep on testing with different patterns until you stop the test. The no-error results will look something like the following:
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: ^C7.95% done, 20:20:04 elapsed. (0/0/0 errors)
Finally, we are ready to run the long SMART self-test:
sudo smartctl -t long /dev/disk/by-id/ata-YOUR-DRIVE-ID
For modern 4TB drives long tests take in the neighborhood of ten hours to complete. Upon the test completion examine the SMART status in detail yet again, to see if anything nasty popped up.
sudo smartctl --all /dev/disk/by-id/ata-YOUR-DRIVE-ID
If everything is fine – congratulations; if not, send the disk back.
I hope you find my routine for testing new hard drives helpful. With it we start with quick tests, so we do not waste too much time on obviously defective drives, and then progress to longer tests, as needed. Do not forget that you can run these tests on several hard drives simultaneously—this will save a lot of time if you have many new disks to test.
Good luck!