Basic Failed Hard Disk Fault Finding and Wrecking/Fixing With fdisk, sdisk, cfdisk and badblocks

I learned some major lessons today. Hard drives, generally speaking, are amazing bits of kit - which I already knew - but they are incredibly robust given half a chance, when you think they are totally dead or only partially working - usually due to strange partitioning structures - maybe used to wipe data securely etc.

I had 2 x IDEs and a WD320 SATA disk that I was CONVINCED were dead, but had made that assumption using the USB drive gizmo. These gizmos have their place with known good drives, but for drives with "problems" they should NOT be used as they can give incorrect results or compound problems as seen below. The whole section on the "36 Terabyte" (yes, you read right) size IDE drive below came about from erroneous data and misbehaving due to the USB gizmo with a flaky drive - I have left it as I wrote it earlier as a lesson...

Plug them into real motherboards to test them properly before writing them off!

----------------------------------------------------------

SATA, IDE and SCSI disks can fail totally or partially in 3 main ways: mechanical, electrical or both.

The mechanical faults such as mangled heads (from mechanical shock etc.), failed motor (age, electrical surge etc.), or worn/damaged spindles or platters will usually mean binning the drive unless you have lost valuable data and want to pay for expensive, professional data recovery services. Recovery cannot be guaranteed, obviously...

The reason it's expensive is because it requires specialist kit, knowledge and is time consuming as in the video:

A seemingly dead or unseen drive may NOT be totally gone for the home user though if you check some basic things. A dead WD320? Take a closer look...

If a drive can be heard spinning up, but doesn't show as a readable device in Windows, Apple or Linux, then it may be that the logic board has an electronic problem, so the drive cannot be communicated with correctly or that the mbr/partition data has been wiped in a particular way that makes it invisible to the GUI, as in the first Gparted screenshot above. It may be recoverable - maybe at least for long enough to image the data or just get key files off it, even if it is near total death.

If a drives works but makes clicking, ticking or other strange noises you need to get data off it NOW. The reasons for noises are explained in the excellent set of pro knowledge level vids below by Scott Moulton - view in Utube for continuous parts play - fascinating re the tech workings:

Noisy drives could fail completely at any moment. (You have multiple backups of everything important of course, so not a big deal eh...???? EH????!!!!).

And I mean multiple "different media" backups (local and off site) - i.e no single point of failure, as the annoying tech dick "linus" left himself wide open for with total reliance on ridiculous "nerd" kit with only 1 motherboard, for the sake of a super tech kit buzz...KISS, keep it simple, Stupid!

Corrosion in electronic circuits is often overlooked - a simple cause of some problems, especially older 2nd hand, boot sale type drives like I pick up occasionally.

The logic board supplies power and communication paths by connectors that barely touch, in surface area terms, so any corrosion can be cleaned off with some IPA/contact cleaner, a cotton bud, a pencil eraser - or a hard scraper if bad. Not all drives have non corroding, gold plated connectors and even then, dirt/dust can get in. Bad corrosion on these power pins may stop the drive spinning up.

Removing the screws using a torqz no.8 driver allows the board to be lifted:

Surface corrosion seemed like holes in the data contacts until lightly scraped off using the driver end:

Also run it over the sharp pins on the drive side that dig in to the contacts:

Now you have optimised the chances of drive life for the electrical contact aspect of the drive - but whether the physical internals/electronics/logical data still functions fully is another matter...

If the disk is attached, not seen, but spins up, all you can try is viewing for a device ID in the syslog.

My drive below still has major problems despite corrosion cleaning - it's logic chips may have become locked and garbled or the platters are corrupt judging by the weird info coming out:

{THIS IS THE START OF THE ABOVE MENTIONED ASSUMPTION ERRORS SECTION I MADE USING THE USB GIZMO and not using a direct to mboard connection}

tail -f /var/log/syslog

Feb 6 17:48:36 hpmint kernel: [ 2489.581086] sd 5:0:0:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 6 17:48:36 hpmint kernel: [ 2489.581092] sd 5:0:0:0: [sdc] Sense Key : Hardware Error [current]
Feb 6 17:48:36 hpmint kernel: [ 2489.581097] sd 5:0:0:0: [sdc] Add. Sense: Logical unit communication CRC error (Ultra-DMA/32)
Feb 6 17:48:36 hpmint kernel: [ 2489.581101] sd 5:0:0:0: [sdc] CDB:
Feb 6 17:48:36 hpmint kernel: [ 2489.581104] Read(16): 88 00 00 00 40 00 52 a1 de 98 00 00 00 08 00 00
Feb 6 17:48:36 hpmint kernel: [ 2489.581123] blk_update_request: I/O error, dev sdc, sector 70370130517656

Gparted can't see it, fdisk can't write it, and parted gets write issues part way through creating a new partition:

(parted) help
align-check TYPE N check partition N for TYPE(min|opt)
alignment
check NUMBER do a simple check on the file system
cp [FROM-DEVICE] FROM-NUMBER TO-NUMBER copy file system to another partition
help [COMMAND] print general help, or help on
COMMAND
mklabel,mktable LABEL-TYPE create a new disklabel (partition
table)
mkfs NUMBER FS-TYPE make a FS-TYPE file system on
partition NUMBER
mkpart PART-TYPE [FS-TYPE] START END make a partition
mkpartfs PART-TYPE FS-TYPE START END make a partition with a file system
resizepart NUMBER END resize partition NUMBER
move NUMBER START END move partition NUMBER
name NUMBER NAME name partition NUMBER as NAME
print [devices|free|list,all|NUMBER] display the partition table,
available devices, free space, all found partitions, or a particular
partition
quit exit program
rescue START END rescue a lost partition near START
and END
resize NUMBER START END resize partition NUMBER and its file
system
rm NUMBER delete partition NUMBER
select DEVICE choose the device to edit
set NUMBER FLAG STATE change the FLAG on partition NUMBER
toggle [NUMBER [FLAG]] toggle the state of FLAG on partition
NUMBER
unit UNIT set the default unit to UNIT
version display the version number and
copyright information of GNU Parted

(parted) mklabel msdos
Error: Input/output error during write on /dev/sdc
Retry/Ignore/Cancel?

(parted) align-check min 1
Error: Partition doesn't exist.

(parted) mktable gpt
Warning: The existing disk label on /dev/sdc will be destroyed and all data on
this disk will be lost. Do you want to continue?
Yes/No? Yes

Error: Input/output error during write on /dev/sdc
Retry/Ignore/Cancel?

It's a 160GB IDE drive that has no partition or size info available - this is trial and error stuff...but I think the drive is truly gone, logically.

(parted) resize 1 0 160000
WARNING: you are attempting to use parted to operate on (resize) a file system.
parted's file system manipulation code is not as robust as what you'll find in
dedicated, file-system-specific packages like e2fsprogs. We recommend
you use parted only to manipulate partition tables, whenever possible.
Support for performing most operations on most types of file systems
will be removed in an upcoming release.
Error: /dev/sdc: unrecognised disk label

Wow! I have a supersize 36TB quantum universe drive!

sudo fdisk /dev/sdc

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x10eeb217.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: The size of this disk is 36029.5 TB (36029506825052160 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT).
Command (m for help): m

Command action
a toggle a bootable flag
b edit bsd disklabel
c toggle the DOS compatibility flag
d delete a partition
l list known partition types
m print this menu
n add a new partition
o create a new empty DOS partition table
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
u change display/entry units
v verify the partition table
w write table to disk and exit
x extra functionality (experts only)

Command (m for help): x

Expert command (m for help): m
Command action
b move beginning of data in a partition
c change number of cylinders
d print the raw data in the partition table
e list extended partitions
f fix partition order
g create an IRIX (SGI) partition table
h change number of heads
i change the disk identifier
m print this menu
p print the partition table
q quit without saving changes
r return to main menu
s change number of sectors/track
v verify the partition table
w write table to disk and exit

Expert command (m for help): p

Disk /dev/sdc: 255 heads, 63 sectors, -1 cylinders

Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID
1 00 0 0 0 0 0 0 0 0 00
2 00 0 0 0 0 0 0 0 0 00
3 00 0 0 0 0 0 0 0 0 00
4 00 0 0 0 0 0 0 0 0 00

-1 cylinders doesn't sound good eh...?

Expert command (m for help): m
Command action
b move beginning of data in a partition
c change number of cylinders
d print the raw data in the partition table
e list extended partitions
f fix partition order
g create an IRIX (SGI) partition table
h change number of heads
i change the disk identifier
m print this menu
p print the partition table
q quit without saving changes
r return to main menu
s change number of sectors/track
v verify the partition table
w write table to disk and exit

Expert command (m for help): c
Number of cylinders (1-1048576):

That's way too many cylinders for a default, surely?! A good way to wipe data from a drive though by a tech..if that's the reason..

These Seagates don't write the CHS values on them, but as a 500GB WD drive I have has only values of 16383/16/63...

I'll have to look up the specs online then see if I can reset the correct values. Now I'm suspecting this disk was deliberately wiped to be unrecoverable.

http://www.seagate.com/staticfiles/support/disc/manuals/ce/DB35%20Series/DB35.3%20Series/100439554f.pdf

Default sectors per track 63
Default read/write heads 16

Default cylinders 16,383

Hmm...same as the WD 500GB spec...how can that be? Unless they limit the size of drives in logic if the CHS is the same?

Expert command (m for help): c
Number of cylinders (1-1048576): 16383

Expert command (m for help): h
Number of heads (1-256, default 255): 16

Expert command (m for help): s
Number of sectors (1-63, default 63):
Using default value 63

Expert command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Best to reboot and see what it reads like now...Nope. Stayed as it was:

Feb 6 19:26:15 hpmint kernel: [ 95.035392] sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).

Ah, well, was a learning experience! But now you know how to really wreck drives with fdisk by changing CHS values if you need...

As dos is defunct, a Sun label was the only option.

This sums it up:

stevee@hpmint ~ $ sudo fdisk /dev/sdc
Detected sun disklabel with wrong sanity

cfisk can't make sense of it either - seems that 36TB value is locked in there!

What about sfdisk? I've never used it...seems it knows when things aren't right at least:

bur what values to use? Clues from Testdisk:

http://www.cgsecurity.org/wiki/Menu_Geometry

How to find the correct number of heads?

If the HD geometry mismatches the geometry used when creating the partition table, warning messages such as: Bad sector count, Bad relative sector or Bad ending head are displayed when Analyse is selected from the main menu. If you see such errors, you may need to use the Geometry menu to change the logical number of heads. Try 255, 16, 32, 64, 128 and 240 heads until TestDisk finds all your partitions. 255 and 240 are the most common head values. If you installed Linux as the only OS on your hard drive, it tends to default to only 16 heads.

How to find the correct number of sectors?

Usually the number of sectors per head is 63, but on some USB devices, the value 32 can be found.

Read the sfdisk man page...

Seems you can use the format

0 + L

for "start, max size, and Linux part type"

at a minimum, to write a new partition table, but the old values the kernel reads from this drive are still persistent, but it actually creates a seemingly valid PT which can then be formatted to ext4 to create an even bigger drive!! Of course, it fails to mount or be read...the logic is so screwed, but how can the linux kernel allow this to appear valid??

Here's the "valid" output from the PT creation then format with ext4:

stevee@hpmint ~ $ sudo sfdisk /dev/sdc
[sudo] password for stevee:
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdc: 4380338034 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an MSDOS signature
/dev/sdc: unrecognised partition table type
Old situation:
No partitions found
Input in the following format; absent fields get a default value.
<start> <size> <type [E,S,L,X,hex]> <bootable [-,*]> <c,h,s> <c,h,s>
Usually you only need to specify <start> and <size> (and perhaps <type>).

/dev/sdc1 :0 + L
/dev/sdc1 0+ 4380338033 4380338034- 35185065258104+ 83 Linux
/dev/sdc2 :
/dev/sdc2 0 - 0 0 0 Empty
/dev/sdc3 :
/dev/sdc3 0 - 0 0 0 Empty
/dev/sdc4 :
/dev/sdc4 0 - 0 0 0 Empty
New situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdc1 0+ 4380338033 4380338034- 35185065258104+ 83 Linux
/dev/sdc2 0 - 0 0 0 Empty
/dev/sdc3 0 - 0 0 0 Empty
/dev/sdc4 0 - 0 0 0 Empty
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk.
Do you want to write this to disk? [ynq] y
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

mkfs mkfs.ext3 mkfs.hfs mkfs.msdos mkfs.xfs
stevee@hpmint ~ $ sudo mkfs.ext4 /dev/sdc1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
9773056 inodes, 39072542 blocks
1953627 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1193 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Just as I really got fed up:

Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
Using default value 1
First sector (2048-4294967295, default 2048): 0
Value out of range.
First sector (2048-4294967295, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-4294967294, default 4294967294): +160G

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Then, with not so weird numbers and no other partitions, I saw, close enough to 160GB:

but it all went stupid again with the FS creation...

sudo mkfs.ext4 /dev/sdc1

Maximum filesystem blocks=4294967296

Time for the bin...

Just as well I'm stubborn...I dug out a spare PC with real IDE connectors on the mboard, and...with a bit of fdisking, it was all resolved! It helps to run the BIOS SMART tests before even getting to Linux to know what to expect.

All of these drives responded to the above commands when attached to mboards without giving very weird results, just errors etc, and sfdisk worked great to write totally new partitions on the 2 drives that using the USB gizmo had "weird" unexpected CHS values that would not be over-written with the gizmo - just start at 0 or the number that sfdisk expects, use the max disk size (+) and a Linux FS type (L):

/dev/sdd1: 0 + L

Once the partition is created, mkfs, then check it with fsck and reboot:

After the reboot, check with Disks for SMART and benchmark - done:

For the case of the WD320 SATA that caused big read problems via the USB gizmo, it has 1452 bad sectors:

It reads correctly now as 320GB and can mount.

A final step now is running badblocks in read/write test mode to clean up any bad sectors if possible - for that initial "36TB" Seagate 160GB IDE drive:

sudo badblocks -vvvw /dev/sda1

And finally for the WD320GB SATA with the buffer I/O errors and 1452 bad sectors:

Badblocks runs a write cycle first for a hex pattern of 0xaa (10 x 16 + 10 = 170) or 10101010 in binary, then read tests it back in the 2nd stage. Whether it can pass the failed SMART tests after the badblocks run remains to be seen...

For a 160GB IDE drive, stage 0xaa alone takes about 3 hrs, before further patterns are tested...yawn...7hrs ish to complete, but both 80 and 160GB drives were fine in all respects after:

The WD320 failed still after 30 hrs of badblocks, with none shown, but a new partition cannot be written to this disk at all, so overall is a fail, even though not completely dead.

Gparted shows a web summary of the fails:

GParted 0.18.0 --enable-libparted-dmraid --enable-online-resize

Libparted 2.3

Create Primary Partition #1 (ext4, 298.09 GiB) on /dev/sdc  00:00:07    ( ERROR )
calibrate New Partition #1  00:00:00    ( SUCCESS )
path: /dev/sdc-1
start: 63
end: 625137344
size: 625137282 (298.09 GiB)
create empty partition  00:00:00    ( SUCCESS )
path: /dev/sdc1
start: 2048
end: 625141759
size: 625139712 (298.09 GiB)
clear old file system signatures in /dev/sdc1  00:00:00    ( SUCCESS )
write 68.00 KiB of zeros at byte offset 0  00:00:00    ( SUCCESS )
write 4.00 KiB of zeros at byte offset 67108864  00:00:00    ( SUCCESS )
write 4.00 KiB of zeros at byte offset 274877906944  00:00:00    ( SUCCESS )
write 4.00 KiB of zeros at byte offset 320071528448  00:00:00    ( SUCCESS )
flush operating system cache of /dev/sdc  00:00:00    ( SUCCESS )
set partition type on /dev/sdc1  00:00:00    ( SUCCESS )
new partition type: ext4
create new ext4 file system  00:00:07    ( ERROR )
mkfs.ext4 -L "" /dev/sdc1
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
19537920 inodes, 78142464 blocks
3907123 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2385 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: 26/2385
mke2fs 1.42.9 (4-Feb-2014)

Warning, had trouble writing out superblocks.

========================================

Comments are closed.

Post Navigation