Restoring data from a broken disk

example: #

broken disk: /dev/sdb
- was mounted on /mnt/faulty
external usb disk: /dev/sde
- formatted in ext4 and mounted on /mnt/recovery_8TB

what happened: #

the system started giving errors on a non redundant disk:

Device: /dev/sdb [SAT], 1025 Currently unreadable (pending) sectors
Device: /dev/sdb [SAT], 255 Offline uncorrectable sectors
Device: /dev/sdb [SAT], ATA error count increased from 306 to 903

trying to rsync the files to another disk led to errors and fails

data recovery with ddrescue: #

ddrescue can read the faulty disk to an image (or to another disk) while selecting different settings for aggressivity

we want to create a full disk image on the first pass, avoiding add too much stress on an already faulty drive,
so we'll tune our ddrescue settings to reflect that, for example:

ddrescue --idirect --no-scrape /dev/sdx sdx.img sdx.log

you see some options and settings:

--idirect: skip os caches and work directly on the device
--no-scrape: avoids stressing the disk
/dev/sdb: the faulty device
sdb.img: target disk image
sdb.log: targe disk

the sdx.log file is very important beacause it maps the bad parts of the disk, so in a second pass we can void rescanning the whole drive but we can just insist on the faulty parts

in the subsequent passes we can omit "--idirect" since we already have a full disk image and we want to recover as much as we can, even breaking the drive:

ddrescue --idirect -r3 /dev/sdx sdx.img sdx.log

the only difference with the first pass has been removing "--no-scrape" and adding "-r3", which is the number of retries we want to do on a bad sector

actual data recovery: #

umount /dev/sdb

we are working on the external sub drive, at /mnt/recovery_8TB:

first pass:

ddrescue --idirect --no-scrape /dev/sdb sdb.img sdb.log

output:

Current status
     ipos:   60025 MB, non-trimmed:   131072 B,  current rate:    166 MB/s
     opos:   60025 MB, non-scraped:        0 B,  average rate:  48489 kB/s
     ipos:    1781 GB, non-trimmed:        0 B,  current rate:   13312 B/s
     opos:    1781 GB, non-scraped:    1213 kB,  average rate:  72448 kB/s
non-tried:        0 B,  bad-sector:    93184 B,    error rate:     170 B/s
  rescued:    2000 GB,   bad areas:      145,        run time:  7h 39m 25s
pct rescued:   99.99%, read errors:      306,  remaining time:      1m 35s
                              time since last successful read:          0s
Finished

second pass:

ddrescue --idirect -r3 /dev/sdb sdb.img sdb.log

output:

Current status
     ipos:    1781 GB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:    1781 GB, non-scraped:        0 B,  average rate:      79 B/s
non-tried:        0 B,  bad-sector:   524800 B,    error rate:     170 B/s
  rescued:    2000 GB,   bad areas:      182,        run time:  2h 44m 23s
pct rescued:   99.99%, read errors:     3924,  remaining time:         n/a
                              time since last successful read:     30m 35s
Finished

transferring recovered data: #

now we can move the data from the image file we created to another mountpoint,
to do that we need to mount the image file

detect partitions on the image file:

kpartx -av sdb.img

look what your loop device looks like:

root@machine:/mnt/recovery_8TB# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0   1,8T  0 loop  
`-loop0p1 253:0    0   1,8T  0 part

mount the image in ReadOnly mode:

mount -o loop,ro /dev/mapper/loop0p1 /mnt/recovery_8TB/restored_volume

rsync the data somewhere else

rsync -avh --progress /mnt/recovery_8TB/restored_volume/ /mnt/recovery_8TB/recovered_data/

compare data: #

tree -pugsDx -o tree_faulty.txt -a /mnt/faulty/

tree -pugsDx -o tree_restored.txt -a /mnt/recovery_8TB/recovered_data/

diff tree_*

tree:

p: Print the protections for each file.
u: Displays file owner or UID number.
g: Displays file group owner or GID number.
s: Print the size in bytes of each file.
D: Print the date of last modification
x: do not traverse filesystems

sources:

Previous: Proxmox LXC Containers