diff options
| author | Vito Graffagnino <vito@graffagnino.xyz> | 2020-09-08 18:10:49 +0100 |
|---|---|---|
| committer | Vito Graffagnino <vito@graffagnino.xyz> | 2020-09-08 18:10:49 +0100 |
| commit | 3b0142cedcde39e4c2097ecd916a870a3ced5ec6 (patch) | |
| tree | 2116c49a845dfc0945778f2aa3e2118d72be428b /vimwiki/Replacing A Failed Disk in a mdadm RAID.md | |
| parent | 8cc927e930d5b6aafe3e9862a61e81705479a1b4 (diff) | |
Added the relevent parts of the .config directory. Alss add ssh config
Diffstat (limited to 'vimwiki/Replacing A Failed Disk in a mdadm RAID.md')
| -rw-r--r-- | vimwiki/Replacing A Failed Disk in a mdadm RAID.md | 63 |
1 files changed, 63 insertions, 0 deletions
diff --git a/vimwiki/Replacing A Failed Disk in a mdadm RAID.md b/vimwiki/Replacing A Failed Disk in a mdadm RAID.md new file mode 100644 index 0000000..8f80365 --- /dev/null +++ b/vimwiki/Replacing A Failed Disk in a mdadm RAID.md @@ -0,0 +1,63 @@ + +If disk errors are reported there may be H/W problems with the disk. Check dmesg for the following type of errors: + +`[737961.360080] raid5_end_read_request: 64 callbacks suppressed` +`[737961.360087] md/raid:md125: read error corrected (8 sectors at 2722701256 on sdc1)` +`[737961.360093] md/raid:md125: read error corrected (8 sectors at 2722701264 on sdc1)` +`[737961.360095] md/raid:md125: read error corrected (8 sectors at 2722701272 on sdc1)` +`[737961.360098] md/raid:md125: read error corrected (8 sectors at 2722701280 on sdc1)` +`[737961.360100] md/raid:md125: read error corrected (8 sectors at 2722701288 on sdc1)` +`[737961.360102] md/raid:md125: read error corrected (8 sectors at 2722701296 on sdc1)` +`[737961.360105] md/raid:md125: read error corrected (8 sectors at 2722701304 on sdc1)` +`[737961.360107] md/raid:md125: read error corrected (8 sectors at 2722701312 on sdc1)` +`[737961.360109] md/raid:md125: read error corrected (8 sectors at 2722701320 on sdc1)` +`[737961.360112] md/raid:md125: read error corrected (8 sectors at 2722701328 on sdc1)` +`[742462.760119] md: md125: data-check done.` + +Use SMART to investigate the hard drive. + +`$ smartctl -i /dev/sdc` + +The drive can be tested via the following command + +`$ smartctl -t long /dev/sdc` + +The long test will take a while, there is also a short test which can be performed. +The results can be viewed using: + +`$ smartctl -l selftest /dev/sdc` +` ` +`smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.36.3.el7.x86_64] (local build)` +`Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org` +`` +`=== START OF READ SMART DATA SECTION ===` +`SMART Self-test log structure revision number 1` +`Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error` +`# 1 Extended offline Completed: read failure 40% 21930 2722703304` + +Thus this needs to be replaced. To find it can use hdparm to get the serial number. + +`$ hdparm -i /dev/sdc | grep SerialNo` +`Model=ST2000DM001-1ER164, FwRev=CC27, SerialNo=Z4Z5QAY5` + +so before shutting down and replacing the drive mdadm is used to mark the drive as failed and it can +be removed from the raid. + +`$ mdadm --manage /dev/md0 --fail /dev/sdc1` +`$ mdadm --manage /dev/md0 --remove /dev/sdc1` + +Before the old drive is removed the partition table can be dumped using: + +`$ sfdisk -d /dev/sdc > sdc.out` + +Once the new drive has been swapped in, the old partition table can then be used on the new drive: + +`$ sfdisk -d /dev/sdc < sdc.out` + +The new disk is now ready to be included in the raid: + +`$ mdadm --manage /dev/md125 --add /dev/sdc1` + +Finally can monitor the progress of the rebuild using: + +`$ cat /proc/mdstat` |
