Dysk wypadający z macierzy

Dysk wypadający z macierzy

Witam,

Dziś kolejny raz dysk wypadł mi z macierzy w związku z tym proszę o pomoc w interpretacji tego co się dzieje.
Po ostatniej odbudowie myślałem, że będzie ok a dziś nawet smart nie za bardzo chciał odpowiadać:

Cytat:

[root@server ~]# smartctl -a /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more ‚-T permissive’ options.
[root@server ~]# smartctl -T permissive -a /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Short INQUIRY response, skip product id
SMART Health Status: OK
Read defect list: asked for grown list but didn’t get it

Error Counter logging not supported
Device does not support Self Test logging


Uruchomiłem serwer w rescue i tryb graficzny pokazywał mi dysk sda na czerwonym tle ale nie znalazłem opisu czy ma to jakieś znaczenia chociaż nigdy wcześniej w ten sposób dysku podświetlonego nie miałem.
Udało mi się jednak wyświetlić dane smart:

Cytat:

smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDS723020BLA642
Serial Number: MN5220F31MKPXK
Firmware Version: MN6OA5C0
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Sep 27 01:26:53 2011 CEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20377) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 096 096 016 Pre-fail Always – 327683
2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline – 90
3 Spin_Up_Time 0x0007 152 152 024 Pre-fail Always – 406 (Average 347)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always – 12
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always – 6
7 Seek_Error_Rate 0x000b 089 089 067 Pre-fail Always – 12
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline – 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always – 1397
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always – 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 12
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always – 264
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always – 264
194 Temperature_Celsius 0x0002 171 171 000 Old_age Always – 35 (Lifetime Min/Max 21/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always – 15
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always – 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline – 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always – 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 252 –
# 2 Short offline Completed without error 00% 6 –
# 3 Short offline Completed without error 00% 4 –
# 4 Short offline Completed without error 00% 4 –
# 5 Short offline Completed without error 00% 0 –

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


po krótkim teście wyglądał już tak :

Cytat:

smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDS723020BLA642
Serial Number: MN5220F31MKPXK
Firmware Version: MN6OA5C0
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Sep 27 02:34:01 2011 CEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x85) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20377) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always – 0
2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline – 90
3 Spin_Up_Time 0x0007 152 152 024 Pre-fail Always – 406 (Average 347)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always – 12
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always – 6
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always – 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline – 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always – 1398
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always – 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 12
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always – 264
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always – 264
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always – 36 (Lifetime Min/Max 21/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always – 15
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always – 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline – 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always – 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1397 –
# 2 Short offline Completed without error 00% 1397 –
# 3 Short offline Completed without error 00% 1397 –
# 4 Short offline Completed without error 00% 1397 –
# 5 Short offline Completed without error 00% 252 –
# 6 Short offline Completed without error 00% 6 –
# 7 Short offline Completed without error 00% 4 –
# 8 Short offline Completed without error 00% 4 –
# 9 Short offline Completed without error 00% 0 –

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Dla porównania wkleję sdb:

Cytat:

smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: Hitachi HDS723020BLA642
Serial Number: MN5220F31SMU6K
Firmware Version: MN6OA5C0
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Sep 27 01:26:57 2011 CEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20377) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always – 0
2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline – 96
3 Spin_Up_Time 0x0007 152 152 024 Pre-fail Always – 406 (Average 349)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always – 12
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always – 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always – 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline – 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always – 1397
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always – 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 12
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always – 15
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always – 15
194 Temperature_Celsius 0x0002 176 176 000 Old_age Always – 34 (Lifetime Min/Max 21/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always – 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always – 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline – 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always – 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1394 –
# 2 Short offline Completed without error 00% 252 –
# 3 Short offline Completed without error 00% 6 –
# 4 Short offline Completed without error 00% 4 –
# 5 Short offline Completed without error 00% 4 –
# 6 Short offline Completed without error 00% 0 –

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Dodatkowo w logach mam :

Cytat:

Sep 25 23:52:16 ns383655 kernel: sd 0:0:0:0: [sda] Unhandled error code
Sep 25 23:52:16 ns383655 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 25 23:52:16 ns383655 kernel: sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 e2 c5 c7 02 00 00 08 00
Sep 25 23:52:16 ns383655 kernel: end_request: I/O error, dev sda, sector 3804612354
….
Sep 26 22:46:25 ns383655 kernel: sd 0:0:0:0: [sda] Unhandled error code
Sep 26 22:46:25 ns383655 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 26 22:46:25 ns383655 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Sep 26 22:46:25 ns383655 kernel: end_request: I/O error, dev sda, sector 0
Sep 26 22:46:25 ns383655 kernel: Buffer I/O error on device sda, logical block 0
Sep 26 22:46:25 ns383655 kernel: Buffer I/O error on device sda, logical block 3
Sep 26 22:51:45 ns383655 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO


Proszę o pomoc lub nakierowanie na właściwą interpretację problemu.

Pozdrawiam,
SinuS

Comments are closed.