Solid-State-Drives are getting more and more common in ESXi Hosts. They are used for caching (vFlash Read Cache, PernixData FVP), Virtual SAN or plain Datastores. A problem that comes with SSDs is their limited lifetime per cell. Depending on their technology, each cell can be overwritten from 1.000 times in consumer TLC SSDs up to 100.000 times in enterprise SLC based SSDs.
The value to keep an eye on is the guaranteed TBW (Total Bytes Written or Terabytes Written) which is typically provided by the vendor in their specifications. This value describes how many Terabytes can be written to the entire device, until the warranty expires. The current value can be readout with S.M.A.R.T. in the Total_LBAs_Written field.
Unfortunatelly, VMware makes it hard to readout RAW S.M.A.R.T values on ESXi hosts. For that reason I've ported a version of smartctl, which is part of smartmontools to ESXi. I've made the package available as VIB. The download link is at the bottom of this post.
First of all, let's get started what you can see on an ESXi Host regarding to endurance without smartctl. In this example I'm using a Samsung SSD 850 EVO M.2 250GB which is currently in use as a local Datastore. Warranty for this device is 75TBW. Just mentioning that this is a consumer grade SSD. The lowest Endurance Class for Virtual SAN for exmaple starts at 365TBW.
ESXCLI can display S.M.A.R.T stats with
esxcli storage core device smart get -d [device]
# esxcli storage core device smart get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Parameter Value Threshold Worst ---------------------------- ----- --------- ----- Health Status OK N/A N/A Media Wearout Indicator N/A N/A N/A Write Error Count N/A N/A N/A Read Error Count N/A N/A N/A Power-on Hours 99 0 99 Power Cycle Count 99 0 99 Reallocated Sector Count 100 10 100 Raw Read Error Rate N/A N/A N/A Drive Temperature N/A N/A N/A Driver Rated Max Temperature 49 0 34 Write Sectors TOT Count 100 0 100 Read Sectors TOT Count N/A N/A N/A Initial Bad Block Count N/A N/A N/A
What do these values mean? Actually only that the drive is "healthy". It does not provide the information we are looking for. ESXi also keeps track fo the health status with the smartd and writes the status to /var/log/syslog.log like in the following example:
2016-05-18T14:54:23Z smartd: [warn] t10.ATA_____ST9500325AS_________________________________________S2WB2XXB: above TEMPERATURE threshold (40 > 0)
ESXCLI can also display device stats, which are very close to what we are looking for:
# esxcli storage core device stats get -d t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Device: t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ Successful Commands: 93483233 Blocks Read: 205579211 Blocks Written: 2123298938 Read Operations: 3240880 Write Operations: 90144369 Reserve Operations: 39107 Reservation Conflicts: 0 Failed Commands: 22 Failed Blocks Read: 0 Failed Blocks Written: 0 Failed Read Operations: 0 Failed Write Operations: 0 Failed Reserve Operations: 0
ESXi keeps track of all read and write operations to the disk. These counters are reset when ESXi is rebooted. So this does not help to determine wear leveling either.
And here comes smartctl into play:
# smartctl -d sat --all /dev/disks/t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO M.2 250GB Serial Number: S24BNXAG805065D LU WWN Device Id: 5 002538 d404b9f9f Firmware Version: EMT21B6Q User Capacity: 250,059,350,016 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed May 16 15:25:26 2016 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled [...] SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5039 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 35 177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always - 122 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 049 034 000 Old_age Always - 51 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 26 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 6343034492
In the SMART Attributes section, we can find with the ID #241 our Total_LBAs_Written value. This value needs to be multiplied with the sector size which is 512 bytes and divided by 1099511627776 (1024^4) to get Terabytes.
Total_LBAs_Written * Sector Size / 1024^4 = TBW
6343034492 * 512 / 1099511627776 = 2.95 TBW
I've used 3 TBW from my guaranteed 75 TBW. According to Power_On_Hours, which can be found in SMART ID #9, the device is in use since about 200 days (24/7 online of course). Guess I have another 13 years to go...
This also proves that the value in "esxcli storage core device stats get" is wrong, respectively only counted since the last reboot. Blocks written according to this command is 2123298938 which results in about 1TB.
How to get smartctl
!!! Please note that the use of this VIB is absolutely unsupported. Use at your own risk !!!
I've tested the package with ESXi 6.0 only
- Download smartctl-6.6-4321.x86_64.vib
- Copy the VIB to the /tmp/ directory of an ESXi host
- SSH to the ESXi host
- Set the VIB acceptance level to CommunitySupported
# esxcli software acceptance set --level=CommunitySupported
- Install the package (Maintenance Mode or Reboot is not required)
#esxcli software vib install -v /tmp/smartctl-6.6-4321.x86_64.vib
The tool is located at /opt/smartmontools/smartctl and works just like the Linux version.
Locate physical disks with ls -l /dev/disks/
/opt/smartmontools/smartctl -d [Device Type] --all /dev/disks/[DISK]
# smartctl -d sat --all /dev/disks/t10.ATA_____Samsung_SSD_850_EVO_M.2_250GB___________S24BNXAG805065D_____ smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: Samsung SSD 850 EVO M.2 250GB Serial Number: S24BNXAG805065D LU WWN Device Id: 5 002538 d404b9f9f Firmware Version: EMT21B6Q User Capacity: 250,059,350,016 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 133) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5040 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 35 177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always - 122 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 049 034 000 Old_age Always - 51 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 26 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 6345601655 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum. SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 255 0 65535 Read_scanning was never started Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Looks like it doesnt work properly for SM951 NVMe device:
[root@esxi:/tmp] /opt/smartmontools/smartctl -d nvme --all /dev/disks/t10.NVMe____SAMSUNG_MZVPV512HDGL2D00000______________xxxxxxxxxxxxxx______00000001
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Function not implemented
I tried a scan but that failed:
[root@esxi:/tmp] /opt/smartmontools/smartctl --scan
Segmentation fault
I can get my hands on a system with a NVMe device tomorrow. I will look into it.
How's working with behind hardware raid controller?
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-5.5.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Smartctl open device: /dev/disks/naa.600605b00516c6971f219afb0f3cd956 [megaraid_disk_01] [SAT] failed: can't get bus number
Sadly i have the same error
Just out of curiosity what did you use for a compile environment for the static smartctl?
I'm currently using an aged CentOS 3.9 and was wondering if something newer was valid.
I've experimented with a couple other options but always seem to go back to that one for one reason or another.
(The latest addition to my custom local vib is a static version of whiptail for an experimental frontend to ghettoVCB)
I did not compile it by myself, I used the latest linux compatible nightly build from http://builds.smartmontools.org/ and made a vib package out of it.
tanks you for smartctl I install on all my esxi
(I find 3 Disk HS !!)
I would like to chek Disk after a megaraid
/opt/lsi/storcli/storcli -CfgDsply -a0 | grep "Device Id\|DISK"
Number of DISK GROUPS: 1
DISK GROUP: 0
Device Id: 5
Device Id: 4
/opt/smartmontools/smartctl -d sat+megaraid,5 -a /dev/disks/naa.600605b006eb32f01a806e721f93a9a4
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Smartctl open device: /dev/disks/naa.600605b006eb32f01a806e721f93a9a4 [megaraid_disk_05] [SAT] failed: can't get bus number
http://guides.ovh.com/LsiMegaraid remplacer MegaCli par storcli
All the best
I have this same problem, do you have solution?
So on my regular no ESXI hosts I use smartctl to check drive health behind raid controllers like so.
smarctl -a -d sat+megaraid,$deviceid /dev/sd[a-d]
Pingback: SSD Total Bytes Written Calculator | Virten.net
esxi 6.5
Installation Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
VIBs Installed: smartmontools_bootbank_smartctl_6.6-4321
VIBs Removed:
VIBs Skipped:
can't find smartctl, there is no /opt/smartmontools
smartctl is not working properly for NVMe drives, can you please let us know any alternative command do we have to get smart details for NVMe drive in esxcli.
[root@esxi:/tmp] /opt/smartmontools/smartctl -d nvme --all /dev/disks/t10.NVMe___MZVPV512HDGL2D00000______________xxxxxxxxxxxxxx______00000001
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.0.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Function not implemented
Hi,
First of all, congrats on the great work done on this, as well as your entire blog. I found it useful numerous times.
On the topic - it's worth mentioning that most vendors nowadays report the Total LBAs Written in 32MB Blocks. It's still the same Attribute with ID# 241. However, the calculation will be:
* 32 / 1024^2.
I did some additional testing and got this proactively reported to Zabbix via simple triggers and zabbix trapper in the latest 3.4 release. If interested, drop me a line.
Cheers.
It does not seem to work with ESX 6.7 (properly), it has a bunch of unknown attributes...
/opt/smartmontools/smartctl -d sat --all /dev/disks/t10.ATA_____INTEL_SSDSC2BW180A3L__________________00_CVCV224003TC180EGN__
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.7.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BW180A3L
Serial Number: CVCV224003TC180EGN
LU WWN Device Id: 5 001517 bb296e50b
Firmware Version: LE1i
User Capacity: 180,045,766,656 bytes [180 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jul 26 08:43:24 2018 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7f) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0021) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 067 067 000 Old_age Always - 29774 (38 130 0)
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 698
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 673
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 2
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 673
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 937268
226 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 65535
227 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 67
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 937268
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1984387
249 Unknown_Attribute 0x0013 100 100 000 Pre-fail Always - 22056
SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
What do you think a 240GB SSD having 80TBW endurance rating will last with normal workload?
It's hard to define "normal", but:
Datastore: Yes
vSAN: No
My esxi is set to use efi/secure boot. Am I correct that if acceptance level is set to community, esxi host will not boot with secure mode enabled?
Hi, works great!
How can we „bribe“ you to build a new version of the Tools with NVMe support? The built in commands even in 6.7 still don‘t cut it.
I've tried but even the latest version of smartctl does not work with NVMe on an ESXi host.
Hello.
What is the latest version of smartctl for ESXi 6.7?
Looks like it does not work with Samsung SSD NVMe.
[:/opt/smartmontools] ./smartctl -d nvme -H /dev/disks/t10.NVMe____SAMSUNG_MZPLL3T2HAJQ2D00005______________S4CCNA0M800098______00000001
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.7.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Function not implemented
[:/opt/smartmontools] ./smartctl -d nvme --all /dev/disks/t10.NVMe____SAMSUNG_MZPLL3T2HAJQ2D00005______________S4CCNA0M800098______00000001
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.7.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Function not implemented
[:/opt/smartmontools] ./smartctl -d nvme -x /dev/disks/t10.NVMe____SAMSUNG_MZPLL3T2HAJQ2D00005______________S4CCNA0M800098______00000001
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.7.0] (daily-20160510)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Function not implemented
I've found another way to get NVMe S.M.A.R.T.:
# get adapter name of storage device:
esxcfg-scsidevs --hba-device-list
# getting S.M.A.R.T.:
esxcli nvme device log smart get -A vmhba1
# getting raw data:
# Data Units Written: 0x35074e
# Converting raw data from hex to dec and calculating:
# "Data Units Written" * 512 / 1048576 = X.XX GBW
Which ESXi Version is used?
Should work with 6.5 and 6.7
Alternatively, instead of doing * 512 / 1048576, you can simply do * 2048 to reduce the amount of math. This comment really saved me for getting NVMe SMART data info on ESXi though.
Thank you! Here is my result in ESXi 6.7 with an ADATA NVMe drive:
[root@esxi:~] esxcli nvme device log smart get -A vmhba2
SMART And Health Info:
Available Spare Space Below Threshold: false
Temperature Warning: false
NVM Subsystem Reliability Degradation: false
Read Only Mode: false
Volatile Memory Backup Device Failure: false
Composite Temperature: 301 K
Available Spare: 100 %
Available Spare Threshold: 10 %
Percentage Used: 18 %
Data Units Read: 0x5cc257c
Data Units Written: 0x4dd3bbb
Host Read Commands: 0x4a312a2b
Host Write Commands: 0x6d749ad1
Controller Busy Time: 0x18036
Power Cycles: 0xaf
Power On Hours: 0x5e4f
Unsafe Shutdowns: 0x4f
Media Errors: 0x60
Number of Error Info Log Entries: 0x0
Warning Composite Temperature Time: 0 Mins
Critical Composite Temperature Time: 0 Mins
So, Data Units Written: 0x4dd3bbb in Hex is 81607611 in Dec
Totol write is 81607611*512/1048576/1024 = 38.91 TB
Nice work, it is possible to get the new release as vib? thx juergen
Hi, nice work. I'd like an update to the latest version to support json output.
I have currently compiled the latest version only it doesn't read values.
Can you share how you compiled it? Currently I've only done
./configure LDFLAGS="-static"
FYI, I am getting "Function not implemented"
[root@esx01:/tmp] ./smartctl -d ata -x /dev/disks/t10.ATA__DISK
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-6.5.0] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
Smartctl open device: /dev/disks/t10.ATA__DISKfailed: Function not implemented
Any plans on giving hints?
I'm guessing no?
Hi @fgrehl.
Smartmontools 7.1 was released 2019-12-30; any chance you could update the vib?
Or/also, could you please give the steps how you ported to/compiled the Smartmontools source for ESXi as a vib?
Many Thanks,
Pin
I've asked that too, but it's very quiet on the writer's end :(
I would be very interested in that too - since esxi 6.7 my scheduled self-tests wont work anymore.
Would be nice to know, how to port it from source files ...
Hi
Hope you can help how do i get smartools working for aacraid card please?
aacraid,H,L,ID, but not sure what command to run.
Any help appreciated
Many Thanks AP
I mean, my latest 8TB SSD is good for 2,888 TBW.
Hi!
Small oneliner sh script to print TBW for all disk (SATA), smartctl must be installed:
disk=0; for i in `ls /dev/disks/ -1A | grep -v : | grep t10`; do let "disk++" ; echo -e "\e[4mDisk n°$disk\e[0m"; echo $i | sed -r "s/_{2,}/ /g" | awk {'printf "Model => " $2 "\nSerial => " $3 "\nTBW => "'};
var=`/opt/smartmontools/smartctl -d sat --all /dev/disks/$i|grep Total_LBAs_Written | awk {'print $10'}`; [[ -z $var ]] && echo -e "\x1B[31mNaN\e[0m" || echo -e "\x1B[31m$(( $var * 512 / 1099511627776 )) TB\e[0m\n"; done
Script output :
Disk n°1
Model => CT240BX500SSD1
Serial => 2002E3E142FA
TBW => NaN
Disk n°2
Model => Samsung_SSD_870_QVO_1TB
Serial => S5SVNF0NC00481A
TBW => 0 TB
Disk n°3
Model => Samsung_SSD_870_QVO_1TB
Serial => S5SVNF0NC00494A
TBW => 0 TB
Disk n°4
Model => Samsung_SSD_870_QVO_1TB
Serial => S5SVNF0NC00511D
TBW => 0 TB
Disk n°5
Model => Samsung_SSD_870_QVO_1TB
Serial => S5SVNF0NC00514B
TBW => 16 TB
PS: The script does not support float, so you can't view the number of TBW under than 1T :)
I have 4 HDD and 1 NVMe drives on ESXi 6.7 and only one HDD shows the TBW, while all other 4 show NaN.
Thank you anyway!
Thank you for this post. I have used the details in the past with great success. Unfortunately the link to smartctl-6.6-4321.x86_64.vib now appears to be broken and I do not have a copy. Would it be possible to fix the link please, when it is convenient?
Thank you, J
I used this solution for a few years with ESXi 7 on my homeserver. I read the values in PRTG Network Monitor. This way i could track the health of my SSD. It actually worked, because the SSD broke last week and i got informed that the SMART values were decreasing.
I bought a new server, which is running ESXi 8. But the VIB from this blog isn't working anymore:
[ProfileValidationError]
In ImageProfile (Updated) ESXi-8.0U1-21495797-standard, the payload(s) in VIB smartmontools_bootbank_smartctl_6.6-4321 does not have sha-256 gunzip checksum. This will prevent VIB security verification and secure boot from functioning properly. Please remove this VIB or please check with your vendor for a replacement of this VIB
Please refer to the log file for more details.
Any way to make this work in 2023?
Run esxcli software vib remove --vibname=smartctl so that you can upgrade esxi and can then reinstall this after.
I've just had the same issue as @Frank on ESXi 8 with HPE extensions.