Question Why there are 2 features in NVMe which does the same task .i.e Monitoring the temperature

G_Balaji

Junior Member
Sep 7, 2020
3
0
6
Hi Guys,
I have few doubts over the “Host Controller Thermal Management” and “Temperature Threshold” feature. Why there 2 different features which does the same task. i.e Monitoring the temperature. I understand HCTM does the throttling based on the temperature & Temperature threshold just raises an event. Why not the throttling can be done while handling the async event.

A) Host Controlled Thermal Management:
This feature allows the host system to specify 2 temperature thresholds[ Thermal Management Temperature 1 & Thermal Management Temperature 2 ] at which the drive should perform light and heavy throttling to reduce the drive's temperature. When the composite temperature goes beyond TMT1 then light throttling is performed and if it goes beyond TMT2 then heavy throttling is performed.


B) Temperature Threshold (Feature Identifier 04h)
There are 9 temperature values in the SMART / Health Information log (i.e., the Composite Temperature and Temperature Sensor 1 through Temperature Sensor 8).
Each of this 9 temperature sensor has an associated under temperature & over temperature values.

The default value of the over temperature threshold feature for Composite Temperature is the value in the WCTEMP field in the Identify Controller.
The default value of the under temperature threshold feature for Composite Temperature is implementation specific.
These values can be changed using set feature command.

When a temperature is greater than or equal to its corresponding over temperature threshold or less than
or equal to its corresponding under temperature threshold, then bit one of the Critical Warning field in the
SMART / Health Information Log is set to one. This may trigger an asynchronous event.
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Host Controlled Thermal Management is configured by the host system. It's for cases where the host system wants to impose lower temperature limits on the SSD than its own self-imposed limits. The regular SMART temperature thresholds are set by the drive manufacturer rather than by the operating system of whatever machine the SSD ends up installed in.

HCTM is an optional feature that not all drives bother to implement, and that operating systems don't necessarily make use of unless they have extra knowledge of the overall system's temperature and cooling capabilities. By contrast, the basic SMART temperature monitoring and throttling thresholds is pretty much necessary for the drive to be a reasonably robust device capable of protecting itself from damage.