7.2 Thermal Management Framework

Linux divides a System-on-Chip (SoC) into multiple thermal zones that correspond to an area in the silicon die where the temperature is deemed uniform. On SAMA7G54, one single thermal zone is available (thermal_zone0), which means that the temperature is uniform across the die. The thermal zone properties can be displayed with the following command:

# ls /sys/class/thermal/thermal_zone0/
available_policies...........k_pu.............................trip_point_0_type
cdev0........................mode.............................trip_point_1_hyst
cdev0_trip_point.............offset...........................trip_point_1_temp
cdev0_weight.................policy...........................trip_point_1_type
cdev1........................power............................trip_point_2_hyst
cdev1_trip_point.............slope............................trip_point_2_temp
cdev1_weight.................subsystem........................trip_point_2_type
integral_cutoff..............sustainable_power................type
k_d..........................temp.............................uevent
k_i..........................trip_point_0_hyst
k_po.........................trip_point_0_temp

The SAMA7G54 die temperature can be read from the “temp” file:

# cat /sys/class/thermal/thermal_zone0/temp
55201

This result expressed in millidegrees Celsius provides the SAMAG54 die temperature (55.201°C).

Among the important attributes of a thermal zone are the temperature trip points. These points define when the Linux thermal governor makes decisions to start and stop cooling down the device (CPU frequency decrease, fan activation, etc.). These trip points may be of the following types:

  • "active": a trip point to enable active cooling (ex.: external fan)
  • "passive": a trip point to enable passive cooling (ex.: reduced CPU frequency)
  • "hot": a trip point to notify emergency
  • "critical": hardware not reliable. The CPU goes into Power-down mode.

The policies available to the thermal governor are listed below. By default, the policy is set to “step wise”.

  • step wise: open-loop control, based on temperature threshold and trend. Walks through each cooling device cooling state, step by step.
  • fair share: weight-based. Determines the cooling device state, based on assigned weight partitioning.
  • bang bang: uses a hysteresis to abruptly switch on or off a cooling device. It is intended to control fans which cannot be throttled but just switched on or off.
  • power allocator: closed-loop control, based on power budget, temperature, and current power consumption of each involved device.
  • user space: hands off the control of a thermal zone to the user space.

As an example, the following settings can be used as a starting point for an industrial application. They may be adjusted depending on each system specificities:

  • Trip point 0 “passive” (DVFS CPUfreq) at 90°C
  • Trip point 1 “hot” at 95°C
  • Trip point 2 “critical” at 100°C

These values account for the temperature sensor accuracy (±5°C).