mirror of
https://github.com/RGBCube/Site
synced 2025-07-31 13:07:46 +00:00
269 lines
7.2 KiB
Markdown
269 lines
7.2 KiB
Markdown
---
|
|
title: "Why more `/sys/class/hwmon/*/temp*_label` than CPU cores?"
|
|
|
|
color: "#99CF9F"
|
|
thumbnail: /assets/images/cpu-dumb.webp
|
|
|
|
tags:
|
|
- hardware
|
|
---
|
|
|
|
So, I am currently working on a Linux tool+daemon to replace
|
|
[`auto-cpufreq`](https://github.com/AdnanHodzic/auto-cpufreq) with a more
|
|
efficient, tunable and observable alternative.
|
|
|
|
And obviously, to tune a CPU well, knowing how hot it is is a requirement.
|
|
|
|
Then, how do you actually see how hot your CPU is in Linux? Using
|
|
`/sys/class/hwmon` of course.
|
|
|
|
This path contains hardware monitoring devices. In the laptop I'm testing this
|
|
on, `hwmon4` under this directory corresponds to the CPU.
|
|
|
|
Let's run a `tree`:
|
|
|
|
```no-highlight
|
|
/sys/class/hwmon/hwmon4 -> ../../devices/platform/coretemp.0/hwmon/hwmon4
|
|
├── device -> ../../../coretemp.0
|
|
│ ├── driver_override
|
|
│ ├── hwmon
|
|
│ ├── modalias
|
|
│ ├── power
|
|
│ ├── subsystem -> ../../../bus/platform
|
|
│ └── uevent
|
|
├── name
|
|
├── power
|
|
│ ├── autosuspend_delay_ms
|
|
│ ├── control
|
|
│ ├── runtime_active_time
|
|
│ ├── runtime_status
|
|
│ └── runtime_suspended_time
|
|
├── subsystem -> ../../../../../class/hwmon [recursive, not followed]
|
|
├── temp10_crit
|
|
├── temp10_crit_alarm
|
|
├── temp10_input
|
|
├── temp10_label
|
|
├── temp10_max
|
|
├── temp14_crit
|
|
├── temp14_crit_alarm
|
|
├── temp14_input
|
|
├── temp14_label
|
|
├── temp14_max
|
|
├── temp18_crit
|
|
├── temp18_crit_alarm
|
|
├── temp18_input
|
|
├── temp18_label
|
|
├── temp18_max
|
|
├── temp1_crit
|
|
├── temp1_crit_alarm
|
|
├── temp1_input
|
|
├── temp1_label
|
|
├── temp1_max
|
|
├── temp22_crit
|
|
├── temp22_crit_alarm
|
|
├── temp22_input
|
|
├── temp22_label
|
|
├── temp22_max
|
|
├── temp26_crit
|
|
├── temp26_crit_alarm
|
|
├── temp26_input
|
|
├── temp26_label
|
|
├── temp26_max
|
|
├── temp2_crit
|
|
├── temp2_crit_alarm
|
|
├── temp2_input
|
|
├── temp2_label
|
|
├── temp2_max
|
|
├── temp30_crit
|
|
├── temp30_crit_alarm
|
|
├── temp30_input
|
|
├── temp30_label
|
|
├── temp30_max
|
|
├── temp34_crit
|
|
├── temp34_crit_alarm
|
|
├── temp34_input
|
|
├── temp34_label
|
|
├── temp34_max
|
|
├── temp35_crit
|
|
├── temp35_crit_alarm
|
|
├── temp35_input
|
|
├── temp35_label
|
|
├── temp35_max
|
|
├── temp36_crit
|
|
├── temp36_crit_alarm
|
|
├── temp36_input
|
|
├── temp36_label
|
|
├── temp36_max
|
|
├── temp37_crit
|
|
├── temp37_crit_alarm
|
|
├── temp37_input
|
|
├── temp37_label
|
|
├── temp37_max
|
|
├── temp38_crit
|
|
├── temp38_crit_alarm
|
|
├── temp38_input
|
|
├── temp38_label
|
|
├── temp38_max
|
|
├── temp39_crit
|
|
├── temp39_crit_alarm
|
|
├── temp39_input
|
|
├── temp39_label
|
|
├── temp39_max
|
|
├── temp40_crit
|
|
├── temp40_crit_alarm
|
|
├── temp40_input
|
|
├── temp40_label
|
|
├── temp40_max
|
|
├── temp41_crit
|
|
├── temp41_crit_alarm
|
|
├── temp41_input
|
|
├── temp41_label
|
|
├── temp41_max
|
|
├── temp42_crit
|
|
├── temp42_crit_alarm
|
|
├── temp42_input
|
|
├── temp42_label
|
|
├── temp42_max
|
|
├── temp43_crit
|
|
├── temp43_crit_alarm
|
|
├── temp43_input
|
|
├── temp43_label
|
|
├── temp43_max
|
|
├── temp44_crit
|
|
├── temp44_crit_alarm
|
|
├── temp44_input
|
|
├── temp44_label
|
|
├── temp44_max
|
|
├── temp45_crit
|
|
├── temp45_crit_alarm
|
|
├── temp45_input
|
|
├── temp45_label
|
|
├── temp45_max
|
|
├── temp46_crit
|
|
├── temp46_crit_alarm
|
|
├── temp46_input
|
|
├── temp46_label
|
|
├── temp46_max
|
|
├── temp47_crit
|
|
├── temp47_crit_alarm
|
|
├── temp47_input
|
|
├── temp47_label
|
|
├── temp47_max
|
|
├── temp48_crit
|
|
├── temp48_crit_alarm
|
|
├── temp48_input
|
|
├── temp48_label
|
|
├── temp48_max
|
|
├── temp49_crit
|
|
├── temp49_crit_alarm
|
|
├── temp49_input
|
|
├── temp49_label
|
|
├── temp49_max
|
|
├── temp6_crit
|
|
├── temp6_crit_alarm
|
|
├── temp6_input
|
|
├── temp6_label
|
|
├── temp6_max
|
|
└── uevent
|
|
```
|
|
|
|
Let's `cat` all the `_label` files:
|
|
|
|
```no-highlight
|
|
/sys/class/hwmon/hwmon4/temp1_label:
|
|
Package id 0
|
|
/sys/class/hwmon/hwmon4/temp2_label:
|
|
Core 0
|
|
/sys/class/hwmon/hwmon4/temp6_label:
|
|
Core 4
|
|
/sys/class/hwmon/hwmon4/temp10_label:
|
|
Core 8
|
|
/sys/class/hwmon/hwmon4/temp14_label:
|
|
Core 12
|
|
/sys/class/hwmon/hwmon4/temp18_label:
|
|
Core 16
|
|
/sys/class/hwmon/hwmon4/temp22_label:
|
|
Core 20
|
|
/sys/class/hwmon/hwmon4/temp26_label:
|
|
Core 24
|
|
/sys/class/hwmon/hwmon4/temp30_label:
|
|
Core 28
|
|
/sys/class/hwmon/hwmon4/temp34_label:
|
|
Core 32
|
|
/sys/class/hwmon/hwmon4/temp35_label:
|
|
Core 33
|
|
/sys/class/hwmon/hwmon4/temp36_label:
|
|
Core 34
|
|
/sys/class/hwmon/hwmon4/temp37_label:
|
|
Core 35
|
|
/sys/class/hwmon/hwmon4/temp38_label:
|
|
Core 36
|
|
/sys/class/hwmon/hwmon4/temp39_label:
|
|
Core 37
|
|
/sys/class/hwmon/hwmon4/temp40_label:
|
|
Core 38
|
|
/sys/class/hwmon/hwmon4/temp41_label:
|
|
Core 39
|
|
/sys/class/hwmon/hwmon4/temp42_label:
|
|
Core 40
|
|
/sys/class/hwmon/hwmon4/temp43_label:
|
|
Core 41
|
|
/sys/class/hwmon/hwmon4/temp44_label:
|
|
Core 42
|
|
/sys/class/hwmon/hwmon4/temp45_label:
|
|
Core 43
|
|
/sys/class/hwmon/hwmon4/temp46_label:
|
|
Core 44
|
|
/sys/class/hwmon/hwmon4/temp47_label:
|
|
Core 45
|
|
/sys/class/hwmon/hwmon4/temp48_label:
|
|
Core 46
|
|
/sys/class/hwmon/hwmon4/temp49_label:
|
|
Core 47
|
|
```
|
|
|
|
Notice something? I do:
|
|
|
|
- `temp1_label` is `Package id 0` - what is that?
|
|
- The core numbers make no sense. This device only has 32 cores, there shouldn't
|
|
be any gaps within numbers and the numbers shouldn't go that high.
|
|
|
|
The explanation for the first point is simple, looking at the kernel
|
|
[`coretemp.c`](https://github.com/torvalds/linux/blob/ec7714e4947909190ffb3041a03311a975350fe0/drivers/hwmon/coretemp.c#L348)
|
|
implementation, we can see that it is the temperature of the CPU as a whole:
|
|
|
|
```c
|
|
static ssize_t show_label(
|
|
struct device *dev,
|
|
struct device_attribute *devattr,
|
|
char *buf
|
|
) {
|
|
struct platform_data *pdata = dev_get_drvdata(dev);
|
|
struct temp_data *tdata = container_of(devattr, struct temp_data, sd_attrs[ATTR_LABEL]);
|
|
|
|
if (is_pkg_temp_data(tdata))
|
|
return sprintf(buf, "Package id %u\n", pdata->pkg_id);
|
|
|
|
return sprintf(buf, "Core %u\n", tdata->cpu_core_id);
|
|
}
|
|
```
|
|
|
|
That leaves us the second question. Why do we have CPUs 1, 2, 6, 10, 14, 18, 22,
|
|
26, 30, 34-49, instead of the expected 0-31?
|
|
|
|
It turns out that CPU It's common for chip manufacturers to disable faulty or
|
|
degraded cores before shipping. Or if a lower tier SKU[^Stock Keeping Unit] is
|
|
selling more, cores of higher tier SKUs are disabled to match expectations.
|
|
|
|
This process is called `binning`, and it exists to not waste silicon. Since
|
|
silicon manufacturing is not perfect, some CPUs are more faulty, thus slower
|
|
than others. The process determines how faulty a CPU is and sorts them into
|
|
"bins". This is also why lower tier CPUs who are almost exactly the same as
|
|
higher tier CPUs exist.
|
|
|
|
The likelihood of faulty silicon also increases with the smaller the
|
|
architecture size gets (the Apple M4 is 4nm, which is crazy), so this method of
|
|
recycling worse chips is becoming much more valuable by the day.
|
|
|
|
So, in summary the 32 core CPU I was testing this on was most likely just the 64
|
|
core version with a bunch of the cores disabled.
|