Thermal reading wrong

I seem to be getting weird readings from the CPU temperature sensors:

  root@amlogic-s905x:~# while true; do sleep 1; cat /sys/class/thermal/thermal_zon
e0/temp ; done                                                                  
                   
86000                                                                           
71000                                                                           
71000                                                                           
76000                                                                           
75000                                                                           
71000                                                                           
86000                                                                           
86000                                                                           
-1000                                                                           
86000                 

This is also causing a lot of kernel noise on the console, like

[ 45.362429@0] thermal thermal_zone0: temp:-1000, hyst:5000, trip_temp:85000,

My setup:
Linux ‘ubuntu’ branch from https://github.com/khadas/linux

Is anyone else seeing this?

You can comment this printing first, we will research this and commit to Github then:

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index b934f3c..45dd264 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -439,9 +439,9 @@ static void handle_critical_trips(struct thermal_zone_device
                        tz->enter_hot = 0;
                else
                        tz->enter_hot++;
-               dev_info(&tz->device,
-                        "temp:%d, hyst:%ld, trip_temp:%ld, hot:%d\n",
-                        tz->temperature, hyst, trip_temp, tz->enter_hot);
+               //dev_emerg(&tz->device,
+                        //"temp:%d, hyst:%ld, trip_temp:%ld, hot:%d\n",
+                        //tz->temperature, hyst, trip_temp, tz->enter_hot);
                if (tz->ops->notify)
                        tz->ops->notify(tz, trip, trip_type);
        }

Thanks for the quick answer. ‘dmesg -n 5’ does the job, too. I’m also seeing this with the default android setup the board came with.
I made some more tests, just to see if the temperature values somewhat correlate to the CPU heat. When all cores are at 100% load, some of the values show a tendency to increase, but there are still false ones, besides, they are likely way too high, as if they were in °F. The occasionally occuring value of -1000 is confusing…
Not high priority for me, but I just wanted to know if a sensor is broken, or my dts flaky.

Same printing here on Ubuntu, but no on Android.

Will figure out it asap when we back from Chinese Spring Festival.

Hi, M.S.:
We’ve just figured out the issue, and have committed the code to our Github, just sync the u-boot source code to resolve it.

If your problem has been fixed, please follow the Problem Solved Button to mark it as solved.

Good luck & thanks!

root@Khadas:~# while true; do sleep 1; cat /sys/class/thermal/thermal_zone0/temp ;done
44000
44000
44000
44000
44000
44000
44000
44000
44000
44000
44000
43000
44000
44000
44000
44000
44000
44000
44000
44000
43000
43000
43000
43000
43000
43000
43000
43000
43000
44000
44000
44000
43000
44000
43000
44000
44000
43000
43000
43000
43000
43000
44000
43000
44000
44000
43000
43000
43000
44000
43000
43000
43000
43000
44000
44000
43000
44000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
44000
44000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
44000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000
43000

...

44000
43000
43000
^C
root@Khadas:~# 

I added to my kernel (branch S905X) this patch to correct the output temperature. Checked (TV box S905 not VIM), don’t see negative values. But sometimes it skips in the total flow is clearly overstated values. For example the overall flow of 46-47 Degrees and 1 line 62 Degrees.

2 Likes

for VIM: it’s not related with kernel, it’s about u-boot: should enable SARADC for temperature sensor.

Hi Gouwa,

thanks for that hint, indeed my u-boot throws an error on the saradc command. Will look into fixing this. Let’s consider it solved.

Greetings & thanks again!

Marc sent me an improved patch that corrects the output error -10000. I collected a kernel with this patch checked, the output is working correctly.

I’ve check the patch you commit, I don’t think it’s the good way to done like that, as:

  • the patch just use the last temp value when got the error temp value(-1000)
  • I think the right way to solve it is: figure out why that will get the error temp value.

PS: enable SARADC on the u-boot will fix up the error temp value issue.

IMHO (i’m does not impose, only expressing my point of view). You think if you just delete used variable (data) from the code, it’s better than to delete only the incorrect value ? Just for your information (experience with many platforms) the appearance of erroneous values is not so often and the temperature does not take off instantly, so using the previous value, it is better than having the wrong value. I think, until a complete solution of this error, the patch Marca, preferable to the exception variable.

I didn’t mean the way to handle the error values:

  • It’s a good and right way to use the last value if got a error value
  • But in this case, I mean the best way is to figure out why there will get a wrong/error temp value (-1000)
  • I’ve tested here and found the error temp value will not appear again if enable SARARC on u-boot

Hi Gouwa balbes150

The patch I implemented is indeed not a definitive measure. It is a tmp workaround. But, on my box -1000 is every now and then set by get_cpu_temp drivers/amlogic/thermal/aml_thermal_hw.c. This is some kind of error.
The unfortunate side effect is that it triggers trip code and throttles CPU down for no good reason.
So the fix is a hack and one must rather figure out why, every now and then we get the error temp value. On the other hand, my system runs much smoother now. Note that the -1000 could occurr in sequences of cold temperatures around 40deg. But code treats -1000 as hot.
Perhaps a comment in the code would not hurt to indicate the workaround nature.

Thanks

Can you have a try to enable SARADC on U-Boot? I think this is the solution.

you mean I should edit the dts file?

Nope, just run saradc open 1 on u-boot before load the kernel Image.

If you are running on Khadas VIM, just update the newest u-boot to fix it up:

Thanks for kind help. I am not u-boot expert.
I run s905 minimx-g gentoo and use aml_autoscript s905_autoscript in /boot kindly provided by balbes.
I have no clue which one to change and how. Perhaps balbes can help,
If the change can be done in s905_autoscript.cmd then I can try (if I know exactly what to write there). Sorry for noobism …
here is all I see about saradc
% fw_printenv | grep -i saradc
upgrade_key=saradc open 0; if saradc get_in_range 0x0 0x50; then echo detect upgrade key; run update;fi;

As you can see, there are three ADC channels on S905X, and the one(channel 0) you figure out is for upgrade button(Function button on VIM), but I guess the temp sensor of S905X locate at channel 1.

For Armbian, you can have a try with following by editing autoscript:

Current:

setenv boot_start booti ${kernel_loadaddr} ${initrd_loadaddr} ${dtb_mem_addr}

New:

setenv boot_start "saradc open 1; booti ${kernel_loadaddr} ${initrd_loadaddr} ${dtb_mem_addr}"

I didn’t have a test above yet, and there may be a better way to achieve that, just a reference for you.

Good luck.

1 Like

Thanks!
I’ll try that and I will add a dev_info as test to check if -1000 ever occurs again
keep you informed.

Note again that I’m not sure whether this way fit for S905 or not, just tested on VIM/S905X.

1 Like