Vim kernel panic under load with 3.14.29

HI,
I have just received my Vim Pro, I have tried to do some compilations with make -j4 and some strees -c 4 with the ubuntu desktop 3.14.29 booting from mmc and booting from ssd with the https://github.com/150balbes version)

During the cpu load when the temperature is going to over 69/70 degree I get always a kernel panic.
With Android the board works fine.

this is the message: ( gpufreq_get_requested_power is the reason related to the GPU ??)

601.174933@0] [] kthread_data+0x24/0x2c
[ 601.180280@0] [] __schedule+0x4a4/0x654
[ 601.185624@0] [] schedule+0x2c/0x80
[ 601.190628@0] [] do_exit+0x584/0x974
[ 601.195716@0] [] die+0x18c/0x1a8
[ 601.200460@0] [] arm64_notify_die+0x40/0x7c
[ 601.206151@0] [] bad_mode+0x94/0xa8
[ 601.211156@0] [] gpufreq_get_requested_power+0x48/0xac
[ 601.217797@0] [] power_allocator_throttle+0x304/0x5d8
[ 601.224350@0] [] handle_thermal_trip+0x5c/0x1f8
[ 601.230387@0] [] thermal_zone_device_update+0x6c/0xb0
[ 601.236943@0] [] thermal_zone_device_check+0x1c/0x2c
[ 601.243414@0] [] process_one_work+0x148/0x438
[ 601.249276@0] [] worker_thread+0x140/0x

and

[ 599.140459@0] Call trace:
[ 599.143048@0] [< (null)>] (null)
[ 599.147887@0] [] power_allocator_throttle+0x304/0x5d8
[ 599.154438@0] [] handle_thermal_trip+0x5c/0x1f8
[ 599.160473@0] [] thermal_zone_device_update+0x6c/0xb0
[ 599.167028@0] [] thermal_zone_device_check+0x1c/0x2c
[ 599.173499@0] [] process_one_work+0x148/0x438
[ 599.179363@0] [] worker_thread+0x140/0x3d0
[ 599.184969@0] [] kthread+0xd8/0xf0

This triggered the thermal protection. Add a cooling system.

I also got this problem a lot if I run Armbian rootfs, but never on Ubuntu-Mate build by ourselves.

Did you update the latest U-Boot?

Did you notice the topic, we’ve solved this issue.

I understand that the user is in the process of intensive work, the actual temperature becomes more than 70 degrees. And he doesn’t have a fan and radiator on VIM.

Not all about that, we found that if the U-Boot doesn’t enable SARADC, the thermal sensor will got a wrong temperature value, as the thermal sensor built-in the CPU is based on the SARADC.

We’ve compared the new Ubuntu-mate ROM(will release soon) and the old Armbian ROM:

  • Ubuntu-mate: play video around 48 hours without any heat sink or cooling fan, the VIM didn’t crash
  • Armbian ROM built before: crash often

We will confirm again the Armbian ROM with new U-Boot to verify that.

What is the physical temperature (if the external control laser thermometer) on the surface of the processor ?
Good option of monitoring temperature using a thermal imager. Which can display all temperature zones on the processor (CPU GPU etc).

Can I get this version of u-boot ?

Check our Github for that.

I think the physical temp which we can measure from the CPU surface is different from the thermal sensor(inside of the CPU). And basically, the surface temp is around 55-60 degree when playing videos, and the environment temp is around 25-30 degree

You can write the details of the test (settings) ?

  1. The status of the case (the original as when shipped or no)
  2. Screen resolution (is the desktop resolution to 720 or more)
  3. Size of played video (window or full screen). Program through which to play the test video.
  4. What temperature monitoring issues in Ubuntu.
  5. What is the frequency of the processor outputs the monitor
  6. What used videoscoralie (if used) software or hardware.
  7. What temperature was shown by monitoring ?

I am using this version:
http://www.mediafire.com/file/8dyn2y9z9hz1f13/Vim_Uboot_170121.7z

I have used the command: stress -c 4
the compilation if use only 1 core or 2 core is fine, also stress -c 2 works fine.

Using the armbianmonitor -m during these operations the clock was 1.51 ghz, when the temperature is going to 69/70 celsius there is always a kernel panic,
I dont know if these value are the real value for the temp.
The cpu is without heatsink but I think that at 69 celsius the kernel should not going in kernel panic, or there is something that generate the kernel panic because the temp is growing up too fast ?

I dont think that play a video is a good testing method for this, usually the decoding process in software can’t use all the cores at the same time.
A stress test like compile a kernel or big library with make -j4 or j8 can stress better the system (also the stress is fine).

1 Like

If You have the opportunity, try to test the behavior of this image (when run from external media) in two ways. 1. When you run the system WITHOUT a file “dtb.img” in the root of the FAT partition to run the stress test. 2. Copy from “dtb” in root file “gxl_p212_2g.dtb” and rename it to “dtb.img”. To run this version and check the stress test.

Armbian_5.24_S9xxx_mate_Ubuntu_xenial_3.14.29_desktop_20170205.img.xz

I’m not exactly checked, this is only my assumption - the decline is due to the fact that upon reaching the first threshold, (70) , the system shall automatically begin reducing the frequency\the number of working cores to reduce the load and limit temperature rise. But because of an error in the code (mine or not proper kernel) - there is a drop.

HI, I have tried with the Armbian_5.24_S9xxx_mate_Ubuntu_xenial_3.14.29_desktop_20170205 and without dtb.img at 70 (monitored with armbianmonitor -m) I get the panic, with the gxl_p212_2g.dtb the bootload didn’t load the kernel
I am using this boatloader: http://www.mediafire.com/file/8dyn2y9z9hz1f13/Vim_Uboot_170121.7z

Now I have tried to use a little heatsink and the temp grow up slowly but when reach the 70 there is the panic.

Where I can find the kernel branch/tag ? If you give me the branch and the repo I could try to do some debug (usually I don’t work on the kernel side but I can try to take a quick look).

I have to use your /lib repo to build the kernel and boot loader ?

[ 601.211156@0] [] gpufreq_get_requested_power+0x48/0xac
[ 601.217797@0] [] power_allocator_throttle+0x304/0x5d8

601.174933@0] [] kthread_data+0x24/0x2c
[ 601.180280@0] [] __schedule+0x4a4/0x654
[ 601.185624@0] [] schedule+0x2c/0x80
[ 601.190628@0] [] do_exit+0x584/0x974
[ 601.195716@0] [] die+0x18c/0x1a8
[ 601.200460@0] [] arm64_notify_die+0x40/0x7c
[ 601.206151@0] [] bad_mode+0x94/0xa8
[ 601.211156@0] [] gpufreq_get_requested_power+0x48/0xac
[ 601.217797@0] [] power_allocator_throttle+0x304/0x5d8
[ 601.224350@0] [] handle_thermal_trip+0x5c/0x1f8
[ 601.230387@0] [] thermal_zone_device_update+0x6c/0xb0

Hi, Pier:
Can you kindly have a try with the latest Ubuntu-server? Check the Topic for details.

As a mentioned, I’ve tested a lot and didn’t got the issue.

Visit Khadas Github for that, and note that the ubuntu branch is for Linux Distro.

As Balbes150’s Armbian ROM, you can visit his Github for details.

sure I’ll try the server version today

1 Like

bad_mode in the stack does not look good.
I have seen similar on other hardware
arch/arm64/kernel/traps.c
says
bad_mode handles the impossible case in the exception vector.

Hi, Tasi:
Sorry, I’m not quite understand you do you mean?