VIM3 running totally unstable / keeps freezing / crashing

mcbain · August 14, 2020, 1:16am

Dear all,

I’m really having troubles to run my VIM3 stable for now. At least every 1 or 2 days the VIM3 crashes or freezes completely.

Two days ago I was able to gather the following Kernel message dump when it froze:

[21900.298170] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
[21900.301313] Mem abort info:
[21900.304072]   ESR = 0x96000004
[21900.307092]   EC = 0x25: DABT (current EL), IL = 32 bits
[21900.312352]   SET = 0, FnV = 0
[21900.315370]   EA = 0, S1PTW = 0
[21900.318474] Data abort info:
[21900.321322]   ISV = 0, ISS = 0x00000004
[21900.325116]   CM = 0, WnR = 0
[21900.328052] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000920d5000
[21900.334432] [00000000000000a8] pgd=0000000000000000
[21900.339267] Internal error: Oops: 96000004 [#1] SMP
[21900.344093] Modules linked in: binfmt_misc(E) veth(E) xt_nat(E) xt_tcpudp(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_ondemand(E) cpufreq_powersave(E) macvlan(E) xfs(E) btsdio(E) hci_uart(E) btqca(E) btrtl(E) brcmfmac(E) btbcm(E) btintel(E) brcmutil(E) bluetooth(E) cfg80211(E) ecdh_generic(E) ecc(E) nvmem_meson_efuse(E) ip_tables(E) x_tables(E) meson_mx_sdio(E) rtc_meson_vrtc(E) meson_rng(E) rng_core(E) dwmac_generic(E) [last unloaded: reset_meson_audio_arb]
[21900.406631] CPU: 4 PID: 25622 Comm: kworker/4:2 Tainted: G            E     5.7.0 #1
[21900.414386] Hardware name: amlogic w400/w400, BIOS 2020.04 08/03/2020
[21900.420784] Workqueue: events dbs_work_handler
[21900.425170] pstate: 60000085 (nZCv daIf -PAN -UAO)
[21900.429919] pc : regmap_update_bits_base+0x70/0x9c
[21900.434658] lr : regmap_update_bits_base+0x6c/0x9c
[21900.439398] sp : ffff8000128bba50
[21900.442675] x29: ffff8000128bba50 x28: 0000000000000000
[21900.447936] x27: ffff00007ff16d00 x26: ffff00007ff14de0
[21900.453198] x25: 0000000000000000 x24: 0000000000000000
[21900.458459] x23: 0000000000000000 x22: 0000000000000090
[21900.463720] x21: 00000000040003f0 x20: 0000000000000208
[21900.468981] x19: ffff0000d9ebf400 x18: 0000000000000000
[21900.474242] x17: 0000000000000000 x16: 0000000000000000
[21900.479504] x15: 0000000000000000 x14: 0000000000b71b00
[21900.484765] x13: 00000000016e3600 x12: 0000000000000000
[21900.490026] x11: 0000000000000000 x10: 00000000ffffffc3
[21900.495288] x9 : ffff800010811584 x8 : 0000000000000006
[21900.500549] x7 : 0000000000000000 x6 : 0000000000000000
[21900.505810] x5 : 0000000000000000 x4 : 0000000000000000
[21900.511071] x3 : ffff8000108112c8 x2 : 245cb3377b54f800
[21900.516332] x1 : 0000000000000090 x0 : 0000000000000000
[21900.521594] Call trace:
[21900.524017]  regmap_update_bits_base+0x70/0x9c
[21900.528416]  meson_clk_cpu_dyndiv_set_rate+0xf8/0x110
[21900.533414]  clk_change_rate+0x160/0x2d0
[21900.537294]  clk_change_rate+0x260/0x2d0
[21900.541176]  clk_core_set_rate_nolock+0x16c/0x19c
[21900.545833]  clk_set_rate+0x44/0x78
[21900.549284]  _generic_set_opp_clk_only+0x20/0x58
[21900.553854]  dev_pm_opp_set_rate+0x450/0x484
[21900.558080]  set_target+0x48/0x78
[21900.561361]  __cpufreq_driver_target+0x220/0x2f4
[21900.565937]  od_dbs_update+0xec/0x170 [cpufreq_ondemand]
[21900.571194]  dbs_work_handler+0x48/0x80
[21900.574986]  process_one_work+0x1b0/0x2b0
[21900.578952]  worker_thread+0x1ec/0x284
[21900.582664]  kthread+0xe0/0xf0
[21900.585682]  ret_from_fork+0x10/0x30
[21900.589224] Code: 2a1403e1 aa1303e0 97fffbb0 2a0003f4 (a9428261)
[21900.595256] ---[ end trace 5557c55222dffdc6 ]---

It looks like it had something to do with the cpufreq driver (ondemand) crashing one of the CPUs and freeze the machine.

Just a few minutes ago, the machine again froze (hartbeart LED stopped working), luckily I had the Serial Console connected, which gave the following Kernel message dump:

[146491.652619] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[146491.653436] rcu:    2-...0: (5 ticks this GP) idle=96a/1/0x4000000000000000 softirq=712849/712849 fqs=2588
[146491.662989] rcu:    5-...0: (4 ticks this GP) idle=7e6/1/0x4000000000000000 softirq=652052/652053 fqs=2589
[146491.672531]         (detected by 1, t=5256 jiffies, g=2398493, q=291)
[146491.678376] Task dump for CPU 2:
[146491.681652] kworker/2:2     R  running task        0  3705      2 0x0000002a
[146491.688789] Workqueue: events dbs_work_handler
[146491.693203] Call trace:
[146491.695748]  __switch_to+0xd0/0x124
[146491.699263]  0x0
[146491.701150] Task dump for CPU 5:
[146491.704419] node            R  running task        0  1325    810 0x00000002
[146491.711485] Call trace:
[146491.714025]  __switch_to+0xd0/0x124
[146491.717544]  0xffff0000dad9e040
[146554.672647] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[146554.673468] rcu:    2-...0: (5 ticks this GP) idle=96a/1/0x4000000000000000 softirq=712849/712849 fqs=4219
[146554.683018] rcu:    5-...0: (4 ticks this GP) idle=7e6/1/0x4000000000000000 softirq=652052/652053 fqs=4219
[146554.692562]         (detected by 0, t=21007 jiffies, g=2398493, q=494)
[146554.698490] Task dump for CPU 2:
[146554.701766] kworker/2:2     R  running task        0  3705      2 0x0000002a
[146554.708900] Workqueue: events dbs_work_handler
[146554.713317] Call trace:
[146554.715861]  __switch_to+0xd0/0x124
[146554.719376]  0x0
[146554.721262] Task dump for CPU 5:
[146554.724533] node            R  running task        0  1325    810 0x00000002
[146554.731598] Call trace:
[146554.734137]  __switch_to+0xd0/0x124
[146554.737657]  0xffff0000dad9e040
[146554.740885] rcu: rcu_preempt kthread starved for 12372 jiffies! g2398493 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[146554.751525] rcu: RCU grace-period kthread stack dump:
[146554.756616] rcu_preempt     I    0    10      2 0x00000028
[146554.762130] Call trace:
[146554.764670]  __switch_to+0xd0/0x124
[146554.768211]  __schedule+0x398/0x444
[146554.771739]  schedule+0x84/0xd4
[146554.774940]  schedule_timeout+0xc8/0xf0
[146554.778820]  rcu_gp_kthread+0x42c/0x808
[146554.782696]  kthread+0xec/0xfc
[146554.785797]  ret_from_fork+0x10/0x18
[146617.692607] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[146617.693426] rcu:    2-...0: (5 ticks this GP) idle=96a/1/0x4000000000000000 softirq=712849/712849 fqs=12083
[146617.703069] rcu:    5-...0: (4 ticks this GP) idle=7e6/1/0x4000000000000000 softirq=652052/652053 fqs=12084
[146617.712700]         (detected by 1, t=36764 jiffies, g=2398493, q=512)
[146617.718631] Task dump for CPU 2:
[146617.721906] kworker/2:2     R  running task        0  3705      2 0x0000002a
[146617.729036] Workqueue: events dbs_work_handler
[146617.733458] Call trace:
[146617.736000]  __switch_to+0xd0/0x124
[146617.739516]  0x0
[146617.741402] Task dump for CPU 5:
[146617.744674] node            R  running task        0  1325    810 0x00000002
[146617.751739] Call trace:
[146617.754279]  __switch_to+0xd0/0x124
[146617.757796]  0xffff0000dad9e040
[146680.712607] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[146680.713418] rcu:    2-...0: (5 ticks this GP) idle=96a/1/0x4000000000000000 softirq=712849/712849 fqs=19953
[146680.723061] rcu:    5-...0: (4 ticks this GP) idle=7e6/1/0x4000000000000000 softirq=652052/652053 fqs=19954
[146680.732693]         (detected by 1, t=52518 jiffies, g=2398493, q=527)
[146680.738624] Task dump for CPU 2:
[146680.741899] kworker/2:2     R  running task        0  3705      2 0x0000002a
[146680.749028] Workqueue: events dbs_work_handler
[146680.753451] Call trace:
[146680.755997]  __switch_to+0xd0/0x124
[146680.759509]  0x0
[146680.761397] Task dump for CPU 5:
[146680.764666] node            R  running task        0  1325    810 0x00000002
[146680.771733] Call trace:
[146680.774273]  __switch_to+0xd0/0x124
[146680.777790]  0xffff0000dad9e040

Again it looks like something to do with the CPU.

Running like that it really makes no sense for me to use the VIM3 in real production environment.

I’m using the U-Boot Version from here (https://github.com/hyphop/khadas-uboot/releases) and the Linux Kernel 5.7.0 from here (https://github.com/hyphop/khadas-linux-kernel) with a non-changed config (defconfig). System is a clean Debian Buster (debootstraped) running from the eMMC. Power Supply is a 5.1V 3A original Raspberry one.

Anyone has a clue what’s causing this freezes and fix them?

Thanks and best regards
mcbain

numbqq · August 14, 2020, 1:20am

So the image is not our official released. Have you tried our official images ?

https://dl.khadas.com/Firmware/VIM3/Ubuntu/SD_USB/

RDFTKV · August 14, 2020, 2:16am

Hello, Just to rule it out, I would try a different power supply. Depending on your connected peripherals, 5.1 volts may not be sufficient. If possible, try a supply with greater voltage, as an example, the Khadas USB-C PD 24W supply.
Also try a different USB-C cable if available.
Good luck.

Vladimir.v.v · August 14, 2020, 5:48am

hello, your voltage is low, you need 12V 2A

mcbain · August 14, 2020, 10:03am

Okay, thanks for your suggestions. Next freeze I will change the USB Power Supply to a 60W USB-C PD Supply (5V 3A, 9V 3A, 12V 3A, 15V 3A, 20V 3A) - that should be sufficient, right?

For sure I can try the official release but that doesn’t make any sense to me. I hate using prepacked images on all kind of SBC. I simply want to run a clean and minimalistic headless server version (Debian) with a mainline kernel version and the version from hyphop seems to be suitable, doesn’t it?

If it happens again with the changed power supply I can for sure try the official Ubuntu Image on a SD card and see if it happens again there, but I really like to solve to problem instead of changing the system.

Thanks again, will give an update if it happens again!

Vladimir.v.v · August 14, 2020, 10:09am

this one seems the most suitable!

Electr1 · August 14, 2020, 10:24am

You can try using Fenix to make fresh Ubuntu/Debian linux images, on your Ubuntu Linux PC from source…

Vladimir.v.v · August 14, 2020, 10:27am

buddy, his problem is not with the firmware

Electr1 · August 14, 2020, 11:26am

Fenix was recommended, for making firmware locally… and not having to use prepacked firmwares…

Vladimir.v.v · August 14, 2020, 11:39am

… what follows from this?

mcbain · August 19, 2020, 5:44am

Hi again,

okay, few days of testing with now 2 different power supplys (which should have plenty enough power) and different usb-c cables, everything is still the same:

[384618.539551] Unable to handle kernel write to read-only memory at virtual address ffff8000115b3714
[384618.542881] Mem abort info:
[384618.545725]   ESR = 0x9600004e
[384618.548832]   EC = 0x25: DABT (current EL), IL = 32 bits
[384618.554177]   SET = 0, FnV = 0
[384618.557282]   EA = 0, S1PTW = 0
[384618.560473] Data abort info:
[384618.563406]   ISV = 0, ISS = 0x0000004e
[384618.567287]   CM = 0, WnR = 1
[384618.570309] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000001c50000
[384618.577035] [ffff8000115b3714] pgd=00000000f4806003, pud=00000000f4805003, pmd=0040000001400791
[384618.585751] Internal error: Oops: 9600004e [#1] PREEMPT SMP
[384618.591352] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 br_netfilter bridge stp llc macvlan xfs hci_uart btqca btbcm btintel btsdio brcmfmac bluetooth ecdh_generic ecc cfg80211 brcmutil reset_meson_audio_arb dw_hdmi_cec dw_hdmi_i2s_audio ip_tables x_tables ipv6 nf_defrag_ipv6 meson_mx_sdio meson_rng rng_core rtc_meson_vrtc [last unloaded: rc_core]
[384618.634831] CPU: 5 PID: 4286 Comm: kworker/5:0 Not tainted 5.7.0 #1
[384618.641119] Hardware name: amlogic w400/w400, BIOS 2020.04 08/03/2020
[384618.647610] Workqueue: events dbs_work_handler
[384618.652075] pstate: 60000085 (nZCv daIf -PAN -UAO)
[384618.656909] pc : _raw_spin_unlock_irqrestore+0x0/0x38
[384618.661998] lr : regmap_unlock_spinlock+0x14/0x20
[384618.666733] sp : ffff80001064ba60
[384618.670097] x29: ffff80001064ba60 x28: 0000000000000000 
[384618.675444] x27: ffff0000f2a48ae0 x26: ffff800011d29000 
[384618.680792] x25: 0000000000000000 x24: 0000000000000000 
[384618.686139] x23: 0000000000000000 x22: 0000000000000030 
[384618.691487] x21: 00000000040003f0 x20: 0000000000000000 
[384618.696834] x19: ffff0000f1314800 x18: ffff800011d9d878 
[384618.702182] x17: 0000000000000000 x16: 0000000000000000 
[384618.707529] x15: 00000013de435000 x14: 0000000000b71b00 
[384618.712877] x13: 00000000016e3600 x12: 0000000000000000 
[384618.718224] x11: 0000000000000001 x10: 0000000000000960 
[384618.723572] x9 : ffff80001064b8a0 x8 : 0000000000000006 
[384618.728919] x7 : ffff0000f1314800 x6 : 0000000000000000 
[384618.734266] x5 : 0000000000000000 x4 : 0000000000000000 
[384618.739614] x3 : ffff8000115b370c x2 : 4d181ec4637fa400 
[384618.744962] x1 : 0000000000000000 x0 : ffff0000f1314800 
[384618.750310] Call trace:
[384618.752819]  _raw_spin_unlock_irqrestore+0x0/0x38
[384618.757564]  regmap_update_bits_base+0x74/0x94
[384618.762046]  meson_clk_cpu_dyndiv_set_rate+0xf0/0x108
[384618.767134]  clk_change_rate+0xe4/0x1f8
[384618.771012]  clk_change_rate+0x18c/0x1f8
[384618.774980]  clk_core_set_rate_nolock+0x130/0x160
[384618.779723]  clk_set_rate+0x3c/0x70
[384618.783265]  _generic_set_opp_clk_only+0x20/0x58
[384618.787918]  dev_pm_opp_set_rate+0x3e4/0x434
[384618.792233]  set_target+0x40/0x70
[384618.795593]  __cpufreq_driver_target+0x188/0x230
[384618.800251]  od_dbs_update+0xe4/0x168
[384618.803959]  dbs_work_handler+0x40/0x78
[384618.807849]  process_one_work+0x178/0x1e4
[384618.811896]  worker_thread+0x1e4/0x274
[384618.815692]  kthread+0xec/0xfc
[384618.818798]  ret_from_fork+0x10/0x18
[384618.822424] Code: 97fff483 a8c17bfd d50323bf d65f03c0 (d503233f) 
[384618.828544] ---[ end trace 8dc2eedc19f08c29 ]---
[384618.833206] note: kworker/5:0[4286] exited with preempt_count 1
[409380.232721] ------------[ cut here ]------------
[409380.232809] percpu ref (cgroup_bpf_release_fn) <= 0 (-21) after switching to atomic
[409380.232874] WARNING: CPU: 5 PID: 1 at lib/percpu-refcount.c:163 percpu_ref_switch_to_atomic_rcu+0xa0/0x110
[409380.249166] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 br_netfilter bridge stp llc macvlan xfs hci_uart btqca btbcm btintel btsdio brcmfmac bluetooth ecdh_generic ecc cfg80211 brcmutil reset_meson_audio_arb dw_hdmi_cec dw_hdmi_i2s_audio ip_tables x_tables ipv6 nf_defrag_ipv6 meson_mx_sdio meson_rng rng_core rtc_meson_vrtc [last unloaded: rc_core]
[409380.292644] CPU: 5 PID: 1 Comm: systemd Tainted: G      D           5.7.0 #1
[409380.299708] Hardware name: amlogic w400/w400, BIOS 2020.04 08/03/2020
[409380.306181] pstate: 60000005 (nZCv daif -PAN -UAO)
[409380.311012] pc : percpu_ref_switch_to_atomic_rcu+0xa0/0x110
[409380.316615] lr : percpu_ref_switch_to_atomic_rcu+0xa0/0x110
[409380.322219] sp : ffff80001002be20
[409380.325583] x29: ffff80001002be20 x28: ffff800011d260d0 
[409380.330930] x27: 0000000000000000 x26: 0000000000000000 
[409380.336278] x25: ffff800011d45800 x24: ffff800011d29c50 
[409380.341625] x23: 00007dfeddbf32f8 x22: ffff800011d29970 
[409380.346973] x21: ffffffffffffffea x20: ffff0000d64e6ee0 
[409380.352320] x19: ffff0000d64e6f08 x18: 000000000000002d 
[409380.357667] x17: 0000000000000000 x16: 0000000000000000 
[409380.363015] x15: 000000000000000a x14: ffffffffffffffeb 
[409380.368362] x13: ffff800011e04271 x12: ffffffffffffffff 
[409380.373710] x11: 0000000000000020 x10: 00000000fffffff9 
[409380.379057] x9 : ffff800011e03ebb x8 : 7420676e69686374 
[409380.384405] x7 : 6977732072657466 x6 : 0000000000000000 
[409380.389752] x5 : 00ffffffffffffff x4 : 000000000000000f 
[409380.395100] x3 : 0000000000000000 x2 : 0000000000000000 
[409380.400447] x1 : 4d181ec4637fa400 x0 : 0000000000000000 
[409380.405796] Call trace:
[409380.408305]  percpu_ref_switch_to_atomic_rcu+0xa0/0x110
[409380.413569]  rcu_core+0x2ac/0x3d4
[409380.416927]  rcu_core_si+0x10/0x1c
[409380.420376]  efi_header_end+0x1a4/0x1e4
[409380.424261]  irq_exit+0x58/0xa8
[409380.427452]  __handle_domain_irq+0x70/0xa0
[409380.431590]  gic_handle_irq+0x68/0xa8
[409380.435298]  el0_irq_naked+0x4c/0x54
[409380.438918] ---[ end trace 8dc2eedc19f08c2a ]---

and

[ 6361.838118] Internal error: SP/PC alignment exception: 8a000000 [#1] SMP
[ 6361.839185] Modules linked in: veth(E) xt_nat(E) xt_tcpudp(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_ondemand(E) cpufreq_powersave(E) macvlan(E) xfs(E) hci_uart(E) btqca(E) btrtl(E) btbcm(E) btsdio(E) btintel(E) brcmfmac(E) bluetooth(E) snd_soc_hdmi_codec(E) brcmutil(E) ir_nec_decoder(E) cfg80211(E) ecdh_generic(E) ecc(E) snd_soc_meson_g12a_tohdmitx(E) snd_soc_meson_axg_sound_card(E) rc_khadas(E) reset_meson_audio_arb(E) snd_soc_meson_card_utils(E) snd_soc_meson_codec_glue(E) snd_soc_meson_axg_tdmout(E) snd_soc_meson_axg_tdm_interface(E) snd_soc_meson_axg_frddr(E) snd_soc_meson_axg_tdm_formatter(E) snd_soc_meson_axg_fifo(E) meson_ir(E) rc_core(E) snd_soc_core(E) snd_pcm_dmaengine(E) snd_pcm(E) dw_hdmi_cec(E) dw_hdmi_i2s_audio(E)
[ 6361.839267]  snd_timer(E) snd(E) soundcore(E) nvmem_meson_efuse(E) ip_tables(E) x_tables(E) meson_mx_sdio(E) meson_rng(E) rng_core(E) rtc_meson_vrtc(E) dwmac_generic(E)
[ 6361.940617] CPU: 2 PID: 12134 Comm: kworker/2:1 Tainted: G            E     5.7.0 #1
[ 6361.948375] Hardware name: amlogic w400/w400, BIOS 2020.04 08/03/2020
[ 6361.954771] Workqueue: events dbs_work_handler
[ 6361.959157] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 6361.963902] pc : 0x4153f397ffee72
[ 6361.967181] lr : clk_recalc+0x48/0x68
[ 6361.970799] sp : ffff80001254bac0
[ 6361.974077] x29: ffff80001254bac0 x28: 0000000000000000 
[ 6361.979338] x27: ffff0000b244ba00 x26: ffff0000b244d8e0 
[ 6361.984599] x25: ffff800011389988 x24: 0000000000000000 
[ 6361.989860] x23: 0000000003f940aa x22: 000000003b9ac9f1 
[ 6361.995121] x21: ffff0000b28ed100 x20: 000000003b9ac9f1 
[ 6362.000382] x19: ffff80001254bae0 x18: 0000000000000000 
[ 6362.005644] x17: 0000000000000000 x16: 0000000000000000 
[ 6362.010905] x15: 0000000000000000 x14: 0000000000b71b00 
[ 6362.016166] x13: 00000000016e3600 x12: 0000000000000000 
[ 6362.021427] x11: 0000000000000000 x10: 00000000ffffffc3 
[ 6362.026689] x9 : ffff8000107f7e70 x8 : 0000000000000006 
[ 6362.031950] x7 : 0000000000000000 x6 : 0000000000000000 
[ 6362.037211] x5 : 0000000000000000 x4 : 0000000000000000 
[ 6362.042472] x3 : ffff8000108112c8 x2 : a94153f397ffee72 
[ 6362.047734] x1 : 000000003b9ac9f1 x0 : ffff0000b28c7b00 
[ 6362.052996] Call trace:
[ 6362.055412]  0x4153f397ffee72
[ 6362.058346]  clk_change_rate+0x1c4/0x2d0
[ 6362.062225]  clk_change_rate+0x260/0x2d0
[ 6362.066107]  clk_core_set_rate_nolock+0x16c/0x19c
[ 6362.070764]  clk_set_rate+0x44/0x78
[ 6362.074215]  _generic_set_opp_clk_only+0x20/0x58
[ 6362.078786]  dev_pm_opp_set_rate+0x450/0x484
[ 6362.083012]  set_target+0x48/0x78
[ 6362.086290]  __cpufreq_driver_target+0x220/0x2f4
[ 6362.090866]  od_dbs_update+0xec/0x170 [cpufreq_ondemand]
[ 6362.096123]  dbs_work_handler+0x48/0x80
[ 6362.099918]  process_one_work+0x1b0/0x2b0
[ 6362.103884]  worker_thread+0x1ec/0x284
[ 6362.107596]  kthread+0xe0/0xf0
[ 6362.110613]  ret_from_fork+0x10/0x30
[ 6362.114152] Code: bad PC value
[ 6362.117168] ---[ end trace aec64d809f2af8ed ]---

So it’s getting really frustrating to me as there is no way to get the VIM3 running stable or even usable for my applications.

I managed to make a deeper look into the Fenix repo and for me it seems as the u-boot version and linux-mainline version and patches/configs are identical to the linked repos I posted in the first post (hyphop). So I really think it will make no difference if I use fenix to build the image.

As per the error / kernel messages it seems to me that it has something to do with the CPU frequency govenor / switching the cpu frequencies. Next try is to disable to cpufreq govenor and set it to full performance, to stop switching the frequencies on the CPU. I really hate this but let’s see and try if it stops the crashing to investigate further.

Any one else has some good advices or ideas to fix this?!

Best regards
mcbain

mcbain · August 19, 2020, 6:51am

Hi again,

looking deeper into the Fenix repo I found the following lines in the file

config/boards/VIM3.conf

...
CPUMIN=500000
CPUMAX=2208000
GOVERNOR=performance
...

So it looks like cpu frequency scaling is disabled with the performance governor by default in Fenix. Makes sense if we look at the kernel messages above produced with other governors.

So to conclude: dynamic cpu frequency scaling is NOT SUPPORTED on VIM3 with mainline kernel?!

Best regards

Vladimir.v.v · August 19, 2020, 6:57am

hello, try changing to “interactive”, although I don’t think this is the cause of the board failure problem

numbqq · August 19, 2020, 6:59am

It is supported with mainline kernel.

mcbain · August 19, 2020, 7:13am

you have another idea? I thought this because nearly every kernel message on crash is referring to meson_clk_cpu_dyndiv_set_rate / __cpufreq_driver_target / clk_change_rate.

Ok, then I’ll test the governors ondemand/conservative/interactive/performance one by one and see if it still crashes or stops with one of them.

Can someone imagine that it maybe could be a hardware problem with the board itself?!

numbqq · August 19, 2020, 7:27am

Waiting for your test results.

It shouldn’t be. You can use 4.9 kernel to test whether this error exist or not. Maybe it’s a bug of mainline kernel.

Vladimir.v.v · August 19, 2020, 7:35am

Do you observe CPU overheating? Then you can somehow combine it

mcbain · August 19, 2020, 7:48am

cpu/ddr temperature (thermal_zone0 + 1) is constantly between 40 and 47, monitored this as well as I thought it may be a problem. load is not really high either, I’m running just a few simple bash scripts with cron, maximum load is something about 0.3

Vladimir.v.v · August 19, 2020, 7:52am

Yes, everything is good enough here!
Have you changed the power supply and cable to 12v 2a?

mcbain · August 19, 2020, 7:53am

yes, at the moment I’m running the 60W rated one connected (5V 3A, 9V 3A, 12V 3A, 15V 3A, 20V 3A).