USB port hangs using Coral AI accelerator

RichardPar · February 22, 2023, 9:09am

Yuk! thats unfortunate…

If you give me some ideas/direction, I can be your hands - or I can offer remote connectivity… (SSH access to a host which khadas is on) - will connect serial console and network.

RichardPar · February 22, 2023, 10:55am

Update -

I can make the USB/Coral go bad by writing data to the NVME disk and doing AI tasks at the same time (this also happens on USB storage!)

numbqq · February 23, 2023, 1:00am

Hello @RichardPar

When the error occurs could you provide the dmesg log?

RichardPar · February 23, 2023, 7:52am

There are no change in the logs when the errors start… below is what is in the log -

When the Coral test is started, the reset SuperSpeed line appears (along with the 3 other lines)
it does not change when the error occurs.

[66642.027349] usb 2-1: reset SuperSpeed Gen 1x2 USB device number 2 using xhci-hcd
[66642.047234] usb 2-1: LPM exit latency is zeroed, disabling LPM.
[66642.047489] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0
[66642.048162] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0

RichardPar · February 27, 2023, 10:12am

Any assistance?, or should I just take this as ‘wasted money’ and go back to RaspberryPi. The current linux kernel is not fit for purpose. I have 5 VIM4’s and planned to get 20 more -

its a shame it doesnt run HomeAssistant+AI properly.

hyphop · February 27, 2023, 2:40pm

as i can understand in your situation
vim4 supply power for nvme + wifi + Coral AI usb etc … right ?

reason can be anything but most of time is a power supply

maybe problem is trivial: not enough power for all on peaks ?

what about testing just Coral alone ?

or provide separate additional power for USB device ?

RichardPar · February 27, 2023, 6:24pm

Thanks…
its just VIM4+NVME+Coral (No Wifi) -

The USB reset occurs as libUSB is getting the card in to a known state to upload firmware.

Its not Power supply - it happens on isolated PSU - VIM4 is powered by 25Watt USB-C-PD PSU
Coral alone works… (as far as I can see)
Adding load on the CPU/Disk (NVME) seems to cause the USB stack handler to slow down. The same problem occurs on the USB-3 and USB-2 sockets (both independent powered USB hubs). Devices seem to work, but I dont have another USB device that uses the same libusb/bulkio functioality to remove Coral from the setup)

I took out the NVME and replaced with a powered USB Hard disk - The problem still persisted. (VIM4+USB-HDD+Coral)

I set the Disk IO scheuler in Cgroups to limit speed at 10megabytes/second writing to disk - problem persisted. To me, it looks like the problems occur when linux writes the dirty cache (but that is an observation; nothing scientific)

RichardPar · February 28, 2023, 6:51am

I changed the PSU to a 5V/20A - there is no reason for power to be a problem…

After 2 minutes of tests, it just rebooted itself

hyphop · February 28, 2023, 7:29am

my suggestion for clearing problem check only configuration like : VIM4 with eMMC + usb-Coral + original power adapter!

after this results we can follow to the next step

RichardPar · February 28, 2023, 9:22am

There is no ‘original power adapter’ - all the boards come in a tiny box with antennas.
I will repeat with eMMC

I used DD to create a 10GB file …

to NVME - Coral fails after 5GB+ transferred to drive

to eMMC - Coral fails after

After the failure, the device takes a long time to recover… below example shows it going from error state to working.

lsusb shows the USB device as present…and dmesg does not show any USB plugging events.

RichardPar · February 28, 2023, 9:42am

Memory Plot

Red arrow shows the place where the USB/Coral errors start happening (Writing to eMMC)

Writing to NVME - Same thing, just quicker

RichardPar · February 28, 2023, 10:45am

UPDATE:

When Coral is not working, dropping the cache makes it work again…

root@Khadas:/home/khadas# echo 3 > /proc/sys/vm/drop_caches

hyphop · February 28, 2023, 11:40am

i saw messages like resources exhausted its mean something wrong with software memory allocation etc …

looks like its not USB port problem ? how do u think?

RichardPar · February 28, 2023, 12:03pm

I think its a memory issue inside the kernel - I just dont know where to start looking. I dont know why the cache is causing the problem - below ~1GB of FreeMem and it doesnt behave well.

The same problem happens on USB2 and USB3 - which are different drivers in tke kernel (one is xhci and the other is DWC3)

I am running a test now with the min_free_bytes set to 2GB … seeing if it survives

UPDATE:

echo 500000 > /proc/sys/vm/min_free_kbytes

The Coral/USB has not gone bad yet … (yes! its a totally silly number!) MemFree is hovering about 1.4GB Free

hyphop · February 28, 2023, 1:29pm

tnx for exploration ! i will check it on my side , and try to provide some solution and suggestions for similar problems

PS: please check system logs for oom-killer matches

PSS: please share your logs as plain text

RichardPar · February 28, 2023, 2:00pm

No OOM tasks have been executed…

My application just starts, loads AI model - runs a picture of a parrot and exits

RichardPar · March 1, 2023, 10:42am

Early boot memory looks different - is it meant to ?

Android

earlycon: aml-uart0 at MMIO 0x00000000fe078000 (options ‘’)
[ 0.000000@0] printk: bootconsole [aml-uart0] enabled
[ 0.000000@0] 08400000 - 08500000, 1024 KB, ramoops@0x07400000
[ 0.000000@0] CMA pool @0x0000000005000000, size 52 MiB need clear mmu map
[ 0.000000@0] 05000000 - 08400000, 53248 KB, linux,secmon
[ 0.000000@0] 40000000 - 41000000, 16384 KB, linux,dsp_fw
[ 0.000000@0] 3f800000 - 40000000, 8192 KB, linux,meson-fb
[ 0.000000@0] CMA pool @0x00000000c0400000, size 508 MiB need clear mmu map
[ 0.000000@0] c0400000 - e0000000, 520192 KB, linux,codec_mm_cma
[ 0.000000@0] CMA pool @0x00000000a5400000, size 432 MiB need clear mmu map
[ 0.000000@0] a5400000 - c0400000, 442368 KB, linux,nvme_ssd
[ 0.000000@0] node linux,di_cma compatible matching fail
[ 0.000000@0] Reserved memory: created DMA memory pool at 0x00000000e0000000, size 0 MiB
[ 0.000000@0] e0000000 - e0000000, 0 KB, linux,ppmgr
[ 0.000000@0] 9d400000 - a5400000, 131072 KB, linux,isp_cma
[ 0.000000@0] 99400000 - 9d400000, 65536 KB, linux,adapt_cma
[ 0.000000@0] 91400000 - 99400000, 131072 KB, linux,cam_cma
[ 0.000000@0] 87c00000 - 91400000, 155648 KB, linux,ion-dev
[ 0.000000@0] 7ac00000 - 87c00000, 212992 KB, linux,ion-fb
[ 0.000000@0] 79800000 - 7ac00000, 20480 KB, linux,vdin1_cma

Ubuntu Server
[ 0.000000@0] Machine model: Khadas VIM4
[ 0.000000@0] earlycon: aml-uart0 at MMIO 0x00000000fe078000 (options ‘’)
[ 0.000000@0] printk: bootconsole [aml-uart0] enabled
[ 0.000000@0] 08400000 - 08500000, 1024 KB, ramoops@0x07400000
[ 0.000000@0] CMA pool @0x0000000005000000, size 52 MiB need clear mmu map
[ 0.000000@0] 05000000 - 08400000, 53248 KB, linux,secmon
[ 0.000000@0] 40000000 - 41000000, 16384 KB, linux,dsp_fw
[ 0.000000@0] 3f800000 - 40000000, 8192 KB, linux,meson-fb
[ 0.000000@0] CMA pool @0x00000000c5000000, size 432 MiB need clear mmu map
[ 0.000000@0] c5000000 - e0000000, 442368 KB, linux,codec_mm_cma
[ 0.000000@0] node linux,di_cma compatible matching fail
[ 0.000000@0] Reserved memory: created DMA memory pool at 0x00000000e0000000, size 0 MiB
[ 0.000000@0] e0000000 - e0000000, 0 KB, linux,ppmgr
[ 0.000000@0] bd000000 - c5000000, 131072 KB, linux,isp_cma
[ 0.000000@0] bb800000 - bd000000, 24576 KB, linux,adapt_cma
[ 0.000000@0] b2000000 - bb800000, 155648 KB, linux,ion-dev
[ 0.000000@0] node linux,ion-fb compatible matching fail
[ 0.000000@0] b0c00000 - b2000000, 20480 KB, linux,vdin1_cma
[ 0.000000@0] 21fc00000 - 220000000, 4096 KB, linux,ldc_mem
[ 0.000000@0] cma: Reserved 8 MiB at 0x00000000b0400000

Debian 10
[ 0.000000@0] Machine model: Khadas VIM4
[ 0.000000@0] earlycon: aml-uart0 at MMIO 0x00000000fe078000 (options ‘’)
[ 0.000000@0] printk: bootconsole [aml-uart0] enabled
[ 0.000000@0] swiotlb,default value: noforce
[ 0.000000@0] swiotlb,dts value: normal
[ 0.000000@0] 08400000 - 08500000, 1024 KB, ramoops@0x07400000
[ 0.000000@0] CMA pool @0x0000000005000000, size 52 MiB need clear mmu map
[ 0.000000@0] 05000000 - 08400000, 53248 KB, linux,secmon
[ 0.000000@0] 40000000 - 41000000, 16384 KB, linux,dsp_fw
[ 0.000000@0] 3f800000 - 40000000, 8192 KB, linux,meson-fb
[ 0.000000@0] CMA pool @0x00000000c5000000, size 432 MiB need clear mmu map
[ 0.000000@0] c5000000 - e0000000, 442368 KB, linux,codec_mm_cma
[ 0.000000@0] node linux,di_cma compatible matching fail
[ 0.000000@0] Reserved memory: created DMA memory pool at 0x00000000e0000000, size 0 MiB
[ 0.000000@0] e0000000 - e0000000, 0 KB, linux,ppmgr
[ 0.000000@0] bd000000 - c5000000, 131072 KB, linux,isp_cma
[ 0.000000@0] bb800000 - bd000000, 24576 KB, linux,adapt_cma
[ 0.000000@0] b2000000 - bb800000, 155648 KB, linux,ion-dev
[ 0.000000@0] node linux,ion-fb compatible matching fail
[ 0.000000@0] b0c00000 - b2000000, 20480 KB, linux,vdin1_cma
[ 0.000000@0] 21fc00000 - 220000000, 4096 KB, linux,ldc_mem
[ 0.000000@0] cma: Reserved 8 MiB at 0x00000000b0400000

RichardPar · March 3, 2023, 10:55am

I dont believe its the memory anymore - I run memtester on all the RAM and things work.

I am now thinking there is a kernel option causing this - I think the buffer/cache flushing is taking more priority and causing userland/IRQ’s to not be serviced.

I am currently doing a criminal hack… and the problem has not occured (as the Cached memory stays high) - but this is not a fix!

#!/bin/sh

while :
do
echo 3 > /proc/sys/vm/drop_caches
sleep 10
done

numbqq · March 6, 2023, 2:41am

Hello @RichardPar

Could you reproduce this issue with other devices except the Coral?

aaaaaaaaaaaaaa · March 12, 2024, 1:22am

Thank you for your temp solution,it works for me.I have almost the same problem with Edge2+Gemini2(Orbbec d-camera).When running ros node on edge2 for a while,usb seems to be reset and i can get such message with dmesg.

[ 3118.406616] usb 8-1: reset SuperSpeed Gen 1 USB device number 2 using xhci-hcd
[ 3118.424211] uvcvideo: Unknown video format 20343159-0000-0010-8000-00aa00389b71
[ 3118.424231] uvcvideo: Found UVC 1.10 device Orbbec(R) DaBai DCL(TM) (2bc5:0701)
[ 3118.426814] uvcvideo: Found UVC 1.10 device Orbbec(R) DaBai DCL(TM) (2bc5:0701)
[ 3118.429363] uvcvideo: Found UVC 1.10 device Orbbec(R) DaBai DCL(TM) (2bc5:0701)
[ 3120.375584] usb 8-1: usbfs: process 20472 (component_conta) did not claim interface 0 before use
[ 3125.375904] usb 8-1: usbfs: process 20498 (component_conta) did not claim interface 0 before use
[ 3130.376150] usb 8-1: usbfs: process 20524 (component_conta) did not claim interface 0 before use