USB port hangs using Coral AI accelerator

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu and Debian11 Fenix - I have installed HomeAssistant(VIM3 Edition)+Frigate AI.

Which version of system do you use? Khadas official images, self built images, or others?

Official and Home

Please describe your issue below:

Using a Coral AI accelerator - after a while the USB port stops responding. Hotplug does not recover

The same on custom image and ubuntu…

Post a console log of your issue below:

I wish I could get a log… The only message I get is

[2023-02-18 10:02:53] ws4py                          INFO    : Using epoll
[2023-02-18 10:02:56] frigate.edgetpu                INFO    : TPU found
F driver/usb/usb_driver.cc:870] ProcessIo [1-2] async transfer out failed. Abort. Resource exhausted: USB error -11 [AsyncBulkOutTransfer]
Fatal Python error: Aborted
Thread 0x0000007f63dc91e0 (most recent call first):
  File "/usr/lib/python3.9/threading.py", line 312 in wait
  File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/usr/lib/python3.9/threading.py", line 892 in run
  File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner

Kernel shows no errors in DMESG/Kern/

The Coral is USB interface or M.2 ?

Could you provide the details of reproduce this issue ? We will check on our side.

A standard USB Coral accelerator plugged in to USB-2 or USB-3 (happens on both) - I have also tried different Coral TPU modules to make sure TPU was good.

image

I use the Fenix Debian-11 image (as I added USBMON kernel option to try debug); but the standard Ubuntu server images do the same.


My setup has 4 CCTV cameras running - a 1TB NVME drive is fitted for recording.

Installed Debian11
Install HomeAssistant
Plugins for HA
RTS2P - RTSP Proxy
MQTT
Frigate 0.11.1

Thats about it - just let it run.

Regards,
Richard

Hello @RichardPar

Okay, we will check on our side.
We also need to know how you reproduce this issue? Just run the examples for Coral?

Sorry… I edited my previous post as I realised I was too vague… :smiley:

It happens 100% on Frigate (after a while though) The example doesnt work either - but I am trying to establish whether this is docker causing issues or bare metal issues.

from the Example when USB is in a bad state. (I am trying to figure an easy way to trigger the error)

khadas@Khadas:~/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg -c 5000
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
F driver/usb/usb_driver.cc:870] ProcessIo [1-2] async transfer out failed. Abort. Resource exhausted: USB error -11 [AsyncBulkOutTransfer]
Aborted

output of lsusb (Bus2 Device 2)

khadas@Khadas:~/coral/pycoral$ lsusb
Bus 002 Device 002: ID 18d1:9302 Google Inc.
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 1a86:7523 QinHeng Electronics CH340 serial converter
Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
Bus 001 Device 002: ID 1a40:0101 Terminus Technology Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Hotplugging Coral device

[ 7496.964827] usb 2-1: USB disconnect, device number 2
[ 7502.220894] usb 2-1: new SuperSpeed Gen 1x2 USB device number 3 using xhci-hcd
[ 7502.241599] usb 2-1: New USB device found, idVendor=1a6e, idProduct=089a, bcdDevice= 1.00
[ 7502.241967] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0

Running the example Coral gives the same output as above… After a cold reboot (warm reboot does not seem to fix anything)

khadas@Khadas:~/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg -c 10
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
14.8ms
4.9ms
4.8ms
4.6ms
4.6ms
4.6ms
4.6ms
4.8ms
4.9ms
4.7ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.75781

Hello @RichardPar

Just found that we can’t get Coral from China…

Yuk! thats unfortunate…

If you give me some ideas/direction, I can be your hands - or I can offer remote connectivity… (SSH access to a host which khadas is on) - will connect serial console and network.

Update -

I can make the USB/Coral go bad by writing data to the NVME disk and doing AI tasks at the same time (this also happens on USB storage!)

Hello @RichardPar

When the error occurs could you provide the dmesg log?

There are no change in the logs when the errors start… below is what is in the log -

When the Coral test is started, the reset SuperSpeed line appears (along with the 3 other lines)
it does not change when the error occurs.

[66642.027349] usb 2-1: reset SuperSpeed Gen 1x2 USB device number 2 using xhci-hcd
[66642.047234] usb 2-1: LPM exit latency is zeroed, disabling LPM.
[66642.047489] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0
[66642.048162] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0

Any assistance?, or should I just take this as ‘wasted money’ and go back to RaspberryPi. The current linux kernel is not fit for purpose. I have 5 VIM4’s and planned to get 20 more -

its a shame it doesnt run HomeAssistant+AI properly.

as i can understand in your situation
vim4 supply power for nvme + wifi + Coral AI usb etc … right ?

reason can be anything but most of time is a power supply

maybe problem is trivial: not enough power for all on peaks ?

what about testing just Coral alone ?

or provide separate additional power for USB device ?

Thanks…
its just VIM4+NVME+Coral (No Wifi) -

The USB reset occurs as libUSB is getting the card in to a known state to upload firmware.

Its not Power supply - it happens on isolated PSU - VIM4 is powered by 25Watt USB-C-PD PSU
Coral alone works… (as far as I can see)
Adding load on the CPU/Disk (NVME) seems to cause the USB stack handler to slow down. The same problem occurs on the USB-3 and USB-2 sockets (both independent powered USB hubs). Devices seem to work, but I dont have another USB device that uses the same libusb/bulkio functioality to remove Coral from the setup)

I took out the NVME and replaced with a powered USB Hard disk - The problem still persisted. (VIM4+USB-HDD+Coral)

I set the Disk IO scheuler in Cgroups to limit speed at 10megabytes/second writing to disk - problem persisted. To me, it looks like the problems occur when linux writes the dirty cache (but that is an observation; nothing scientific)

I changed the PSU to a 5V/20A - there is no reason for power to be a problem…

After 2 minutes of tests, it just rebooted itself

my suggestion for clearing problem check only configuration like : VIM4 with eMMC + usb-Coral + original power adapter!

after this results we can follow to the next step

There is no ‘original power adapter’ - all the boards come in a tiny box with antennas.
I will repeat with eMMC

I used DD to create a 10GB file …

to NVME - Coral fails after 5GB+ transferred to drive

to eMMC - Coral fails after

After the failure, the device takes a long time to recover… below example shows it going from error state to working.

lsusb shows the USB device as present…and dmesg does not show any USB plugging events.

Memory Plot

Red arrow shows the place where the USB/Coral errors start happening (Writing to eMMC)

Writing to NVME - Same thing, just quicker :smiley:

UPDATE:

When Coral is not working, dropping the cache makes it work again…

root@Khadas:/home/khadas# echo 3 > /proc/sys/vm/drop_caches

i saw messages like resources exhausted its mean something wrong with software memory allocation etc …

looks like its not USB port problem ? :wink: how do u think?