Which system do you use? Android, Ubuntu, OOWOW or others?
Ubuntu and Debian11 Fenix - I have installed HomeAssistant(VIM3 Edition)+Frigate AI.
Which version of system do you use? Khadas official images, self built images, or others?
Official and Home
Please describe your issue below:
Using a Coral AI accelerator - after a while the USB port stops responding. Hotplug does not recover
The same on custom image and ubuntu…
Post a console log of your issue below:
I wish I could get a log… The only message I get is
[2023-02-18 10:02:53] ws4py INFO : Using epoll
[2023-02-18 10:02:56] frigate.edgetpu INFO : TPU found
F driver/usb/usb_driver.cc:870] ProcessIo [1-2] async transfer out failed. Abort. Resource exhausted: USB error -11 [AsyncBulkOutTransfer]
Fatal Python error: Aborted
Thread 0x0000007f63dc91e0 (most recent call first):
File "/usr/lib/python3.9/threading.py", line 312 in wait
File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/usr/lib/python3.9/threading.py", line 892 in run
File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
A standard USB Coral accelerator plugged in to USB-2 or USB-3 (happens on both) - I have also tried different Coral TPU modules to make sure TPU was good.
I use the Fenix Debian-11 image (as I added USBMON kernel option to try debug); but the standard Ubuntu server images do the same.
My setup has 4 CCTV cameras running - a 1TB NVME drive is fitted for recording.
Installed Debian11
Install HomeAssistant
Plugins for HA
RTS2P - RTSP Proxy
MQTT
Frigate 0.11.1
Sorry… I edited my previous post as I realised I was too vague…
It happens 100% on Frigate (after a while though) The example doesnt work either - but I am trying to establish whether this is docker causing issues or bare metal issues.
from the Example when USB is in a bad state. (I am trying to figure an easy way to trigger the error)
khadas@Khadas:~/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg -c 5000
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
F driver/usb/usb_driver.cc:870] ProcessIo [1-2] async transfer out failed. Abort. Resource exhausted: USB error -11 [AsyncBulkOutTransfer]
Aborted
output of lsusb (Bus2 Device 2)
khadas@Khadas:~/coral/pycoral$ lsusb
Bus 002 Device 002: ID 18d1:9302 Google Inc.
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 1a86:7523 QinHeng Electronics CH340 serial converter
Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
Bus 001 Device 002: ID 1a40:0101 Terminus Technology Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Hotplugging Coral device
[ 7496.964827] usb 2-1: USB disconnect, device number 2
[ 7502.220894] usb 2-1: new SuperSpeed Gen 1x2 USB device number 3 using xhci-hcd
[ 7502.241599] usb 2-1: New USB device found, idVendor=1a6e, idProduct=089a, bcdDevice= 1.00
[ 7502.241967] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Running the example Coral gives the same output as above… After a cold reboot (warm reboot does not seem to fix anything)
khadas@Khadas:~/coral/pycoral$ python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg -c 10
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
14.8ms
4.9ms
4.8ms
4.6ms
4.6ms
4.6ms
4.6ms
4.8ms
4.9ms
4.7ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.75781
If you give me some ideas/direction, I can be your hands - or I can offer remote connectivity… (SSH access to a host which khadas is on) - will connect serial console and network.
There are no change in the logs when the errors start… below is what is in the log -
When the Coral test is started, the reset SuperSpeed line appears (along with the 3 other lines)
it does not change when the error occurs.
[66642.027349] usb 2-1: reset SuperSpeed Gen 1x2 USB device number 2 using xhci-hcd
[66642.047234] usb 2-1: LPM exit latency is zeroed, disabling LPM.
[66642.047489] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0
[66642.048162] xhci-hcd xhci-hcd.0.auto: ##### crg set max_burst 0
Any assistance?, or should I just take this as ‘wasted money’ and go back to RaspberryPi. The current linux kernel is not fit for purpose. I have 5 VIM4’s and planned to get 20 more -
its a shame it doesnt run HomeAssistant+AI properly.
The USB reset occurs as libUSB is getting the card in to a known state to upload firmware.
Its not Power supply - it happens on isolated PSU - VIM4 is powered by 25Watt USB-C-PD PSU
Coral alone works… (as far as I can see)
Adding load on the CPU/Disk (NVME) seems to cause the USB stack handler to slow down. The same problem occurs on the USB-3 and USB-2 sockets (both independent powered USB hubs). Devices seem to work, but I dont have another USB device that uses the same libusb/bulkio functioality to remove Coral from the setup)
I took out the NVME and replaced with a powered USB Hard disk - The problem still persisted. (VIM4+USB-HDD+Coral)
I set the Disk IO scheuler in Cgroups to limit speed at 10megabytes/second writing to disk - problem persisted. To me, it looks like the problems occur when linux writes the dirty cache (but that is an observation; nothing scientific)