Which Khadas SBC do you use?
VIM3 A311D
Which system do you use? Android, Ubuntu, OOWOW or others?
Kernel 4.9 Ubuntu 20.04
Which version of system do you use? Khadas official images, self built images, or others?
gnome
Please describe your issue below:
I have tflite model for Object Detection, which I want to run on NPU to get the acceleration. I used NPU TFlite delegates as mentioned in examples. Inferences with models mentioned in examples are fast but when I tried to use our model inferences are slow (for e.g detection time 4 sec with NPU but with CPU it is 0.20 sec with our model).
Please the guide on following points.
- is it a problem with a model ?(PF model below)
- How to know that vx delagates are working or model is running on NPU ?
- Do I need to build the vx delegates from the source ?
- can I use “libvx_delegate.so” directly given in the examples with my model for object detection?
Log with NPU:
Vx delegate: allowed_cache_mode set to 0.
Vx delegate: device num set to 0.
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
WARNING: Fallback unsupported op 32 to TfLite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
W [HandleLayoutInfer:291]Op 162: default layout inference pass.
[[0.21484375 0.16015625 0.09765625 0.0859375 0.07421875 0.05859375
0.05078125 0.046875 0.046875 0.046875 0.046875 0.046875
0.0390625 0.03515625 0.03515625 0.03515625 0.03515625 0.03125
0.03125 0.03125 0.03125 0.02734375 0.02734375 0.02734375
0.02734375]]
4.172280788421631 sec
Log with CPU:
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[[0.20703125 0.16796875 0.09765625 0.078125 0.05859375 0.0546875
0.05078125 0.046875 0.04296875 0.04296875 0.04296875 0.04296875
0.0390625 0.03515625 0.03515625 0.03515625 0.03515625 0.03515625
0.03515625 0.03515625 0.03125 0.03125 0.03125 0.03125
0.02734375]]
0.19845318794250488 sec