Vim3 NPU work time

Hola!
Previously thank you very much for your help with yolov8n and custom dataset, but here I found a new problem.
The inference of Vim3 Pro on Yolov8n takes 0.06sec (on my data it’s 0.04sec) and wow! that’s fast!
But the leftover process of working with data takes 0.338 - 0.06 = 0.278sec

And that’s a lot. When I’m running khadas on my ip cam, he catches only 3 frames per second. And wtf? My OrangePi’s CPU was giving me at least 4 FPS.

Here I have three options:

  1. I’m doing something wrong and Khadas NPU isn’t working (I’m running it on ksnn → your example script + converted weights)
  2. You have the solution how to power it up
  3. It won’t work faster => I’m throwing it in the garbage and move to RP

Waiting for your advice

Hello @Agent_kapo

The issue lies mainly in the bottleneck introduced by post-processing of the outputs, the NPU is definitely in action and doing inference with the model.

Python itself could be attributed to the reason of the slow processing as it’s a single threaded for doing the post process, so this performance drop is platform agnostic.

The KSNN code samples for YOLOv8n is considered as a template for creating an optimized model that will run on the NPU.

There are some ways you can try to improve the efficiency of this post processing.

  • Implementing your own multi-threaded post processing code, a recommendation would be with python multiprocessing lib, optimized python interpreter like pypy, or numba, or even something more experimental like your own OpenCL post processing…

please note the number of possible ways to do the post processing steps is huge, and we have not gotten to test all of them, but it would be interesting to come up with something innovative.

Happy Holidays

Thank you very much for your advice, actually I already was thinking about numba
I’ll try numba and subprocesses, haven’t worked with OpenCL, but maybe I’ll try

Happy Holidays!

Hello! Were you able to speed it up? The demo’s post processing is too slow for actual use. It is doing slow matrix calculations. I wasn’t able to optimzie it.