Hola!
Previously thank you very much for your help with yolov8n and custom dataset, but here I found a new problem.
The inference of Vim3 Pro on Yolov8n takes 0.06sec (on my data it’s 0.04sec) and wow! that’s fast!
But the leftover process of working with data takes 0.338 - 0.06 = 0.278sec
The issue lies mainly in the bottleneck introduced by post-processing of the outputs, the NPU is definitely in action and doing inference with the model.
Python itself could be attributed to the reason of the slow processing as it’s a single threaded for doing the post process, so this performance drop is platform agnostic.
The KSNN code samples for YOLOv8n is considered as a template for creating an optimized model that will run on the NPU.
There are some ways you can try to improve the efficiency of this post processing.
Implementing your own multi-threaded post processing code, a recommendation would be with python multiprocessing lib, optimized python interpreter like pypy, or numba, or even something more experimental like your own OpenCL post processing…
please note the number of possible ways to do the post processing steps is huge, and we have not gotten to test all of them, but it would be interesting to come up with something innovative.
Thank you very much for your advice, actually I already was thinking about numba
I’ll try numba and subprocesses, haven’t worked with OpenCL, but maybe I’ll try
Hello! Were you able to speed it up? The demo’s post processing is too slow for actual use. It is doing slow matrix calculations. I wasn’t able to optimzie it.