YOLOv8n VIM3 Demo is too slow

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Please provide the version of the system here:

Linux Khadas 5.15.119 #1.6.4 SMP PREEMPT Sat Feb 3 11:21:08 CST 2024 aarch64 aarch64 aarch64 GNU/Linux

Please describe your issue below:

Issue 1

I tried to make a program through the link above.

I modified main.cpp to allow the use of video in programs via existing cameras.
No changes were made except for main.cpp.

Didn’t change anything except main.cpp.

  • Then I measured the processing speed.
    $ ./bin_r_cv4/yolov8n_demo_x11_usb -m ./nn_data/yolov8n_88.nb -v ./road5_640.mp4 -w 640 -h 360
    Preprocess Time: 0.123304 sec
    Inference Time: 0.041984 sec
    Postprocess Time: 0.074343 sec

  • When tested with Python, the speedup was as follows:
    Preprocess Time: 0.0165 sec
    Inference Time: 0.1092 sec
    Postprocess Time: 0.0364 sec

The inference time has been greatly reduced, but conversely, the pre/post-processing time has increased significantly.

Is there something wrong with my settings or code?

I would like to know how fast it works for others when running the demo and if there is a way to make it faster.


Issue 2

I tried converting the model myself by looking at “YOLOv8n VIM3 Demo Lite - 2”.
https://docs.khadas.com/products/sbc/vim3/npu/vim3_demo_lite/yolov8n

The folder attached below was created.

I put vnn_yolov8n.c and vnn_yolov8n.h from the created folder into the video program I modified and compiled it.

  • Then I ran it as follows:
    $ ./bin_r_cv4/yolov8n_demo_x11_usb -m ./nn_data/yolov8n.nb -v ./road5_640.mp4 -w 640 -h 360
    Preprocess Time: 0.120346 sec
    Inference Time: 0.033668 sec
    Postprocess Time: 1.804054 sec

The post-processing time jumped and the accuracy was not at all satisfactory.

My output structure is

[80, 80, 144, 1]
[40, 40, 144, 1]
[20, 20, 144, 1]

And the output structure of the existing demo was like this.

[144, 80, 80, 1]
[144, 40, 40, 1]
[144, 20, 20, 1]

What’s the problem?

Your kind response would be greatly appreciated.

https://drive.google.com/drive/folders/1c1ro2yL3E7nsb3iVYxO5s8xRg8BS21IC?usp=sharing

Here is my source code.

Hello @GHdevlog ,

About Iusse 1, i simply tested our demo that also spend time like you. I push the problem to our engineer. I will tell you at once when i receive feedback.

About Iusse 2, sorry i check the doc has mistake. the right output is (1, 80, 80, 144). Modify the line as follow in head.py.

def forward_export(self, x):
    results = []
    for i in range(self.nl):
        dfl = self.cv2[i](x[i]).contiguous()
        cls = self.cv3[i](x[i]).contiguous()
-        results.append(torch.cat([cls, dfl], 1))
+        results.append(torch.cat([cls, dfl], 1).permute(0, 2, 3, 1))
    return tuple(results)

Thanks to @Louis-Cheng-Liu

Issue 2

I converted the model again according to your instructions.

  • input - [640, 640, 3, 1]
    output - [144, 80, 80, 1]
    [144, 40, 40, 1]
    [144, 20, 20, 1]

I had a lot of thoughts on how to change the dimension order of the output values, but thanks to this, I was able to solve it easily.

Afterwards I replaced the files in my program with the files generated according to your instructions.

  • Preprocess Time: 0.122897 sec
    Inference Time: 0.041305 sec
    Postprocess Time: 0.073177 sec

And when I ran it, the processing speed was as above, and it improved to a speed similar to the demo code.
The predictions were also produced correctly.

I hope that Issue 1 will also be improved and good results will be achieved.

Thank you again.

Hello @GHdevlog ,

About Issue 1, the preprocess and postprocess need to modify. Our engineer will improve the them next two week. If update, i will notice you.

1 Like