Slow NPU Inference Performance using YOLOV8n Compared to Benchmark (Using rknnlite API)

Eliasin · February 22, 2024, 8:20pm

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Khadas official images, self built images, or others?

Latest Khadas Ubuntu image installed with OOWOW

Please describe your issue below:

From NPU benchmarks of running YOLOv8n I expected to see each inference take ~100ms. When I perform inferences using the rknnlite api provided by RockChip I am seeing times that are at a minimum ~300ms.

        start_time = time.time()
        outputs = self.rknn_lite.inference(inputs=[frame])
        end_time = time.time()
        print(f"Inference duration: {end_time - start_time}")

This is measured exactly before and after the call to the RKNNLite detector object. self.rknn_lite is an instance of rknnlite.api.RKNNLite. I am using version 1.6.0 of the rknn wheel and librknnrt.so from the rknn-toolkit2 repository. The model I am using was created from the conversion script mentioned in the rknn model zoo repository. I am assuming that the yolov8n model in the edge2-npu repository is made in the same way with the only difference being the RKNN library version supported being 1.4.0 in the edge2 -npu repository.

During running I can see that my NPU load is only at about 20% across all 3 cores and my CPU is not loaded.

numbqq · February 23, 2024, 6:58am

Hello @Eliasin

@Louis-Cheng-Liu will help you then.

Louis-Cheng-Liu · February 23, 2024, 10:16am

Hello @Eliasin ,

I have not used rknnlite.api.RKNNLite. If you think it infers too slow, you can try to use edge2-npu/C++/yolov8n. It spends 20ms per frame.

Eliasin · February 23, 2024, 4:16pm

Do you happen to have any examples of a CPython binding that would allow me to infer/postprocess the results in C but interpret the results as something like a numpy array in Python? The rest of my application logic is in Python and while I can rewrite it or do some IPC it’d be nicer if I didn’t have to.

Louis-Cheng-Liu · February 27, 2024, 9:07am

Hello @Eliasin ,

Sorry, we do not have the example you say.

I simply run YOLOv8n model by rknn_lite.inference on egde2, but only about 32ms per frame.