NPU Performance Level Log

keikeigd · February 23, 2024, 2:55am

Hello all,

I am referring to NPU Performance Analysis to evaluate the NPU performance when loading the model using the NPU.

This is the command I use:

python3 yolov5s_tflite.py --model /home/khadas/Desktop/yolov5.nb --library /home/khadas/Desktop/libnn_yolov5.so --picture ./data/cars.jpg --level 0`

When I set --level 0, the console log displays without NPU performance data:

 |---+ KSNN Version: v1.3 +---| 
Start init neural network ...
Total processing time: 0.2203, FPS: 4.5403
Total inference time: 0.0880, inference per sec: 11.3604

If I want to see NPU performance in more detail, I set --level 2 . However, both the overall processing time and the NPU inference time are significantly higher.

execution time:            633485 us
[     1] TOTAL_READ_BANDWIDTH  (MByte): 107.082538
[     2] TOTAL_WRITE_BANDWIDTH (MByte): 196.149869
[     3] AXI_READ_BANDWIDTH  (MByte): 64.275588
[     4] AXI_WRITE_BANDWIDTH (MByte): 49.775287
[     5] DDR_READ_BANDWIDTH (MByte): 42.806950
[     6] DDR_WRITE_BANDWIDTH (MByte): 146.374582
[     7] GPUTOTALCYCLES: 501447320
[     8] GPUIDLECYCLES: 407465799
VPC_ELAPSETIME: 633796
*********
Run the 1 time: 645.00ms or 645802.00us
vxProcessGraph execution time:
Total   645.00ms or 645870.00us
Average 645.87ms or 645870.00us
Total processing time: 0.8310, FPS: 1.2033
Total inference time: 0.6696, inference per sec: 1.4935

This indicates that in order to get NPU performance data, the system needs to load, retrieve, calculate, and print a significant amount of data, resulting in higher processing times and lower FPS. Am I correct?

Therefore, if I set --level 0, it performs with the best FPS, but I cannot obtain detailed NPU performance data, right? Is there any way to achieve the best FPS, as seen at level 0, while still being able to gather detailed NPU performance data?

Thank you.