Strange Result of NPU Inference on YOLOv8n POSE

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Khadas official images, self built images, or others?

Khadas Official Image from OOWOW
Linux Khadas 5.10.160 #1.6.2

Please describe your issue below:

The outputs from inferencing YOLOv8n POSE have some deficiencies, the output format is 8600 rows of 56 attributes, the first 4 for a bounding box, the next a confidence, then 17 keypoints (2 spatial and a confidence value). This is similar to the YOLOv8 detect output which is the 4 bounding box attributes and then the rest are the confidence for all the classes. The output I get is completely 0.00 for all of the confidence values and bounding boxes and has concerning rows and columns of 0.00 that are unexpected (these may correspond to keypoint confidence). Here are some example row outputs.

[  0.          0.          0.          0.          0.        227.57094
   7.7581      0.        230.15697     5.1720667   0.        224.9849
   5.1720667   0.        232.743       0.          0.        222.39886
   2.5860333   0.        235.32904     5.1720667   0.        217.2268
   5.1720667   0.        237.91507    18.102234    0.        214.64076
  18.102234    0.        230.15697    20.688267    0.        219.81284
  20.688267    0.        232.743      20.688267    0.        222.39886
  20.688267    0.        232.743      10.344133    0.        222.39886
  10.344133    0.        232.743      23.2743      0.        224.9849
  23.2743      0.       ]
> /home/khadas/pose_perf/yolo_test.py(363)<module>()
-> for row in range(out.shape[0]):
(Pdb) c
[  0.          0.          0.          0.          0.        237.91507
   7.7581      0.        240.5011      5.1720667   0.        235.32904
   5.1720667   0.        240.5011      0.          0.        230.15697
   2.5860333   0.        245.67317     5.1720667   0.        227.57094
   5.1720667   0.        248.2592     18.102234    0.        224.9849
  18.102234    0.        240.5011     20.688267    0.        230.15697
  20.688267    0.        243.08713    20.688267    0.        232.743
  20.688267    0.        243.08713    10.344133    0.        230.15697
  10.344133    0.        243.08713    23.2743      0.        235.32904
  23.2743      0.       ]

Here is an example of what this output should look like, even though the confidence values for the row are low, the bounding box coordinates are not near zero.

(Pdb) p np_out[0]
array([     12.353,      16.379,       36.39,       46.78,  4.5582e-06,      4.6046,     -8.2559,     0.14382,      4.6362,     -9.6881,     0.12751,      4.0202,     -9.5472,     0.12229,      2.6991,     -8.9242,     0.23624,      5.5221,     -8.7413,     0.22851,      3.0457,     -3.6397,     0.37018,       5.138,
           -3.5511,     0.36872,      4.2668,     -1.6557,     0.18525,      4.4674,     -1.5006,     0.18891,      3.7493,     -0.1979,     0.15563,      5.2914,    -0.15677,     0.16213,      4.0072,      5.2005,     0.21152,      5.5032,      5.3291,     0.21354,      4.6373,      8.0499,      0.1262,      4.2513,
            8.4074,     0.12954,      5.0615,      14.274,      0.1409,      4.5339,      14.378,     0.14318], dtype=float32)

My inference results with the YOLOv8 detect model are as expected and don’t have conspicuous zero rows.

Hello @Eliasin

@Louis-Cheng-Liu will help you then.

Hello @Eliasin ,

Could you provide your original model and RKNN model? From your description, i can not make sure what is wrong with it.

It seems I can only upload pictures so here are dropbox links to the converted and original model. I used version 1.6.0 of the rknn_toolkit2 and librknnrt.so

Following some other’s advice on YOLOv8 pose, it seems like the issue stems from the quantization having issues representing the outputs because the confidence scores are normalized 0-1 while the other values are dependent on input image size, which in this case is (640, 640). I scaled the input down to (224, 224) and got some ok results but I’m going to check the int16 support because (224, 224) is quite small for what I’m doing.

Let me know if you think there might be some other factors impacting the results.

Hello @Eliasin ,

I think it is the problem of RKNN convert tool quantitation. We have converted YOLOv8-detect model. At that time, we meet the same problem. The scores of conf are all zero. But when we convert RKNN model without quantitation, it can inference the right result. We guess the problem is precision loss caused by quantitation. We find the problem in the last structure.

So, the solution we use is get rid of the last structure and do the same cooperation as last structure done by ourselves. You can refer our YOLOv8n doc and try it in YOLOv8-pose.
YOLOv8n OpenCV Edge2 Demo - 2 [Khadas Docs]

If you think it is too complex for you, you can convert RKNN model without quantitation. However, it will cause infer time longer.

Do you have the steps you took to convert yolov8 to rknn. I have followed pretty much every guide I can find but always end with up getting an error that deals something with can’t do something to argb and it fails. Would really love a step by step guide so I can figure out what I’m doing wrong.