[NPU] Converted and compiled YOLOv8 model does not detect anything

Which Khadas SBC do you use?

VIM3 Pro

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Khadas official images, self built images, or others?

vim3-ubuntu-20.04-gnome-linux-4.9-fenix-1.4-221229.img

Please describe your issue below:

Dear all, @Frank

Hi. I’m trying to run yolov8 on an NPU, but I’m having trouble.

I converted the yolov8n.onnx weights to .nb, .so files using the commands below from that repository.
The weights used in the conversion are in the corresponding Google Drive.

./convert \
    --model-name yolov8n \
    --platform onnx \
    --model yolov8n.onnx \
    --source-files ./data/dataset/dataset0.txt \
    --mean-values '0 0 0 0.0039' \
    --quantized-dtype asymmetric_affine \
    --kboard VIM3 --print-level 1 \
    --input-size-list '640,640,3'

After performing object detection using the converted weights and the library, I checked the output tensor and found that only bbox coordinates were present and the classification scores were all 0.

Below is the code we ran and the output

Code

from ultralytics import YOLO
from ultralytics.yolo.v8.detect.predict import DetectionPredictor
from PIL import Image
import cv2
import time
import argparse
import torch

from ksnn.api import KSNN
from ksnn.types import *
from test import non_max_suppression

def detect_camera():
    cap = cv2.VideoCapture(1)
    cap.set(3,1920)
    cap.set(4,1080)
    while 1:
        cv_img = list()
        ret, img = cap.read()
        cv_img.append(img)
        
        start = time.time()
        data = yolov8.nn_inference(cv_img, platform="ONNX", reorder="2 1 0", output_format=output_format.OUT_FORMAT_FLOAT32)
        print("inferece time :", time.time() - start)
    
        data = data[0]
        data.reshape(1, 84, 8400)
        print(data.shape())

        cv2.imshow("capture", img)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

def detect_picture(model, picture):
    
    img = cv2.imread(picture, cv2.IMREAD_COLOR)
    
    start = time.time()
    data = yolov8.nn_inference([img], platform="ONNX", reorder="2 1 0", output_tensor=1, output_format=output_format.OUT_FORMAT_FLOAT32)
    print("inferece time :", time.time() - start)
     
    print(len(data))
    print(len(data[0]))
    data = data[0]

    # transpose
    data = data.reshape(84, 8400).transpose()
    print(data[:, :10])



if __name__ == "__main__": 
    parser = argparse.ArgumentParser()
    parser.add_argument("--library", help="Path to C static library file")
    parser.add_argument("--model", help="Path to nbg file")
    parser.add_argument("--device", help="webcam num")
    parser.add_argument("--picture", help="Path to input picture")
    parser.add_argument("--level", help="Information printer level: 0/1/2")
        
    args = parser.parse_args()

    yolov8 = KSNN("VIM3")
    yolov8.nn_init(library=args.library, model=args.model, level=0)

    if args.device:
        detect_camera(yolov8)
    else:
        detect_picture(yolov8, args.picture)
khadas@Khadas:~/ultralytics/ultralytics/yolo/v8/detect$ python3 yolov8-npu.py --library ~/ksnn_weight/0_0_0_0.0039/libnn_yolov8n.so --model ~/ksnn_weight/0_0_0_0.0039/yolov8n.nb --picture ~/yolov5/data/images/bus.jpg

Output

/usr/local/lib/python3.8/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
inferece time : 0.058927059173583984
1
705600
[[     4.9911      7.4867      7.4867 ...           0           0           0]
 [      22.46      4.9911       44.92 ...           0           0           0]
 [     32.442      4.9911      62.389 ...           0           0           0]
 ...
 [     519.08       618.9      249.56 ...           0           0           0]
 [     549.03      598.94      209.63 ...           0           0           0]
 [     598.94      578.97       89.84 ...           0           0           0]]

I’ve verified that all the other code in ksnn/example works fine.

I changed the mean-values option and ran several experiments, but the results are the same.
Is the code applying the converted weight file wrong?
Or is my conversion method wrong?

Additionally, I would appreciate a list of webcams that are compatible with Khadas VIM3 Pro.
Thank you.

@juppi I think you need to setup the output tensor.

data = yolov3.nn_inference(cv_img, platform='DARKNET', reorder='2 1 0', output_tensor=3, output_format=output_format.OUT_FORMAT_FLOAT32)

If your yolov8 model same as yolov3, the ouyput tensor is 3.

1 Like

Thanks for the answer!

However, the yolov8 model I use is different from yolov3, so when I adjust the output tensor value to 2 or 3 and run it, I get an error.

The problem here seems to be not the shape of the output tensor, but the predictions inside it all come out as zero. Is this because I converted the weight file from .onnx?

I ask because predictions imported from a custom yolov3 (.cfg, .weight) with weights converted to .nb, .so come out fine, but I’m wondering if it’s the difference between darknet and onnx.

I’m also wondering if the yolov8 model is incompatible with KSNN since yolov8 came out recently