Which Khadas SBC do you use?
VIM3 Pro
Which system do you use? Android, Ubuntu, OOWOW or others?
Ubuntu
Which version of system do you use? Khadas official images, self built images, or others?
vim3-ubuntu-20.04-gnome-linux-4.9-fenix-1.4-221229.img
Please describe your issue below:
Dear all, @Frank
Hi. I’m trying to run yolov8 on an NPU, but I’m having trouble.
I converted the yolov8n.onnx
weights to .nb
, .so
files using the commands below from that repository.
The weights used in the conversion are in the corresponding Google Drive.
./convert \
--model-name yolov8n \
--platform onnx \
--model yolov8n.onnx \
--source-files ./data/dataset/dataset0.txt \
--mean-values '0 0 0 0.0039' \
--quantized-dtype asymmetric_affine \
--kboard VIM3 --print-level 1 \
--input-size-list '640,640,3'
After performing object detection using the converted weights and the library, I checked the output tensor and found that only bbox coordinates were present and the classification scores were all 0.
Below is the code we ran and the output
Code
from ultralytics import YOLO
from ultralytics.yolo.v8.detect.predict import DetectionPredictor
from PIL import Image
import cv2
import time
import argparse
import torch
from ksnn.api import KSNN
from ksnn.types import *
from test import non_max_suppression
def detect_camera():
cap = cv2.VideoCapture(1)
cap.set(3,1920)
cap.set(4,1080)
while 1:
cv_img = list()
ret, img = cap.read()
cv_img.append(img)
start = time.time()
data = yolov8.nn_inference(cv_img, platform="ONNX", reorder="2 1 0", output_format=output_format.OUT_FORMAT_FLOAT32)
print("inferece time :", time.time() - start)
data = data[0]
data.reshape(1, 84, 8400)
print(data.shape())
cv2.imshow("capture", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
def detect_picture(model, picture):
img = cv2.imread(picture, cv2.IMREAD_COLOR)
start = time.time()
data = yolov8.nn_inference([img], platform="ONNX", reorder="2 1 0", output_tensor=1, output_format=output_format.OUT_FORMAT_FLOAT32)
print("inferece time :", time.time() - start)
print(len(data))
print(len(data[0]))
data = data[0]
# transpose
data = data.reshape(84, 8400).transpose()
print(data[:, :10])
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--library", help="Path to C static library file")
parser.add_argument("--model", help="Path to nbg file")
parser.add_argument("--device", help="webcam num")
parser.add_argument("--picture", help="Path to input picture")
parser.add_argument("--level", help="Information printer level: 0/1/2")
args = parser.parse_args()
yolov8 = KSNN("VIM3")
yolov8.nn_init(library=args.library, model=args.model, level=0)
if args.device:
detect_camera(yolov8)
else:
detect_picture(yolov8, args.picture)
khadas@Khadas:~/ultralytics/ultralytics/yolo/v8/detect$ python3 yolov8-npu.py --library ~/ksnn_weight/0_0_0_0.0039/libnn_yolov8n.so --model ~/ksnn_weight/0_0_0_0.0039/yolov8n.nb --picture ~/yolov5/data/images/bus.jpg
Output
/usr/local/lib/python3.8/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
inferece time : 0.058927059173583984
1
705600
[[ 4.9911 7.4867 7.4867 ... 0 0 0]
[ 22.46 4.9911 44.92 ... 0 0 0]
[ 32.442 4.9911 62.389 ... 0 0 0]
...
[ 519.08 618.9 249.56 ... 0 0 0]
[ 549.03 598.94 209.63 ... 0 0 0]
[ 598.94 578.97 89.84 ... 0 0 0]]
I’ve verified that all the other code in ksnn/example
works fine.
I changed the mean-values
option and ran several experiments, but the results are the same.
Is the code applying the converted weight file wrong?
Or is my conversion method wrong?
Additionally, I would appreciate a list of webcams that are compatible with Khadas VIM3 Pro.
Thank you.