Khadas VIM 3 NPU issues

@Louis-Cheng-Liu
@numbqq
@Electr1

import numpy as np
import os
import argparse
import sys
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv

cam_index = 0

names = {
    0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck',
    8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench',
    14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear',
    22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase',
    29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat',
    35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle',
    40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana',
    47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza',
    54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table',
    61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone',
    68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock',
    75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'
}

def draw_bounding_box(frame, box, class_index, confidence):
    pass
if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    parser.add_argument("--library", help="Path to C static library file")
    parser.add_argument("--model", help="Path to nbg file")
    parser.add_argument("--level", help="Information printer level: 0/1/2")

    args = parser.parse_args()
    if args.model:
        if os.path.exists(args.model) == False:
            sys.exit('Model \'{}\' not exist'.format(args.model))
        model = args.model
    else:
        sys.exit("NBG file not found !!! Please use format: --model")
    if args.library:
        if os.path.exists(args.library) == False:
            sys.exit('C static library \'{}\' not exist'.format(args.library))
        library = args.library
    else:
        sys.exit("C static library not found !!! Please use format: --library")
    if args.level == '1' or args.level == '2':
        level = int(args.level)
    else:
        level = 0

    resnet50 = KSNN('VIM3')
    print(' |--- KSNN Version: {} +---| '.format(resnet50.get_nn_version()))
    print('Start init neural network ...')
    resnet50.nn_init(library=library, model=model, level=level)
    print('Done.')

    cap = cv.VideoCapture(cam_index)
    assert cap.isOpened()
    color = (0, 255, 0)
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        frame = cv.resize(frame, (640, 480))
        outputs = resnet50.nn_inference(frame, platform='ONNX', reorder='2 1 0', output_format=output_format.OUT_FORMAT_FLOAT32)
        outputs = outputs[0]
        predictions = np.array(outputs).reshape(8400, 144)
        boxes = predictions[:, :64]
        confidences = predictions[:, 64:]
        confidences = 1 / (1 + np.exp(-confidences))
        predicted_classes = np.argmax(confidences, axis=1)
        predicted_class_names = [names[class_index] for class_index in predicted_classes]
        
        
        cv.imshow('Camera Stream', frame)
        if cv.waitKey(1) & 0xFF == ord('q'):
	        break

    cap.release()
    cv.destroyAllWindows()


im unable to understand why there are 64 elements for box location…
if the model is giving out confidences for 80 classes there should be 80 boxes…
and also the model output is like
[[actual output list of 1209600 elements]]
though that works out to be 144x8400
but please tell me why there are 64 elements for boxes

it would be great if you could help out with postprocessing of the output…

Hello @Arjun_Gupta ,

Yes, 1209600=144×8400.

64 elements need to be decoded to 4 elements. This part is about decoding process.

First, you need to divide it into 4 parts. Each part has 16 elements. Then, each part does softmax and multiplies a constant matrix. The constant matrix is 0-15, shape(1, 16). You will has 4 elements. But these elements only are box information. So, third, add location information. Name them x1, y1, x2, y2 in order. The formula is as follows

box_left = (j + 0.5 - x1) * stride / input.shape[1]
box_top = (i + 0.5 - y1) * stride / input.shape[0]
box_right = (j + 0.5 + x2) * stride / input.shape[1]
box_bottom = (i + 0.5 + y2) * stride / input.shape[0]

i and j are the coordinates of the feature map and stride is convolution reduction ratio. YOLO has three kinds of strides, 8, 16, 32. So, it has three different sizes of feature maps. For your model, input is 640×640, so the sizes are 640×640 divided by 8, 16, 32; 80×80, 40×40, 20×20. For example, if in 40×40 map, the coordinates is [9, 0] ([i, j]).

box_left = (0 + 0.5 - x1) * 16 / 640
box_top = (9 + 0.5 - y1) * 16 / 640
box_right = (0 + 0.5 + x2) *16 / 640
box_bottom = (9 + 0.5 + y2) *16 / 640

About how to know this element is in what map and what location, 8400 is 80×80+40×40+20×20. So, [:, :6400] is 80 map. [:, 6400:8000] is 40 map. [:, 8000:] is 20 map. The elements arrange in order of j to i. For example, the elements in 0-79 is [0, 0] to [0, 79], 80-159 is [1, 0] to [1, 79].

A suggestion. Your model input is 640×640 and KSNN will force to resize your input to this size. In order to avoid distortion, i suggest you padding the input before providing it to KSNN.

A mistake. You reshape output should be the same as model output.
image

- predictions = np.array(outputs).reshape(8400, 144)
+ predictions = np.array(outputs).reshape(144, 8400)

Hello @Arjun_Gupta ,

Maybe i do not explain clearly. You can try to search ‘YOLOv8 postprocess’ and find some other explanation.

The YOLOv8 KSNN demo will release this week. If it is too complex for you, you can wait for the demo.

1 Like

Hello @Arjun_Gupta ,

Sorry, my explanation has a mistake. I have modify it.

- box_left = (j + 0.5 - x1) / stride
- box_top = (i + 0.5 - y1) / stride
- box_right = (j + 0.5 + x2) / stride
- box_bottom = (i + 0.5 + y2) / stride
+ box_left = (j + 0.5 - x1) * stride / input.shape[1]
+ box_top = (i + 0.5 - y1) * stride / input.shape[0]
+ box_right = (j + 0.5 + x2) * stride  / input.shape[1]
+ box_bottom = (i + 0.5 + y2) * stride / input.shape[0]

@Louis-Cheng-Liu
this code is what I came up with
for some reason the x, y, width and height variables are in magnitudes of 10^-5
what is causing this?

import numpy as np
import os
import argparse
import sys
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv

cam_index = 1

names = {
    0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck',
    8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench',
    14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear',
    22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase',
    29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat',
    35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle',
    40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana',
    47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza',
    54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table',
    61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone',
    68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock',
    75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'
}

def decode_boxes(predictions, input_shape, strides):
    num_classes = 80
    boxes_per_anchor = 4

    predictions = np.array(predictions).reshape(144, 8400)
    decoded_boxes = []

    for i in range(len(predictions)):
        class_confidences = predictions[i][:80]
        box_data = predictions[i][80:]

        class_probs = np.exp(class_confidences) / np.sum(np.exp(class_confidences), axis=-1)

        box_data = np.exp(box_data) / np.sum(np.exp(box_data), axis=-1)

        box_data = box_data.reshape(-1, boxes_per_anchor)

        x_data, y_data, width_data, height_data = box_data.T

        decoded_boxes.append(np.stack([x_data, y_data, width_data, height_data], axis=-1))

    decoded_boxes = np.array(decoded_boxes)

    boxes = []
    for i in range(decoded_boxes.shape[0]):
        for j in range(decoded_boxes.shape[1]):
            box = decoded_boxes[i, j, :]

            x, y, width, height = box

            box_left = (j + 0.5 - x) * strides / input_shape[1]
            box_top = (i + 0.5 - y) * strides / input_shape[0]
            box_right = (j + 0.5 + width) * strides / input_shape[1]
            box_bottom = (i + 0.5 + height) * strides / input_shape[0]

            boxes.append([box_left, box_top, box_right, box_bottom])

    return np.array(boxes)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    parser.add_argument("--library", help="Path to C static library file")
    parser.add_argument("--model", help="Path to nbg file")
    parser.add_argument("--level", help="Information printer level: 0/1/2")

    args = parser.parse_args()
    if args.model:
        if not os.path.exists(args.model):
            sys.exit('Model \'{}\' not exist'.format(args.model))
        model = args.model
    else:
        sys.exit("NBG file not found !!! Please use format: --model")
    if args.library:
        if not os.path.exists(args.library):
            sys.exit('C static library \'{}\' not exist'.format(args.library))
        library = args.library
    else:
        sys.exit("C static library not found !!! Please use format: --library")
    if args.level == '1' or args.level == '2':
        level = int(args.level)
    else:
        level = 0

    resnet50 = KSNN('VIM3')
    print(' |--- KSNN Version: {} +---| '.format(resnet50.get_nn_version()))
    print('Start init neural network ...')
    resnet50.nn_init(library=library, model=model, level=level)
    print('Done.')

    cap = cv.VideoCapture(cam_index)
    assert cap.isOpened()
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        frame = cv.resize(frame, (640, 640))

        outputs = resnet50.nn_inference(frame, platform='ONNX', reorder='2 1 0', output_format=output_format.OUT_FORMAT_FLOAT32)
        
        #decode boxes
        input_shape = (640, 640)
        strides = 8
        decoded_boxes = decode_boxes(outputs[0], input_shape, strides)

        #bounding boxes
        for box in decoded_boxes:
            box_left, box_top, box_right, box_bottom = box
            color = (0, 255, 0)  # Green color for bounding boxes
            label = "{}".format(names[np.argmax(box)])
            cv.rectangle(frame, (int(box_left), int(box_top)), (int(box_right), int(box_bottom)), color, 2)
            cv.putText(frame, label, (int(box_left), int(box_top) - 5), cv.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

        cv.imshow('Camera Stream', frame)
        if cv.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv.destroyAllWindows()

Hello @Arjun_Gupta ,


First, strides have three, [8, 16, 32].


Second, the predictions you reshape into [144, 8400], so the len(predictions) is 144 and the shape of predictions[i][80:] is[1, 8320].

Third, you need to do softmax for four part separately. Divide [64, :] into [16, :]×4 and do softmax for each part.

Last, you missed this step.

Blockquote
Then, each part does softmax and multiplies a constant matrix. The constant matrix is 0-15, shape(1, 16).

The constant matrix.
image

YOLOv8 demo has been released last Friday. You can refer it.
khadas/ksnn: Khadas Software Neural Network (github.com)

Docs
YOLOv8n KSNN Demo - 2 [Khadas Docs]

This demo’s model output is [1, 144, 80, 80], [1, 144, 40, 40], [1, 144, 20, 20]. You should separate and reshape your output to them.

1 Like

@Louis-Cheng-Liu
regarding the YOLOv8 demo i’m recieving this error:

[22663] Failed to execute script pegasus

Traceback (most recent call last):

  File "pegasus.py", line 131, in <module>

  File "pegasus.py", line 112, in main

  File "acuitylib/app/importer/commands.py", line 245, in execute

  File "acuitylib/vsi_nn.py", line 171, in load_onnx

  File "acuitylib/app/importer/import_onnx.py", line 123, in run

  File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__

  File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference

  File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape

  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner

  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info

  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 636, in multi_direction_broadcast_shape

ValueError: operands could not be broadcast together with shapes (1,0,160,160) (1,16,160,160)

Hello @Arjun_Gupta ,

Is it report when convert model? Could you provide your new model?

Another thing, you do not have to modify model. [1, 144, 8400] is okay, too. Divide and reshape it to [1, 144, 80, 80], [1, 144, 40, 40] and [1, 144, 20, 20].

output_1 = output[:, :, :6400].reshape(1, 144, 80, 80)
output_2 = output[:, :, 6400:8000].reshape(1, 144, 40, 40)
output_3 = output[:, :, 8000:].reshape(1, 144, 20, 20)

@Louis-Cheng-Liu
this issue occurs when i try to convert a stock yolov8n model

https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt

onnx format-

Hello @Arjun_Gupta ,

Have you done this step?

Your onnx model do not remove the structure.

I use your pt model to convert successfully.

@Louis-Cheng-Liu
Yes i edited the file as required
and i tried both models one with the problematic structure and one without both have same error

Hello @Arjun_Gupta ,

Do you use official code? Provide your code please.

@Louis-Cheng-Liu
No i meant both models give error when i try to convert them via the acuity tool kit
the error was same whether i removed the problematic structure from the model or not
the error is trying to convert yolov8n.onnx model

also when I run the example of yolov8n it is really laggy
could you please verify if this issue is on my side or is hardware limitation

Hello @Arjun_Gupta ,

I can reproduce your problem when i use your onnx model to convert. But i use your pt model to convert onnx and then to convert nb. It is successful. So i think the problem occur in your convert code and hope you provide your train code.

Can the ‘laggy’ be more specific? Infer one picture spend much time or load model spend much time?

Hi @Louis-Cheng-Liu
I’m using the same code as in the guide to convert the model to onnx:

from ultralytics import YOLO
model = YOLO("./runs/detect/train/weights/best.pt")
results = model.export(format="onnx")

The model loading time is ok the but the inference time is 60-70 ms which is about 15 FPS

Hello @Arjun_Gupta ,

What the version of PyTorch and ONNX library are you use? My PyTorch version is 1.10.1 and ONNX is 1.14.0.

On my VIM3, the inference time is 50-70ms.

Modify the opset version and try to convert again.

import onnx

model = onnx.load("./yolov8n.onnx")
model.opset_import[0].version = 12

onnx.save(model, "./yolov8n_1.onnx")

@Louis-Cheng-Liu
On the VM in which i convert the models using ksnn:

torch==2.1.2
onnx==1.13.0

in the env where im converting from torch to onnx:

torch==2.1.2
onnx==1.15.0

BTW what OS/ distribution are you using?
I’m using ubuntu server with LXDe since with gNOME the inferencing was even slower
could it be a power issue?

also i tried

import onnx

model = onnx.load("./yolov8n.onnx")
model.opset_import[0].version = 12

onnx.save(model, "./yolov8n_1.onnx")

and i got this error:

Traceback (most recent call last):
  File "pegasus.py", line 131, in <module>
  File "pegasus.py", line 112, in main
  File "acuitylib/app/importer/commands.py", line 245, in execute
  File "acuitylib/vsi_nn.py", line 171, in load_onnx
  File "acuitylib/app/importer/import_onnx.py", line 123, in run
  File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__
  File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference
  File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1100, in reshape_shape
  File "<__array_function__ internals>", line 6, in reshape
  File "numpy/core/fromnumeric.py", line 301, in reshape
  File "numpy/core/fromnumeric.py", line 61, in _wrapfunc
ValueError: cannot reshape array of size 604800 into shape (1,4,16,8400)

Hello @Arjun_Gupta ,

Do you use this official codes?

Could you provide your whole codes? 604800 is 72×8400. It should not appear in YOLOv8.

The kernel i use is 5.15.
https://dl.khadas.com/products/vim3/firmware/ubuntu/emmc/vim3-ubuntu-22.04-gnome-linux-5.15-fenix-1.6-231229-emmc.img.xz

What is you use? Is it much slower than mine?

Hi @Louis-Cheng-Liu

after a reinstall everything is working fine the inference time is around 40ms for me with a USB-PD 30W supply and XFCE with ubuntu server
thanks a lot

Hi @Louis-Cheng-Liu ,
I am stuck in a problem and i need your help. The problem is that i have a model which i want to run on older version of fenix-release. So in order to get the required Galcore driver and to run the required aml-npu library which is need for the model i had to upgrade the kernel of this older fenix-release which is shown below.
Now i used :
linux-dtb-amlogic-4.9_1.0.9_arm64.deb
linux-image-amlogic-4.9_1.0.9_arm64.deb
linux-headers-amlogic-4.9_1.0.9_arm64.deb
For kernel upgrade after sync and reboot my wifi seems to be not working. I dont know why.

cat /etc/fenix-release
# PLEASE DO NOT EDIT THIS FILE
BOARD=VIM3
VENDOR=Amlogic
VERSION=0.8.3
ARCH=arm64
INITRD_ARCH=arm64
INSTALL_TYPE=EMMC
IMAGE_RELEASE_VERSION=V0.8.3-20200302

swarmx@P100:/etc$ cat os-release
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

And this is some logs i am seeing after reboot when the kernel upgraded: