Help getting started with VIM3 NPU Usage

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Please provide the version of the system here:

20.04

Please describe your issue below:

I am new to the Vim3/AMLogic platform.I have an app running a trained densenet121 based model in onnx format that I previously ran on a Jetson Nano board and am in the process of porting it to the Khadas Vim3 and I’m confused on where to start.

I assume I have to convert my onnx model to KSNN format, but I’m unclear on where to get the mean values from for the converter, and I’m very confused on where to get started from. Is the DDK_6.4.8.7_SDK API doc in the aml_npu_sdk archive the right place to start?

Is there a way to run the onnx as is or does it absolutely have to be converted to KSNN format?

Basically I’m just get confused getting started understanding what I need to do to port my model to the VIM3 NPU.

Hello @mjasner, welcome to the community

You can check the Model Transcoding and Running User Guide

Instead of /bin/pegasus, you need to use the convert tool is present inside SDK/acuity-toolkit/python

The mean value is based on the model quantization, since it will be UINT8, the mean value of 1 in +127/-128, is 1/128 or 0.0078125

P.S For the model sample input, you can use .npy format input sample, which you specify in dataset.txt

Hello @mjasner ,

Suggest you refer doc Instructions for KSNN conversion tool [Khadas Docs] to convert your model.

For mean and values, it is decided by setting when the model trains. Before get input to model, you must do normalization for input data. You can get the mean and values from your infer codes or from your training codes.

If you want to infer model by NPU, you must use VIM3 convert tool to convert model to nb. ONNX only infer by CPU.

Here is all doc for VIM3 NPU. You can refer them.
VIM3/3L NPU Notes [Khadas Docs]

If you have any question, you can ask me for help.

1 Like

Excellent! Thank you for the help and the quick replies, guys! I’m sure I’ll have a lot more questions but this should be enough to get me started and I really appreciate the help.

Thanks again!

Two further question on this topic. What is the dataset parameter for? What kind of images do I need to provide?

Also, if the model is pre-trained, how are the weights incorporated into the transcoding process?

Hello @mjasner ,

dataset is a txt file which write your quantified images path. The quantified images have better use your working scenario image. The number is about 200-500.

For pre-trained model, you need to convert it into inference model first. Then convert the inference model into ONNX format (Suggest converting). At last use convert tool convert to nb model.

So are the files being passed in through the dataset argument a training dataset, or just representative of what the input from the app will look like (in my case this is frames captures from an attached camera)?

Hello @mjasner ,

You need to save the frames into images, and then write the path of images in a txt file.

txt file
image

data folder

Ok, so there isn’t anything special about these images other than they are sample inputs from the application. That makes sense. I am going to finish setting up the conversion environment and try it. I’m sure I’ll have more questions after that! I appreciate all of the help so far.

I gathered all (at least I think it’s all) of the things I will need to convert my onnx file to NB but I’m getting an error I’m not sure about. I created a new 0_import_model.sh based on the examples in the Model Transcoding and Running User Guide. I did just the first command and so my shell script looks like this:

#!/bin/bash

NAME=pose_densenet121_body
ACUITY_PATH=../bin/

pegasus=${ACUITY_PATH}pegasus
if [ ! -e "$pegasus" ]; then
    pegasus=${ACUITY_PATH}pegasus.py
fi

$pegasus import onnx --model ./model/${NAME}.onnx \
 --output-data ${NAME}.data --output-model ${NAME}.json \
 --inputs "input" --input-size-list "1, 3, 256, 256"\
 --outputs "cmap paf"

When I try and run that (via the convert-in-docker.sh script) I get the following output:

$ sudo ./convert-in-docker.sh
docker run -it --name npu-vim3 --rm -v /home/marc/src/khadas/workspace/aml_npu_sdk:/home/khadas/npu -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro -v /home/root:/home/root numbqq/npu-vim3
2025-02-20 21:18:46.017828: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/khadas/npu/acuity-toolkit/bin/acuitylib
2025-02-20 21:18:46.017855: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I Namespace(import='onnx', input_dtype_list=None, input_size_list='1, 3, 256, 256', inputs='input', model='./model/pose_densenet121_body.onnx', output_data='pose_densenet121_body.data', output_model='pose_densenet121_body.json', outputs='cmap paf', size_with_batch=None, which='import')
I Start importing onnx...
WARNING: ONNX Optimizer has been moved to https://github.com/onnx/optimizer.
All further enhancements and fixes to optimizers will be done in this new repo.
The optimizer code in onnx/onnx repo will be removed in 1.9 release.

W Call onnx.optimizer.optimize fail, skip optimize
I Current ONNX Model use ir_version 6 opset_version 9
I Call acuity onnx optimize 'eliminate_option_const' success
W Call acuity onnx optimize 'froze_const_branch' fail, skip this optimize
I Call acuity onnx optimize 'froze_if' success
I Call acuity onnx optimize 'merge_sequence_construct_concat_from_sequence' success
I Call acuity onnx optimize 'merge_lrn_lowlevel_implement' success
Traceback (most recent call last):
  File "pegasus.py", line 131, in <module>
  File "pegasus.py", line 112, in main
  File "acuitylib/app/importer/commands.py", line 245, in execute
  File "acuitylib/vsi_nn.py", line 171, in load_onnx
  File "acuitylib/app/importer/import_onnx.py", line 123, in run
  File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__
  File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference
  File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1317, in conv_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1287, in _conv_shape
IndexError: list index out of range
[9] Failed to execute script pegasus

The same error occurs when I try and use the ksnn python converter:

$ ./convert --model-name torch-jit-export --platform onnx --model /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/demo/model/pose_densenet121_body.onnx  --input-size-list '1,3,256,256' --inputs input --outputs "'cmap paf'" --mean-values  "128 128 128 0.0078125" --quantized-dtype dynamic_fixed_point --qtype int8 --source-files /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/demo/mjdataset.txt --kboard VIM3 --print-level 0


--+ KSNN Convert tools v1.4 +--


Start import model ...
2025-02-20 21:32:36.782140: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/bin/acuitylib:/tmp/_MEIFLEoqn
2025-02-20 21:32:36.782166: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I Namespace(import='onnx', input_dtype_list=None, input_size_list='1,3,256,256', inputs='input', model='/home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/demo/model/pose_densenet121_body.onnx', output_data='Model.data', output_model='Model.json', outputs='cmap paf', size_with_batch=None, which='import')
I Start importing onnx...
WARNING: ONNX Optimizer has been moved to https://github.com/onnx/optimizer.
All further enhancements and fixes to optimizers will be done in this new repo.
The optimizer code in onnx/onnx repo will be removed in 1.9 release.

W Call onnx.optimizer.optimize fail, skip optimize
I Current ONNX Model use ir_version 6 opset_version 9
I Call acuity onnx optimize 'eliminate_option_const' success
W Call acuity onnx optimize 'froze_const_branch' fail, skip this optimize
I Call acuity onnx optimize 'froze_if' success
I Call acuity onnx optimize 'merge_sequence_construct_concat_from_sequence' success
I Call acuity onnx optimize 'merge_lrn_lowlevel_implement' success
[20200] Failed to execute script pegasus
Traceback (most recent call last):
  File "pegasus.py", line 131, in <module>
  File "pegasus.py", line 112, in main
  File "acuitylib/app/importer/commands.py", line 245, in execute
  File "acuitylib/vsi_nn.py", line 171, in load_onnx
  File "acuitylib/app/importer/import_onnx.py", line 123, in run
  File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__
  File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference
  File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1317, in conv_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1287, in _conv_shape
IndexError: list index out of range

There is no further information given, so I’m unclear on what that error is and how to resolve it. Any advice would be greatly appreciated. Thanks in advance.

Hello @mjasner ,

Could you provide your model? We try to reproduce the problem.

I can share the model if you want, but I think I resolved the issue. When I was going through the Model Transcoding and Running Guide I saw the following command to use for the first step of converting an ONNX model:

./bin/pegasus import onnx --model xxxx.onnx \
--output-data xxx.data --output-model xxx.json \
# --inputs "0 1 2" --input-size-list "288,288,3#288,288,3#288,288,3" \
# --outputs "327 328"

I made the assumption that those last 2 lines were optional. Since I knew the values for those fields (at least I think I did) I uncommented them and put the values I had in there. That was the cause of the problem. When I commented those lines back out everything worked fine. I guess I was trying to be too smart/clever and ended up shooting myself in the foot.

I was now able to complete the conversion and have a .nb model to start testing with! Thanks for all of the help and patience.

So I was able to get the model to convert with the acuity-toolkit, but I haven’t been able to get it to convert with the python converter yet (which is the one I really need). It seems like the input-size-list parameter is the issue. The command I’m using is:

./convert --model-name mobilenet_ssd --platform onnx --model /home/marc/src/khadas/pose_densenet121_body.onnx --input-size-list "'1, 3, 256, 256'" --inputs input --outputs "'cmap paf'" --mean-values '128 128 128 0.0078125' --quantized-dtype asymmetric_affine --source-files ./mjdataset.txt --kboard VIM3 --print-level 0

The output is similar to before:

--+ KSNN Convert tools v1.4 +--


Start import model ...
2025-02-23 21:16:30.659543: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/bin/acuitylib:/tmp/_MEIvFSBJv
2025-02-23 21:16:30.659568: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I Namespace(import='onnx', input_dtype_list=None, input_size_list='1, 3, 256, 256', inputs='input', model='/home/marc/src/khadas/pose_densenet121_body.onnx', output_data='Model.data', output_model='Model.json', outputs='cmap paf', size_with_batch=None, which='import')
I Start importing onnx...
WARNING: ONNX Optimizer has been moved to https://github.com/onnx/optimizer.
All further enhancements and fixes to optimizers will be done in this new repo.
The optimizer code in onnx/onnx repo will be removed in 1.9 release.

W Call onnx.optimizer.optimize fail, skip optimize
I Current ONNX Model use ir_version 6 opset_version 9
I Call acuity onnx optimize 'eliminate_option_const' success
W Call acuity onnx optimize 'froze_const_branch' fail, skip this optimize
I Call acuity onnx optimize 'froze_if' success
I Call acuity onnx optimize 'merge_sequence_construct_concat_from_sequence' success
I Call acuity onnx optimize 'merge_lrn_lowlevel_implement' success
[29214] Failed to execute script pegasus
Traceback (most recent call last):
  File "pegasus.py", line 131, in <module>
  File "pegasus.py", line 112, in main
  File "acuitylib/app/importer/commands.py", line 245, in execute
  File "acuitylib/vsi_nn.py", line 171, in load_onnx
  File "acuitylib/app/importer/import_onnx.py", line 123, in run
  File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__
  File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference
  File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1317, in conv_shape
  File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 1287, in _conv_shape
IndexError: list index out of range

The values I got for that list were from the onnx toolkit via the script:

import onnx

model = onnx.load(r"pose_densenet121_body.onnx")

# The model is represented as a protobuf structure and it can be accessed
# using the standard python-for-protobuf methods

# iterate through inputs of the graph
for input in model.graph.input:
    print (input.name, end=": ")
    # get type of input tensor
    tensor_type = input.type.tensor_type
    # check if it has a shape:
    if (tensor_type.HasField("shape")):
        # iterate through dimensions of the shape:
        for d in tensor_type.shape.dim:
            # the dimension may have a definite (integer) value or a symbolic identifier or neither:
            if (d.HasField("dim_value")):
                print (d.dim_value, end=", ")  # known dimension
            elif (d.HasField("dim_param")):
                print (d.dim_param, end=", ")  # unknown dimension with symbolic name
            else:
                print ("?", end=", ")  # unknown dimension with no name
    else:
        print ("unknown rank", end="")
    print()

which gives the output:

input: 1, 3, 256, 256,

For reference my model can be downloaded from https://www.dropbox.com/scl/fi/9ef9dc033w7aloohc8zbo/densenet121_body.tgz?e=2&noscript=1&rlkey=2czvhvwey517pi77mcwh2k83b&st=wysvnf4e&dl=0

Hello @mjasner ,

For single input ONNX model, you do not need to add the inputs and outputs. Convert ONNX model tool will detect the model input and outputs automatically.

Here is my convert command and convert model successfully.

./convert --model-name pose_densenet121_body \
          --platform onnx \
          --model pose_densenet121_body.onnx \
          --mean-values '128 128 128 0.0078125' \
          --quantized-dtype asymmetric_affine \
          --source-files dataset.txt \
          --batch-size 1 \
          --iterations 1 \
          --kboard VIM3 \
          --print-level 0

Ok, that worked. I’m curious as to why, when I convert the model with the acuity-toolkit demo scripts the resulting model is around 15mb, but it’s 22 when I use the convert tool. Just curious what the differences are.

Thanks again for the help.

Hello @mjasner ,

They are the same if parameters are same. Python convert tool contains all steps from three scripts.

I try to convert model by scripts, both of them are 21.7MB. You can check the paremeters whether are the same.

Thanks, that makes sense. I used the onnx ksnn example project to write a little script that reads in the images from my dataset and try and pass them to the inference engine. It’s a simple script:

#!/usr/bin/python3
import numpy as np
import os
import argparse
import sys
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import time

model = KSNN('VIM3')
print(' |--- KSNN Version: {} +---| '.format(model.get_nn_version()))

model.nn_init(library='model/libnn_pose_densenet121_body.so', model='model/pose_densenet121_body.nb', level=1)

with open("mjdataset.txt", "r") as file:
  for line in file:
    # Process each line here
    print(line.strip())  # Print the line after removing leading/trailing whitespace
    orig_img = cv.imread(line.strip(), cv.IMREAD_COLOR)

    start = time.time()
    outputs = model.nn_inference(orig_img, platform = 'ONNX', reorder='2 1 0', output_format=output_format.OUT_FORMA>
    end = time.time()
    print('Done. inference time: ', end - start)

However when I run this I get:


$ ./test.py
 |--- KSNN Version: v1.4 +---|
Create Neural Network: 21ms or 21908us
./dataset/dataset_0124.jpg
Segmentation fault

Am I correct that the issue is that I’m passing 1920x1080 images to the neural network, but I need to adjust them to the input size of 1, 3, 256, 256? I’m unclear on what the 1 is? Do I only need to worry about the 3,256,256?

Hello @mjasner ,

1 is the batch-size. Mean infer one picture per inference. But now KSNN only support 1 batch-size model.

The error is about the input types. You can refer other demo preprocess code. And remember do normalization.

And I assume I need to change the 640 to 256 based on the parameters of the model, correct?

Hello @mjasner ,

Yes, resize the original image to your model input size, and then do normalization. Transpose the shape from (256, 256, 3) to (3, 256, 256). At last put it in a list type and input nn_inference.