YOLOv11 (and Ultralytics?) on Khadas NPU

mjasner · March 16, 2025, 1:52am

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Please provide the version of the system here:

20.04

Please describe your issue below:

I’m working on porting some code from a Jetson device to a Vim3. I’m working on evaluating the performance of several models used on the Jetson platform. One is a densenet121 based model that I’ve gotten a lot of help with on here.

Another model I’m looking at is a YoloV11 model. It’s run through Ultralytics. I’ve seen some references to being able to run YoloV11 on the Khadas NPU. Does that mean it can be run without converting it to .nb format, or do I still need to convert it? Ultralytics more or less takes care of all of the pre and post processing for the model as I understand it, can I still use that on the Khadas NPU or do I need to manually port that code?

Just looking for some help getting started with this model.

Thanks

Louis-Cheng-Liu · March 19, 2025, 3:34am

Hello @mjasner ,

If you want to use NPU infer model, you must convert model to nb. You can use KSNN Python Demo. Install Ultralytics and use Ultralytics to do pre and post process. The part of inference use KSNN.

Now we do not have YOLOv11 Demo, but we have YOLOv8 Demo. If you want to use, you can refer this doc.
YOLOv8n KSNN Demo - 2 [Khadas Docs]

You also can refer this doc to convert YOLOv11 and run. If any question, ask me for help.

mjasner · March 20, 2025, 4:04am

So i need to convert the yolo model to onnx and then convert the onnx model to nb?

Louis-Cheng-Liu · March 20, 2025, 8:55am

Hello @mjasner ,

Yes, only nb model can inferred by NPU.

mjasner · March 29, 2025, 12:53am

On my desktop that I use for model conversion I followed the instructions in the link. I updated Ultralytics (I changed head.py per the instructions, though the version of Ultralytics that pip installed on my machine had different line numbers, probably due to being a different version), and when I tried to export the yolov11 model in onnx format I got

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: export failure ❌ 0.2s: expected Tensor as element 0 in argument 0, but got tuple
Traceback (most recent call last):
  File "onnxexport.py", line 3, in <module>
    results = model.export(format="onnx")
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/engine/model.py", line 728, in export
    return Exporter(overrides=args, _callbacks=self.callbacks)(model=self.model)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/engine/exporter.py", line 433, in __call__
    f[2], _ = self.export_onnx()
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/engine/exporter.py", line 181, in outer_func
    raise e
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/engine/exporter.py", line 176, in outer_func
    f, model = inner_func(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/engine/exporter.py", line 559, in export_onnx
    torch.onnx.export(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/onnx/utils.py", line 551, in export
    _export(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/onnx/utils.py", line 1648, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/onnx/utils.py", line 1170, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/onnx/utils.py", line 1046, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/onnx/utils.py", line 950, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/jit/_trace.py", line 1497, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/jit/_trace.py", line 141, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/jit/_trace.py", line 132, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 114, in forward
    return self.predict(x, *args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 132, in predict
    return self._predict_once(x, profile, visualize, embed)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 153, in _predict_once
    x = m(x)  # run
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/marc/src/yolo11n-pose/yoloconv/lib/python3.8/site-packages/ultralytics/nn/modules/head.py", line 261, in forward
    return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
TypeError: expected Tensor as element 0 in argument 0, but got tuple

I’ve done some googling but haven’t found any answers. Do you know what this might mean?

Thanks

mjasner · March 29, 2025, 1:16am

At the start of the convert script it shows this output with versions of Ultralytics and Pytorch:

Ultralytics 8.3.97 🚀 Python-3.8.0 torch-2.4.1+cu121 CPU (Intel Core(TM) i5-4440 3.10GHz)
YOLO11n-pose summary (fused): 109 layers, 2,866,468 parameters, 0 gradients, 7.4 GFLOPs

Louis-Cheng-Liu · April 1, 2025, 7:41am

Hello @mjasner

In our code, the x is a tuple type. But the pred_kpt type is torch.tensor. So they cannot concat.

The follow code is i modify class Pose in head.py. It can convert ONNX successfully.

@@ -253,12 +264,16 @@ class Pose(Detect):
     def forward(self, x):
         """Perform forward pass through YOLO model and return predictions."""
         bs = x[0].shape[0]  # batch size
-        kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
+        # kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
+        kpt = [self.cv4[i](x[i]) for i in range(self.nl)]
+        for i in range(self.nl):
+            kpt[i] = kpt[i].permute(0, 2, 3, 1)
         x = Detect.forward(self, x)
+        return x, kpt
         if self.training:
             return x, kpt
         pred_kpt = self.kpts_decode(bs, kpt)
         return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))

mjasner · April 2, 2025, 1:12am

I added that change to head.py along with the other change from the instructions and it was able to convert the model. When I tried to check the model with Netron I get a large model graph but I don’t see anything that looks like the screen from the conversion page.

When I run a script to list the inputs and outputs I get:

Inputs:  ['images']
Outputs:  ['output0', '554', '575', '510', '511', '512']

I used the following command to convert it to NB format:
./convert --model-name yolo11n-pose --platform onnx --model yolo11n-pose.onnx --mean-values '0 0 0 0.00392156' --quantized-dtype asymmetric_affine --source-files mjdataset.txt --batch-size 1 --iterations 671 --kboard VIM3 --print-level 0

That does not succeed. I get the following output:

> $ ./convert --model-name yolo11n-pose --platform onnx --model yolo11n-pose.onnx --mean-values '0 0 0 0.00392156' --quantized-dtype asymmetric_affine --source-files mjdataset.txt --batch-size 1 --iterations 671 --kboard VIM3 --print-level 0
> 
> 
> --+ KSNN Convert tools v1.4 +--
> 
> 
> Start import model ...
> 2025-04-01 21:05:15.031811: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/bin/acuitylib:/tmp/_MEItWluDu
> 2025-04-01 21:05:15.031933: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
> I Namespace(import='onnx', input_dtype_list=None, input_size_list=None, inputs=None, model='yolo11n-pose.onnx', output_data='Model.data', output_model='Model.json', outputs=None, size_with_batch=None, which='import')
> I Start importing onnx...
> WARNING: ONNX Optimizer has been moved to https://github.com/onnx/optimizer.
> All further enhancements and fixes to optimizers will be done in this new repo.
> The optimizer code in onnx/onnx repo will be removed in 1.9 release.
> 
> W Call onnx.optimizer.optimize fail, skip optimize
> I Current ONNX Model use ir_version 9 opset_version 19
> I Call acuity onnx optimize 'eliminate_option_const' success
> /home/marc/src/khadas/workspace/aml_npu_sdk/acuity-toolkit/bin/acuitylib/acuitylib/onnx_ir/onnx_numpy_backend/ops/split.py:15: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
>   if inputs[1] == '':
> W Call acuity onnx optimize 'froze_const_branch' fail, skip this optimize
> I Call acuity onnx optimize 'froze_if' success
> I Call acuity onnx optimize 'merge_sequence_construct_concat_from_sequence' success
> I Call acuity onnx optimize 'merge_lrn_lowlevel_implement' success
> [663] Failed to execute script pegasus
> Traceback (most recent call last):
>   File "pegasus.py", line 131, in <module>
>   File "pegasus.py", line 112, in main
>   File "acuitylib/app/importer/commands.py", line 245, in execute
>   File "acuitylib/vsi_nn.py", line 171, in load_onnx
>   File "acuitylib/app/importer/import_onnx.py", line 123, in run
>   File "acuitylib/converter/onnx/convert_onnx.py", line 61, in __init__
>   File "acuitylib/converter/onnx/convert_onnx.py", line 761, in _shape_inference
>   File "acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py", line 65, in infer_shape
>   File "acuitylib/onnx_ir/onnx_numpy_backend/smart_graph_engine.py", line 70, in smart_onnx_scanner
>   File "acuitylib/onnx_ir/onnx_numpy_backend/smart_node.py", line 48, in calc_and_assign_smart_info
>   File "acuitylib/onnx_ir/onnx_numpy_backend/smart_toolkit.py", line 636, in multi_direction_broadcast_shape
> ValueError: operands could not be broadcast together with shapes (1,0,160,160) (1,16,160,160)

The model (in both original pytorch and converted onnx format) are available at https://www.dropbox.com/scl/fo/osyo3l2p1topgpie7sz5n/AKYXZ3Hw4PK2x3YvX-gBO7U?rlkey=riezhmuma5083j2krpv8l6h4a&st=4461istv&dl=0

mjasner · April 8, 2025, 1:05am

Any thoughts on this?

Louis-Cheng-Liu · April 8, 2025, 6:05am

Hello @mjasner ,

Sorry, this problem maybe occur in convert tool. Our engineer still find the reason.

Actually, the problem have encountered before.
Unable to convert yolov8n to .nb - VIM3 - Khadas Community

The solution way is using the same Ultralytics and PyTorch version with our demo.

But this version do not have YOLOv11.

I will notice you inform you at once when we solve the problem.

mjasner · April 9, 2025, 1:37pm

So at the moment there is no way to get this model converted? Would converting it to something other than onnx and then to NB work any better?

Louis-Cheng-Liu · April 10, 2025, 9:55am

Hello @mjasner ,

A bad new, our convert tool does not support the new YOLO structure which cause this error. However, we do not have plan to update the tool and kernel recently. Please use YOLOv8 first. If you still need to use YOLOv11, you need to wait the update.