Errors in 0_import_model.sh when importing Pytorch JDE model

Hello!

I’m trying to convert the Joint Detection and Embedding (JDE) model to use in the Khadas android npu app for VIM3.

I tested the demo.py with CUDA and it works. I also tested the demo.py on the CPU (I removed all the .cuda() calls) and it works too.

I converted the JDE-576x320 pytorch model to a jit model with the cvt2jit.py file found in the JDE C++ implementation

With the CUDA version, when I try to convert the resulting file ( jde_576x320_torch14.pt ) with 0_import_model.sh, I get this error :

$ sh 0_import_model_pt.sh
I Start importing pytorch...
WARNING: Token 'NEWLINE' defined, but not used
WARNING: There is 1 unused token
Traceback (most recent call last):
  File "convertpytorch.py", line 96, in <module>
  File "convertpytorch.py", line 86, in main
  File "acuitylib/vsi_nn.py", line 219, in load_pytorch_by_onnx_backend
  File "acuitylib/onnx_ir/frontend/pytorch_frontend/pytorch_frontend.py", line 62, in model_import
  File "acuitylib/onnx_ir/frontend/pytorch_frontend/pytorch_frontend.py", line 134, in _model_parser
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[37] Failed to execute script convertpytorch

With the CPU version, when I try to convert the resulting file ( jde_576x320_torch14.pt ) with 0_import_model.sh, I get this error :

$ sh 0_import_model_pt.sh
I Start importing pytorch...
WARNING: Token 'NEWLINE' defined, but not used
WARNING: There is 1 unused token
I Save onnx model: tmp_onnx_model_file.onnx
E Unsupport tensor 2638 and node aten::select_2638 and
schema aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a)) ;
I ----------------Warning(0)----------------
Traceback (most recent call last):
  File "convertpytorch.py", line 96, in <module>
  File "convertpytorch.py", line 86, in main
  File "acuitylib/vsi_nn.py", line 225, in load_pytorch_by_onnx_backend
  File "acuitylib/onnx_ir/frontend/pytorch_frontend/pytorch_frontend.py", line 241, in model_export
  File "acuitylib/onnx_ir/frontend/pytorch_frontend/pytorch_lower_to_onnx.py", line 110, in lower_to_onnx
  File "acuitylib/onnx_ir/frontend/pytorch_frontend/pytorch_lower_to_onnx.py", line 105, in lower_match
  File "acuitylib/acuitylog.py", line 251, in e
ValueError: Unsupport tensor 2638 and node aten::select_2638 and
schema aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a)) ;
[16807] Failed to execute script convertpytorch

I know CUDA isn’t supported by the NPU, but I was wondering if this was a normal behavior.

Should I avoid using models with CUDA code in them?
Should I try to convert one of the JDE models into something other than Pytorch?

I don’t know what to do to solve this problem and I hope you can help me.

Thank you very much!

@Alexy You should check with the docs. If it have some unsupport layer. it can’t be convert

Hi @Frank ! Thank you for your answer and sorry for not answering earlier.

I do think some layers are unsupported. I ran the traced JDE model into Netron to see the layers.
I think the problematic operations are torch.reciprocal, torch.contiguous and torch.repeat, but I’m not sure since the docs don’t show Pytorch to Acuity mapping.

I converted the pytorch model to onnx, but there are still errors in the 0_import_model.sh script.

If I converted pytorch to onnx with opset_version = 10, I get this error :

$ sh 0_import_model_onnx.sh
I Start importing onnx…
I Current ONNX Model use ir_version 4 opset_version 10
W Call acuity onnx optimize fail, skip optimize


I Try match ConstantOfShape_719:out0
W Not match tensor ConstantOfShape_719:out0
Traceback (most recent call last):
File “convertonnx.py”, line 56, in
File “convertonnx.py”, line 45, in main
File “acuitylib/vsi_nn.py”, line 139, in load_onnx
File “acuitylib/app/importer/import_onnx.py”, line 124, in run
File “acuitylib/converter/convert_onnx.py”, line 953, in match_paragraph_and_param
File “acuitylib/converter/convert_onnx.py”, line 856, in _onnx_push_ready_tensor
TypeError: ‘NoneType’ object is not iterable
[16790] Failed to execute script convertonnx

It seems to be an unsupported layer or operation (ConstantOfShape) again.

When I converted pytorch to onnx with opset_version = 11, I get this error:

D Calc tensor Concat_1496 (1, 4, 20, 36, 4, 5)
E Calc node ScatterND : ScatterND_909 output shape fail
W ----------------Warning(5)----------------
Traceback (most recent call last):
File “acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py”, line 52, in infer_shape
File “/home/alexy/acuity/aml_npu_sdk/acuity-toolkit/bin/acuitylib/acuitylib/onnx_ir/onnx_numpy_backend/ops/scatter_nd.py”, line 28, in ScatterND
return scatter_nd_impl(data, indices, updates)
File “/home/alexy/acuity/aml_npu_sdk/acuity-toolkit/bin/acuitylib/acuitylib/onnx_ir/onnx_numpy_backend/ops/scatter_nd.py”, line 21, in scatter_nd_impl
output[indices[i]] = updates[i]
IndexError: arrays used as indices must be of integer (or boolean) type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “convertonnx.py”, line 56, in
File “convertonnx.py”, line 45, in main
File “acuitylib/vsi_nn.py”, line 139, in load_onnx
File “acuitylib/app/importer/import_onnx.py”, line 120, in run
File “acuitylib/converter/convert_onnx.py”, line 69, in init
File “acuitylib/converter/convert_onnx.py”, line 413, in _shape_inference
File “acuitylib/onnx_ir/onnx_numpy_backend/shape_inference.py”, line 54, in infer_shape
File “acuitylib/acuitylog.py”, line 251, in e
ValueError: Calc node ScatterND : ScatterND_909 output shape fail
[9480] Failed to execute script convertonnx

I found a similar problem on this forum, but I’m not sure if it was solved and if I can actually modify ScatterND.

So now I’m not sure if I can convert this model without modifying it (which I’m not sure I can).
I would like if you have any idea how to solve this problem.

I would also like to know if your team is planning to add more support to more recent Pytorch versions, layers and operations? It would be very important for us to know.

Thanks!

@Alexy This support is not done by our side, it is done by the tool provider, we can only wait for the tool provider to upgrade the pytorch version