1.2.0 pytorch模型转换出错，即使用官方docker环境训练的pth模型也是如此

ideafold · November 8, 2021, 10:39am

训练环境pytorch 1.2.0

../bin/convertpytorch --pytorch-model  /model/model_last.pth --net-output dddnuscenes_v2.json --data-output dddnuscenes_v2.data

root@d68113bdadef:/acuity-toolkit/conversion_scripts# ../bin/convertpytorch --pytorch-model  /model/model_last.pth --net-output dddnuscenes_v2.json --data-output dddnuscenes_v2.data

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

/acuity-toolkit/bin/acuitylib/onnx_tf/common/__init__.py:87: UserWarning: FrontendHandler.get_outputs_names is deprecated. It will be removed in future release.. Use node.outputs instead.
  warnings.warn(message)
I Start importing pytorch...
/model/model_last.pth ********************
Traceback (most recent call last):
  File "convertpytorch.py", line 51, in <module>
  File "convertpytorch.py", line 41, in main
  File "acuitylib/vsi_nn.py", line 146, in load_pytorch
  File "acuitylib/app/importer/import_pytorch.py", line 92, in run
  File "acuitylib/converter/convert_pytorch.py", line 508, in __init__
  File "torch/jit/__init__.py", line 162, in load
RuntimeError: [enforce fail at inline_container.cc:137] . PytorchStreamReader failed reading zip archive: failed finding central directory
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7ff256ecfe17 in /acuity-toolkit/bin/acuitylib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x6b (0x7ff259e5875b in /acuity-toolkit/bin/acuitylib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0x9a (0x7ff259e5c20a in /acuity-toolkit/bin/acuitylib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x60 (0x7ff259e5f270 in /acuity-toolkit/bin/acuitylib/libtorch.so)
frame #4: torch::jit::import_ir_module(std::shared_ptr<torch::jit::script::CompilationUnit>, std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x38 (0x7ff25af3e088 in /acuity-toolkit/bin/acuitylib/libtorch.so)
frame #5: <unknown function> + 0x4d6abc (0x7ff2a123fabc in /acuity-toolkit/bin/acuitylib/libtorch_python.so)
frame #6: <unknown function> + 0x1d3f04 (0x7ff2a0f3cf04 in /acuity-toolkit/bin/acuitylib/libtorch_python.so)
<omitting python frames>
frame #29: ../bin/convertpytorch() [0x402ca1]
frame #30: ../bin/convertpytorch() [0x403087]
frame #31: __libc_start_main + 0xe7 (0x7ff31eecdb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #32: ../bin/convertpytorch() [0x401a9e]

[2107] Failed to execute script convertpytorch
root@d68113bdadef:/acuity-toolkit/conversion_scripts#

原始pytorch模型
链接：百度网盘请输入提取码
提取码：u7wg

Frank · November 8, 2021, 10:46am

@ideafold 你的模型训练环境是1.2么，这个zip压缩的问题就是新旧版本的问题
PS: 贴代码时请使用markdown，便于查看

ideafold · November 8, 2021, 11:03am

是1.2

pip3 list结果
torch 1.2.0
torchvision 0.4.0

Frank · November 8, 2021, 11:16am

@ideafold 我之前也碰见过这样子的模型，都是用户的训练环境高于1.2，转换环境是1.2,导致出现这样子的问题

luleibo · November 16, 2021, 10:37am

目前SDK支持的pytorch训练环境只是是1.2版本吗？

Frank · November 17, 2021, 1:16am

@luleibo 是的，目前SDK就只支持到1.2.下个版本可能会增加