Onnx 做量化时由于batch和reshape带来的问题?

@Frank
你好,我这里有一个onnx的模型,经过0_import_model.sh之后,进行量化1_quantize_model.sh操作时,出现了如下问题:

1_quantize_model.sh:

NAME=$1

ACUITY_PATH=../../../linux_sdk_6.4.3/acuity-toolkit/bin/

tensorzone=${ACUITY_PATH}tensorzonex

#dynamic_fixed_point-i8 asymmetric_affine-u8
$tensorzone \
    --action quantization \
    --source text \
    --source-file ../imagedata/dataset.txt \
    --channel-mean-value '103.939 116.779 123.68 127.0' \
    --model-input ${NAME}.json \
    --model-data ${NAME}.data \
    --model-quantize ${NAME}.quantize \
    --quantized-dtype dynamic_fixed_point-i8 \
    --quantized-rebuild
问题日志展开

[TRAINER]Quantization start…
I Init validate tensor provider.
I Enqueue samples 100
I Init provider with 100 samples.
D set up a quantize net
D Process input.1_31 …
D Acuity output shape(input): (100 32 160 1)
W:tensorflow:From tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
D Real output shape: (100, 32, 160, 1)
D Process Conv_55_30 …
D Acuity output shape(convolution): (100 32 160 64)
D Real output shape: (100, 32, 160, 64)
D Process Relu_57_29 …
D Acuity output shape(relu): (100 32 160 64)
D Real output shape: (100, 32, 160, 64)
D Process MaxPool_58_28 …
D Acuity output shape(pooling): (100 16 80 64)
D Real output shape: (100, 16, 80, 64)
D Process Conv_59_27 …
D Acuity output shape(convolution): (100 16 80 128)
D Real output shape: (100, 16, 80, 128)
D Process Relu_61_26 …
D Acuity output shape(relu): (100 16 80 128)
D Real output shape: (100, 16, 80, 128)
D Process MaxPool_62_25 …
D Acuity output shape(pooling): (100 8 40 128)
D Real output shape: (100, 8, 40, 128)
D Process Conv_63_24 …
D Acuity output shape(convolution): (100 8 40 256)
D Real output shape: (100, 8, 40, 256)
D Process Relu_65_23 …
D Acuity output shape(relu): (100 8 40 256)
D Real output shape: (100, 8, 40, 256)
D Process Conv_66_22 …
D Acuity output shape(convolution): (100 8 40 256)
D Real output shape: (100, 8, 40, 256)
D Process Relu_68_21 …
D Acuity output shape(relu): (100 8 40 256)
D Real output shape: (100, 8, 40, 256)
D Process MaxPool_69_20 …
D Acuity output shape(pooling): (100 4 40 256)
D Real output shape: (100, 4, 40, 256)
D Process Conv_70_19 …
D Acuity output shape(convolution): (100 4 40 512)
D Real output shape: (100, 4, 40, 512)
D Process Relu_72_18 …
D Acuity output shape(relu): (100 4 40 512)
D Real output shape: (100, 4, 40, 512)
D Process Conv_73_17 …
D Acuity output shape(convolution): (100 4 40 512)
D Real output shape: (100, 4, 40, 512)
D Process Relu_75_16 …
D Acuity output shape(relu): (100 4 40 512)
D Real output shape: (100, 4, 40, 512)
D Process MaxPool_76_15 …
D Acuity output shape(pooling): (100 2 40 512)
D Real output shape: (100, 2, 40, 512)
D Process Conv_77_14 …
D Acuity output shape(convolution): (100 1 39 512)
D Real output shape: (100, 1, 39, 512)
D Process Relu_79_13 …
D Acuity output shape(relu): (100 1 39 512)
D Real output shape: (100, 1, 39, 512)
D Process Reshape_97_12_acuity_mark_perm_34 …
D Acuity output shape(permute): (100 512 1 39)
D Real output shape: (100, 512, 1, 39)
D Process Reshape_97_12 …
D Acuity output shape(reshape): (1 512 39)
Traceback (most recent call last):
File “tensorflow_core/python/framework/ops.py”, line 1610, in _create_c_op
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot reshape a tensor with 1996800 elements to shape [1,512,39] (19968 elements) for ‘Reshape_97_12/Reshape_97_12’ (op: ‘Reshape’) with input shapes: [100,512,1,39], [3] and with input tensors computed as partial shapes: input[1] = [1,512,39].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “tensorzonex.py”, line 446, in
File “tensorzonex.py”, line 383, in main
File “acuitylib/app/tensorzone/quantization.py”, line 156, in run
File “acuitylib/app/tensorzone/quantization.py”, line 103, in _run_quantization
File “acuitylib/app/tensorzone/workspace.py”, line 172, in _setup_graph
File “acuitylib/app/tensorzone/graph.py”, line 59, in generate
File “acuitylib/acuitynetbuilder.py”, line 282, in build
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 312, in build_layer
File “acuitylib/acuitynetbuilder.py”, line 344, in build_layer
File “acuitylib/layer/acuitylayer.py”, line 280, in compute_tensor
File “acuitylib/layer/reshapelayer.py”, line 108, in compute_out_tensor
File “tensorflow_core/python/ops/array_ops.py”, line 131, in reshape
File “tensorflow_core/python/ops/gen_array_ops.py”, line 8117, in reshape
File “tensorflow_core/python/framework/op_def_library.py”, line 793, in _apply_op_helper
File “tensorflow_core/python/util/deprecation.py”, line 507, in new_func
File “tensorflow_core/python/framework/ops.py”, line 3360, in create_op
File “tensorflow_core/python/framework/ops.py”, line 3429, in _create_op_internal
File “tensorflow_core/python/framework/ops.py”, line 1773, in init
File “tensorflow_core/python/framework/ops.py”, line 1613, in _create_c_op
ValueError: Cannot reshape a tensor with 1996800 elements to shape [1,512,39] (19968 elements) for ‘Reshape_97_12/Reshape_97_12’ (op: ‘Reshape’) with input shapes: [100,512,1,39], [3] and with input tensors computed as partial shapes: input[1] = [1,512,39].
[4811] Failed to execute script tensorzonex

当我把 --source-file ../imagedata/dataset.txt 的图片数量换成一张,就没有问题。100张就出现了上面的问题,reshape不成功,像是量化的时候直接把量化图片按batchsize=100送进去了

[模型下载]链接: https://pan.baidu.com/s/16wBkQ1dq2nGWTy8HN7ocbQ 密码: jo77

@Frank 另外,请问是否有让VIM3的USB只支持USB1.1的办法,不要USB2.0/3.0?

@librazxc 你的dataset.txt文件内容,贴出来一下,我看看是不是排版有问题导致的

@librazxc 那你需要修改源码,自己编译,在dts里面把USB2.0和3.0的节点关闭掉

@Frank 这里指的修改源码,是指源码里关于usb1.1描述符的修改吗?

@Frank 你好,dataset.txt在给的下载链接里的imagedata文件夹下。应该不是这个问题。

@librazxc 关闭VIM3的dts里面usb2.0和3.0的节点,你关闭2.0和3.0的目的是什么

@librazxc 没有下载你的链接的内容,入股需要我验证,需要等一段时间,你可以直接贴出来这个文件的内容,我看看你是怎么排的

…/imagedata/word_1.png
…/imagedata/word_2.png
…/imagedata/word_3.png
…/imagedata/word_4.png
…/imagedata/word_5.png
…/imagedata/word_6.png
…/imagedata/word_7.png
…/imagedata/word_8.png
…/imagedata/word_9.png
…/imagedata/word_10.png
…/imagedata/word_11.png
…/imagedata/word_12.png
…/imagedata/word_13.png
…/imagedata/word_14.png
…/imagedata/word_15.png
…/imagedata/word_16.png
…/imagedata/word_17.png
…/imagedata/word_18.png
…/imagedata/word_19.png
…/imagedata/word_20.png
…/imagedata/word_21.png
…/imagedata/word_22.png
…/imagedata/word_23.png
…/imagedata/word_24.png
…/imagedata/word_25.png
…/imagedata/word_26.png
…/imagedata/word_27.png
…/imagedata/word_28.png
…/imagedata/word_29.png
…/imagedata/word_30.png
…/imagedata/word_31.png
…/imagedata/word_32.png
…/imagedata/word_33.png
…/imagedata/word_34.png
…/imagedata/word_35.png
…/imagedata/word_36.png
…/imagedata/word_37.png
…/imagedata/word_38.png
…/imagedata/word_39.png
…/imagedata/word_40.png
…/imagedata/word_41.png
…/imagedata/word_42.png
…/imagedata/word_43.png
…/imagedata/word_44.png
…/imagedata/word_45.png
…/imagedata/word_46.png
…/imagedata/word_47.png
…/imagedata/word_48.png
…/imagedata/word_49.png
…/imagedata/word_50.png
…/imagedata/word_51.png
…/imagedata/word_52.png
…/imagedata/word_53.png
…/imagedata/word_54.png
…/imagedata/word_55.png
…/imagedata/word_56.png
…/imagedata/word_57.png
…/imagedata/word_58.png
…/imagedata/word_59.png
…/imagedata/word_60.png
…/imagedata/word_61.png
…/imagedata/word_62.png
…/imagedata/word_63.png
…/imagedata/word_64.png
…/imagedata/word_65.png
…/imagedata/word_66.png
…/imagedata/word_67.png
…/imagedata/word_68.png
…/imagedata/word_69.png
…/imagedata/word_70.png
…/imagedata/word_71.png
…/imagedata/word_72.png
…/imagedata/word_73.png
…/imagedata/word_74.png
…/imagedata/word_75.png
…/imagedata/word_76.png
…/imagedata/word_80.png
…/imagedata/word_82.png
…/imagedata/word_90.png
…/imagedata/word_91.png
…/imagedata/word_92.png
…/imagedata/word_95.png
…/imagedata/word_96.png
…/imagedata/word_97.png
…/imagedata/word_98.png
…/imagedata/word_99.png
…/imagedata/word_100.png
…/imagedata/word_101.png
…/imagedata/word_102.png
…/imagedata/word_103.png
…/imagedata/word_104.png
…/imagedata/word_105.png
…/imagedata/word_106.png
…/imagedata/word_107.png
…/imagedata/word_108.png
…/imagedata/word_109.png
…/imagedata/word_110.png
…/imagedata/word_111.png
…/imagedata/word_112.png
…/imagedata/word_113.png

@Frank

@librazxc

aml_npu_sdk/acuity-toolkit/conversion_scripts/data$ cat  validation_tf.txt 
./space_shuttle_224.jpg, 813

你参照文档里面验证文件的格式呢,把图片你的结果也带上

@Frank 需要测试一个项目只允许USB1.1,如下:

主设备VIM3 从设备dsp
usb1.1 usb2.0/1.1

由于从设备dsp usb2.0/1.1共用phy,没法隔开;所以想着主设备能不用只支持usb1.1,这样就可以测试1.1是否正常工作了

@Frank 我不是分类或者检测模型,是crnn的模型,你的意思字符label也带上?dbnet转量化我没带上也可以转换成功

@librazxc 你只用一张图片出来的结果正常么,文本的模型,这个工具不确定是不是支持得好,我也没有尝试过

@Frank 你好,测试了一下,应该是onnx reshape的操作导致的问题。我使用tf的tflite或者pb不会遇到reshape的问题

@librazxc 那你可以考虑一下使用其他的模型,使用tf的模型是最好的,因为工具本身是基于tf做的,tf的支持是最完善的

好的 那我只好用tf重新写了。。。。。。。。。

@Frank 。。。。。这个有啥建议不。。。。

@librazxc 我这边明天尝试一下,但是估计不能这么简单的关掉,关掉3.0倒是可以

@librazxc 我这边测试过了,不能简单的只留下1.1的USB,会出很多问题,去掉3.0倒是可以,没有很好的将2.0和1.1分的方法,可能修改驱动才行.

@Frank
好的,非常感谢,辛苦了….