Input data format for quantised models

I’m using 3 different quantisation techniques (uint8, int8, int16) that are supported, over the same model (resnet50) and am noticing a huge difference in their execution times

uint8 is the fastest (100 inferences/second) and has the highest accuracy
int16 is the slowest (1 inference/second) and has the lowest accuracy

Do we need to change the input data (test image) format or quantise it to improve the inference results/speed?

@johndoe Each model has its own appropriate input data type, and only the appropriate data type can get the correct result. uint8 quantization method recommended by Google. Different input types have different meanings. This is determined by the model itself. If your data is between [0,256], uint8 is recommended. If your data falls between [-128,128], int8 is recommended. Beyond this range, consider int16

1 Like

I see. And how do I figure out my data range? Whether it’s in [0,256] or [-128, 128]?

@johndoe It depends on your model itself, it depends on what your datasets

The models that I’m considering are all trained on the imagenet dataset. So what range am I dealing with in this case?

Would you have any reference manual for knowing the type for each model?

@johndoe For example, if your input is a picture, your data type can be converted through the opencv interface. Then you can choose uint8, because the pixel value is from 0 to 255. For another example, if your input is a geometric sequence, then your value may exceed 256, then you can choose int16

1 Like

Okay. So you’re recommending that I should change my input image’s type into int16, int8 or uint8 before passing it into the inference function, right?

[EDIT] Or should I figure out the best quantisation technique after knowing what my image’s data type is?

@johndoe You should limit the data to uint8 (for example) before passing it to the model, otherwise you will cause data loss

1 Like

Got you. Thanks a lot for the help @Frank