I’m using 3 different quantisation techniques (uint8, int8, int16) that are supported, over the same model (resnet50) and am noticing a huge difference in their execution times
uint8 is the fastest (100 inferences/second) and has the highest accuracy
int16 is the slowest (1 inference/second) and has the lowest accuracy
Do we need to change the input data (test image) format or quantise it to improve the inference results/speed?
@johndoe Each model has its own appropriate input data type, and only the appropriate data type can get the correct result. uint8 quantization method recommended by Google. Different input types have different meanings. This is determined by the model itself. If your data is between [0,256], uint8 is recommended. If your data falls between [-128,128], int8 is recommended. Beyond this range, consider int16
@johndoe For example, if your input is a picture, your data type can be converted through the opencv interface. Then you can choose uint8, because the pixel value is from 0 to 255. For another example, if your input is a geometric sequence, then your value may exceed 256, then you can choose int16
Okay. So you’re recommending that I should change my input image’s type into int16, int8 or uint8 before passing it into the inference function, right?
[EDIT] Or should I figure out the best quantisation technique after knowing what my image’s data type is?