Help getting started with VIM3 NPU Usage

I had typed up a long reply here showing that this still did not work, but upon further investigation I found the model WAS now working correctly, but there was another issue occuring in the post processing. I had been using the version of OpenCV from the instructions here because I had been playing with the OpenCV DNN support. I was never able to get that to work, but that’s a different topic.

I created a fresh virtual environment that was not using ANY of the system site directories and installed whatever version of OpenCV pip gave me, along with ksnn 1.4, andit works with the test image! I was able to get a pose detection out of the model with KSNN!

There is still one issue, though. It doesn’t get any detections when using inputs from the attached camera. If I run the sample images I put in dropbox yesterday through the model I get nothing. Can you please advise? Do I need to re-quantize the model with a dataset of images from the camera? What difference would that make?

Hello @mjasner ,

Could you provide your camera demo code? If the picture can detect, camera input can detect, too.

The code is the same, i just changed the name of the image from pose1.jpg to the name of a capture from the camera.

The images can be found at these links:
Image 1
Image 2
Image 3

I’ve tried resizing the images, cropping the images, changing to black and white, etc. I get no detections when running images from the camera through the network.

Do you think re-converting the model using a dataset of pictures from the camera instead of the coco dataset will have any effect?

Hello @mjasner ,

I have said that the quantified images is best to use actual scene images or test dataset images. The reason we use COCO is that we do not have your scene images and make a simple attempt by COCO data.

The problem is that the model lose much accuracy. We use nb model infer your three test images. It cannot detect either. We quantify model in int16. The int16 model can detect.

Suggest you use your scene images re-convert first. It can get more precise quantified model in actual scenario. If still detect nothing, try to use int16 model.

The command for int16 we use.

./convert --model-name pose_densenet121_body \
          --platform onnx \
          --model pose_densenet121_body.onnx \
          --mean-values '0 0 0 0.0039215686' \
          --quantized-dtype dynamic_fixed_point \
          --qtype int16
          --source-files dataset.txt \
          --batch-size 1 \
          --iterations 500 \
          --kboard VIM3 \
          --print-level 0

Using a dataset of captures from the camera helped improve the accuracy of the model. I’m going to spend a few days doing further testing. I may try to use int16 just as a point of comparison as well. I’ll let you know if I run into any issues with it.

Thanks again, you have been very patient and super helpful! It’s greatly appreciated!

Just FYI I had to use int16 to get it to consistently give results from the camera feed, but the model is now giving steady results. Thanks for the help and patience