[YOLOv4] Example fails when the input isn't 1080p

arthurgassner · March 2, 2022, 4:06pm

Dear all,

I am trying to run the YOLOv4 demo with an image that is not 1080x1920.

I am able to run the demo (from here) with the example image (1080p.bmp), by running

./detect_demo_x11 -m 4 -p 1080p.bmp

The prediction works (see below)

When trying with my own images (that are 480x848), nothing was predicted.
As a sanity-check, I resized the example image (1080p.bmp) to 480x848, and passed it through the detect_demo_x11. Again, nothing was predicted (see below).

(npu) khadas@biped1 detect_demo_picture git:(master) ✗ ./detect_demo_x11 -m 4 -p 848p.bmp 
W Detect_api:[det_set_log_level:19]Set log level=1
W Detect_api:[det_set_log_level:21]output_format not support Imperfect, default to DET_LOG_TERMINAL
W Detect_api:[det_set_log_level:26]Not exist VSI_NN_LOG_LEVEL, Setenv set_vsi_log_error_level
det_set_log_config Debug
det_set_model success!!

model.width:416
model.height:416
model.channel:3

Det_set_input START
Det_set_input END
Det_get_result START
Det_get_result END

resultData.detect_num=0
result type is 0

I am surprised because looking through the source code (i.e. in int ge2d_init(int width, int height) inside of the file aml_npu_app/detect_library/sample_demo_x11/main.cpp), it looks to me like the image is in any case resized to the model’s input shape (i.e. 416x416).

Does anyone know why the input size of the image matters so much?

Thank you kindly

Frank · March 3, 2022, 12:52am

@arthurgassner

github.com

khadas/aml_npu_app/blob/master/detect_library/sample_demo_x11/main.cpp#L72

    
      
          	{ "height",         required_argument,  NULL,   'h' },
          	{ "facenet_f",      required_argument,  NULL,   'f' },
          	{ "help",           no_argument,        NULL,   'H' },
          	{ 0, 0, 0, 0 }
          };
          
          

          
// ge2d
          aml_ge2d_t amlge2d;
          
          
#define MAX_HEIGHT  1080
          #define MAX_WIDTH   1920
          
          
int height = MAX_HEIGHT;
          int width = MAX_WIDTH;
          
          
int facenet_falge = 0;
          
          
using namespace std;
          using namespace cv;

The source code is developed, you can study it yourself. In fact, it is very simple here, as long as you use the opencv interface to scale the image. The demo uses 1920x1080 by default

arthurgassner · March 3, 2022, 9:02am

Thank you for the quick answer @Frank

I changed the sample_demo_x11/main.cpp source code so that

# define  MAX_HEIGHT  480 
# define  MAX_WIDTH   848

After compilation, I tried it by running

./detect_demo_x11 -m 4 -p 848p.bmp

It does detect something, but the prediction is wrong (see below)

Do I need to change anything else in the code?

Thank you again for the help and a great end of the week
Arthur

Frank · March 3, 2022, 9:04am

@arthurgassner I think your OPENCV interface should not be handled correctly, I think you need to briefly understand how opencv scales the image. I think your data here is seriously deformed, which leads to biased recognition results

arthurgassner · March 3, 2022, 9:30am

@Frank

Do you think it’s my OPENCV install that is wrong then?

I kept the image ratio (to avoid deforming the image, evening slightly) and changed sample_demo_x11/main.cpp to

#define MAX_HEIGHT 480
#define MAX_WIDTH 853

Compiling the code and testing it on a 480x853 image, nothing is detected.

But if I manually rescale the 480x853 image to 1080x1920 (resulting in a slightly pixelated image) then, the person is detected (changing MAX_HEIGHT and MAX_WIDTH in the main.cpp of course)

Am I missing something?

Thank you for your help

arthurgassner · March 9, 2022, 4:09pm

@Frank do you have any idea of what it might be? I’m still stuck on this unfortunately

A great day to y’all!

Frank · March 10, 2022, 12:45am

@arthurgassner Did you know how use resize function in your code ? I think you should check the api doc with OpenCV

arthurgassner · May 25, 2022, 7:55am

As it turns out, the reason the prediction was wrong is that the model expected the input image’s width to be a multiple of 32.

Rescaling the image to a width of 832px fixed it.